SlideShare a Scribd company logo
1 of 44
Download to read offline
1/16
Learning Intrusion Prevention Policies
Through Optimal Stopping
CDIS Research Workshop
Kim Hammar & Rolf Stadler
kimham@kth.se & stadler@kth.se
Division of Network and Systems Engineering
KTH Royal Institute of Technology
Oct 14, 2021
2/16
Challenges: Evolving and Automated Attacks
I Challenges:
I Evolving & automated attacks
I Complex infrastructures
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
2/16
Goal: Automation and Learning
I Challenges
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
2/16
Approach: Game Model & Reinforcement Learning
I Challenges:
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
I Our Approach:
I Model network attack and defense as
games.
I Use reinforcement learning to learn
policies.
I Incorporate learned policies in
self-learning systems.
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
3/16
Use Case: Intrusion Prevention
I A Defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and active defense
I An Attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
4/16
Use Case: Intrusion Prevention
I A Defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and active defense
I An Attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
We formulate this use case as an Optimal Stopping problem
5/16
Background on Optimal Stopping Problems
I The General Problem:
I A Markov process (st)T
t=1 is observed sequentially
I Two options per t: (i) continue to observe; or (ii) stop
I Find the optimal stopping time τ∗
:
τ∗
= arg max
τ
Eτ
" τ−1
X
t=1
γt−1
RC
st st+1
+ γτ−1
RS
sτ sτ
#
(1)
where RS
ss0 & RC
ss0 are the stop/continue rewards
I History:
I Studied in the 18th century to analyze a gambler’s fortune
I Formalized by Abraham Wald in 1947
I Since then it has been generalized and developed by (Chow,
Shiryaev & Kolmogorov, Bather, Bertsekas, etc.)
I Applications & Use Cases:
I Change detection, machine replacement, hypothesis testing,
gambling, selling decisions, queue management, advertisement
scheduling, the secretary problem, etc.
5/16
Background on Optimal Stopping Problems
I The General Problem:
I A Markov process (st)T
t=1 is observed sequentially
I Two options per t: (i) continue to observe; or (ii) stop
I History:
I Studied in the 18th century to analyze a gambler’s fortune
I Formalized by Abraham Wald in 19471
I Since then it has been generalized and developed by (Chow2
,
Shiryaev & Kolmogorov3
, Bather4
, Bertsekas5
, etc.)
1
Abraham Wald. Sequential Analysis. Wiley and Sons, New York, 1947.
2
Y. Chow, H. Robbins, and D. Siegmund. “Great expectations: The theory of optimal stopping”. In: 1971.
3
Albert N. Shirayev. Optimal Stopping Rules. Reprint of russian edition from 1969. Springer-Verlag Berlin,
2007.
4
John Bather. Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions. USA:
John Wiley and Sons, Inc., 2000. isbn: 0471976490.
5
Dimitri P. Bertsekas. Dynamic Programming and Optimal Control. 3rd. Vol. I. Belmont, MA, USA: Athena
Scientific, 2005.
5/16
Background on Optimal Stopping Problems
I The General Problem:
I A Markov process (st)T
t=1 is observed sequentially
I Two options per t: (i) continue to observe; or (ii) stop
I History:
I Studied in the 18th century to analyze a gambler’s fortune
I Formalized by Abraham Wald in 1947
I Since then it has been generalized and developed by (Chow,
Shiryaev & Kolmogorov, Bather, Bertsekas, etc.)
I Applications & Use Cases:
I Change detection6
, selling decisions7
, queue management8
,
advertisement scheduling9
, etc.
6
Alexander G. Tartakovsky et al. “Detection of intrusions in information systems by sequential change-point
methods”. In: Statistical Methodology 3.3 (2006). issn: 1572-3127. doi:
https://doi.org/10.1016/j.stamet.2005.05.003. url:
https://www.sciencedirect.com/science/article/pii/S1572312705000493.
7
Jacques du Toit and Goran Peskir. “Selling a stock at the ultimate maximum”. In: The Annals of Applied
Probability 19.3 (2009). issn: 1050-5164. doi: 10.1214/08-aap566. url:
http://dx.doi.org/10.1214/08-AAP566.
8
Arghyadip Roy et al. “Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision
Processes”. In: CoRR (2019). http://arxiv.org/abs/1912.10325. eprint: 1912.10325.
9
Vikram Krishnamurthy, Anup Aprem, and Sujay Bhatt. “Multiple stopping time POMDPs: Structural results
& application in interactive advertising on social media”. In: Automatica 95 (2018), pp. 385–398. issn:
0005-1098. doi: https://doi.org/10.1016/j.automatica.2018.06.013. url:
https://www.sciencedirect.com/science/article/pii/S0005109818303054.
5/16
Background on Optimal Stopping Problems
I The General Problem:
I A Markov process (st )T
t=1 is observed sequentially
I Two options per t: (i) continue to observe; or (ii) stop
I History:
I Studied in the 18th century to analyze a gambler’s fortune
I Formalized by Abraham Wald in 1947
I Since then it has been generalized and developed by (Chow, Shiryaev & Kolmogorov, Bather,
Bertsekas, etc.)
I Applications & Use Cases:
I Change detection10
, selling decisions11
, queue management12
,
advertisement scheduling13
, intrusion prevention14
etc.
10
Alexander G. Tartakovsky et al. “Detection of intrusions in information systems by sequential change-point
methods”. In: Statistical Methodology 3.3 (2006). issn: 1572-3127. doi:
https://doi.org/10.1016/j.stamet.2005.05.003. url:
https://www.sciencedirect.com/science/article/pii/S1572312705000493.
11
Jacques du Toit and Goran Peskir. “Selling a stock at the ultimate maximum”. In: The Annals of Applied
Probability 19.3 (2009). issn: 1050-5164. doi: 10.1214/08-aap566. url:
http://dx.doi.org/10.1214/08-AAP566.
12
Arghyadip Roy et al. “Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision
Processes”. In: CoRR (2019). http://arxiv.org/abs/1912.10325. eprint: 1912.10325.
13
Vikram Krishnamurthy, Anup Aprem, and Sujay Bhatt. “Multiple stopping time POMDPs: Structural results
& application in interactive advertising on social media”. In: Automatica 95 (2018), pp. 385–398. issn:
0005-1098. doi: https://doi.org/10.1016/j.automatica.2018.06.013. url:
https://www.sciencedirect.com/science/article/pii/S0005109818303054.
14
Kim Hammar and Rolf Stadler. Learning Intrusion Prevention Policies through Optimal Stopping. 2021. arXiv:
2106.07160 [cs.AI].
6/16
Formulating Intrusion Prevention as a Stopping Problem
time-step t = 1
t
t = T
Episode
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
I Intrusion Prevention as Optimal Stopping Problem:
I The system evolves in discrete time-steps.
I Defender observes the infrastructure (IDS, log files, etc.).
I An intrusion occurs at an unknown time.
I The defender can make L stops.
I Each stop is associated with a defensive action
I The final stop shuts down the infrastructure.
I Based on the observations, when is it optimal to stop?
I We formalize this problem with a POMDP
6/16
Formulating Intrusion Prevention as a Stopping Problem
time-step t = 1
t
t = T
Episode
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
I Intrusion Prevention as Optimal Stopping Problem:
I The system evolves in discrete time-steps.
I Defender observes the infrastructure (IDS, log files, etc.).
I An intrusion occurs at an unknown time.
I The defender can make L stops.
I Each stop is associated with a defensive action
I The final stop shuts down the infrastructure.
I Based on the observations, when is it optimal to stop?
I We formalize this problem with a POMDP
6/16
Formulating Intrusion Prevention as a Stopping Problem
time-step t = 1
t
t = T
Episode
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
I Intrusion Prevention as Optimal Stopping Problem:
I The system evolves in discrete time-steps.
I Defender observes the infrastructure (IDS, log files, etc.).
I An intrusion occurs at an unknown time.
I The defender can make L stops.
I Each stop is associated with a defensive action
I The final stop shuts down the infrastructure.
I Based on the observations, when is it optimal to stop?
I We formalize this problem with a POMDP
6/16
Formulating Intrusion Prevention as a Stopping Problem
Intrusion event
time-step t = 1 Intrusion ongoing
t
t = T
Episode
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
I Intrusion Prevention as Optimal Stopping Problem:
I The system evolves in discrete time-steps.
I Defender observes the infrastructure (IDS, log files, etc.).
I An intrusion occurs at an unknown time.
I The defender can make L stops.
I Each stop is associated with a defensive action
I The final stop shuts down the infrastructure.
I Based on the observations, when is it optimal to stop?
I We formalize this problem with a POMDP
6/16
Formulating Intrusion Prevention as a Stopping Problem
Intrusion event
time-step t = 1 Intrusion ongoing
t
t = T
Early stopping times Stopping times that
affect the intrusion
Episode
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
I Intrusion Prevention as Optimal Stopping Problem:
I The system evolves in discrete time-steps.
I Defender observes the infrastructure (IDS, log files, etc.).
I An intrusion occurs at an unknown time.
I The defender can make L stops.
I Each stop is associated with a defensive action
I The final stop shuts down the infrastructure.
I Based on the observations, when is it optimal to stop?
I We formalize this problem with a POMDP
6/16
Formulating Intrusion Prevention as a Stopping Problem
Intrusion event
time-step t = 1 Intrusion ongoing
t
t = T
Early stopping times Stopping times that
affect the intrusion
Episode
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
I Intrusion Prevention as Optimal Stopping Problem:
I The system evolves in discrete time-steps.
I Defender observes the infrastructure (IDS, log files, etc.).
I An intrusion occurs at an unknown time.
I The defender can make L stops.
I Each stop is associated with a defensive action
I The final stop shuts down the infrastructure.
I Based on the observations, when is it optimal to stop?
I We formalize this problem with a POMDP
6/16
Formulating Intrusion Prevention as a Stopping Problem
Intrusion event
time-step t = 1 Intrusion ongoing
t
t = T
Early stopping times Stopping times that
affect the intrusion
Episode
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
I Intrusion Prevention as Optimal Stopping Problem:
I The system evolves in discrete time-steps.
I Defender observes the infrastructure (IDS, log files, etc.).
I An intrusion occurs at an unknown time.
I The defender can make L stops.
I Each stop is associated with a defensive action
I The final stop shuts down the infrastructure.
I Based on the observations, when is it optimal to stop?
I We formalize this problem with a POMDP
7/16
A Partially Observed MDP Model for the Defender
I States:
I Intrusion state st ∈ {0, 1}, terminal ∅.
I Observations:
I Severe/Warning IDS Alerts (∆x, ∆y),
Login attempts ∆z, stops remaining
lt ∈ {1, .., L},
fXYZ (∆x, ∆y, ∆z|st, It, t)
I Actions:
I “Stop” (S) and “Continue” (C)
I Rewards:
I Reward: security and service. Penalty:
false alarms and intrusions
I Transition probabilities:
I Bernoulli process (Qt)T
t=1 ∼ Ber(p)
defines intrusion start It ∼ Ge(p)
I Objective and Horizon:
I max Eπθ
hPT∅
t=1 r(st, at)
i
, T∅
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
intrusion starts
Qt = 1
intrusion prevented
lt ≤ lA
lt = 0
final stop
t ≥ It + Tint
intrusion succeeds
t ≥ It + Tint
episode ends
5 10 15 20 25
intrusion start time t
0.0
0.5
1.0
CDF
I
t
(t)
It ∼ Ge(p = 0.2)
7/16
A Partially Observed MDP Model for the Defender
I States:
I Intrusion state st ∈ {0, 1}, terminal ∅.
I Observations:
I Severe/Warning IDS Alerts (∆x, ∆y),
Login attempts ∆z, stops remaining
lt ∈ {1, .., L},
fXYZ (∆x, ∆y, ∆z|st, It, t)
I Actions:
I “Stop” (S) and “Continue” (C)
I Rewards:
I Reward: security and service. Penalty:
false alarms and intrusions
I Transition probabilities:
I Bernoulli process (Qt)T
t=1 ∼ Ber(p)
defines intrusion start It ∼ Ge(p)
I Objective and Horizon:
I max Eπθ
hPT∅
t=1 r(st, at)
i
, T∅
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
intrusion starts
Qt = 1
intrusion prevented
lt ≤ lA
lt = 0
final stop
t ≥ It + Tint
intrusion succeeds
t ≥ It + Tint
episode ends
5 10 15 20 25
intrusion start time t
0.0
0.5
1.0
CDF
I
t
(t)
It ∼ Ge(p = 0.2)
7/16
A Partially Observed MDP Model for the Defender
I States:
I Intrusion state st ∈ {0, 1}, terminal ∅.
I Observations:
I Severe/Warning IDS Alerts (∆x, ∆y),
Login attempts ∆z, stops remaining
lt ∈ {1, .., L},
fXYZ (∆x, ∆y, ∆z|st, It, t)
I Actions:
I “Stop” (S) and “Continue” (C)
I Rewards:
I Reward: security and service. Penalty:
false alarms and intrusions
I Transition probabilities:
I Bernoulli process (Qt)T
t=1 ∼ Ber(p)
defines intrusion start It ∼ Ge(p)
I Objective and Horizon:
I max Eπθ
hPT∅
t=1 r(st, at)
i
, T∅
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
intrusion starts
Qt = 1
intrusion prevented
lt ≤ lA
lt = 0
final stop
t ≥ It + Tint
intrusion succeeds
t ≥ It + Tint
episode ends
5 10 15 20 25
intrusion start time t
0.0
0.5
1.0
CDF
I
t
(t)
It ∼ Ge(p = 0.2)
7/16
A Partially Observed MDP Model for the Defender
I States:
I Intrusion state st ∈ {0, 1}, terminal ∅.
I Observations:
I Severe/Warning IDS Alerts (∆x, ∆y),
Login attempts ∆z, stops remaining
lt ∈ {1, .., L},
fXYZ (∆x, ∆y, ∆z|st, It, t)
I Actions:
I “Stop” (S) and “Continue” (C)
I Rewards:
I Reward: security and service. Penalty:
false alarms and intrusions
I Transition probabilities:
I Bernoulli process (Qt)T
t=1 ∼ Ber(p)
defines intrusion start It ∼ Ge(p)
I Objective and Horizon:
I max Eπθ
hPT∅
t=1 r(st, at)
i
, T∅
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
intrusion starts
Qt = 1
intrusion prevented
lt ≤ lA
lt = 0
final stop
t ≥ It + Tint
intrusion succeeds
t ≥ It + Tint
episode ends
5 10 15 20 25
intrusion start time t
0.0
0.5
1.0
CDF
I
t
(t)
It ∼ Ge(p = 0.2)
7/16
A Partially Observed MDP Model for the Defender
I States:
I Intrusion state st ∈ {0, 1}, terminal ∅.
I Observations:
I Severe/Warning IDS Alerts (∆x, ∆y),
Login attempts ∆z, stops remaining
lt ∈ {1, .., L},
fXYZ (∆x, ∆y, ∆z|st, It, t)
I Actions:
I “Stop” (S) and “Continue” (C)
I Rewards:
I Reward: security and service. Penalty:
false alarms and intrusions
I Transition probabilities:
I Bernoulli process (Qt)T
t=1 ∼ Ber(p)
defines intrusion start It ∼ Ge(p)
I Objective and Horizon:
I max Eπθ
hPT∅
t=1 r(st, at)
i
, T∅
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
intrusion starts
Qt = 1
intrusion prevented
lt ≤ lA
lt = 0
final stop
t ≥ It + Tint
intrusion succeeds
t ≥ It + Tint
episode ends
5 10 15 20 25
intrusion start time t
0.0
0.5
1.0
CDF
I
t
(t)
It ∼ Ge(p = 0.2)
7/16
A Partially Observed MDP Model for the Defender
I States:
I Intrusion state st ∈ {0, 1}, terminal ∅.
I Observations:
I Severe/Warning IDS Alerts (∆x, ∆y),
Login attempts ∆z, stops remaining
lt ∈ {1, .., L},
fXYZ (∆x, ∆y, ∆z|st, It, t)
I Actions:
I “Stop” (S) and “Continue” (C)
I Rewards:
I Reward: security and service. Penalty:
false alarms and intrusions
I Transition probabilities:
I Bernoulli process (Qt)T
t=1 ∼ Ber(p)
defines intrusion start It ∼ Ge(p)
I Objective and Horizon:
I max Eπθ
hPT∅
t=1 r(st, at)
i
, T∅
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
intrusion starts
Qt = 1
intrusion prevented
lt ≤ lA
lt = 0
final stop
t ≥ It + Tint
intrusion succeeds
t ≥ It + Tint
episode ends
5 10 15 20 25
intrusion start time t
0.0
0.5
1.0
CDF
I
t
(t)
It ∼ Ge(p = 0.2)
7/16
A Partially Observed MDP Model for the Defender
I States:
I Intrusion state st ∈ {0, 1}, terminal ∅.
I Observations:
I Severe/Warning IDS Alerts (∆x, ∆y),
Login attempts ∆z, stops remaining
lt ∈ {1, .., L},
fXYZ (∆x, ∆y, ∆z|st, It, t)
I Actions:
I “Stop” (S) and “Continue” (C)
I Rewards:
I Reward: security and service. Penalty:
false alarms and intrusions
I Transition probabilities:
I Bernoulli process (Qt)T
t=1 ∼ Ber(p)
defines intrusion start It ∼ Ge(p)
I Objective and Horizon:
I max Eπθ
hPT∅
t=1 r(st, at)
i
, T∅
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
intrusion starts
Qt = 1
intrusion prevented
lt ≤ lA
lt = 0
final stop
t ≥ It + Tint
intrusion succeeds
t ≥ It + Tint
episode ends
5 10 15 20 25
intrusion start time t
0.0
0.5
1.0
CDF
I
t
(t)
It ∼ Ge(p = 0.2)
7/16
A Partially Observed MDP Model for the Defender
I States:
I Intrusion state st ∈ {0, 1}, terminal ∅.
I Observations:
I Severe/Warning IDS Alerts (∆x, ∆y),
Login attempts ∆z, stops remaining
lt ∈ {1, .., L},
fXYZ (∆x, ∆y, ∆z|st, It, t)
I Actions:
I “Stop” (S) and “Continue” (C)
I Rewards:
I Reward: security and service. Penalty:
false alarms and intrusions
I Transition probabilities:
I Bernoulli process (Qt)T
t=1 ∼ Ber(p)
defines intrusion start It ∼ Ge(p)
I Objective and Horizon:
I max Eπθ
hPT∅
t=1 r(st, at)
i
, T∅
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
intrusion starts
Qt = 1
intrusion prevented
lt ≤ lA
lt = 0
final stop
t ≥ It + Tint
intrusion succeeds
t ≥ It + Tint
episode ends
5 10 15 20 25
intrusion start time t
0.0
0.5
1.0
CDF
I
t
(t)
It ∼ Ge(p = 0.2)
We analyze the optimal policy using optimal stopping theory
8/16
Threshold Properties of the Optimal Defender Policy
h̃t = ∆xt + ∆yt + ∆zt
∆x = Severe IDS alerts at time t
∆y = Warning IDS alerts at time t
∆z = Login attempts at time t
h̃t
0
8/16
Threshold Properties of the Optimal Defender Policy
h̃t = ∆xt + ∆yt + ∆zt
∆x = Severe IDS alerts at time t
∆y = Warning IDS alerts at time t
∆z = Login attempts at time t
h̃t
0
π∗
l (ht) = S ⇐⇒ h̃t ≥ β∗
l , l = 1
S 1
β∗
1
8/16
Threshold Properties of the Optimal Defender Policy
h̃t = ∆xt + ∆yt + ∆zt
∆x = Severe IDS alerts at time t
∆y = Warning IDS alerts at time t
∆z = Login attempts at time t
h̃t
0
π∗
l (ht) = S ⇐⇒ h̃t ≥ β∗
l , l ∈ 1, 2
S 1
S 2
β∗
1
β∗
2
8/16
Threshold Properties of the Optimal Defender Policy
h̃t = ∆xt + ∆yt + ∆zt
∆x = Severe IDS alerts at time t
∆y = Warning IDS alerts at time t
∆z = Login attempts at time t
h̃t
0
S 1
S 2
.
.
.
S L
π∗
l (ht) = S ⇐⇒ h̃t ≥ β∗
l , l ∈ 1, . . . , L
β∗
1
β∗
2
β∗
L
. . .
9/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
9/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
9/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
9/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
9/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
9/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
9/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
9/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
10/16
The Target Infrastructure
I Topology:
I 30 Application Servers, 1 Gateway/IDS (Snort), 3 Clients, 1 Attacker,
1 Defender
I Services
I 31 SSH, 8 HTTP, 1 DNS, 1 Telnet, 2 FTP, 1 MongoDB, 2 SMTP, 2
Teamspeak 3, 22 SNMP, 12 IRC, 1 Elasticsearch, 12 NTP, 1 Samba,
19 PostgreSQL
I RCE Vulnerabilities
I 1 CVE-2010-0426, 1 CVE-2014-6271, 1 SQL Injection, 1
CVE-2015-3306, 1 CVE-2016-10033, 1 CVE-2015-5602, 1
CVE-2015-1427, 1 CVE-2017-7494
I 5 Brute-force vulnerabilities
I Operating Systems
I 23 Ubuntu-20, 1 Debian 9:2, 1 Debian Wheezy, 6 Debian Jessie, 1
Kali
Attacker Clients
. . .
Defender
1 IDS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
Target infrastructure.
11/16
Emulating the Client Population
Client Functions Application servers
1 HTTP, SSH, SNMP, ICMP N2, N3, N10, N12
2 IRC, PostgreSQL, SNMP N31, N13, N14, N15, N16
3 FTP, DNS, Telnet N10, N22, N4
Table 1: Emulated client population; each client interacts with
application servers using a set of functions at short intervals.
12/16
Emulating the Defender’s Actions
lt Action Command in the Emulation
3 Reset users deluser –remove-home <username>
2 Blacklist IPs iptables -A INPUT -s <ip> -j DROP
1 Block gateway iptables -A INPUT -i eth0 -j DROP
Table 2: Commands used to implement the defender’s stop actions in the
emulation.
13/16
Static Attackers to Emulate Intrusions
Time-steps t NoviceAttacker ExperiencedAttacker ExpertAttacker
1-It ∼ Ge(0.2) (Intrusion has not started) (Intrusion has not started) (Intrusion has not started)
It + 1-It + 6 Recon1, brute-force attacks (SSH,Telnet,FTP) Recon2, CVE-2017-7494 exploit on N4, Recon3, CVE-2017-7494 exploit on N4,
on N2, N4, N10, login(N2, N4, N10), brute-force attack (SSH) on N2, login(N2, N4), login(N4), backdoor(N4)
backdoor(N2, N4, N10) backdoor(N2, N4), Recon2 Recon3, SQL Injection on N18
It + 7-It + 10 Recon1, CVE-2014-6271 on N17, CVE-2014-6271 on N17, login(N17) login(N18), backdoor(N18),
login(N17), backdoor(N17) backdoor(N17), SSH brute-force attack on N12 Recon3, CVE-2015-1427 on N25
It + 11-It + 14 SSH brute-force attack on N12, login(N12) login(N12), CVE-2010-0426 exploit on N12, login(N25), backdoor(N25),
CVE-2010-0426 exploit on N12, Recon1 Recon2, SQL Injection on N18 Recon3, CVE-2017-7494 exploit on N27
It + 15-It + 16 login(N18), backdoor(N18) login(N27), backdoor(N27)
It + 17-It + 19 Recon2, CVE-2015-1427 on N25, login(N25)
Table 3: Attacker actions to emulate intrusions.
14/16
Learning Intrusion Prevention Policies through Optimal
Stopping
−200
0
200
vs
Novice
Reward per episode
10
15
20
Episode length (steps)
0.6
0.7
0.8
0.9
1.0
P[intrusion interrupted]
0.0
0.2
0.4
P[early stopping]
1
2
3
4
Duration of intrusion
−200
0
vs
Experienced
5
10
15
20
0.6
0.7
0.8
0.9
1.0
0.0
0.2
0.4
1
2
3
4
5
0 250K 500K 750K
# training episodes
−150
−100
−50
0
50
vs
Expert
0 250K 500K 750K
# training episodes
2
4
6
8
10
0 250K 500K 750K
# training episodes
0.6
0.7
0.8
0.9
1.0
0 250K 500K 750K
# training episodes
0.0
0.2
0.4
0 250K 500K 750K
# training episodes
2
4
6
πθ emulation πθ simulation t = 6 baseline (∆x + ∆y) ≥ 1 baseline Upper bound π∗
Learning curves of training defender policies against static attackers,
L = 3.
15/16
Threshold Properties of the Learned Policies, L = 3
0.00
0.25
0.50
0.75
1.00
vs
Novice
z = 0, t = 5 − lt z = 0, t = 10 − lt
0.00
0.25
0.50
0.75
1.00
vs
Experienced
0 50 100 150 200
# total alerts x + y
0.00
0.25
0.50
0.75
1.00
vs
Expert
0 50 100 150 200
# total alerts x + y
πlt=3
θ (S) πlt=2
θ (S) πlt=1
θ (S) b(1)
P
first stop second & third stop
not used
second stop
first stop
third stop
not used
all stops used
when x + y ≥ 40
16/16
Conclusions & Future Work
I Conclusions:
I We develop a method to find learn intrusion prevention
policies
I (1) emulation system; (2) system identification; (3) simulation system; (4) reinforcement
learning and (5) domain randomization and generalization.
I We formulate intrusion prevention as a multiple stopping
problem
I We present a POMDP model of the use case
I We apply the stopping theory to establish structural results of the optimal policy
I Our research plans:
I Extending the theoretical model
I Relaxing simplifying assumptions (e.g. more dynamic defender actions)
I Active attacker
I Evaluation on real world infrastructures

More Related Content

Similar to Learning Intrusion Prevention Policies Through Optimal Stopping

Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayKim Hammar
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingKim Hammar
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...Kim Hammar
 
Nse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadlerNse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadlerKim Hammar
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseKim Hammar
 
Monitoring your organization against threats - Critical System Control
Monitoring your organization against threats - Critical System ControlMonitoring your organization against threats - Critical System Control
Monitoring your organization against threats - Critical System ControlMarc-Andre Heroux
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Huntpedia
HuntpediaHuntpedia
HuntpediaJc Sv
 
ITD BSides PDX Slides
ITD BSides PDX SlidesITD BSides PDX Slides
ITD BSides PDX SlidesEricGoldstrom
 
Lecture #8: Clark-Wilson & Chinese Wall Model for Multilevel Security
Lecture #8: Clark-Wilson & Chinese Wall Model for Multilevel SecurityLecture #8: Clark-Wilson & Chinese Wall Model for Multilevel Security
Lecture #8: Clark-Wilson & Chinese Wall Model for Multilevel SecurityDr. Ramchandra Mangrulkar
 
Smau Bologna 2013 Stefano Zanero
Smau Bologna 2013 Stefano ZaneroSmau Bologna 2013 Stefano Zanero
Smau Bologna 2013 Stefano ZaneroSMAU
 
huntpedia.pdf
huntpedia.pdfhuntpedia.pdf
huntpedia.pdfCecilSu
 
Coco co-desing and co-verification of masked software implementations on cp us
Coco   co-desing and co-verification of masked software implementations on cp usCoco   co-desing and co-verification of masked software implementations on cp us
Coco co-desing and co-verification of masked software implementations on cp usRISC-V International
 
[Bucharest] Attack is easy, let's talk defence
[Bucharest] Attack is easy, let's talk defence[Bucharest] Attack is easy, let's talk defence
[Bucharest] Attack is easy, let's talk defenceOWASP EEE
 

Similar to Learning Intrusion Prevention Policies Through Optimal Stopping (20)

Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Nse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadlerNse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadler
 
10probs.ppt
10probs.ppt10probs.ppt
10probs.ppt
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
The Edge to AI
The Edge to AIThe Edge to AI
The Edge to AI
 
Monitoring your organization against threats - Critical System Control
Monitoring your organization against threats - Critical System ControlMonitoring your organization against threats - Critical System Control
Monitoring your organization against threats - Critical System Control
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
The red book
The red book  The red book
The red book
 
Huntpedia
HuntpediaHuntpedia
Huntpedia
 
ITD BSides PDX Slides
ITD BSides PDX SlidesITD BSides PDX Slides
ITD BSides PDX Slides
 
Lecture #8: Clark-Wilson & Chinese Wall Model for Multilevel Security
Lecture #8: Clark-Wilson & Chinese Wall Model for Multilevel SecurityLecture #8: Clark-Wilson & Chinese Wall Model for Multilevel Security
Lecture #8: Clark-Wilson & Chinese Wall Model for Multilevel Security
 
Smau Bologna 2013 Stefano Zanero
Smau Bologna 2013 Stefano ZaneroSmau Bologna 2013 Stefano Zanero
Smau Bologna 2013 Stefano Zanero
 
huntpedia.pdf
huntpedia.pdfhuntpedia.pdf
huntpedia.pdf
 
Coco co-desing and co-verification of masked software implementations on cp us
Coco   co-desing and co-verification of masked software implementations on cp usCoco   co-desing and co-verification of masked software implementations on cp us
Coco co-desing and co-verification of masked software implementations on cp us
 
[Bucharest] Attack is easy, let's talk defence
[Bucharest] Attack is easy, let's talk defence[Bucharest] Attack is easy, let's talk defence
[Bucharest] Attack is easy, let's talk defence
 

More from Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesKim Hammar
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHKim Hammar
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion ResponseKim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlKim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionKim Hammar
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security AutomationKim Hammar
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Kim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Kim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Kim Hammar
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Kim Hammar
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...Kim Hammar
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhetKim Hammar
 
MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupKim Hammar
 

More from Kim Hammar (16)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhet
 
MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading Group
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Learning Intrusion Prevention Policies Through Optimal Stopping

  • 1. 1/16 Learning Intrusion Prevention Policies Through Optimal Stopping CDIS Research Workshop Kim Hammar & Rolf Stadler kimham@kth.se & stadler@kth.se Division of Network and Systems Engineering KTH Royal Institute of Technology Oct 14, 2021
  • 2. 2/16 Challenges: Evolving and Automated Attacks I Challenges: I Evolving & automated attacks I Complex infrastructures Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 3. 2/16 Goal: Automation and Learning I Challenges I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 4. 2/16 Approach: Game Model & Reinforcement Learning I Challenges: I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods I Our Approach: I Model network attack and defense as games. I Use reinforcement learning to learn policies. I Incorporate learned policies in self-learning systems. Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 5. 3/16 Use Case: Intrusion Prevention I A Defender owns an infrastructure I Consists of connected components I Components run network services I Defender defends the infrastructure by monitoring and active defense I An Attacker seeks to intrude on the infrastructure I Has a partial view of the infrastructure I Wants to compromise specific components I Attacks by reconnaissance, exploitation and pivoting Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 6. 4/16 Use Case: Intrusion Prevention I A Defender owns an infrastructure I Consists of connected components I Components run network services I Defender defends the infrastructure by monitoring and active defense I An Attacker seeks to intrude on the infrastructure I Has a partial view of the infrastructure I Wants to compromise specific components I Attacks by reconnaissance, exploitation and pivoting Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31 We formulate this use case as an Optimal Stopping problem
  • 7. 5/16 Background on Optimal Stopping Problems I The General Problem: I A Markov process (st)T t=1 is observed sequentially I Two options per t: (i) continue to observe; or (ii) stop I Find the optimal stopping time τ∗ : τ∗ = arg max τ Eτ " τ−1 X t=1 γt−1 RC st st+1 + γτ−1 RS sτ sτ # (1) where RS ss0 & RC ss0 are the stop/continue rewards I History: I Studied in the 18th century to analyze a gambler’s fortune I Formalized by Abraham Wald in 1947 I Since then it has been generalized and developed by (Chow, Shiryaev & Kolmogorov, Bather, Bertsekas, etc.) I Applications & Use Cases: I Change detection, machine replacement, hypothesis testing, gambling, selling decisions, queue management, advertisement scheduling, the secretary problem, etc.
  • 8. 5/16 Background on Optimal Stopping Problems I The General Problem: I A Markov process (st)T t=1 is observed sequentially I Two options per t: (i) continue to observe; or (ii) stop I History: I Studied in the 18th century to analyze a gambler’s fortune I Formalized by Abraham Wald in 19471 I Since then it has been generalized and developed by (Chow2 , Shiryaev & Kolmogorov3 , Bather4 , Bertsekas5 , etc.) 1 Abraham Wald. Sequential Analysis. Wiley and Sons, New York, 1947. 2 Y. Chow, H. Robbins, and D. Siegmund. “Great expectations: The theory of optimal stopping”. In: 1971. 3 Albert N. Shirayev. Optimal Stopping Rules. Reprint of russian edition from 1969. Springer-Verlag Berlin, 2007. 4 John Bather. Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions. USA: John Wiley and Sons, Inc., 2000. isbn: 0471976490. 5 Dimitri P. Bertsekas. Dynamic Programming and Optimal Control. 3rd. Vol. I. Belmont, MA, USA: Athena Scientific, 2005.
  • 9. 5/16 Background on Optimal Stopping Problems I The General Problem: I A Markov process (st)T t=1 is observed sequentially I Two options per t: (i) continue to observe; or (ii) stop I History: I Studied in the 18th century to analyze a gambler’s fortune I Formalized by Abraham Wald in 1947 I Since then it has been generalized and developed by (Chow, Shiryaev & Kolmogorov, Bather, Bertsekas, etc.) I Applications & Use Cases: I Change detection6 , selling decisions7 , queue management8 , advertisement scheduling9 , etc. 6 Alexander G. Tartakovsky et al. “Detection of intrusions in information systems by sequential change-point methods”. In: Statistical Methodology 3.3 (2006). issn: 1572-3127. doi: https://doi.org/10.1016/j.stamet.2005.05.003. url: https://www.sciencedirect.com/science/article/pii/S1572312705000493. 7 Jacques du Toit and Goran Peskir. “Selling a stock at the ultimate maximum”. In: The Annals of Applied Probability 19.3 (2009). issn: 1050-5164. doi: 10.1214/08-aap566. url: http://dx.doi.org/10.1214/08-AAP566. 8 Arghyadip Roy et al. “Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes”. In: CoRR (2019). http://arxiv.org/abs/1912.10325. eprint: 1912.10325. 9 Vikram Krishnamurthy, Anup Aprem, and Sujay Bhatt. “Multiple stopping time POMDPs: Structural results & application in interactive advertising on social media”. In: Automatica 95 (2018), pp. 385–398. issn: 0005-1098. doi: https://doi.org/10.1016/j.automatica.2018.06.013. url: https://www.sciencedirect.com/science/article/pii/S0005109818303054.
  • 10. 5/16 Background on Optimal Stopping Problems I The General Problem: I A Markov process (st )T t=1 is observed sequentially I Two options per t: (i) continue to observe; or (ii) stop I History: I Studied in the 18th century to analyze a gambler’s fortune I Formalized by Abraham Wald in 1947 I Since then it has been generalized and developed by (Chow, Shiryaev & Kolmogorov, Bather, Bertsekas, etc.) I Applications & Use Cases: I Change detection10 , selling decisions11 , queue management12 , advertisement scheduling13 , intrusion prevention14 etc. 10 Alexander G. Tartakovsky et al. “Detection of intrusions in information systems by sequential change-point methods”. In: Statistical Methodology 3.3 (2006). issn: 1572-3127. doi: https://doi.org/10.1016/j.stamet.2005.05.003. url: https://www.sciencedirect.com/science/article/pii/S1572312705000493. 11 Jacques du Toit and Goran Peskir. “Selling a stock at the ultimate maximum”. In: The Annals of Applied Probability 19.3 (2009). issn: 1050-5164. doi: 10.1214/08-aap566. url: http://dx.doi.org/10.1214/08-AAP566. 12 Arghyadip Roy et al. “Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes”. In: CoRR (2019). http://arxiv.org/abs/1912.10325. eprint: 1912.10325. 13 Vikram Krishnamurthy, Anup Aprem, and Sujay Bhatt. “Multiple stopping time POMDPs: Structural results & application in interactive advertising on social media”. In: Automatica 95 (2018), pp. 385–398. issn: 0005-1098. doi: https://doi.org/10.1016/j.automatica.2018.06.013. url: https://www.sciencedirect.com/science/article/pii/S0005109818303054. 14 Kim Hammar and Rolf Stadler. Learning Intrusion Prevention Policies through Optimal Stopping. 2021. arXiv: 2106.07160 [cs.AI].
  • 11. 6/16 Formulating Intrusion Prevention as a Stopping Problem time-step t = 1 t t = T Episode Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31 I Intrusion Prevention as Optimal Stopping Problem: I The system evolves in discrete time-steps. I Defender observes the infrastructure (IDS, log files, etc.). I An intrusion occurs at an unknown time. I The defender can make L stops. I Each stop is associated with a defensive action I The final stop shuts down the infrastructure. I Based on the observations, when is it optimal to stop? I We formalize this problem with a POMDP
  • 12. 6/16 Formulating Intrusion Prevention as a Stopping Problem time-step t = 1 t t = T Episode Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31 I Intrusion Prevention as Optimal Stopping Problem: I The system evolves in discrete time-steps. I Defender observes the infrastructure (IDS, log files, etc.). I An intrusion occurs at an unknown time. I The defender can make L stops. I Each stop is associated with a defensive action I The final stop shuts down the infrastructure. I Based on the observations, when is it optimal to stop? I We formalize this problem with a POMDP
  • 13. 6/16 Formulating Intrusion Prevention as a Stopping Problem time-step t = 1 t t = T Episode Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31 I Intrusion Prevention as Optimal Stopping Problem: I The system evolves in discrete time-steps. I Defender observes the infrastructure (IDS, log files, etc.). I An intrusion occurs at an unknown time. I The defender can make L stops. I Each stop is associated with a defensive action I The final stop shuts down the infrastructure. I Based on the observations, when is it optimal to stop? I We formalize this problem with a POMDP
  • 14. 6/16 Formulating Intrusion Prevention as a Stopping Problem Intrusion event time-step t = 1 Intrusion ongoing t t = T Episode Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31 I Intrusion Prevention as Optimal Stopping Problem: I The system evolves in discrete time-steps. I Defender observes the infrastructure (IDS, log files, etc.). I An intrusion occurs at an unknown time. I The defender can make L stops. I Each stop is associated with a defensive action I The final stop shuts down the infrastructure. I Based on the observations, when is it optimal to stop? I We formalize this problem with a POMDP
  • 15. 6/16 Formulating Intrusion Prevention as a Stopping Problem Intrusion event time-step t = 1 Intrusion ongoing t t = T Early stopping times Stopping times that affect the intrusion Episode Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31 I Intrusion Prevention as Optimal Stopping Problem: I The system evolves in discrete time-steps. I Defender observes the infrastructure (IDS, log files, etc.). I An intrusion occurs at an unknown time. I The defender can make L stops. I Each stop is associated with a defensive action I The final stop shuts down the infrastructure. I Based on the observations, when is it optimal to stop? I We formalize this problem with a POMDP
  • 16. 6/16 Formulating Intrusion Prevention as a Stopping Problem Intrusion event time-step t = 1 Intrusion ongoing t t = T Early stopping times Stopping times that affect the intrusion Episode Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31 I Intrusion Prevention as Optimal Stopping Problem: I The system evolves in discrete time-steps. I Defender observes the infrastructure (IDS, log files, etc.). I An intrusion occurs at an unknown time. I The defender can make L stops. I Each stop is associated with a defensive action I The final stop shuts down the infrastructure. I Based on the observations, when is it optimal to stop? I We formalize this problem with a POMDP
  • 17. 6/16 Formulating Intrusion Prevention as a Stopping Problem Intrusion event time-step t = 1 Intrusion ongoing t t = T Early stopping times Stopping times that affect the intrusion Episode Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31 I Intrusion Prevention as Optimal Stopping Problem: I The system evolves in discrete time-steps. I Defender observes the infrastructure (IDS, log files, etc.). I An intrusion occurs at an unknown time. I The defender can make L stops. I Each stop is associated with a defensive action I The final stop shuts down the infrastructure. I Based on the observations, when is it optimal to stop? I We formalize this problem with a POMDP
  • 18. 7/16 A Partially Observed MDP Model for the Defender I States: I Intrusion state st ∈ {0, 1}, terminal ∅. I Observations: I Severe/Warning IDS Alerts (∆x, ∆y), Login attempts ∆z, stops remaining lt ∈ {1, .., L}, fXYZ (∆x, ∆y, ∆z|st, It, t) I Actions: I “Stop” (S) and “Continue” (C) I Rewards: I Reward: security and service. Penalty: false alarms and intrusions I Transition probabilities: I Bernoulli process (Qt)T t=1 ∼ Ber(p) defines intrusion start It ∼ Ge(p) I Objective and Horizon: I max Eπθ hPT∅ t=1 r(st, at) i , T∅ 0 1 ∅ t ≥ 1 lt > 0 t ≥ 2 lt > 0 intrusion starts Qt = 1 intrusion prevented lt ≤ lA lt = 0 final stop t ≥ It + Tint intrusion succeeds t ≥ It + Tint episode ends 5 10 15 20 25 intrusion start time t 0.0 0.5 1.0 CDF I t (t) It ∼ Ge(p = 0.2)
  • 19. 7/16 A Partially Observed MDP Model for the Defender I States: I Intrusion state st ∈ {0, 1}, terminal ∅. I Observations: I Severe/Warning IDS Alerts (∆x, ∆y), Login attempts ∆z, stops remaining lt ∈ {1, .., L}, fXYZ (∆x, ∆y, ∆z|st, It, t) I Actions: I “Stop” (S) and “Continue” (C) I Rewards: I Reward: security and service. Penalty: false alarms and intrusions I Transition probabilities: I Bernoulli process (Qt)T t=1 ∼ Ber(p) defines intrusion start It ∼ Ge(p) I Objective and Horizon: I max Eπθ hPT∅ t=1 r(st, at) i , T∅ 0 1 ∅ t ≥ 1 lt > 0 t ≥ 2 lt > 0 intrusion starts Qt = 1 intrusion prevented lt ≤ lA lt = 0 final stop t ≥ It + Tint intrusion succeeds t ≥ It + Tint episode ends 5 10 15 20 25 intrusion start time t 0.0 0.5 1.0 CDF I t (t) It ∼ Ge(p = 0.2)
  • 20. 7/16 A Partially Observed MDP Model for the Defender I States: I Intrusion state st ∈ {0, 1}, terminal ∅. I Observations: I Severe/Warning IDS Alerts (∆x, ∆y), Login attempts ∆z, stops remaining lt ∈ {1, .., L}, fXYZ (∆x, ∆y, ∆z|st, It, t) I Actions: I “Stop” (S) and “Continue” (C) I Rewards: I Reward: security and service. Penalty: false alarms and intrusions I Transition probabilities: I Bernoulli process (Qt)T t=1 ∼ Ber(p) defines intrusion start It ∼ Ge(p) I Objective and Horizon: I max Eπθ hPT∅ t=1 r(st, at) i , T∅ 0 1 ∅ t ≥ 1 lt > 0 t ≥ 2 lt > 0 intrusion starts Qt = 1 intrusion prevented lt ≤ lA lt = 0 final stop t ≥ It + Tint intrusion succeeds t ≥ It + Tint episode ends 5 10 15 20 25 intrusion start time t 0.0 0.5 1.0 CDF I t (t) It ∼ Ge(p = 0.2)
  • 21. 7/16 A Partially Observed MDP Model for the Defender I States: I Intrusion state st ∈ {0, 1}, terminal ∅. I Observations: I Severe/Warning IDS Alerts (∆x, ∆y), Login attempts ∆z, stops remaining lt ∈ {1, .., L}, fXYZ (∆x, ∆y, ∆z|st, It, t) I Actions: I “Stop” (S) and “Continue” (C) I Rewards: I Reward: security and service. Penalty: false alarms and intrusions I Transition probabilities: I Bernoulli process (Qt)T t=1 ∼ Ber(p) defines intrusion start It ∼ Ge(p) I Objective and Horizon: I max Eπθ hPT∅ t=1 r(st, at) i , T∅ 0 1 ∅ t ≥ 1 lt > 0 t ≥ 2 lt > 0 intrusion starts Qt = 1 intrusion prevented lt ≤ lA lt = 0 final stop t ≥ It + Tint intrusion succeeds t ≥ It + Tint episode ends 5 10 15 20 25 intrusion start time t 0.0 0.5 1.0 CDF I t (t) It ∼ Ge(p = 0.2)
  • 22. 7/16 A Partially Observed MDP Model for the Defender I States: I Intrusion state st ∈ {0, 1}, terminal ∅. I Observations: I Severe/Warning IDS Alerts (∆x, ∆y), Login attempts ∆z, stops remaining lt ∈ {1, .., L}, fXYZ (∆x, ∆y, ∆z|st, It, t) I Actions: I “Stop” (S) and “Continue” (C) I Rewards: I Reward: security and service. Penalty: false alarms and intrusions I Transition probabilities: I Bernoulli process (Qt)T t=1 ∼ Ber(p) defines intrusion start It ∼ Ge(p) I Objective and Horizon: I max Eπθ hPT∅ t=1 r(st, at) i , T∅ 0 1 ∅ t ≥ 1 lt > 0 t ≥ 2 lt > 0 intrusion starts Qt = 1 intrusion prevented lt ≤ lA lt = 0 final stop t ≥ It + Tint intrusion succeeds t ≥ It + Tint episode ends 5 10 15 20 25 intrusion start time t 0.0 0.5 1.0 CDF I t (t) It ∼ Ge(p = 0.2)
  • 23. 7/16 A Partially Observed MDP Model for the Defender I States: I Intrusion state st ∈ {0, 1}, terminal ∅. I Observations: I Severe/Warning IDS Alerts (∆x, ∆y), Login attempts ∆z, stops remaining lt ∈ {1, .., L}, fXYZ (∆x, ∆y, ∆z|st, It, t) I Actions: I “Stop” (S) and “Continue” (C) I Rewards: I Reward: security and service. Penalty: false alarms and intrusions I Transition probabilities: I Bernoulli process (Qt)T t=1 ∼ Ber(p) defines intrusion start It ∼ Ge(p) I Objective and Horizon: I max Eπθ hPT∅ t=1 r(st, at) i , T∅ 0 1 ∅ t ≥ 1 lt > 0 t ≥ 2 lt > 0 intrusion starts Qt = 1 intrusion prevented lt ≤ lA lt = 0 final stop t ≥ It + Tint intrusion succeeds t ≥ It + Tint episode ends 5 10 15 20 25 intrusion start time t 0.0 0.5 1.0 CDF I t (t) It ∼ Ge(p = 0.2)
  • 24. 7/16 A Partially Observed MDP Model for the Defender I States: I Intrusion state st ∈ {0, 1}, terminal ∅. I Observations: I Severe/Warning IDS Alerts (∆x, ∆y), Login attempts ∆z, stops remaining lt ∈ {1, .., L}, fXYZ (∆x, ∆y, ∆z|st, It, t) I Actions: I “Stop” (S) and “Continue” (C) I Rewards: I Reward: security and service. Penalty: false alarms and intrusions I Transition probabilities: I Bernoulli process (Qt)T t=1 ∼ Ber(p) defines intrusion start It ∼ Ge(p) I Objective and Horizon: I max Eπθ hPT∅ t=1 r(st, at) i , T∅ 0 1 ∅ t ≥ 1 lt > 0 t ≥ 2 lt > 0 intrusion starts Qt = 1 intrusion prevented lt ≤ lA lt = 0 final stop t ≥ It + Tint intrusion succeeds t ≥ It + Tint episode ends 5 10 15 20 25 intrusion start time t 0.0 0.5 1.0 CDF I t (t) It ∼ Ge(p = 0.2)
  • 25. 7/16 A Partially Observed MDP Model for the Defender I States: I Intrusion state st ∈ {0, 1}, terminal ∅. I Observations: I Severe/Warning IDS Alerts (∆x, ∆y), Login attempts ∆z, stops remaining lt ∈ {1, .., L}, fXYZ (∆x, ∆y, ∆z|st, It, t) I Actions: I “Stop” (S) and “Continue” (C) I Rewards: I Reward: security and service. Penalty: false alarms and intrusions I Transition probabilities: I Bernoulli process (Qt)T t=1 ∼ Ber(p) defines intrusion start It ∼ Ge(p) I Objective and Horizon: I max Eπθ hPT∅ t=1 r(st, at) i , T∅ 0 1 ∅ t ≥ 1 lt > 0 t ≥ 2 lt > 0 intrusion starts Qt = 1 intrusion prevented lt ≤ lA lt = 0 final stop t ≥ It + Tint intrusion succeeds t ≥ It + Tint episode ends 5 10 15 20 25 intrusion start time t 0.0 0.5 1.0 CDF I t (t) It ∼ Ge(p = 0.2) We analyze the optimal policy using optimal stopping theory
  • 26. 8/16 Threshold Properties of the Optimal Defender Policy h̃t = ∆xt + ∆yt + ∆zt ∆x = Severe IDS alerts at time t ∆y = Warning IDS alerts at time t ∆z = Login attempts at time t h̃t 0
  • 27. 8/16 Threshold Properties of the Optimal Defender Policy h̃t = ∆xt + ∆yt + ∆zt ∆x = Severe IDS alerts at time t ∆y = Warning IDS alerts at time t ∆z = Login attempts at time t h̃t 0 π∗ l (ht) = S ⇐⇒ h̃t ≥ β∗ l , l = 1 S 1 β∗ 1
  • 28. 8/16 Threshold Properties of the Optimal Defender Policy h̃t = ∆xt + ∆yt + ∆zt ∆x = Severe IDS alerts at time t ∆y = Warning IDS alerts at time t ∆z = Login attempts at time t h̃t 0 π∗ l (ht) = S ⇐⇒ h̃t ≥ β∗ l , l ∈ 1, 2 S 1 S 2 β∗ 1 β∗ 2
  • 29. 8/16 Threshold Properties of the Optimal Defender Policy h̃t = ∆xt + ∆yt + ∆zt ∆x = Severe IDS alerts at time t ∆y = Warning IDS alerts at time t ∆z = Login attempts at time t h̃t 0 S 1 S 2 . . . S L π∗ l (ht) = S ⇐⇒ h̃t ≥ β∗ l , l ∈ 1, . . . , L β∗ 1 β∗ 2 β∗ L . . .
  • 30. 9/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 31. 9/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 32. 9/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 33. 9/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 34. 9/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 35. 9/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 36. 9/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 37. 9/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 38. 10/16 The Target Infrastructure I Topology: I 30 Application Servers, 1 Gateway/IDS (Snort), 3 Clients, 1 Attacker, 1 Defender I Services I 31 SSH, 8 HTTP, 1 DNS, 1 Telnet, 2 FTP, 1 MongoDB, 2 SMTP, 2 Teamspeak 3, 22 SNMP, 12 IRC, 1 Elasticsearch, 12 NTP, 1 Samba, 19 PostgreSQL I RCE Vulnerabilities I 1 CVE-2010-0426, 1 CVE-2014-6271, 1 SQL Injection, 1 CVE-2015-3306, 1 CVE-2016-10033, 1 CVE-2015-5602, 1 CVE-2015-1427, 1 CVE-2017-7494 I 5 Brute-force vulnerabilities I Operating Systems I 23 Ubuntu-20, 1 Debian 9:2, 1 Debian Wheezy, 6 Debian Jessie, 1 Kali Attacker Clients . . . Defender 1 IDS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31 Target infrastructure.
  • 39. 11/16 Emulating the Client Population Client Functions Application servers 1 HTTP, SSH, SNMP, ICMP N2, N3, N10, N12 2 IRC, PostgreSQL, SNMP N31, N13, N14, N15, N16 3 FTP, DNS, Telnet N10, N22, N4 Table 1: Emulated client population; each client interacts with application servers using a set of functions at short intervals.
  • 40. 12/16 Emulating the Defender’s Actions lt Action Command in the Emulation 3 Reset users deluser –remove-home <username> 2 Blacklist IPs iptables -A INPUT -s <ip> -j DROP 1 Block gateway iptables -A INPUT -i eth0 -j DROP Table 2: Commands used to implement the defender’s stop actions in the emulation.
  • 41. 13/16 Static Attackers to Emulate Intrusions Time-steps t NoviceAttacker ExperiencedAttacker ExpertAttacker 1-It ∼ Ge(0.2) (Intrusion has not started) (Intrusion has not started) (Intrusion has not started) It + 1-It + 6 Recon1, brute-force attacks (SSH,Telnet,FTP) Recon2, CVE-2017-7494 exploit on N4, Recon3, CVE-2017-7494 exploit on N4, on N2, N4, N10, login(N2, N4, N10), brute-force attack (SSH) on N2, login(N2, N4), login(N4), backdoor(N4) backdoor(N2, N4, N10) backdoor(N2, N4), Recon2 Recon3, SQL Injection on N18 It + 7-It + 10 Recon1, CVE-2014-6271 on N17, CVE-2014-6271 on N17, login(N17) login(N18), backdoor(N18), login(N17), backdoor(N17) backdoor(N17), SSH brute-force attack on N12 Recon3, CVE-2015-1427 on N25 It + 11-It + 14 SSH brute-force attack on N12, login(N12) login(N12), CVE-2010-0426 exploit on N12, login(N25), backdoor(N25), CVE-2010-0426 exploit on N12, Recon1 Recon2, SQL Injection on N18 Recon3, CVE-2017-7494 exploit on N27 It + 15-It + 16 login(N18), backdoor(N18) login(N27), backdoor(N27) It + 17-It + 19 Recon2, CVE-2015-1427 on N25, login(N25) Table 3: Attacker actions to emulate intrusions.
  • 42. 14/16 Learning Intrusion Prevention Policies through Optimal Stopping −200 0 200 vs Novice Reward per episode 10 15 20 Episode length (steps) 0.6 0.7 0.8 0.9 1.0 P[intrusion interrupted] 0.0 0.2 0.4 P[early stopping] 1 2 3 4 Duration of intrusion −200 0 vs Experienced 5 10 15 20 0.6 0.7 0.8 0.9 1.0 0.0 0.2 0.4 1 2 3 4 5 0 250K 500K 750K # training episodes −150 −100 −50 0 50 vs Expert 0 250K 500K 750K # training episodes 2 4 6 8 10 0 250K 500K 750K # training episodes 0.6 0.7 0.8 0.9 1.0 0 250K 500K 750K # training episodes 0.0 0.2 0.4 0 250K 500K 750K # training episodes 2 4 6 πθ emulation πθ simulation t = 6 baseline (∆x + ∆y) ≥ 1 baseline Upper bound π∗ Learning curves of training defender policies against static attackers, L = 3.
  • 43. 15/16 Threshold Properties of the Learned Policies, L = 3 0.00 0.25 0.50 0.75 1.00 vs Novice z = 0, t = 5 − lt z = 0, t = 10 − lt 0.00 0.25 0.50 0.75 1.00 vs Experienced 0 50 100 150 200 # total alerts x + y 0.00 0.25 0.50 0.75 1.00 vs Expert 0 50 100 150 200 # total alerts x + y πlt=3 θ (S) πlt=2 θ (S) πlt=1 θ (S) b(1) P first stop second & third stop not used second stop first stop third stop not used all stops used when x + y ≥ 40
  • 44. 16/16 Conclusions & Future Work I Conclusions: I We develop a method to find learn intrusion prevention policies I (1) emulation system; (2) system identification; (3) simulation system; (4) reinforcement learning and (5) domain randomization and generalization. I We formulate intrusion prevention as a multiple stopping problem I We present a POMDP model of the use case I We apply the stopping theory to establish structural results of the optimal policy I Our research plans: I Extending the theoretical model I Relaxing simplifying assumptions (e.g. more dynamic defender actions) I Active attacker I Evaluation on real world infrastructures