SlideShare a Scribd company logo
1 of 36
Download to read offline
0/24
Automated Security Response through
Online Learning with Adaptive Conjectures1
Kim Hammar, Tao Li, Rolf Stadler, & Quanyan Zhu
kimham@kth.se
Division of Network and Systems Engineering
KTH Royal Institute of Technology
April 5, 2024
1
Kim Hammar, Tao Li, Rolf Stadler, and Quanyan Zhu. Automated Security Response through Online Learning
with Adaptive Conjectures. Submitted to the IEEE, https://arxiv.org/abs/2402.12499. 2024.
1/24
Challenge: IT Systems are Complex
I It is not realistic that any model will capture all the details.
I =⇒ We have to work with approximate models.
I =⇒ model misspecification.
I How does misspecification affect optimality and convergence?
2/24
Our Contribution: Conjectural Online Learning
πA,1
πD,1
Conjecture ρ
(A)
1
Conjecture ρ
(D)
1
πA,2
πD,2
Conjecture ρ
(A)
2
Conjecture ρ
(D)
2
. . .
πA,t
πD,t
Conjecture ρ
(A)
t
Conjecture ρ
(D)
t
Learning Learning
3/24
Use Case: Security Response
I A defender owns an infrastructure.
I Defends the infrastructure by
monitoring and response.
I Has partial observability.
I An attacker seeks to intrude on the
infrastructure.
I Wants to compromise specific
components.
I Attacks by reconnaissance,
exploitation and pivoting.
Attacker Clients
. . .
Defender
1 ids
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
4/24
Prior Work
I Assumes a stationary model with no misspecification
I Limitation: fails to capture many real-world systems.
I Focuses on offline computation of defender strategies
I Limitation: computationally intractable for realistic models.
1 2 3
0
1,000
2,000
3,000
# Servers N
Time
(s)
(a) |O| = 10.
10 100 1000
0
2,000
4,000
6,000
8,000
Observation space size |O|
Time
(s)
(b) N = 1.
Time required to compute a perfect Bayesian equilibrium with hsvi.
5/24
Problem
I Partially-observed stochastic game Γθt .
I Γθt is parameterized by θt, which is hidden.
I Player k has a conjecture of θt, denoted by θt ∈ Θk.
I The player is misspecified if θt 6∈ Θk.
I As θt evolves, the player adapts its conjecture.
I The player uses the conjecture to update its strategy πk,t.
I What is an effective method to update conjectures and
strategies?
6/24
Our Method: Conjectural Online Learning
ρ
(k)
t
posterior
s1,1 s1,2 . . .
θ
(k)
t
conjecture
πk,t
strategy
θ
(k)
t ∼ ρ
(k)
t
rollout action
a
(k)
t
prior ρ
(k)
1
Bayesian
learning
information feedback i
(k)
t
The conjecture distribution ρ
(k)
t is calibrated through Bayesian
learning and the strategy πk,t is updated through rollout.
6/24
Our Method: Conjectural Online Learning
ρ
(k)
t
posterior
s1,1 s1,2 . . .
θ
(k)
t
conjecture
πk,t
strategy
θ
(k)
t ∼ ρ
(k)
t
rollout action
a
(k)
t
prior ρ
(k)
1
Bayesian
learning
information feedback i
(k)
t
The conjecture distribution ρ
(k)
t is calibrated through Bayesian
learning and the strategy πk,t is updated through rollout.
6/24
Our Method: Conjectural Online Learning
ρ
(k)
t
posterior
s1,1 s1,2 . . .
θ
(k)
t
conjecture
πk,t
strategy
θ
(k)
t ∼ ρ
(k)
t
rollout action
a
(k)
t
prior ρ
(k)
1
Bayesian
learning
information feedback i
(k)
t
The conjecture distribution ρ
(k)
t is calibrated through Bayesian
learning and the strategy πk,t is updated through rollout.
6/24
Our Method: Conjectural Online Learning
ρ
(k)
t
posterior
s1,1 s1,2 . . .
θ
(k)
t
conjecture
πk,t
strategy
θ
(k)
t ∼ ρ
(k)
t
rollout action
a
(k)
t
prior ρ
(k)
1
Bayesian
learning
information feedback i
(k)
t
The conjecture distribution ρ
(k)
t is calibrated through Bayesian
learning and the strategy πk,t is updated through rollout.
6/24
Our Method: Conjectural Online Learning
ρ
(k)
t
posterior
s1,1 s1,2 . . .
θ
(k)
t
conjecture
πk,t
strategy
θ
(k)
t ∼ ρ
(k)
t
rollout action
a
(k)
t
prior ρ
(k)
1
Bayesian
learning
information feedback i
(k)
t
The conjecture distribution ρ
(k)
t is calibrated through Bayesian
learning and the strategy πk,t is updated through rollout.
6/24
Our Method: Conjectural Online Learning
ρ
(k)
t
posterior
s1,1 s1,2 . . .
θ
(k)
t
conjecture
πk,t
strategy
θ
(k)
t ∼ ρ
(k)
t
rollout action
a
(k)
t
prior ρ
(k)
1
Bayesian
learning
information feedback i
(k)
t
The conjecture distribution ρ
(k)
t is calibrated through Bayesian
learning and the strategy πk,t is updated through rollout.
7/24
Strategy Adaptation through Conjectural Rollout (1/2)
7/24
Strategy Adaptation through Conjectural Rollout (1/2)
a
(k)
t
bt
Conjectured lookahead tree of player k.
7/24
Strategy Adaptation through Conjectural Rollout (1/2)
a
(k)
t
c
(k)
t
bt
bt+1
Conjectured lookahead tree of player k.
7/24
Strategy Adaptation through Conjectural Rollout (1/2)
a
(k)
t
c
(k)
t
bt
bt+1
Conjectured lookahead tree of player k.
7/24
Strategy Adaptation through Conjectural Rollout (1/2)
a
(k)
t
c
(k)
t
bt
bt+1
T T T
T
T
T
T
Conjectured lookahead tree of player k
7/24
Strategy Adaptation through Conjectural Rollout (1/2)
a
(k)
t
c
(k)
t
bt
bt+1
arg mina
(k)
t
T T T
T
T
T
T
`k-step rollout.
8/24
Strategy Adaptation through Conjectural Rollout (2/2)
`k-step rollout based on the conjectured model:
πk,t(bt) ∈ R(θ
(k)
t , bt, J
(πt )
k , `k) , arg min
a
(k)
t ,a
(k)
t+1,...,a
(k)
t+`k−1
(1)
Eπt


t+`k−1
X
j=t
γj−t
ck(Sj, A
(D)
j ) + γ`k
J
(πt )
k (Bt+`k
) | bt

 .
I θ
(k)
t is the model conjecture.
I ck is the cost function.
I J
πt
k is the conjectured cost-to-go under strategy profile πt.
I bt is the current belief state.
9/24
Performance Guarantees of Rollout (1/2)
Theorem
The conjectured cost of player k’s rollout strategy πk,t satisfies
J
(πk,t ,π−k,t )
k (b) ≤ J
(πk,1,π−k,t )
k (b) ∀b ∈ B. (A)
Intuition:
I The rollout policy improves the base policy in the
conjectured model (A).
9/24
Performance Guarantees of Rollout (1/2)
Theorem
The conjectured cost of player k’s rollout strategy πk,t satisfies
J
(πk,t ,π−k,t )
k (b) ≤ J
(πk,1,π−k,t )
k (b) ∀b ∈ B. (A)
Assuming (θ
(k)
t , `−k) predicts the game `k steps ahead, then
kJ
(πk,t ,π−k,t )
k − J?
k k ≤
2γ`k
1 − γ
kJ
(πk,1,π−k,t )
k − J?
k k, (B)
where J?
k is the optimal cost-to-go. k·k is the maximum norm
kJk = maxx |J(x)|.
Intuition:
I The rollout policy improves the base policy in the
conjectured model (A).
I If the conjectured model is wrong but can predict the next `k
steps, then we can bound the performance (B).
10/24
Performance Guarantees of Rollout (2/2)
The performance bound
kJ
(πk,t ,π−k,t )
k − J?
k k ≤
2γ`k
1 − γ
kJ
(πk,1,π−k,t )
k − J?
k k, (B)
improves superlinearly with the lookahead horizon `k.
20 40 60 80 100 120 140 160 180 200
50
100
150
200 γ = 0.99 γ = 0.985 γ = 0.98 γ = 0.975
`k
2γ`k
1−γ
11/24
Compute Time of the Rollout Operator
1 2 3 4
102
103
104
Time
(s)
`k
Compute time of the rollout operator for varying lookahead horizons `k.
12/24
Our Method: Conjectural Online Learning
ρ
(k)
t
posterior
s1,1 s1,2 . . .
θ
(k)
t
conjecture
πk,t
strategy
θ
(k)
t ∼ ρ
(k)
t
rollout action
a
(k)
t
prior ρ
(k)
1
Bayesian
learning
information feedback i
(k)
t
The conjecture distribution ρ
(k)
t is calibrated through Bayesian
learning and the strategy πk,t is updated through rollout.
13/24
Bayesian Learning to Calibrate Conjectures
ρ
(k)
t is calibrated through Bayesian learning as
ρ
(k)
t (θ
(k)
t ) ,
P[i
(k)
t | θ
(k)
t , bt−1]ρ(k)(θ
(k)
t−1)
R
Θk
P[i
(k)
t | θ
(k)
t , bt−1]ρ
(k)
t−1(dθ
(k)
t )
,
where i
(k)
t is the information feedback at time t.
I We want to characterize limt→∞ ρ
(k)
t .
I Does the conjecture converge?
I Is the conjecture consistent asymptotically?
14/24
Asymptotic Analysis of Bayesian Learning
I Let ν ∈ ∆(B) be an occupancy measure over the belief space.
I We say that a conjecture θ
(k)
is consistent if it minimizes
the weighted KL-divergence:
K(θ
(k)
, ν) , Eb∼νEI(k)
"
ln
P[I(k) | θ, b]
P[I(k) | θ
(k)
, b]
!
| θ, b
#
.
I Let Θ?
k denote the set of consistent conjectures.
Remark
Due to misspecification, θ
(k)
t ∈ Θ?
k does not imply that θ
(k)
t
equals the true parameter vector θt.
15/24
Bayesian Learning Converges to Consistent Conjectures
Intuitively, consistent conjectures are “closest” to the true model.
1
0.1 t = 1
1
0.1
t = 5
1
0.1
t = 10
1
0.2 t = 15
1
0.2
t = 20
1
0.2
t = 25
1
0.2
t = 30
1
0.3
t = 35
1
0.3
t = 40
1
0.3
t = 45
1
0.3
t = 50
1
0.4
t = 55
1
0.5
t = 60
1
0.7
Conjecture ρ
(k)
t
Consistent True model θ Θk
t = 200
. . .
θ θ
16/24
Bayesian Learning is Consistent Asymptotically
As t → ∞, the conjecture distribution ρ
(k)
t concentrates on
the set of consistent conjectures.
Theorem
Given certain regularity conditions, the following property is
guaranteed by col.
lim
t→∞
Z
Θk

K(θ, νt) − K?
Θk
(νt)

ρ
(k)
t (dθ) = 0
a.s.-PR, where K?
Θk
denotes the minimal weighted kl-divergence.
17/24
Our Method: Conjectural Online Learning
ρ
(k)
t
posterior
s1,1 s1,2 . . .
θ
(k)
t
conjecture
πk,t
strategy
θ
(k)
t ∼ ρ
(k)
t
rollout action
a
(k)
t
prior ρ
(k)
1
Bayesian
learning
information feedback i
(k)
t
The conjecture distribution ρ
(k)
t is calibrated through Bayesian
learning and the strategy πk,t is updated through rollout.
18/24
Evaluation - Target Infrastructure
I Target infrastructure to the right.
I Defender monitors the
infrastructure through ids alerts.
I Attacker seeks to compromise
servers.
I The position of the attacker is
unknown.
I Defender can recover compromised
servers at a cost.
rd zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
19/24
Model Parameter
I Let θt represent the number of clients.
I Clients arrive according to the rate function.
λ(t) = exp
dim(ψ)
X
i=1
ψi ti
| {z }
trend
+
dim(χ)
X
k=1
χk sin(ωkt + φk)
| {z }
periodic
!
.
50 100 150 200 250 300 350
20
40
X ∼ Po(λ(t))
λ(t)
λ(t) non-periodic
t
20/24
Correlation Between Observations and the Model
I We collect measurements from our testbed to estimate the
distribution of ids alerts.
20 40 60 80 100 120 140 160 180 200 220 240
50
100
150
200
Average number of clients
t
20 40 60 80 100 120 140 160 180 200 220 240
104
2 · 104
3 · 104
4 · 104
No intrusion 1 compromised server 2 compromised servers
Priority-weighted sum of ids alerts ot
t
21/24
Evaluation of col (1/3)
10 20 30 40 50 60 70
20
40
60
80
100
θt
t
10 20 30 40 50 60 70
5
10
E
θ
(D)
∼ρ
(D)
t
[K(θ
(D)
, νt)]
t
Remark
The conjectures do not converge if θt keep changing.
22/24
Evaluation of col (2/3)
Fix the number of clients to be θt = 12 for all t.
2 4 6 8 10 12 14 16 18 20
0.5
1
ρ
(D)
t (θt = 12)
t
4 6 8 10 12 14 16 18 20
−0.05
0.05
E
θ
(D)
∼ρ
(D)
t
[K(θ
(D)
, νt)]
t
23/24
Evaluation of col (3/3)
Fix the number of clients to be θt = 12 for all t.23
2 8 14 20 26 30
0
10
20
|ΘD|
# Time steps to converge
2
Kim Hammar, Tao Li, Rolf Stadler, and Quanyan Zhu. Automated Security Response through Online Learning
with Adaptive Conjectures. Submitted to the IEEE, https://arxiv.org/abs/2402.12499. 2024.
3
Further evaluations can be found in the paper.
24/24
Conclusion
I We introduce a novel game-theoretic formulation of
automated security response where each player has a
probabilistic conjecture about the game model.
I We present Conjectural Online Learning, a theoretically-sound
method for online learning of security strategies in
non-stationary and uncertain environments.
ρ
(k)
t
posterior
s1,1 s1,2 . . .
θ
(k)
t
conjecture
πk,t
strategy
θ
(k)
t ∼ ρ
(k)
t
rollout action
a
(k)
t
prior ρ
(k)
1
Bayesian
learning
information feedback i
(k)
t

More Related Content

Similar to Automated Security Response through Online Learning with Adaptive Con jectures

Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Willy Marroquin (WillyDevNET)
 
A Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic AssignmentA Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic AssignmentKelly Taylor
 
Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)Aijun Zhang
 
Bayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic ModelsBayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic ModelsColin Gillespie
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationAlexander Litvinenko
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...Alexander Litvinenko
 
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Alexander Litvinenko
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsJagadeeswaran Rathinavel
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Rediet Moges
 
First Place Memocode'14 Design Contest Entry
First Place Memocode'14 Design Contest EntryFirst Place Memocode'14 Design Contest Entry
First Place Memocode'14 Design Contest EntryKevin Townsend
 
Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Alexander Litvinenko
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Metodo Monte Carlo -Wang Landau
Metodo Monte Carlo -Wang LandauMetodo Monte Carlo -Wang Landau
Metodo Monte Carlo -Wang Landauangely alcendra
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi
 
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Cl...
Theoretical and Practical Bounds on the Initial Value of  Skew-Compensated Cl...Theoretical and Practical Bounds on the Initial Value of  Skew-Compensated Cl...
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Cl...Xi'an Jiaotong-Liverpool University
 
Sequential Learning in the Position-Based Model
Sequential Learning in the Position-Based ModelSequential Learning in the Position-Based Model
Sequential Learning in the Position-Based Modelrecsysfr
 

Similar to Automated Security Response through Online Learning with Adaptive Con jectures (20)

Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...
 
A Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic AssignmentA Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic Assignment
 
Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)
 
Bayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic ModelsBayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic Models
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty Quantification
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
 
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithms
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03
 
Ydstie
YdstieYdstie
Ydstie
 
First Place Memocode'14 Design Contest Entry
First Place Memocode'14 Design Contest EntryFirst Place Memocode'14 Design Contest Entry
First Place Memocode'14 Design Contest Entry
 
Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Metodo Monte Carlo -Wang Landau
Metodo Monte Carlo -Wang LandauMetodo Monte Carlo -Wang Landau
Metodo Monte Carlo -Wang Landau
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Cl...
Theoretical and Practical Bounds on the Initial Value of  Skew-Compensated Cl...Theoretical and Practical Bounds on the Initial Value of  Skew-Compensated Cl...
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Cl...
 
Sequential Learning in the Position-Based Model
Sequential Learning in the Position-Based ModelSequential Learning in the Position-Based Model
Sequential Learning in the Position-Based Model
 
lecture 1.pptx
lecture 1.pptxlecture 1.pptx
lecture 1.pptx
 

More from Kim Hammar

Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHKim Hammar
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion ResponseKim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlKim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionKim Hammar
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security AutomationKim Hammar
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Kim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingKim Hammar
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...Kim Hammar
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseKim Hammar
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Kim Hammar
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayKim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Kim Hammar
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Kim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingKim Hammar
 

More from Kim Hammar (20)

Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Automated Security Response through Online Learning with Adaptive Con jectures

  • 1. 0/24 Automated Security Response through Online Learning with Adaptive Conjectures1 Kim Hammar, Tao Li, Rolf Stadler, & Quanyan Zhu kimham@kth.se Division of Network and Systems Engineering KTH Royal Institute of Technology April 5, 2024 1 Kim Hammar, Tao Li, Rolf Stadler, and Quanyan Zhu. Automated Security Response through Online Learning with Adaptive Conjectures. Submitted to the IEEE, https://arxiv.org/abs/2402.12499. 2024.
  • 2. 1/24 Challenge: IT Systems are Complex I It is not realistic that any model will capture all the details. I =⇒ We have to work with approximate models. I =⇒ model misspecification. I How does misspecification affect optimality and convergence?
  • 3. 2/24 Our Contribution: Conjectural Online Learning πA,1 πD,1 Conjecture ρ (A) 1 Conjecture ρ (D) 1 πA,2 πD,2 Conjecture ρ (A) 2 Conjecture ρ (D) 2 . . . πA,t πD,t Conjecture ρ (A) t Conjecture ρ (D) t Learning Learning
  • 4. 3/24 Use Case: Security Response I A defender owns an infrastructure. I Defends the infrastructure by monitoring and response. I Has partial observability. I An attacker seeks to intrude on the infrastructure. I Wants to compromise specific components. I Attacks by reconnaissance, exploitation and pivoting. Attacker Clients . . . Defender 1 ids 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 5. 4/24 Prior Work I Assumes a stationary model with no misspecification I Limitation: fails to capture many real-world systems. I Focuses on offline computation of defender strategies I Limitation: computationally intractable for realistic models. 1 2 3 0 1,000 2,000 3,000 # Servers N Time (s) (a) |O| = 10. 10 100 1000 0 2,000 4,000 6,000 8,000 Observation space size |O| Time (s) (b) N = 1. Time required to compute a perfect Bayesian equilibrium with hsvi.
  • 6. 5/24 Problem I Partially-observed stochastic game Γθt . I Γθt is parameterized by θt, which is hidden. I Player k has a conjecture of θt, denoted by θt ∈ Θk. I The player is misspecified if θt 6∈ Θk. I As θt evolves, the player adapts its conjecture. I The player uses the conjecture to update its strategy πk,t. I What is an effective method to update conjectures and strategies?
  • 7. 6/24 Our Method: Conjectural Online Learning ρ (k) t posterior s1,1 s1,2 . . . θ (k) t conjecture πk,t strategy θ (k) t ∼ ρ (k) t rollout action a (k) t prior ρ (k) 1 Bayesian learning information feedback i (k) t The conjecture distribution ρ (k) t is calibrated through Bayesian learning and the strategy πk,t is updated through rollout.
  • 8. 6/24 Our Method: Conjectural Online Learning ρ (k) t posterior s1,1 s1,2 . . . θ (k) t conjecture πk,t strategy θ (k) t ∼ ρ (k) t rollout action a (k) t prior ρ (k) 1 Bayesian learning information feedback i (k) t The conjecture distribution ρ (k) t is calibrated through Bayesian learning and the strategy πk,t is updated through rollout.
  • 9. 6/24 Our Method: Conjectural Online Learning ρ (k) t posterior s1,1 s1,2 . . . θ (k) t conjecture πk,t strategy θ (k) t ∼ ρ (k) t rollout action a (k) t prior ρ (k) 1 Bayesian learning information feedback i (k) t The conjecture distribution ρ (k) t is calibrated through Bayesian learning and the strategy πk,t is updated through rollout.
  • 10. 6/24 Our Method: Conjectural Online Learning ρ (k) t posterior s1,1 s1,2 . . . θ (k) t conjecture πk,t strategy θ (k) t ∼ ρ (k) t rollout action a (k) t prior ρ (k) 1 Bayesian learning information feedback i (k) t The conjecture distribution ρ (k) t is calibrated through Bayesian learning and the strategy πk,t is updated through rollout.
  • 11. 6/24 Our Method: Conjectural Online Learning ρ (k) t posterior s1,1 s1,2 . . . θ (k) t conjecture πk,t strategy θ (k) t ∼ ρ (k) t rollout action a (k) t prior ρ (k) 1 Bayesian learning information feedback i (k) t The conjecture distribution ρ (k) t is calibrated through Bayesian learning and the strategy πk,t is updated through rollout.
  • 12. 6/24 Our Method: Conjectural Online Learning ρ (k) t posterior s1,1 s1,2 . . . θ (k) t conjecture πk,t strategy θ (k) t ∼ ρ (k) t rollout action a (k) t prior ρ (k) 1 Bayesian learning information feedback i (k) t The conjecture distribution ρ (k) t is calibrated through Bayesian learning and the strategy πk,t is updated through rollout.
  • 13. 7/24 Strategy Adaptation through Conjectural Rollout (1/2)
  • 14. 7/24 Strategy Adaptation through Conjectural Rollout (1/2) a (k) t bt Conjectured lookahead tree of player k.
  • 15. 7/24 Strategy Adaptation through Conjectural Rollout (1/2) a (k) t c (k) t bt bt+1 Conjectured lookahead tree of player k.
  • 16. 7/24 Strategy Adaptation through Conjectural Rollout (1/2) a (k) t c (k) t bt bt+1 Conjectured lookahead tree of player k.
  • 17. 7/24 Strategy Adaptation through Conjectural Rollout (1/2) a (k) t c (k) t bt bt+1 T T T T T T T Conjectured lookahead tree of player k
  • 18. 7/24 Strategy Adaptation through Conjectural Rollout (1/2) a (k) t c (k) t bt bt+1 arg mina (k) t T T T T T T T `k-step rollout.
  • 19. 8/24 Strategy Adaptation through Conjectural Rollout (2/2) `k-step rollout based on the conjectured model: πk,t(bt) ∈ R(θ (k) t , bt, J (πt ) k , `k) , arg min a (k) t ,a (k) t+1,...,a (k) t+`k−1 (1) Eπt   t+`k−1 X j=t γj−t ck(Sj, A (D) j ) + γ`k J (πt ) k (Bt+`k ) | bt   . I θ (k) t is the model conjecture. I ck is the cost function. I J πt k is the conjectured cost-to-go under strategy profile πt. I bt is the current belief state.
  • 20. 9/24 Performance Guarantees of Rollout (1/2) Theorem The conjectured cost of player k’s rollout strategy πk,t satisfies J (πk,t ,π−k,t ) k (b) ≤ J (πk,1,π−k,t ) k (b) ∀b ∈ B. (A) Intuition: I The rollout policy improves the base policy in the conjectured model (A).
  • 21. 9/24 Performance Guarantees of Rollout (1/2) Theorem The conjectured cost of player k’s rollout strategy πk,t satisfies J (πk,t ,π−k,t ) k (b) ≤ J (πk,1,π−k,t ) k (b) ∀b ∈ B. (A) Assuming (θ (k) t , `−k) predicts the game `k steps ahead, then kJ (πk,t ,π−k,t ) k − J? k k ≤ 2γ`k 1 − γ kJ (πk,1,π−k,t ) k − J? k k, (B) where J? k is the optimal cost-to-go. k·k is the maximum norm kJk = maxx |J(x)|. Intuition: I The rollout policy improves the base policy in the conjectured model (A). I If the conjectured model is wrong but can predict the next `k steps, then we can bound the performance (B).
  • 22. 10/24 Performance Guarantees of Rollout (2/2) The performance bound kJ (πk,t ,π−k,t ) k − J? k k ≤ 2γ`k 1 − γ kJ (πk,1,π−k,t ) k − J? k k, (B) improves superlinearly with the lookahead horizon `k. 20 40 60 80 100 120 140 160 180 200 50 100 150 200 γ = 0.99 γ = 0.985 γ = 0.98 γ = 0.975 `k 2γ`k 1−γ
  • 23. 11/24 Compute Time of the Rollout Operator 1 2 3 4 102 103 104 Time (s) `k Compute time of the rollout operator for varying lookahead horizons `k.
  • 24. 12/24 Our Method: Conjectural Online Learning ρ (k) t posterior s1,1 s1,2 . . . θ (k) t conjecture πk,t strategy θ (k) t ∼ ρ (k) t rollout action a (k) t prior ρ (k) 1 Bayesian learning information feedback i (k) t The conjecture distribution ρ (k) t is calibrated through Bayesian learning and the strategy πk,t is updated through rollout.
  • 25. 13/24 Bayesian Learning to Calibrate Conjectures ρ (k) t is calibrated through Bayesian learning as ρ (k) t (θ (k) t ) , P[i (k) t | θ (k) t , bt−1]ρ(k)(θ (k) t−1) R Θk P[i (k) t | θ (k) t , bt−1]ρ (k) t−1(dθ (k) t ) , where i (k) t is the information feedback at time t. I We want to characterize limt→∞ ρ (k) t . I Does the conjecture converge? I Is the conjecture consistent asymptotically?
  • 26. 14/24 Asymptotic Analysis of Bayesian Learning I Let ν ∈ ∆(B) be an occupancy measure over the belief space. I We say that a conjecture θ (k) is consistent if it minimizes the weighted KL-divergence: K(θ (k) , ν) , Eb∼νEI(k) " ln P[I(k) | θ, b] P[I(k) | θ (k) , b] ! | θ, b # . I Let Θ? k denote the set of consistent conjectures. Remark Due to misspecification, θ (k) t ∈ Θ? k does not imply that θ (k) t equals the true parameter vector θt.
  • 27. 15/24 Bayesian Learning Converges to Consistent Conjectures Intuitively, consistent conjectures are “closest” to the true model. 1 0.1 t = 1 1 0.1 t = 5 1 0.1 t = 10 1 0.2 t = 15 1 0.2 t = 20 1 0.2 t = 25 1 0.2 t = 30 1 0.3 t = 35 1 0.3 t = 40 1 0.3 t = 45 1 0.3 t = 50 1 0.4 t = 55 1 0.5 t = 60 1 0.7 Conjecture ρ (k) t Consistent True model θ Θk t = 200 . . . θ θ
  • 28. 16/24 Bayesian Learning is Consistent Asymptotically As t → ∞, the conjecture distribution ρ (k) t concentrates on the set of consistent conjectures. Theorem Given certain regularity conditions, the following property is guaranteed by col. lim t→∞ Z Θk K(θ, νt) − K? Θk (νt) ρ (k) t (dθ) = 0 a.s.-PR, where K? Θk denotes the minimal weighted kl-divergence.
  • 29. 17/24 Our Method: Conjectural Online Learning ρ (k) t posterior s1,1 s1,2 . . . θ (k) t conjecture πk,t strategy θ (k) t ∼ ρ (k) t rollout action a (k) t prior ρ (k) 1 Bayesian learning information feedback i (k) t The conjecture distribution ρ (k) t is calibrated through Bayesian learning and the strategy πk,t is updated through rollout.
  • 30. 18/24 Evaluation - Target Infrastructure I Target infrastructure to the right. I Defender monitors the infrastructure through ids alerts. I Attacker seeks to compromise servers. I The position of the attacker is unknown. I Defender can recover compromised servers at a cost. rd zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 31. 19/24 Model Parameter I Let θt represent the number of clients. I Clients arrive according to the rate function. λ(t) = exp dim(ψ) X i=1 ψi ti | {z } trend + dim(χ) X k=1 χk sin(ωkt + φk) | {z } periodic ! . 50 100 150 200 250 300 350 20 40 X ∼ Po(λ(t)) λ(t) λ(t) non-periodic t
  • 32. 20/24 Correlation Between Observations and the Model I We collect measurements from our testbed to estimate the distribution of ids alerts. 20 40 60 80 100 120 140 160 180 200 220 240 50 100 150 200 Average number of clients t 20 40 60 80 100 120 140 160 180 200 220 240 104 2 · 104 3 · 104 4 · 104 No intrusion 1 compromised server 2 compromised servers Priority-weighted sum of ids alerts ot t
  • 33. 21/24 Evaluation of col (1/3) 10 20 30 40 50 60 70 20 40 60 80 100 θt t 10 20 30 40 50 60 70 5 10 E θ (D) ∼ρ (D) t [K(θ (D) , νt)] t Remark The conjectures do not converge if θt keep changing.
  • 34. 22/24 Evaluation of col (2/3) Fix the number of clients to be θt = 12 for all t. 2 4 6 8 10 12 14 16 18 20 0.5 1 ρ (D) t (θt = 12) t 4 6 8 10 12 14 16 18 20 −0.05 0.05 E θ (D) ∼ρ (D) t [K(θ (D) , νt)] t
  • 35. 23/24 Evaluation of col (3/3) Fix the number of clients to be θt = 12 for all t.23 2 8 14 20 26 30 0 10 20 |ΘD| # Time steps to converge 2 Kim Hammar, Tao Li, Rolf Stadler, and Quanyan Zhu. Automated Security Response through Online Learning with Adaptive Conjectures. Submitted to the IEEE, https://arxiv.org/abs/2402.12499. 2024. 3 Further evaluations can be found in the paper.
  • 36. 24/24 Conclusion I We introduce a novel game-theoretic formulation of automated security response where each player has a probabilistic conjecture about the game model. I We present Conjectural Online Learning, a theoretically-sound method for online learning of security strategies in non-stationary and uncertain environments. ρ (k) t posterior s1,1 s1,2 . . . θ (k) t conjecture πk,t strategy θ (k) t ∼ ρ (k) t rollout action a (k) t prior ρ (k) 1 Bayesian learning information feedback i (k) t