2. 2 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
Finally, the interactions can be very long (as in the case of APTs), or
very frequent (as in the case of IoBT networks under short but very
intense attacks). Previous work using game theory to model lateral
movement (Kamdem et al., 2017) and honeypot allocation (Durkota
et al., 2015; Kiekintveld et al., 2015; PĆbil et al., 2012; Wang et al.,
2017) has not been able to address all of these problems.
Partially Observable Stochastic Games (POSGs) are a class of game-
theoretic models that encapsulates all of these properties and
allows us to reason about strategies in dynamic scenarios with
imperfect information and very long (even inļ¬nite) time horizon.
While POSGs are a very general model, this comes at the cost of
computational complexity. Computing optimal strategies in POSGs
is highly intractable from the algorithmic perspective, even in the
strictly competitive case (i.e., two-player zero-sum setting). If we
assume that both players have partial information about the en-
vironment, the players need to reason not only about their belief
over possible states, but also about the belief the opponent has
over the possible states, beliefs over beliefs, and so on. Restricting to
subclasses of POSGs where this issue of nested beliefs does not arise,
however, allows us to design and implement algorithms that are
guaranteed to converge to optimal strategies (HorĆ”k and BoÅ”anskĆ½,
2019; HorƔk et al., 2017),
Here, we develop new algorithms for POSG models of the lateral
movement problem in cybsersecurity, and show that these new
algorithms allow us to solve problems of a realistic size even with
all of these complications. We study a model of attacker moving
laterally in a computer network ā we assume that the attacker
inļ¬ltrated some nodes in a network and aims to reach a certain
host/node (e.g., the main database). We model this problem as a
two-player game on a graph between the attacker and the
defender. The nodes of the graph correspond to hosts in the
network, and the edges correspond to attacking another host using
a particular vulnerability on the target host, which has an
associated cost. The attacker sequentially chooses which hosts to
attack to progress towards the goal. The defender chooses edges
that will act as honeypots (i.e., fake vulnerabilities) which are able
to detect certain moves by the attacker through the network and
allow him to take additional mitigating actions. The defender can
also change his honeypot deployment after any detection event to
take into account the new piece of information about the attack.
We refer to this game as the Lateral Movement POSG. We for-
malize it as a One-Sided POSGs (HorƔk et al., 2017) where the
defender has partial information about the assets of the attacker,
while the attacker always knows his current assets and knows
when he has been detected. This allows us to apply existing meth-
ods for restricted classes of POSGs to this problem. Unfortunately,
these existing algorithms have very limited scalability for our
games due to two fundamental limitations in the game representa-
tion and algorithm design. The ļ¬rst limitation is the complexity of
representing, updating, and reasoning about the uncertainty of the
defender. In the existing POSGs (as well as in similar single-agent
Partially Observable Markov Decision Processes (POMDPs)), uncer-
tainty is modeled using beliefs that correspond to probability dis-
tributions over the possible states. In the Lateral Movement POSG
the defender must reason over all possible subsets of hosts in a
network, maintaining a probability that any speciļ¬c subset of hosts
has already been compromised. Since the number of possible sub-
sets grows exponentially in the size of the network, the number of
possible states quickly become too large for the algorithm to scale.
The second limitation comes from the number of possible ac- tions
of the players. In the Lateral Movement POSG, the actions of the
defender correspond to placing multiple honeypots on edges in a
graph. Even if there are only 20 possible edges in a network
and the defender has 3 honeypots, the defender has up to 6.8 Ā·
103
possible actions to choose from at a single decision point in the
game. Since the game is dynamic with many decision points, this
problem is exacerbated even further by the length of the game.
We address both of these limitations and propose two key
contributions for improving the scalability of POSG algorithms.
These speciļ¬cally address the problems that arise in our cyber-
security domain, but can apply to many other similar classes of
POSG as well. In particular: (1) we replace the representation of
beliefs over the exponential number of possible states with a
summarized abstraction that captures key information but reduces
the dimensionality of the beliefs; (2) we use an incremental
strategy generation technique to iteratively expand the strategy
space of the players in order to overcome large branching factor1
We present general version of the algorithms and novel solution
concepts, and then show how they can be applied speciļ¬cally to
the Lateral Movement POSG. We give theoretical justiļ¬cation and
analysis of the key elements of the algorithm. In addition, we
evaluate the performance of our algorithm empirically on realistic
instances of the Lateral Movement POSG. Our algorithm is capable
of ļ¬nding strategies that are very near the optimal solutions for
small cases where we can compute the true optimal value. It also
scales dramatically better than the previous state of the art,
allowing us to solve games with twenty or more nodes that are of
practical signiļ¬cance. For example, we can ļ¬nd defense strategies
in smaller computer networks or abstractions of larger network.
With more aggressive approximation we could scale to even larger
examples using the same algorithms.
2. Related work
The concept of a honeypot has been around for more than 30
years in cybersecurity (Stoll, 1989). They have been used for many
different purposes, and have evolved to be much more sophisti-
cated with greater abilities to mimic real hosts and to capture use-
ful information about attackers (Mairh et al., 2011; Nawrocki et al.,
2016). Recently, there have been a large number of game theory
models developed to capture aspects of how honeypots and other
deception methods can be strategically used to improve cyber-
security (Durkota et al., 2015; Garg and Grosu, 2007; Kiekintveld et
al., 2015; La et al., 2016; PĆbil et al., 2012; Wang et al., 2017).
While none of these previous models address the full scope of
long-term dynamic games with uncertainty that we consider here,
several have considered similar aspects of uncertainty or dynamic
interactions. For example, Wang et al. (2017) investigate the use of
honeypots in the smart grid to mitigate denial of service attacks
through the lens of Bayesian games. Durkota et al. (2015) de-
veloped a model of honeypot allocation under uncertainty. La et al.
(2016) also model honeypots mitigating denial of service attacks in
the Internet-of-Things domain. Du et al. (2017) tackle a similar
āhoneypots for denial of service attackā problem with Bayesian
game modeling in the social networking domain. In ad- dition, a
number of works have consider deception as a signaling game (e.g.,
Pawlick and Zhu (2015)).
In games with inļ¬nite horizon, the problem with nested beliefs
prevents one from designing an (approximate) optimal algorithm
for fully general settings. Nested beliefs can be tackled directly with
histories ā one of the few such approaches is bottom-up dynamic
programming for constructing relevant ļ¬nite-horizon policy trees
for individual players while pruning-out dominated
1 A shorter version of this paper based on a subset of the work has been pub-
lished in IJCAI 2019 conference (HorƔk et al., 2019). We provide following signif- icant
novel contributions: (1) integration of the incremental strategy-generation algorithm
for solving stage games, (2) generalization of the problem so that the defender can
use multiple honeypots, (3) extended discussion throughout the text, and an
example of the algorithm, (4) extended experimental evaluation demonstrat- ing
additional signiļ¬cant speed-up over the conference version of the algorithm, and on
more domain-speciļ¬c examples.
3. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 3
strategies (Hansen et al., 2004; Kumar and Zilberstein, 2009).
However, due to the explicit dependence on history, the scalability
in horizon is very limited.
A more common approach is to focus on a subclass of POSGs. In
Ghosh et al. (2004), zero-sum POSGs with public actions and
observations are considered. The authors show that the game has a
value and present an algorithm that exploits the transformation of
such a model into a game with complete information. Another
signiļ¬cant subclass of POSGs are One-Sided POSGs where one
player has perfect information (Basu and Stettner, 2015; Chatterjee
and Doyen, 2014; HorƔk et al., 2017). This subclass is particularly
important for security applications since it provides a naturally
robust strategies against the worst-case fully-informed opponent.
While our algorithm is based on the algorithm for the one-sided
case, we provide signiļ¬cant generalizations of the previous work,
especially in the representation of value function, deļ¬nition and
algorithms for computing value-backup operator.
Replacing the representation of beliefs over the exponential
number of possible states using a compact characteristic vector that
captures key information reduces the dimensionality of the beliefs.
For POMDPs, a similar compact representation has appeared in
existing literature (e.g., in Li et al. (2010); Roy et al. (2005); Zhou et
al. (2010)). The most general approach was to choose an abstraction
based on Principal Components Analysis (Roy et al., 2005). For our
Lateral Movement POSG, we can exploit natural representation in
marginal probabilities ā i.e., reason about prob- ability of each
resource being infected as a characteristic vector, instead of
explicitly considering all possible subsets of infected resources.
While this offers a path to scaling to much larger prob- lems, it has
consequences for the solution quality as well as the construction of
the solution algorithm. Many components of state- of-the-art POSG
solvers are based on manipulating the full belief distribution and
we show how these can be redesigned to operate using the more
compact summarized belief representations.
Finally, we also use incremental strategy generation method
that is often known as the double-oracle algorithm and of- ten
used for solving games with large combinatorial number of actions
in security games (e.g., in Jain et al. (2013, 2011); McMahan et al.
(2003), but also many others) and more recently, the double oracle
method is being also used in combination with reinforcement
learning on domains where computing (approxi- mate) best
response is not possible (Lanctot et al., 2017; Oliehoek et al., 2017;
Wang et al., 2019). Double-oracle algorithm restricts the space of
possible actions to choose from ā the algorithm forms a restricted
stage game that is iteratively expanded by adding new actions into
the restricted game. These actions are often computed as best
responses to the current strategy of the opponent in the restricted
problem. In the worst case, all actions have to be added into the
restricted problem. This, however, rarely happens in practice and
double-oracle algorithms are often able to ļ¬nd an optimal strategy
using only a fraction of all possible plans (see, for example,
BoÅ”anskĆ½ et al. (2014); Jain et al. (2011); Lanctot et al. (2017)).
While the domain-independent idea of double-oracle algorithm is
simple, our contribution is in (1) full integration of this
methodology into the algorithm for solving stage games of Lateral
Movement POSGs and (2) description of exact and heuristic oracle
algorithms for determining which actions should be added into the
restricted game.
3. One-sided POSGs
A one-sided partially observable stochastic game (HorƔk et al.,
2017), or one-sided POSG (OS-POSG), is an imperfect-information
two-player zero-sum inļ¬nite-horizon game with perfect recall
represented by a tuple (S, A1, A2, O, T, R). The game is played for an
inļ¬nite number of stages. At each stage, the game is in one
of the states s āS and players choose their actions a1 āA1 and
a2 āA2 simultaneously. An initial state of the game is drawn from
a probability distribution b0 āļæ½(S) over states termed the initial
belief. The one-sided nature of the game translates to the fact that
while player 1 lacks detailed information about the course of the
game, player 2 is able to observe the game perfectly (i.e., his only
uncertainty is the action a1 player 1 is about to take in the current
stage).
The choice of actions determines the outcome of the current
stage: Player 1 gets an observation o āO and the game transitions
to a state s āS with probability T(o, s | s, a1, a2), where s is the
current state. Furthermore, player 1 gets a reward R(s, a1, a2) for
this transition, and player 2 receives āR(s, a1, a2). The rewards are
discounted over time with discount factor Ī³ < 1.
One-sided POSGs can be solved by approximating the optimal
value function Vā: ļæ½(S) ā R using a pair of value functions V
(lower bound on Vā) and V (upper bound on Vā). The algorithm
proposed in HorĆ”k et al. (2017) reļ¬nes these bounds by solving a
sequence of stage games. In each of these stage games, the
algorithm focuses on deciding the optimal strategies of the players
in this stage (i.e., Ļ1 āļæ½(A1) for player 1 and Ļ2(s) āļæ½(A2) for
player 2) while assuming that the play in the subsequent stages
yields values represented by value functions V or V, respectively.
Solving the stage games also deļ¬nes the ļ¬xed point equation for
Vā,
Vā
(b) =HVā
(b)
Ļ2 Ļ1
(
= min max Eb,Ļ ,Ļ
1 2
[R]+
a1 ,o
1 2
b,Ļ ,Ļ 1 1 2
+Ī³ Pr [a , o] Ā·Vā(Ļ(b, a , Ļ , o)) , (1)
1 2
where Ļ (b, a , Ļ , o) denotes the Bayesian update of the belief
of player 1 when he played a1 and observed o. Note that actions of
player 2 who has complete information affect the observations
player 1 receives and thus belief of player 1.
Each stage game is parameterized by the current belief b ā ļæ½(S)
of player 1. For piecewise-linear and convex V and V, a stage game
is solved using linear programming. We show this linear program
for V (represented as a point-wise maximum over a set r of linear
i
functions Ī± : ļæ½(S) ā R).
min V (2a)
s.t.
a2
2 2
Ļ (s ā§a ) =b(s) (2b)
s,a2
2 2
V ā„ Ļ (s ā§a )R(s, a ,a
1 2
o
) + Ī³ V 1
a o (2c)
1
a o
s,a2
2
b (s ) = Ļ (sā§ 2 1 2
a )T(o, s |s,a , a ) (2d)
1
a o
s
i
1
a o
V ā„ Ī± (s )b (s )
ās
āa1
āa1,o,s
āa1,o,Ī±i (2e)
2 2
Ļ (s ā§a ) ā„0 ās,a 2 (2f)
In this linear program, player 2 seeks a strategy Ļ2 for the
current stage of the game to minimize the utility V of player 1.
Here Ļ2(sā§a2) stands for the joint probability (ensured by con-
straints (2b) and (2f)) that the current state of the game is s and
player 2 plays the action a2. Constraint (2c) stands for player 1
choosing the best-responding action a1. Constraint (2d) expresses
the belief in the subsequent stage of the game when a1 was played
by player 1 and observation o was seen (multiplied by the
probability of seeing that observation), and ļ¬nally con- straint (2e)
represents the value of V in the belief ba1o. Such a linear program
can be then combined with point-based variants of the value-
iteration algorithm, such as Heuristic Search Value Iteration (HSVI)
(HorƔk et al., 2017; Smith and Simmons, 2004).
4. 4 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
A different approximation scheme is used for the upper bound
V on Vā. Instead of using the point-wise maximum over linear
function, V is expressed using a set Ļ = {(b(i), y(i)) |1 ā¤ i ā¤ |Ļ|} of
V(b) =min
Ī»āR
|Ļ |
ā„0
i
(i) T
1ā¤iā¤|Ļ| 1ā¤iā¤|Ļ|
i
(i)
points. The lower convex hull of this set of points is then formed to
obtain the value of V (b),
r
Ī» y |1 Ī» = 1, Ī» b = b . (3)
Constraint (2e) can then be adapted to use this representation of
V as detailed in HorĆ”k and BoÅ”anskĆ½ (2016).
4. Compact representation of Vā
The dimension of the value function Vā depends on the num-
ber of states, which can be very large in games like the lateral
movement POSG where the number of states is exponential in the
size of the network. We propose an abstraction scheme called
summarized abstraction to decrease the dimensionality of the
problem by creating a simpliļ¬ed representation of the beliefs over
the state space.
We associate each belief b āļæ½(S) in the game with a character-
istic vector Ļ(b) = A Ā·b (for some matrix A āRkĆ|S| where k Ā« |S|)
and we deļ¬ne an (approximate) value function V
Ėā: Rk ā R over
the space of characteristic vectors. We denote Ļ 0 the characteristic
vector corresponding to the initial belief b0, i.e., Ļ 0 = A Ā·b0.
The main goal is to adapt algorithms based on value iteration
to operate over the more compact space Rk instead of the original
belief space ļæ½(S). First, we adapt the ļ¬xed point Eq. (1) for OS-
POSGs to work with compact V
Ėā(instead of original Vā). We let the
player 2 choose any belief that is consistent with the current char-
acteristic vector Ļ (by adding an extra minimization term), and we
replace the value of belief b, Vā(b), by the value of its characteri-
zation V
Ėā(A Ā·b). We also denote this compound function V
Ėāā¦ A.
V
Ėā(Ļ) = H
Ė
V
Ėā(Ļ)
b|Ab=Ļ Ļ2 Ļ1
(
= min min max Eb,Ļ1 ,Ļ2
[R]+
a1 ,o
1 2
b,Ļ ,Ļ 1
Ė ā
( 1 2
+Ī³ Pr [a , o] Ā·V A Ā·Ļ(b, a , Ļ , o)) (4)
Next, we show that the value function V
Ėāsatisfying the ļ¬xed point
Eq. (4) (and obtained as a limit of applying operator H
Ė) provides a
valid lower bound on the solution of the original game
representation over the belief space.
Theorem 1. V
Ėā(Ļ(b)) ā¤ Vā(b) for every b āļæ½(S).
Proof. We prove the theorem by induction. Let V
Ė0 be arbitrary
Ė
0 0 0
(b) Ė0 0
and let V of the original game be V (b) = V (Ļ ) (i.e., V ā¤ V ).
Assume VĖt(Ļ(b)) ā¤ Vt(b) for every b āļæ½(S). Now H(VĖt ā¦ A) ā¤
H(Vt). The extra minimization over beliefs b in HVĖcan only decrease
the utility of the stage game and hence
Ė (b) Ė Ė (b) Ė
t+1 t t t t+1
V (Ļ ) = HV (Ļ ) ā¤ H(V ā¦ A)(b) ā¤ HV (b) = V (b). (5)
D
Note that the Eq. (4) can also be solved using linear program-
ming. Consider a lower bound VĖL
B on V
Ėā formed as a point-wise
maximum over linear functions Ī±i(Ļ) = (a(i))T Ļ + z(i). We modify
the linear program (2) by considering that the belief b is a variable
(constrained by Ļ),
constraints (2b), (2c), (2d), (2f) (6a)
s
b(s) =1 (6b)
A Ā·b = Ļ (6c)
b(s) ā„0 ā
s, (6d)
and we replace the constraint (2e) to account for different
representation of VĖL
B by
Ļa1 o = A Ā·ba1 o āa1,o (6e)
q = 1 Ā·b
1 1
a o T a o
1
āa ,o (6f)
1
a o
1 1
(i) T a o (i) a o
V ā„ (a ) Ā·Ļ + z Ā·q āa1,o,Ī±i, (6g)
where qa1o = 1T Ā·ba1o is the probability that o is generated when
player 1 uses action a1.
Observe that the only constraints with non-zero right-hand side
constants in this new linear program are constraints (6b) and (6c).
This is critical in the following proof that V
Ėāis convex.
Theorem 2. Value function V
Ėāis convex.
Proof. Consider a dual formulation of the linear program (2) up-
dated according to equations (6). Since the only non-zero right-
hand side terms in the primal are 1 and Ļ , the objective of the
dual formulation is o(Ļ) = Ļ T Ā·a + z. Moreover, this is the only
place where the characteristic vector Ļ occurs. Hence the polytope
of the feasible solutions of the dual problem is the same for every
characteristic vector Ļ āRk and o(Ļ ) (after ļ¬xing variables a
and z) forms a bound on the objective value of the solution for
arbitrary Ļ . Since we maximize over all possible o(Ļ ) in the dual,
the objective value of the linear program (and also HĖVĖLB) is convex
in the parameter Ļ.
LB
Now, starting with an arbitrary convex (e.g., linear) V
Ė0 , the
LB t= 0
Ė t ā Ė t+1
LB
Ė Ėt
LB
sequence of functions {V } , where V = HV , is formed only
by convex functions. Therefore, the ļ¬xed point V
Ėāis also a convex
function. D
4.1. HSVI algorithm for compact POSGs
The Algorithm 1 we propose for solving abstracted games
Algorithm 1: HSVI algorithm for one-sided POSGs when sum-
marized abstraction is used.
1 Initialize VĖL
B and VĖUB to lower and upper bound on V
Ė ā
ĖUB
0 ĖLB
0
2 while V (Ļ ) ā V (Ļ ) > E do
3 Explore(Ļ0, E,0)
4 procedure Explore(Ļ, E,t)
5 (b, Ļ2) ā optimal belief and strategy of player 2 in
HĖVĖLB(Ļ)
6 (a1, o) ā select according to heuristic
7 Ļ ā Ļ(Ļ, a1, Ļ2, o)
8
Ė Ė
Update r and Ļ based on the solutions of HĖVĖLB(Ļ) and
HVUB(Ļ)
9 if VĖUB(Ļ ) ā VĖLB(Ļ ) > EĪ³āt then
10 Explore(Ļ , E,t + 1)
11 Update r and Ļ based on the solutions of HĖVĖLB(Ļ )
and HĖVĖUB(Ļ)
is a modiļ¬ed version of the original heuristic search value it-
eration algorithm (HSVI) for solving unabstracted one-sided POSGs
(HorƔk et al., 2017). The key difference here is that we use the value
functions VĖL
B and VĖUB (instead of V and V ) and we must have
modiļ¬ed all parts of the algorithm to use the abstracted
representation of the beliefs.
First, we initialize bounds VĖL
B and VĖUB (line 1) to valid piece-
wise linear and convex lower and upper bounds on V
Ėā(using sets r
of linear functions Ī±i = (a(i))T Ļ + z(i) and Ļ of points (Ļ (i),
5. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 5
y(i))). Then, we perform a sequence of trials (lines 2ā3) from the
initial characteristic vector Ļ 0 = Ab0 until the desired precision
E> 0 isreached.
In each of the trials, we ļ¬rst compute the optimal optimistic
strategy of player 2, which in this case is the selection of the
belief b and strategy Ļ (line 5). Next, we choose the action a
2 1
of player 1 and the observation o (lines 6ā7) so that the excess
approximation error VĖUB(Ļ ) ā VĖLB(Ļ ) ā EĪ³āt in the subsequent
stage (where the belief is described by a characteristic vector
Ļ = Ļ(Ļ, a1, Ļ2, o)) is maximized. If this excess approximation er-
ror is positive, we recurse to the characteristic vector Ļ (line 10).
Before and after the recursion the bounds VĖLB(Ļ ) and VĖUB(Ļ) are
improved using the solution of HĖVĖLB(Ļ) and HĖVĖUB(Ļ) (lines 8 and
11). The update of VĖUBis straightforward and a new point (Ļ,
HĖVĖUB(Ļ)) is added to Ļ . To obtain a new linear function to
add to the set r , we use the objective function o(Ļ) = Ļ T Ā·a + z
(after ļ¬xing variables a and z) of the dual linear program to (6) that
forms a lower bound on HĖVĖL
Band V
Ėā(see the proof of Theorem 2 for
more discussion).
4.2. Extracting strategy of the player 1
In Sections 3 and 4, we have presented linear programs to solve
stage game H
Ė
V
Ė (Ļ) from the perspective of the player 2. These
linear programs solved a minimax problem of player 2 by explicitly
reasoning about her strategies. The actions of the player 1, however,
were only represented as best-response con- straints which does
not provide the strategy of player 1 directly. In this section, we
discuss the dual formulation of the linear program (6) and show
the way the strategy of the player 1 can be extracted from dual
variables.
Consider constraints (2c), (6e), (6f) and (6g) and let us establish
the following mapping to dual variables.
Constraints (2c)
Constraints (6g)
Constraints (6e)
Constraints (6 f )
Dual variables Ļ1(a1) ā„ 0
Dual variables Ī²a1 o(Ī±i) ā„ 0
Dual variables aĢa1 o āRk
Dual variables zĢa1 o āR
The dual formulation of (6) then contains the following constraints
1
(corresponding to variables V, Va o, Ļa1 o and qa1o):
a1
1 1
Ļ (a ) = 1 (7a)
Ī±i
1
a o
i 1 1
Ī² (Ī± ) = Ī³ Ļ(a ) āa1,o (7b)
1
a o
j
Ī±i
j
1
(i) a o
i
a
Ė = a Ī² (Ī± ) 1
āa ,o (7c)
1
a o
Ī±i
1
(i) a o
i
zĖ = z Ī² (Ī± ) 1
āa ,o. (7d)
Constraints (7) give the following interpretation to variables
Ļ1, Ī²a1o, aĖa1o and zĖa1o. Variables Ļ1 directly represent the strategy
of player 1 for the current stage, Ļ1 āļæ½(A1). Variables Ī²a1o(Ī±i) are
the probabilities of playing strategy the value of which is repre-
sented by the linear function Ī±i(Ļ) = (a(i))T Ļ + z(i) in a subgame
where player 1 played action a1 and received observation o. These
probabilities are represented in the form of realization plans, i.e.,
they sum to Ī³Ļ 1(a1) instead of one. We can, however, obtain the
true probabilities by dividing Ī²a1o by Ī³Ļ 1(a1). Variables aĖa1o
and zĖa1o then deļ¬ne a linear function Ī¶Ėa1o(Ļ ) = (aĖa1o)T Ļ + zĖa1o
representing the value of playing the given mix of strategies in the
subgame and are used to express the value of the game in the rest
of the dual formulation.
By normalizing variables Ī²a1o we obtain a linear function
Ī¶ a1o : Rk ā R, called gadget, which represent the value in the
subgame given that subgame has been reached,
Ī¶a1 o(Ļ ) = (aa1 o)TĻ + za1 o where
1
a o
Ī±i
1
i 1 1
a = (Ī² (Ī± )/Ī³ Ļ(a )) Ā·a
a o (i)
1
a o
Ī±i
1
i 1 1
a o (i)
z = (Ī² (Ī± )/Ī³Ļ (a )) Ā·z . (8)
Note that the gadgets Ī¶ a1o are only deļ¬ned for subgames that can
be reached, i.e., where Ļ1(a1) > 0.
The representation of a strategy of player 1 using variables Ļ 1
and Ī¶ can then be used to play the game following the similar
ideas that are used in continual resolving in the domain of
extensive-form games (MoravcĖĆk et al., 2017).
Algorithm 2: Playing a compact OS-POSG using VĖLB.
0
1 Ļ ā Ļ , Ī¶ ā initial gadget Ī¶ (Ļ) =āā
2 while player~1 is required to act do
3 Solve HĖVĖLB(Ļ) with additional constraint o(Ļ) ā„ Ī¶ (Ļ) in
the dual formulation and extract strategy Ļ1, Ī¶ of
player~1
4 Sample action a1 ā¼ Ļ1, execute it and observe an
observation o
5 Ļ ā Ļa1o/qa1o, Ī¶ ā Ī¶ a1o
Algorithm 2 describes the algorithmic scheme to play a game in
compact representation. Whenever player 1 is required to act, he
solves a stage game HĖVĖLB(Ļ). In the ļ¬rst stage, he solves the game
without any additional constraints (since initially Ī¶ (Ļ) = āā) to
ļ¬nd the near-optimal strategy for the entire game. In the subse-
quent stages, player 1 has to ensure that the strategy he is about to
play is consistent with the strategy assumed in the ļ¬rst stage, i.e.,
it provides (at least) the value represented by the gadget Ī¶ a1o
from the previous stage. To this end, a constraint o(Ļ ) ā„ Ī¶ is added
to the dual formulation of (6) (recall that o(Ļ ) is the objective of
the dual linear program).
5. The lateral movement POSG
We now formally deļ¬ne the lateral movement POSG and de-
scribe the key steps of the algorithm speciļ¬cally for this domain.
The game is played on a graph G = (V, E) representing a computer
network. We assume that V = {v1, . . . , v|V|}. The goal of the at-
tacker is to reach vertex v from the initial source of the infection
|V|
v1 by traversing the edges of the graph, while minimizing the cost
to do so, see Fig. 1(a).
Initially, the attacker controls only the vertex v1, i.e., the initial
state of infection is I0 = {v1}. Then, in each stage of the game,
k
i=1
the attacker chooses a directed path P = {P(i)} (where k is the
length of the path) from any of the infected vertices to the target
vertex v|V|. Fig. 1(a) provides examples of paths the attacker may
consider, however, he can only use paths P2 and P3 if vertex v2 has
been previously infected.
Unless the defender takes countermeasures, the attacker infects
all the vertices on the path, including the target vertex v|V|, and
pays the cost of traversing each of the edges on the path,
ā,P
P(i)āP
c = C(P(i)), (9)
where C(P(i)) is a cost associated with taking edge P(i).
The defender tries to discover the attacker and increase his cost
to reach the target (and thus possibly discourage the attack) by
7. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 7
R|V|
aTĻ + k
Ļ
VĢU B
Fig. 2. Dual interpretation of the projection on the convex hull.
is always infected and we treat all states where v|V| is infected as
a single terminal state of the game). We use marginal probabilities
of a vertex being infected as characteristic vectors Ļ āR|V|, i.e.
(b)
i
Ļ =
IāS|viāI
b(I). (13)
Consider the example of a belief from Fig. 1(b). The belief in the
original OS-POSG model is a probability distribution over
states, i.e., |S|-dimensional vector with |S|ā 1 degrees of freedom.
In this game, the number of non-terminal reachable states (i.e.,
states where v|V| is not yet infected) is 14. In comparison, the
characteristic vector (see Fig. 1(c)) has only |V| dimensions. More-
over, observe that in this case the set {b |Ab = Ļ(b)} of possible
beliefs that get projected to Ļ (b) is a singleton and contains only
the belief b from Fig. 1(b).
5.2. Value function representation
The algorithm from Section 4.1 approximates V
Ėāusing a pair of
value functions, the lower bound VĖL
B and the upper bound VĖUB.
While we have discussed the representation of the lower bound
(which is represented using a set r of linear functions
Ī±i(Ļ) = (a(i))T Ļ + z(i)), the representation of the upper bound is
more challenging.
In the original algorithm, the value function V is deļ¬ned (see Eq.
(3)) over the probability simplex ļæ½(S). In that case, it s
u
ļ¬
c
e
s to
consider |S| points in Ļ to deļ¬ne V for every belief. In contrary, the
space of characteristic vectors (i.e., marginal probabilities) is formed
by a hypercube [0, 1]|V| with 2|V| vertices, which would make the
straightforward point representation (using Eq. (3)) impractical.
We can, however, leverage the fact that in this domain infecting
an additional node can only decrease the cost to the target (and
hence V
Ė ā is decreasing). Consider the dual formulation of the
optimization problem (3). In this formulation, the projection of Ļ
to the lower convex hull of a set of points is represented by the
optimal linear function aTĻ + z deļ¬ning a facet of the convex hull
(see Fig. 2). Since V
Ėā is decreasing in Ļ , we can also enforce that
aTĻ + z is decreasing in Ļ (i.e., add the constraint a ā¤ 0 to the
dual formulation). This additional constraint translates to a change
1ā¤iā¤|Ļ| i
(i)
Ī» Ļ = Ļ in the primal problem to an
of the equality
inequality.
VĖUB(b) =min
Ī»āR
|Ļ |
ā„0
r
i
(i) T
1ā¤iā¤|Ļ| 1ā¤iā¤|Ļ|
i
(i)
Ī» y |1 Ī» = 1, Ī» Ļ ā¤Ļ (14)
Now it is suļ¬cient that the set Ļ contains just one point (Ļ (i), y(i))
where Ļ(i) = 0|V| (instead of 2|V| points) to make the constraint
:E1 ā¤iā¤|Ļ | i
(i)
Ī» Ļ ā¤ Ļ satisļ¬able.
It is possible to adapt the constraint (6g) to use the represen-
tation from (14), similarly to the original (unabstracted) one-sided
POSGs (HorĆ”k and BoÅ”anskĆ½, 2016), and thus obtain linear program
to perform the upper bound computation.
5.3. Using marginalized strategies in stage games
The linear program formed by modiļ¬cations from Eqs. (6) still
requires solving the stage game for the original, unabstracted
problem. In this section, we show that it is possible to avoid
expressing the belief b explicitly, and to compute the stage game
directly using the characteristic vectors and marginalized strategies
of the attacker.
First, we present the representation of the stage-game strategies
of the attacker. Instead of representing joint probabilities Ļ 2(Iā§P)
of choosing path P in state I, we only model the probability
Ļ
Ė2(P) of choosing path P aggregated over all states I āS. Further-
more, we allow the attacker to choose the probability Ī¾ (Pā§vi) that
vertex vi is infected while he opts to follow path P.
P
2
Ļ
Ė (P) =1 (15a)
i
0 ā¤ Ī¾(P ā§v ) ā¤ Ļ2
Ė (P) ā
P, vi (15b)
Ļ
Ė2(P) ā„0 ā
P (15c)
To ensure that the strategy represented by variables Ļ
Ė2 and Ī¾
is feasible it must be consistent with the characteristic vector Ļ ,
i i
where Ļ is the probability that the vertex v is infected at the
beginning of the stage.
P
Ī¾(P ā§v ) = Ļ
i i ā
vi (15d)
Furthermore, the path P must start in an already infected
vertex (denoted as Pre(P)), i.e., the conditional probability Pr[Pre(P)
āI |P] of Pre(P) being infected when path P is cho-
sen has to be 1. Now, since Ī¾ (Pā§v) is the joint probability,
Ī¾(P ā§v) = Pr[Pre(P) āI|P] Ā·Ļ
Ė2(P),
Ī¾(P ā§v) = Ļ
Ė2(P) āP,v =Pre(P). (15e)
Example 1. Consider the example from Fig. 1(a) and assume that
the attacker wanted to play a strategy Ļ 2 in the original game,
where
Ļ2(I0 ā§P1) = 0.7 Ļ2(I8 ā§P1) =0.1
Ļ2(I8 ā§P2) = 0.1 Ļ2(I8 ā§P3) =0.1.
The same strategy can be described using the marginalized
representation Ļ
Ė2 and Ī¾:
Ļ
Ė2(P1) = Ļ2(I0 ā§P1) + Ļ2(I8 ā§P1) = 0.8
Ļ
Ė2(P2) = Ļ2(I8 ā§P2)= 0.1
Ļ
Ė2(P3) = Ļ2(I8 ā§P3)= 0.1
Ī¾ (P1 ā§ v1) = 0.8 Ī¾ (P1 ā§ v2) = 0.1 Ī¾
(P2 ā§ v1) = 0.1 Ī¾ (P2 ā§ v2) = 0.1 Ī¾
(P3 ā§v1) = 0.1 Ī¾ (P3 ā§v2) = 0.1
2
all other variables Ļ
Ė and Ī¾ are zero
It is straightforward to verify that variables Ļ
Ė 2 and Ī¾ satisfy Eqs.
(15a)ā(15e) when the characteristic vector Ļ (b) from Fig. 1(c) is
considered.
The representation of strategies of the attacker using variables
2
Ļ
Ė and Ī¾ is suļ¬cient to express the expected immediate reward
of the strategy Ļ
Ė2, hence the constraint (2c) can be changed to use
the marginalized strategies,
P
V ā„ Ļ
Ė2(P)cH,P +
hāH
VH,det(h)
[NH]
ā
H āE . (15f)
Importantly, we can also skip the computation of the belief
bH,det(h) and compute the characteristic vector formed by the
marginals Ļ H,det(h) directly from the variables Ļ
Ė2 and Ī¾ . We now
8. 8 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
present the equation to compute the updated marginal Ļ H,det(h)
given that the attacker has been detected while traversing the
honeypot edge h (and this has been the ļ¬rst honeypot edge he
traversed). Denote the set of all paths satisfying this condition
h
H ā¤h ā¤H
P = {P |h āPā§P ā P }.
Ļ H,det(h)
i
=
h
H i
PāP |P āP
ā¤(Ā·,v ) ā¤h
2
Ļ
Ė (P) +
h
H i
PāP |P hāP
ā¤(Ā·,v ) ā¤h
i
Ī¾(P ā§v ) (15g)
The ļ¬rst sum stands for the probability that the attacker is de-
tected while traversing edge h, but he managed to infect v on his
i
chosen path before the detection. The second sum represents the
probability that the attacker was detected on the edge h as well,
but this time he has not infected v using path P, however, the
qH,det(h)
PāPh
H
= Ļ2
Ė (P). (15h)
We need not consider the subsequent stages where the attacker
has not been detected (i.e., Ā¬det observation has been generated) or
the honeypot edge h reaches the target vertex v|V|. In each of these
cases, the target vertex has been reached and thus the value of the
subsequent stage is zero.
Example 2. Consider once again the example from Fig. 1 and the
strategy of the attacker Ļ 2 from Example 1 and assume that a
honeypot is deployed on the edge h = (v3, v5) only (i.e., H = {h})
and the attacker has been detected there. In such case, the attacker
could have either used path P1 or P2, but not P3 as it does not reach
h.
If the attacker used P1 in I0, he infects v3 and v5 up to the point
of the detection and thus reaches a new state of infection
I (this might have happened with probability 0.7). Similarly,
3
using P1 in I8 reaches I4 (with probability 0.1) and using P2 in I8
reaches I5 (with probability 0.1). The (denormalized) belief bH,det(h)
(corresponding to Eq. (2d)) is thus
bH,det(h)(I3) = 0.7 bH,det(h)(I4) = bH,det(h)(I5) = 0.1.
Computing marginal probabilities of each vertex being infected in
the belief bH,det(h) yields
Ļ1
H,det(h) H,det(h) H,det(h)
3 5
= Ļ = Ļ =1
Ļ H,det(h)
2
= 0.2 Ļ H,det(h)
4
= 0.1.
We can obtain exactly the same marginal probabilities when
2
using marginalized strategy Ļ
Ė and Ī¾ from Example 1. For ex-
ample, neither P1 nor P2 infects vertex v2 (i.e., no edge ( Ā·, v2)
Ļ
is present). Therefore, the ļ¬rst sum in Eq. (15g) is empty and
H,det(h)
2 1 2 2 2 2
= Ī¾ (P ā§v ) + Ī¾ (P ā§v ) = 0.2. Similarly, only P infects
in (15g), and Ļ
v4 before reaching edge h, hence is contained in the ļ¬rst sum
H,det(h)
4 2 2 1 4
= Ļ
Ė (P ) + Ī¾ (P ā§v ) =0.1.
5.4. Initializing bounds
We now describe our approach to initialize bounds VĖL
B and VĖUB
(line 1 of Algorithm 1). We start with the former. To obtain a lower
bound VĖLB, we assume that no honeypots can be deployed in the
network. The attacker then only needs to ļ¬nd the cheapest (i.e.,
shortest) path in the graph when costs C are considered. Denote
Cā(vi) the cost of the cheapest path from vertex vi. We ini- tialize
the lower bound VĖL
B using two linear functions. Firstly, the
cost cannot be negative and we use a linear function Ī±1(Ļ) = 0.
Next, we consider linear function Ī±2(Ļ) = (a(2))T Ļ + z(2) where
(2)
1
(2)
i
z = Cā(v ) and a =min{ i 1
Cā(v ) ā Cā(v ), 0} which is a tighter
lower bound for some Ļ (especially close to the initial character-
0 0
1
istic vector Ļ where Ļ = 1 and other vertices are not infected,
0
i
i.e., Ļ = 0 for i ā„ 2).
To initialize the upper bound VĖUB, we consider a perfect in-
formation variant of the game (similarly to HorƔk et al. (2017)).
However, since the number of states (i.e., possible subsets of
infected vertices) in such a game can be exponential, we consider a
simpler version of the game where the attacker is only in charge of
the vertex he visited the last (instead of a subset of all ver- tices
visited in the past) which can only increase the cost for the
i
attacker. Denote C
ā
(v ) the expected cost of the attacker in the per-
fect information variant of the game where the attacker controls
vertex vi only. We then use the set Ļ = {(Ļ( j), y( j)) |1 ā¤ j ā¤ |V|}
UB
of points to initialize VĢ where
( j)
i
i
vertex vi has already been infected before he started to execute
path P.
Analogously, we can obtain the probability qH,det(h) that the Ļ =
attacker got detected while traversing edge h as
1 i = j
0 i h= j
( j) ā
j
y = C (v ). (16)
6. Scaling up: incremental strategy generation
In the previous section, we introduced marginalized beliefs and
strategies to deal with the large number of game states. However,
the number of actions grows fast as well which makes the game
computationally challenging and may cause memory consumption
issues in practical implementation of the approach. In this section,
we address this issue by adopting a double-oracle approach to
generate actions of the players incrementally.
6.1. Double-Oracle scheme for solving stage games
Before describing the oracle algorithms used in the exper-
iments, we provide a generic scheme for using double-oracle
approach to solving stage games H
Ė
V
Ė(Ļ), see Algorithm 3. The
Algorithm 3: Generic double-oracle scheme for solving stage
games.
Ė Ė
1 2
1 A , A ā initial subsets of actions of player~1 and player~2
deļ¬ning a restricted game
2 When solving a stage game H
Ė
V
Ė (Ļ):
3 do
4 Solve stage game H
Ė
V
Ė(Ļ) considering actions AĖ1and AĖ2
only
if improving allocation H for the defender is found then
5
6 AĖ1 ā AĖ1āŖ{H}
7 continue
8 else if improving path P for the attacker is found then
9 AĖ2 ā AĖ2āŖ{P}
10 continue
11 break
12 while an improving action is found for one of the players
algorithm keeps global sets of actions AĖ1 and AĖ2 deļ¬ning a re-
stricted game where only these actions are considered (instead of
entire sets of actions A1 and A2). Set A1 is initialized to a single
arbitrary honeypot allocation, set A2 is initialized to an
arbitrary set of |V| paths P1, . . . , P|V| from each of the vertices
vi to v|V|. When solving a stage game, we ļ¬rst query the oracle of
the defender before querying the oracle of the attacker. This
decision is motivated by the possibility to use a heuristic oracle for
generating actions of the defender (see Section 6.4) which is less
computationally demanding compared to the exact computation.
10. 10 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
NHā|H|
i=1
i
UB(H) = LB(H) + ļæ½ā
, (21)
where ļæ½ā
1,. . . , ļæ½ā
N ā|H| are the NH ā |H| highest values among ļæ½h.
H
6.4. Heuristic oracles
In practice, ļ¬nding exact solutions of oracles from Sections 6.2
and 6.3 can be computationally expensive. To mitigate the impact
of the oracle computation times on the performance of the
algorithm, we present a heuristic version of the oracle of the
defender. The heuristic oracle is a greedy variant of the exact oracle
presented in Section 6.3.
The use of the heuristic oracle of the defender from the
Algorithm 4 can lead to a degradation of the quality of the
1 H ā ā
2 while |H| < NHdo
3 hāā arg maxhāEHļæ½h
4 Hā HāŖ{hā}
solution found by the HSVI algorithm. By omitting some actions of
the defender from the support of the equilibrium, the cost of the
attacker can be lowered. Nevertheless, the lower-bound guarantees
on the quality of the solution found by the algorithm are retained.
7. Experimental evaluation
In this section we focus on the experimental evaluation of the
proposed techniques on the lateral movement stochastic game
introduced in Section 5. We consider three variants of the HSVI
algorithm for compact POSGs depending on the oracle algorithm
used: (1) the version where no oracles are used (i.e., AĖi= Ai), (2)
the variant using exact oracles for both of the players (described in
Sections 6.2 and 6.3), (3) and ļ¬nally the variant where heuristic
oracle for the defender (described in Section 6.4) is used instead of
the exact computation of the best response. We also compare these
approaches with the original HSVI algorithm (HorƔk et al., 2017)
where the summarized abstraction is not used and the computa-
tion is performed based on the entire (exponential) state space.
7.1. Experiments setting
The evaluation has been performed on a set of randomly
generated graphs with varying parametersānumber of vertices in
the network |V|, number of honeypots NH and the density of the
graph. For simplicity we used directed acyclic graphs (DAGs) for the
experiments, however, the techniques presented in the paper are
general enough to apply on scenarios where the network is not
acyclic.
We use the following algorithm to generate mesh-like net-
works. First, positions of |V| nodes are randomly generated from a
2-dimensional interval [0, 20] Ć [0, 20] with v1 positioned at (0,0)
and v|V| positioned at (20,20) (pi ā[0, 20]2 denotes the position
of vi). For each vertex vi a set of k nearest neighbors Nk(vi) (with
respect to positions pj) is determined and an edge connecting
vertices vi and vj is created for every vj āNk(vi). The orientation
of the edges is set to always lead towards the vertex closer to the
target v|V|. Note that increasing the parameter k leads to an
increase in the number of edges in the graph.
The graph generated by the above method may, however,
contain multiple vertices with zero in-degree or out-degree (apart
of vertices v1 and v|V|). We ļ¬x this by regenerating positions of
Security Level 1 ( Īŗ1)
Security Level 2 ( Īŗ2)
Security Level 3 ( Īŗ3)
Algorithm 4: Heuristic version of the oracle of the defender.
Fig. 3. Example of a mesh network used in experiments. Contour lines depict the
cost function f used to derive the costs C and C of the edges. The vertices are par-
titioned into 3 security levels.
such nodes until ļ¬nding a directed acyclic graphs where only v1
and v|V| have zero in-degree and out-degree, respectively.
Next, we deļ¬ne a cost function f (x, y) ā R>0. This function
indicates the hardness to operate in point (x, y). In our exper-
iments, we use f (x, y) = x + y which attains its maximum in
(20,20) where the target vertex v|V| is located (i.e., the network is
the most secured around the target vertex). To obtain the cost C(vi,
vj) of traversing an edge (vi, vj) given that the honeypot is not
deployed there, we integrate the cost function f along the line
connecting points pi and pj,
{ 1
0
C(vi, vj) = Ipi ā pj I2 f(Ī»pi + (1 āĪ»)p j) dĪ». (22)
Furthermore, we partition the vertices into 3 security perime- ters
(with multiplicative constants Īŗ1, Īŗ2 and Īŗ3) based on the distance
from the position of the target vertex v|V|. The vertices closest to
the target vertex v|V| are the most secured, hence the attacker has
to use a potentially costly zero-day exploit to proceed (and risk its
revelation). Thus the cost C(vi, v j) of traversing an edge entering
vertex vj that is present in the most secured zone is high,
C(vi,vj) = Īŗ3 Ā·C(vi,vj) for Īŗ3 = 4. (23)
For security perimeters further away from the target vertex v|V|
we use Īŗ2 = 1.5 and Īŗ1 = 1. Fig. 3 provides an example of such a
network and its accompanying cost function f.
All computational results have been obtained on computers
equipped with Intel Xeon Scalable Gold 6146 processors and 16GB
of available RAM while limiting the runtime of the algorithms to 2
hours. CPLEX 12.9 has been used to solve (mixed integer) linear
programs. The algorithms were required to ļ¬nd an E-optimal
solution where E is set to 1% of the error VĖUB(Ļ 0) ā VĖLB(Ļ 0) in
the initial characteristic vector Ļ 0 after the initialization phase
described in Section 5.4 is completed. If the algorithm failed to
reach this level of precision within 2 hours, we report an instance
as unsolved. The results are based on 100 randomly generated
networks for each parameter set.
7.2. Scalability in the size of graphs
First, we focus on the scalability of the algorithms in size of the
networkāin the number of vertices |V|, and in the density of the
graph (controlled by the parameter k).
Fig. 4 depict the scalability of the algorithms in the number of
vertices. First of all, observe that the original HSVI algorithm for
one-sided POSGs has been unable to solve all but the smallest
11. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 11
Fig. 4. Scalability in the number of vertices in the network |V| (averages based on 100 instances for each parameter set). Conļ¬dence intervals mark the standard error. The
reported runtimes include only instances solved by the algorithms. The percentage of instances where the algorithms failed to terminate within 2 hours are reported separately.
12. 12 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
Fig. 5. Scalability of the algorithms: (Left) in the density (i.e., the number k of nearest neighbors considered when forming the edges) of the network with 14 vertices and
NH = 2 honeypots, (Right) in the number of honeypots NH in a network with 14 vertices and k = 6. Conļ¬dence intervals mark the standard error. The reported runtimes
include only instances solved by the algorithms. The percentage of instances where the algorithms failed to terminate within 2 hours are reported separately.
instances with 8ā12 vertices and even with only NH = 1 honeypots
(while exhausting the memory on larger instances). To this end, we
did not evaluate this algorithm on more complicated instances
with NH > 1. In contrary, the algorithm relying on the technique of
summarized abstractions have been able to solve most of the
instances with NH = 1 even without the use of the incremental
strategy generation technique described in Section 6. The versions
of the algorithm relying on the incremental strategy generation
were then able to solve all 100 randomly generated instances for
each parametrization when NH = 1 honeypot is considered. Im-
portantly, the bounds computed by the versions using incremental
strategy generation overlap with the bounds computed by the re-
maining variants. This demonstrates that in spite of improving the
runtimes by using the incremental strategy generation technique,
the quality of the solution does not deteriorate.
Increasing the number of honeypots to NH = 2 highlights the
advantages of using the oracles to generate actions of the players
incrementally. While the algorithm that does not use the oracles
failed to solve most of the instances with |V| ā„ 18 in the time limit
of 2 hours, both of the variants relying on the incremental strategy
generation technique were able to solve even most of the largest
instances with 24 vertices and k = 4 in less than 20 minutes on
average (considering only the instances solved by the algorithms).
Increasing the number of honeypots to NH = 3 highlights the
computational advantages of using the heuristic version of the
oracle of the defender presented in Section 6.4. This version of the
algorithm was able to solve signiļ¬cantly more instances compared
to the version using the exact version of the oracle of the defender
(Section 6.3). Note that the version that does not use the strategy
generation technique was able to solve only a
few instances with |V| ā¤ 14. This is mainly attributed to the size
of the linear programs to solve the stage gamesāe.g., the number
of possible combinations of actions of players in the games with
14 vertices and k = 5 we considered is up to |A1| Ć |A2| '.: 5.2Ā·
107.
Note that the algorithms using the oracles were able to solve
nearly all instances in this setting (|V|= 14 and k = 5).
In Fig. 5 (left), we focus on the scalability of the three vari-
ants of the algorithm using the summarized abstraction based on
the density of the network (controlled by the parameter k). The
number of actions in the game can grow fast with the addition of
new edges to the network which makes solving the game chal-
lenging for the variant without the use of oracles. However, the
addition of the edges need not increase the support of the equilib-
rium and hence the runtime of the variants relying on incremental
strategy generation is not affected by the increasing value of the
parameter k.
7.3. Scalability in the number of honeypots
In Fig. 5 (right) we focus on the scalability of the algorithm in
the number of honeypots. The number of actions of the defender is
exponential in the number of honeypots NH. Hence, it is not
surprising that the algorithm that does not use the incremental
strategy generation to form defenderās actions can solve only the
smallest instances in terms of the number of honeypots NH. The
variants of the algorithm relying on the oracles to incremen- tally
generate the actions perform signiļ¬cantly better, however,
when the number of honeypots is large (NH ā„ 4), computing the
best response of the defender exactly (using the approach from
Section 6.3) is prohibitively expensive. In contrary, the heuristic
variant of the oracle of the defender allows the algorithm to scale
better and solve most instances up to NH = 6.
Interestingly, we observed that using the heuristic oracle did
not impact the quality of the solution in a negative way. Consid-
ering the instances from Fig. 5 (right) that were solved by the
variant using the exact oracle of the defender, we observed that
the lower bounds VĖLB(Ļ 0) computed by these two variants of the
algorithm were within 0.6% of each other.
7.4. Applicability to computer networks
Previous experiments have been focused on randomly gener-
ated mesh networks. The physical properties of mesh networks
typically disallow direct communication between the devices (e.g.,
due to the presence of obstacles, or low signal strength caused by
large distances between the devices). This leads to a signiļ¬cantly
lower number of edges in the corresponding game graph com-
pared to regular computer network architectures. To demonstrate
the versatility of our approach, we evaluate our algorithms on a
graph originating from a fundamentally different network topology
given as an example by a network security expert.
In Fig. 6(a), the physical topology of the considered computer
network is shown. Based on this topology, we derived logical
communication links, shown in Fig. 6(b), that the attacker can use
to perform lateral movement and that do not directly correspond
to the physical links. For example, every device can directly
13. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 13
9 end-user machines
5 end-user machines
target
sensor2
sensor1
*
*
*
sensor3
web1*
email*
transf*
*
er
db *
SECON
*
DC *
web2*
a
sensor3
db
SECON
target 5 end-user machines
web1
transfer
web2
sensor1
sensor2
email
Possible attack sources
(7 machines)
9 end-user machines
DC
b
Fig. 6. Computer network topology considered in the experiments: (a) Physical topology of the network. Nodes marked with asterisk exist in multiple instances and can be
used as honeypots. (b) Logical communication links in the network. Dashed lines correspond to easy diļ¬culty levels, solid lines to medium diļ¬culty and ļ¬nally bold solid lines
represent hard diļ¬culty.
communicate with the web server and thus the attacker can use
this communication channel to carry on his attack. Moreover,
compromising the web server allows an attack on the database
server. We also reļ¬ect potential side channels in the networks such
as social engineering allowing the attacker to compromise end-
user machines by publishing malicious ļ¬les on the web
presentation of the company.
Each logical link can be used to infect an adjacent machine. The
links, however, differ in the diļ¬culty of carrying such an attack
along the edge, e.g., it is substantially easier to inļ¬ltrate web server
via an exploitable web presentation than attempting to compro-
mise the domain controller running the latest versions of the
services. We distinguish three diļ¬culty levels (with three different
associated costs representing the time needed to successfully tra-
verse the edge): easy (C(e) = 10 units of time), medium (C(e) = 20)
and hard (C(e) = 40). In the presence of a honeypot, the costs
increase to C(e) = 40, C(e) = 80 and C(e) = 160, respectively2
In our example, we assume that the attacker uses a computer
outside of the network (i.e., in the internet zone) to inļ¬ltrate the
network, with the aim of compromising the target node within
the internal part of the network (assume that sensitive data are
located at this host). The defender attempts to slow down
attackerās progress by deploying a honeypot on one of the hosts
within the network (hosts that can be selected are denoted by an
asterisk in Fig. 6(a)).
We assume that each such host is present in d instances
within the network (e.g., one being the production server and the
remaining instances serving for backup purposes). In order to serve
the legitimate users, the defender is allowed to operate a
2 Note that the exact values of costs do not affect usability of our algorithm and
hence can be set speciļ¬cally for each network.
honeypot on one of the instances only. For d = 2 instances of
each server, the algorithm (using the exact versions of the oracles)
took 982s to ļ¬nd a solution with the lower bound on attackerās
cost of 69.4, while for d = 3 instances of each server a solution
with quality 63.3 has been found in 5950s. The shortest attack path
in the graph has a cost of 50, which marks an increase in the
attackerās cost by 38.8% and 26.6%, respectively. The larger of
the two instances (with d = 3) is represented by a graph with 52
vertices and 504 edges. Note that the decreasing value of the game
with increasing value of d is caused by the fact that the defender is
able to cover only one of the instances of a serverāthus the
probability that the attacker manages to choose an unprotected
instance increases.
While we have demonstrated the capabilities of the algorithm
to solve problems related to computer networks, we believe that
its scalability on this type of networks can be signiļ¬cantly
improved and we discuss this in the next section.
8. Discussion and future extensions
The model introduced in Section 5 represents the interaction
with an attacker who is determined to reach his goal (in our case
reach the target vertex v|V|). The attacker accepts the risk of getting
detected in the course of the attack, and understands the
implications of a successful detection. First, the cost of traversing an
honeypot edge is higher. More importantly, the defender gets vital
information about the progress of the attack which the defender
can use to harden further progress of the attacker.
In Section 7.4, we presented an application of the proposed
algorithm to improve security of computer networks. Real-world
applications, however, require that the legitimate services do not
get replaced by honeypots to sustain the operation of the network.
15. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 15
McMahan, H.B., Gordon, G.J., Blum, A., 2003. Planning in the presence of Cost Func-
tions Controlled by an Adversary. In: 20th International Conference on Machine
Learning, pp. 536ā543.
MoravcĖĆk, M., Schmid, M., Burch, N., LisĆ½, V., Morrill, D., Bard, N., Davis, T., Waugh, K.,
Johanson, M., Bowling, M., 2017. Deepstack: expert-level artiļ¬cial in- telligence in
heads-up no-limit poker. Science 508ā513.
Nawrocki, M., WƤhlisch, M., Schmidt, T.C., Keil, C., Schƶnfelder, J., 2016. A survey on
honeypot software and data analysis. arXiv:1608.06249.
Nguyen, T.H., Wellman, M.P., Singh, S., 2017. A stackelberg game model for botnet
data exļ¬ltration. In: International Conference on Decision and Game Theory for
Security, pp. 151ā170.
Noureddine, M.A., Fawaz, A., Sanders, W.H., BasĀø ar, T., 2016. A game-theoretic ap-
proach to respond to attacker lateral movement. In: International Conference on
Decision and Game Theory for Security. Springer, pp. 294ā313.
Oliehoek, F.A., Savani, R., Gallego-Posada, J., van der Pol, E., de Jong, E.D., Gross, R.,
2017. GANGs: Generative adversarial network games. arXiv:1712.00679.
Pawlick, J., Zhu, Q., 2015. Deception by design: evidence-based signaling games for
network defense. arXiv:1503.05458.
PĆbil, R., LisĆ½, V., Kiekintveld, C., BoÅ”anskĆ½, B., PeĖchoucĖek, M., 2012. Game theoretic
model of strategic honeypot selection in computer networks. In: International
Conference on Decision and Game Theory for Security, pp. 201ā220.
Roy, N., Gordon, G., Thrun, S., 2005. Finding approximate POMDP solutions through
belief compression. J. Artif. Intell. Res. 23, 1ā40.
Smith, T., Simmons, R., 2004. Heuristic Search Value Iteration for POMDPs. In: 20th
Conference on Uncertainty in Artiļ¬cial Intelligence, pp. 520ā527.
Stoll, C., 1989. The Cuckooās Egg: Tracking a Spy Through the Maze of Computer
Espionage. Doubleday.
VaneĖk, O., Yin, Z., Jain, M., BoÅ”anskĆ½, B., Tambe, M., PeĖchoucĖek, M., 2012. Game-theo-
retic Resource Allocation for Malicious Packet Detection in Computer Networks.
In: 11th International Conference on Autonomous Agents and Multiagent Sys-
tems, pp. 902ā915.
Wang, K., Du, M., Maharjan, S., Sun, Y., 2017. Strategic honeypot game model for
distributed denial of service attacks in the smart grid. IEEE Trans. Smart. Grid 8
(5), 2474ā2482.
Wang, Y., Shi, Z.R., Yu, L., Wu, Y., Singh, R., Joppa, L., Fang, F., 2019. Deep reinforce-
me nt learning for green security games with real-time information. In: 33rd
AAAI Conference on Artiļ¬cial Intelligence, pp. 1401ā1408.
Zhou, E., Fu, M.C., Marcus, S.I., 2010. Solving continuous-state POMDPs via densit y
projection. IEEE Trans. Automat. Contr. 55 (5), 1101ā1116.
Karel HorƔk is a postgraduate student at the Dept. of Computer Science of the
Faculty of Electrical Engineering, at the Czech Technical University in Prague. He
received his Masters degree in artiļ¬cial intelligence from Czech Technical University
in Prague in 2015. Currently, he is interested in algorithms for solving inļ¬nite-
horizon stochastic games with imperfect information and their application to the
security. His works have been published at the AAAI and IJCAI conferences and other
venues.
Branislav BoÅ”anskĆ½ is an assistant professor at the Dept. of Computer Science of the
Faculty of Electrical Engineering, at the Czech Technical University in Prague. He
received his PhD in 2015 at the Faculty of Electrical Engineering, at the Czech Tech-
nical University in Prague and spent one year as a postdoc researcher at the Aarhus
University with prof. Peter Bro Miltersen. Branislavs research focuses on algorith-
mic game theory and dynamic games. He is interested in computing approximate
equilibria in dynamic games and designing scalable algorithms for using dynamic
games in computer network security and robust machine learning methods used in
an environment with an adversary. He has published 20+ papers and presented at
the top AI conferences (AAAI, IJCAI, AAMAS) and top AI journals (AIJ, JAIR).
Petr TomĆ”Å”ek is a postgraduate student at the Dept. of Computer Science of the
Faculty of Electrical Engineering, at the Czech Technical University in Prague. He
received his Masters degree in 2018 at the Faculty of Electrical Engineering, at the
Czech Technical University in Prague. He is interested in game theory, especially in
solving large extensive-form games.
Christopher Kiekintveld is currently an associate professor at the University of
Texas at El Paso (UTEP), and received his Ph.D in 2008 from the University of Michi-
gan. His research is in the area of intelligent systems, focusing on multi-agent sys-
tems and computational decision making, as well as applications in security and
other areas with potential social beneļ¬t. He has worked on several deployed ap-
plications of game theory for security, including systems in use by the Federal Air
Marshals Service and Transportation Security Administration. He has authored more
than 80 papers in peer-reviewed conferences and journals (e.g., AAMAS, IJCAI, AAAI,
JAIR, JAAMAS, ECRA). He has received several best paper awards, the David Rist Prize,
and an NSF CAREER award.
Charles A. Kamhoua is a researcher at the Network Security Branch of the U.S. Army
Research Laboratory (ARL) in Adelphi, MD, where he is responsible for con- ducting
and directing basic research in the area of game theory applied to cyber security.
Prior to joining the Army Research Laboratory, he was a researcher at the
U.S. Air Force Research Laboratory (AFRL), Rome, New York for 6 years and an edu-
cator in different academic institutions for more than 10 years. He has held visiting
research positions at the University of Oxford and Harvard University. He has co-
authored more than 150 peer-reviewed journal and conference papers. He received a
B.S. in electronics from the University of Douala (ENSET), Cameroon, in 1999, an
M.S. in Telecommunication and Networking from Florida International University
(FIU) in 2008, and a Ph.D. in Electrical Engineering from FIU in 2011.