SlideShare a Scribd company logo
1 of 15
Computers & Security 87 (2019) 101579
Contents lists available at ScienceDirect
Computers & Security
journal homepage: www.elsevier.com/locate/cose
Optimizing honeypot strategies against dynamic lateral movement
using partially observable stochastic games
Karel HorĆ”ka,āˆ—, Branislav BoÅ”anskĆ½ a, Petr TomĆ”Å”eka, Christopher Kiekintveldb,
Charles Kamhoua c
a Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University in Prague, Technicka 2, 166 27 Prague 6, Czechia
b Computer Science Department, University of Texas at El Paso, 500 W. University Ave., El Paso, TX 79968-0518, United States
c U.S. Army Research Laboratory, 2800 Powder Mill Road, Adelphi, MD 20783-1138, United States
a r t i c l e i n f o
Article history:
Received 16 March 2019
Revised 24 June 2019
Accepted 18 July 2019
Available online 26 July 2019
Keywords:
Dynamic honeypot allocation
Lateral movement
Partially observable stochastic games
Compact representation
Incremental strategy generation
a b s t r a c t
Partially observable stochastic games (POSGs) are a general game-theoretic model for capturing dynamic
interactions where players have partial information. The existing algorithms for solving subclasses of
POSGs have theoretical guarantees for converging to approximate optimal strategies, however, their scal-
ability is limited and they cannot be directly used to solve games of realistic sizes. In our problem, the
attacker uses lateral movement through the network in order to reach a speciļ¬c host, while the defender
wants to discover the attacker by dynamically reallocating honeypots. We demonstrate that restricting to
a speciļ¬c domain allows us to substantially improve existing algorithms: (1) we formulate a compact rep-
resentation of uncertainty the defender faces, (2) we exploit the incremental strategy-generation method
that over iterations expands the possible actions for players. The experimental evaluation shows that our
novel algorithms scale several orders of magnitude better compared to the existing state of the art.
Ā© 2019 Elsevier Ltd. All rights reserved.
1. Introduction
One of the most potent threats in cybersecurity are the Ad-
vanced Persistent Threat (APT) attackers. These are sophisticated
(possibly state-sponsored) attackers who execute highly targeted,
long-term, stealthy attacks against government, military, and cor-
porate organizations. In many cases of known attacks, APTs have
been successful at establishing a deep and persistent foothold in a
target network and maintaining this presence for months or even
years before being discovered (Mandiant Intelligence Center, 2013).
Another emerging challenge in cybersecurity is securing highly
dynamic, diverse mobile networks against short- term but very
intense attacks, as in the Internet of Battleļ¬eld Things (IoBT)
(Abuzainab and Saad, 2018). In both of these cases lateral
movement plays a key role in the attack, as an attacker with a
foothold needs to move to compromise additional network
resources to achieve the goal of the attack. One of the methods a
defender can use to detect and mitigate lateral movement is using
honeypots and other deception technologies to detect, confuse,
āˆ—Corresponding author.
E-mail addresses: horak@agents.fel.cvut.cz (K. HorƔk), bosansky@agents.fel.cvut.cz
(B. BoÅ”anskĆ½), petr.tomasek@aic.fel.cvut.cz (P. TomĆ”Å”ek), cdkiekintveld@utep.edu (C.
Kiekintveld), charles.a.kamhoua.civ@mail.mil
(C. Kamhoua).
and redirect the attacker (Kamdem et al., 2017; Noureddine et al.,
2016).
The resulting adversarial competition for control of a network
between the defender and the attacker can be modeled as a game,
where the defender is allocating defensive resources (e.g.,
honeypots), and the attacker is taking actions to gain greater
control while remaining undetected. Game theory is becoming an
increasingly important tool for optimizing cybersecurity resources,
including strategic allocation of honeypots (Durkota et al., 2015;
Kiekintveld et al., 2015; PĆ­bil et al., 2012), allocating resources to
perform the deep packet inspection (VaneĖ‡k et al., 2012), and
modifying the structure or characteristics of the network (Cai et al.,
2016; Jajodia et al., 2012; Nguyen et al., 2017).
We adopt a game-theoretic framework for optimizing honeypot
deployments for detecting and defeating an adversary. However,
motivated by the scenarios of APT attackers and attacks on mobile
IoBT infrastructure, we consider a richer class of games where the
interactions between the attacker and defender are dynamic, involve
uncertainty, and are long-lived. First, in the realistic cases we want
to model the interaction which is dynamic, in that the defender
and attacker can observe and react to new information and the
moves of the other player, allowing them to update their strategies
over time. Second, the players have imperfect informa- tion; for
example, the defender typically does not know exactly what set of
resources an attacker may control at any given time.
https://doi.org/10.1016/j.cose.2019.101579
0167-4048/Ā© 2019 Elsevier Ltd. All rights reserved.
2 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
Finally, the interactions can be very long (as in the case of APTs), or
very frequent (as in the case of IoBT networks under short but very
intense attacks). Previous work using game theory to model lateral
movement (Kamdem et al., 2017) and honeypot allocation (Durkota
et al., 2015; Kiekintveld et al., 2015; PĆ­bil et al., 2012; Wang et al.,
2017) has not been able to address all of these problems.
Partially Observable Stochastic Games (POSGs) are a class of game-
theoretic models that encapsulates all of these properties and
allows us to reason about strategies in dynamic scenarios with
imperfect information and very long (even inļ¬nite) time horizon.
While POSGs are a very general model, this comes at the cost of
computational complexity. Computing optimal strategies in POSGs
is highly intractable from the algorithmic perspective, even in the
strictly competitive case (i.e., two-player zero-sum setting). If we
assume that both players have partial information about the en-
vironment, the players need to reason not only about their belief
over possible states, but also about the belief the opponent has
over the possible states, beliefs over beliefs, and so on. Restricting to
subclasses of POSGs where this issue of nested beliefs does not arise,
however, allows us to design and implement algorithms that are
guaranteed to converge to optimal strategies (HorĆ”k and BoÅ”anskĆ½,
2019; HorƔk et al., 2017),
Here, we develop new algorithms for POSG models of the lateral
movement problem in cybsersecurity, and show that these new
algorithms allow us to solve problems of a realistic size even with
all of these complications. We study a model of attacker moving
laterally in a computer network ā€“ we assume that the attacker
inļ¬ltrated some nodes in a network and aims to reach a certain
host/node (e.g., the main database). We model this problem as a
two-player game on a graph between the attacker and the
defender. The nodes of the graph correspond to hosts in the
network, and the edges correspond to attacking another host using
a particular vulnerability on the target host, which has an
associated cost. The attacker sequentially chooses which hosts to
attack to progress towards the goal. The defender chooses edges
that will act as honeypots (i.e., fake vulnerabilities) which are able
to detect certain moves by the attacker through the network and
allow him to take additional mitigating actions. The defender can
also change his honeypot deployment after any detection event to
take into account the new piece of information about the attack.
We refer to this game as the Lateral Movement POSG. We for-
malize it as a One-Sided POSGs (HorƔk et al., 2017) where the
defender has partial information about the assets of the attacker,
while the attacker always knows his current assets and knows
when he has been detected. This allows us to apply existing meth-
ods for restricted classes of POSGs to this problem. Unfortunately,
these existing algorithms have very limited scalability for our
games due to two fundamental limitations in the game representa-
tion and algorithm design. The ļ¬rst limitation is the complexity of
representing, updating, and reasoning about the uncertainty of the
defender. In the existing POSGs (as well as in similar single-agent
Partially Observable Markov Decision Processes (POMDPs)), uncer-
tainty is modeled using beliefs that correspond to probability dis-
tributions over the possible states. In the Lateral Movement POSG
the defender must reason over all possible subsets of hosts in a
network, maintaining a probability that any speciļ¬c subset of hosts
has already been compromised. Since the number of possible sub-
sets grows exponentially in the size of the network, the number of
possible states quickly become too large for the algorithm to scale.
The second limitation comes from the number of possible ac- tions
of the players. In the Lateral Movement POSG, the actions of the
defender correspond to placing multiple honeypots on edges in a
graph. Even if there are only 20 possible edges in a network
and the defender has 3 honeypots, the defender has up to 6.8 Ā·
103
possible actions to choose from at a single decision point in the
game. Since the game is dynamic with many decision points, this
problem is exacerbated even further by the length of the game.
We address both of these limitations and propose two key
contributions for improving the scalability of POSG algorithms.
These speciļ¬cally address the problems that arise in our cyber-
security domain, but can apply to many other similar classes of
POSG as well. In particular: (1) we replace the representation of
beliefs over the exponential number of possible states with a
summarized abstraction that captures key information but reduces
the dimensionality of the beliefs; (2) we use an incremental
strategy generation technique to iteratively expand the strategy
space of the players in order to overcome large branching factor1
We present general version of the algorithms and novel solution
concepts, and then show how they can be applied speciļ¬cally to
the Lateral Movement POSG. We give theoretical justiļ¬cation and
analysis of the key elements of the algorithm. In addition, we
evaluate the performance of our algorithm empirically on realistic
instances of the Lateral Movement POSG. Our algorithm is capable
of ļ¬nding strategies that are very near the optimal solutions for
small cases where we can compute the true optimal value. It also
scales dramatically better than the previous state of the art,
allowing us to solve games with twenty or more nodes that are of
practical signiļ¬cance. For example, we can ļ¬nd defense strategies
in smaller computer networks or abstractions of larger network.
With more aggressive approximation we could scale to even larger
examples using the same algorithms.
2. Related work
The concept of a honeypot has been around for more than 30
years in cybersecurity (Stoll, 1989). They have been used for many
different purposes, and have evolved to be much more sophisti-
cated with greater abilities to mimic real hosts and to capture use-
ful information about attackers (Mairh et al., 2011; Nawrocki et al.,
2016). Recently, there have been a large number of game theory
models developed to capture aspects of how honeypots and other
deception methods can be strategically used to improve cyber-
security (Durkota et al., 2015; Garg and Grosu, 2007; Kiekintveld et
al., 2015; La et al., 2016; PĆ­bil et al., 2012; Wang et al., 2017).
While none of these previous models address the full scope of
long-term dynamic games with uncertainty that we consider here,
several have considered similar aspects of uncertainty or dynamic
interactions. For example, Wang et al. (2017) investigate the use of
honeypots in the smart grid to mitigate denial of service attacks
through the lens of Bayesian games. Durkota et al. (2015) de-
veloped a model of honeypot allocation under uncertainty. La et al.
(2016) also model honeypots mitigating denial of service attacks in
the Internet-of-Things domain. Du et al. (2017) tackle a similar
ā€œhoneypots for denial of service attackā€ problem with Bayesian
game modeling in the social networking domain. In ad- dition, a
number of works have consider deception as a signaling game (e.g.,
Pawlick and Zhu (2015)).
In games with inļ¬nite horizon, the problem with nested beliefs
prevents one from designing an (approximate) optimal algorithm
for fully general settings. Nested beliefs can be tackled directly with
histories ā€“ one of the few such approaches is bottom-up dynamic
programming for constructing relevant ļ¬nite-horizon policy trees
for individual players while pruning-out dominated
1 A shorter version of this paper based on a subset of the work has been pub-
lished in IJCAI 2019 conference (HorƔk et al., 2019). We provide following signif- icant
novel contributions: (1) integration of the incremental strategy-generation algorithm
for solving stage games, (2) generalization of the problem so that the defender can
use multiple honeypots, (3) extended discussion throughout the text, and an
example of the algorithm, (4) extended experimental evaluation demonstrat- ing
additional signiļ¬cant speed-up over the conference version of the algorithm, and on
more domain-speciļ¬c examples.
K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 3
strategies (Hansen et al., 2004; Kumar and Zilberstein, 2009).
However, due to the explicit dependence on history, the scalability
in horizon is very limited.
A more common approach is to focus on a subclass of POSGs. In
Ghosh et al. (2004), zero-sum POSGs with public actions and
observations are considered. The authors show that the game has a
value and present an algorithm that exploits the transformation of
such a model into a game with complete information. Another
signiļ¬cant subclass of POSGs are One-Sided POSGs where one
player has perfect information (Basu and Stettner, 2015; Chatterjee
and Doyen, 2014; HorƔk et al., 2017). This subclass is particularly
important for security applications since it provides a naturally
robust strategies against the worst-case fully-informed opponent.
While our algorithm is based on the algorithm for the one-sided
case, we provide signiļ¬cant generalizations of the previous work,
especially in the representation of value function, deļ¬nition and
algorithms for computing value-backup operator.
Replacing the representation of beliefs over the exponential
number of possible states using a compact characteristic vector that
captures key information reduces the dimensionality of the beliefs.
For POMDPs, a similar compact representation has appeared in
existing literature (e.g., in Li et al. (2010); Roy et al. (2005); Zhou et
al. (2010)). The most general approach was to choose an abstraction
based on Principal Components Analysis (Roy et al., 2005). For our
Lateral Movement POSG, we can exploit natural representation in
marginal probabilities ā€“ i.e., reason about prob- ability of each
resource being infected as a characteristic vector, instead of
explicitly considering all possible subsets of infected resources.
While this offers a path to scaling to much larger prob- lems, it has
consequences for the solution quality as well as the construction of
the solution algorithm. Many components of state- of-the-art POSG
solvers are based on manipulating the full belief distribution and
we show how these can be redesigned to operate using the more
compact summarized belief representations.
Finally, we also use incremental strategy generation method
that is often known as the double-oracle algorithm and of- ten
used for solving games with large combinatorial number of actions
in security games (e.g., in Jain et al. (2013, 2011); McMahan et al.
(2003), but also many others) and more recently, the double oracle
method is being also used in combination with reinforcement
learning on domains where computing (approxi- mate) best
response is not possible (Lanctot et al., 2017; Oliehoek et al., 2017;
Wang et al., 2019). Double-oracle algorithm restricts the space of
possible actions to choose from ā€“ the algorithm forms a restricted
stage game that is iteratively expanded by adding new actions into
the restricted game. These actions are often computed as best
responses to the current strategy of the opponent in the restricted
problem. In the worst case, all actions have to be added into the
restricted problem. This, however, rarely happens in practice and
double-oracle algorithms are often able to ļ¬nd an optimal strategy
using only a fraction of all possible plans (see, for example,
BoÅ”anskĆ½ et al. (2014); Jain et al. (2011); Lanctot et al. (2017)).
While the domain-independent idea of double-oracle algorithm is
simple, our contribution is in (1) full integration of this
methodology into the algorithm for solving stage games of Lateral
Movement POSGs and (2) description of exact and heuristic oracle
algorithms for determining which actions should be added into the
restricted game.
3. One-sided POSGs
A one-sided partially observable stochastic game (HorƔk et al.,
2017), or one-sided POSG (OS-POSG), is an imperfect-information
two-player zero-sum inļ¬nite-horizon game with perfect recall
represented by a tuple (S, A1, A2, O, T, R). The game is played for an
inļ¬nite number of stages. At each stage, the game is in one
of the states s āˆˆS and players choose their actions a1 āˆˆA1 and
a2 āˆˆA2 simultaneously. An initial state of the game is drawn from
a probability distribution b0 āˆˆļæ½(S) over states termed the initial
belief. The one-sided nature of the game translates to the fact that
while player 1 lacks detailed information about the course of the
game, player 2 is able to observe the game perfectly (i.e., his only
uncertainty is the action a1 player 1 is about to take in the current
stage).
The choice of actions determines the outcome of the current
stage: Player 1 gets an observation o āˆˆO and the game transitions
to a state s āˆˆS with probability T(o, s | s, a1, a2), where s is the
current state. Furthermore, player 1 gets a reward R(s, a1, a2) for
this transition, and player 2 receives āˆ’R(s, a1, a2). The rewards are
discounted over time with discount factor Ī³ < 1.
One-sided POSGs can be solved by approximating the optimal
value function Vāˆ—: ļæ½(S) ā†’ R using a pair of value functions V
(lower bound on Vāˆ—) and V (upper bound on Vāˆ—). The algorithm
proposed in HorĆ”k et al. (2017) reļ¬nes these bounds by solving a
sequence of stage games. In each of these stage games, the
algorithm focuses on deciding the optimal strategies of the players
in this stage (i.e., Ļ€1 āˆˆļæ½(A1) for player 1 and Ļ€2(s) āˆˆļæ½(A2) for
player 2) while assuming that the play in the subsequent stages
yields values represented by value functions V or V, respectively.
Solving the stage games also deļ¬nes the ļ¬xed point equation for
Vāˆ—,
Vāˆ—
(b) =HVāˆ—
(b)
Ļ€2 Ļ€1
(
= min max Eb,Ļ€ ,Ļ€
1 2
[R]+
a1 ,o
1 2
b,Ļ€ ,Ļ€ 1 1 2
+Ī³ Pr [a , o] Ā·Vāˆ—(Ļ„(b, a , Ļ€ , o)) , (1)
1 2
where Ļ„ (b, a , Ļ€ , o) denotes the Bayesian update of the belief
of player 1 when he played a1 and observed o. Note that actions of
player 2 who has complete information affect the observations
player 1 receives and thus belief of player 1.
Each stage game is parameterized by the current belief b āˆˆ ļæ½(S)
of player 1. For piecewise-linear and convex V and V, a stage game
is solved using linear programming. We show this linear program
for V (represented as a point-wise maximum over a set r of linear
i
functions Ī± : ļæ½(S) ā†’ R).
min V (2a)
s.t.
a2
2 2
Ļ€ (s āˆ§a ) =b(s) (2b)
s,a2
2 2
V ā‰„ Ļ€ (s āˆ§a )R(s, a ,a
1 2
o
) + Ī³ V 1
a o (2c)
1
a o
s,a2
2
b (s ) = Ļ€ (sāˆ§ 2 1 2
a )T(o, s |s,a , a ) (2d)
1
a o
s
i
1
a o
V ā‰„ Ī± (s )b (s )
āˆ€s
āˆ€a1
āˆ€a1,o,s
āˆ€a1,o,Ī±i (2e)
2 2
Ļ€ (s āˆ§a ) ā‰„0 āˆ€s,a 2 (2f)
In this linear program, player 2 seeks a strategy Ļ€2 for the
current stage of the game to minimize the utility V of player 1.
Here Ļ€2(sāˆ§a2) stands for the joint probability (ensured by con-
straints (2b) and (2f)) that the current state of the game is s and
player 2 plays the action a2. Constraint (2c) stands for player 1
choosing the best-responding action a1. Constraint (2d) expresses
the belief in the subsequent stage of the game when a1 was played
by player 1 and observation o was seen (multiplied by the
probability of seeing that observation), and ļ¬nally con- straint (2e)
represents the value of V in the belief ba1o. Such a linear program
can be then combined with point-based variants of the value-
iteration algorithm, such as Heuristic Search Value Iteration (HSVI)
(HorƔk et al., 2017; Smith and Simmons, 2004).
4 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
A different approximation scheme is used for the upper bound
V on Vāˆ—. Instead of using the point-wise maximum over linear
function, V is expressed using a set Ļ’ = {(b(i), y(i)) |1 ā‰¤ i ā‰¤ |Ļ’|} of
V(b) =min
Ī»āˆˆR
|Ļ’ |
ā‰„0
i
(i) T
1ā‰¤iā‰¤|Ļ’| 1ā‰¤iā‰¤|Ļ’|
i
(i)
points. The lower convex hull of this set of points is then formed to
obtain the value of V (b),
r
Ī» y |1 Ī» = 1, Ī» b = b . (3)
Constraint (2e) can then be adapted to use this representation of
V as detailed in HorĆ”k and BoÅ”anskĆ½ (2016).
4. Compact representation of Vāˆ—
The dimension of the value function Vāˆ— depends on the num-
ber of states, which can be very large in games like the lateral
movement POSG where the number of states is exponential in the
size of the network. We propose an abstraction scheme called
summarized abstraction to decrease the dimensionality of the
problem by creating a simpliļ¬ed representation of the beliefs over
the state space.
We associate each belief b āˆˆļæ½(S) in the game with a character-
istic vector Ļ‡(b) = A Ā·b (for some matrix A āˆˆRkƗ|S| where k Ā« |S|)
and we deļ¬ne an (approximate) value function V
Ėœāˆ—: Rk ā†’ R over
the space of characteristic vectors. We denote Ļ‡ 0 the characteristic
vector corresponding to the initial belief b0, i.e., Ļ‡ 0 = A Ā·b0.
The main goal is to adapt algorithms based on value iteration
to operate over the more compact space Rk instead of the original
belief space ļæ½(S). First, we adapt the ļ¬xed point Eq. (1) for OS-
POSGs to work with compact V
Ėœāˆ—(instead of original Vāˆ—). We let the
player 2 choose any belief that is consistent with the current char-
acteristic vector Ļ‡ (by adding an extra minimization term), and we
replace the value of belief b, Vāˆ—(b), by the value of its characteri-
zation V
Ėœāˆ—(A Ā·b). We also denote this compound function V
Ėœāˆ—ā—¦ A.
V
Ėœāˆ—(Ļ‡) = H
Ėœ
V
Ėœāˆ—(Ļ‡)
b|Ab=Ļ‡ Ļ€2 Ļ€1
(
= min min max Eb,Ļ€1 ,Ļ€2
[R]+
a1 ,o
1 2
b,Ļ€ ,Ļ€ 1
Ėœ āˆ—
( 1 2
+Ī³ Pr [a , o] Ā·V A Ā·Ļ„(b, a , Ļ€ , o)) (4)
Next, we show that the value function V
Ėœāˆ—satisfying the ļ¬xed point
Eq. (4) (and obtained as a limit of applying operator H
Ėœ) provides a
valid lower bound on the solution of the original game
representation over the belief space.
Theorem 1. V
Ėœāˆ—(Ļ‡(b)) ā‰¤ Vāˆ—(b) for every b āˆˆļæ½(S).
Proof. We prove the theorem by induction. Let V
Ėœ0 be arbitrary
Ėœ
0 0 0
(b) Ėœ0 0
and let V of the original game be V (b) = V (Ļ‡ ) (i.e., V ā‰¤ V ).
Assume VĖœt(Ļ‡(b)) ā‰¤ Vt(b) for every b āˆˆļæ½(S). Now H(VĖœt ā—¦ A) ā‰¤
H(Vt). The extra minimization over beliefs b in HVĖœcan only decrease
the utility of the stage game and hence
Ėœ (b) Ėœ Ėœ (b) Ėœ
t+1 t t t t+1
V (Ļ‡ ) = HV (Ļ‡ ) ā‰¤ H(V ā—¦ A)(b) ā‰¤ HV (b) = V (b). (5)
D
Note that the Eq. (4) can also be solved using linear program-
ming. Consider a lower bound VĖœL
B on V
Ėœāˆ— formed as a point-wise
maximum over linear functions Ī±i(Ļ‡) = (a(i))T Ļ‡ + z(i). We modify
the linear program (2) by considering that the belief b is a variable
(constrained by Ļ‡),
constraints (2b), (2c), (2d), (2f) (6a)
s
b(s) =1 (6b)
A Ā·b = Ļ‡ (6c)
b(s) ā‰„0 āˆ€
s, (6d)
and we replace the constraint (2e) to account for different
representation of VĖœL
B by
Ļ‡a1 o = A Ā·ba1 o āˆ€a1,o (6e)
q = 1 Ā·b
1 1
a o T a o
1
āˆ€a ,o (6f)
1
a o
1 1
(i) T a o (i) a o
V ā‰„ (a ) Ā·Ļ‡ + z Ā·q āˆ€a1,o,Ī±i, (6g)
where qa1o = 1T Ā·ba1o is the probability that o is generated when
player 1 uses action a1.
Observe that the only constraints with non-zero right-hand side
constants in this new linear program are constraints (6b) and (6c).
This is critical in the following proof that V
Ėœāˆ—is convex.
Theorem 2. Value function V
Ėœāˆ—is convex.
Proof. Consider a dual formulation of the linear program (2) up-
dated according to equations (6). Since the only non-zero right-
hand side terms in the primal are 1 and Ļ‡ , the objective of the
dual formulation is o(Ļ‡) = Ļ‡ T Ā·a + z. Moreover, this is the only
place where the characteristic vector Ļ‡ occurs. Hence the polytope
of the feasible solutions of the dual problem is the same for every
characteristic vector Ļ‡ āˆˆRk and o(Ļ‡ ) (after ļ¬xing variables a
and z) forms a bound on the objective value of the solution for
arbitrary Ļ‡ . Since we maximize over all possible o(Ļ‡ ) in the dual,
the objective value of the linear program (and also HĖœVĖœLB) is convex
in the parameter Ļ‡.
LB
Now, starting with an arbitrary convex (e.g., linear) V
Ėœ0 , the
LB t= 0
Ėœ t āˆž Ėœ t+1
LB
Ėœ Ėœt
LB
sequence of functions {V } , where V = HV , is formed only
by convex functions. Therefore, the ļ¬xed point V
Ėœāˆ—is also a convex
function. D
4.1. HSVI algorithm for compact POSGs
The Algorithm 1 we propose for solving abstracted games
Algorithm 1: HSVI algorithm for one-sided POSGs when sum-
marized abstraction is used.
1 Initialize VĖœL
B and VĖœUB to lower and upper bound on V
Ėœ āˆ—
ĖœUB
0 ĖœLB
0
2 while V (Ļ‡ ) āˆ’ V (Ļ‡ ) > E do
3 Explore(Ļ‡0, E,0)
4 procedure Explore(Ļ‡, E,t)
5 (b, Ļ€2) ā† optimal belief and strategy of player 2 in
HĖœVĖœLB(Ļ‡)
6 (a1, o) ā† select according to heuristic
7 Ļ‡ ā† Ļ„(Ļ‡, a1, Ļ€2, o)
8
Ėœ Ėœ
Update r and Ļ’ based on the solutions of HĖœVĖœLB(Ļ‡) and
HVUB(Ļ‡)
9 if VĖœUB(Ļ‡ ) āˆ’ VĖœLB(Ļ‡ ) > EĪ³āˆ’t then
10 Explore(Ļ‡ , E,t + 1)
11 Update r and Ļ’ based on the solutions of HĖœVĖœLB(Ļ‡ )
and HĖœVĖœUB(Ļ‡)
is a modiļ¬ed version of the original heuristic search value it-
eration algorithm (HSVI) for solving unabstracted one-sided POSGs
(HorƔk et al., 2017). The key difference here is that we use the value
functions VĖœL
B and VĖœUB (instead of V and V ) and we must have
modiļ¬ed all parts of the algorithm to use the abstracted
representation of the beliefs.
First, we initialize bounds VĖœL
B and VĖœUB (line 1) to valid piece-
wise linear and convex lower and upper bounds on V
Ėœāˆ—(using sets r
of linear functions Ī±i = (a(i))T Ļ‡ + z(i) and Ļ’ of points (Ļ‡ (i),
K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 5
y(i))). Then, we perform a sequence of trials (lines 2ā€“3) from the
initial characteristic vector Ļ‡ 0 = Ab0 until the desired precision
E> 0 isreached.
In each of the trials, we ļ¬rst compute the optimal optimistic
strategy of player 2, which in this case is the selection of the
belief b and strategy Ļ€ (line 5). Next, we choose the action a
2 1
of player 1 and the observation o (lines 6ā€“7) so that the excess
approximation error VĖœUB(Ļ‡ ) āˆ’ VĖœLB(Ļ‡ ) āˆ’ EĪ³āˆ’t in the subsequent
stage (where the belief is described by a characteristic vector
Ļ‡ = Ļ„(Ļ‡, a1, Ļ€2, o)) is maximized. If this excess approximation er-
ror is positive, we recurse to the characteristic vector Ļ‡ (line 10).
Before and after the recursion the bounds VĖœLB(Ļ‡ ) and VĖœUB(Ļ‡) are
improved using the solution of HĖœVĖœLB(Ļ‡) and HĖœVĖœUB(Ļ‡) (lines 8 and
11). The update of VĖœUBis straightforward and a new point (Ļ‡,
HĖœVĖœUB(Ļ‡)) is added to Ļ’ . To obtain a new linear function to
add to the set r , we use the objective function o(Ļ‡) = Ļ‡ T Ā·a + z
(after ļ¬xing variables a and z) of the dual linear program to (6) that
forms a lower bound on HĖœVĖœL
Band V
Ėœāˆ—(see the proof of Theorem 2 for
more discussion).
4.2. Extracting strategy of the player 1
In Sections 3 and 4, we have presented linear programs to solve
stage game H
Ėœ
V
Ėœ (Ļ‡) from the perspective of the player 2. These
linear programs solved a minimax problem of player 2 by explicitly
reasoning about her strategies. The actions of the player 1, however,
were only represented as best-response con- straints which does
not provide the strategy of player 1 directly. In this section, we
discuss the dual formulation of the linear program (6) and show
the way the strategy of the player 1 can be extracted from dual
variables.
Consider constraints (2c), (6e), (6f) and (6g) and let us establish
the following mapping to dual variables.
Constraints (2c)
Constraints (6g)
Constraints (6e)
Constraints (6 f )
Dual variables Ļ€1(a1) ā‰„ 0
Dual variables Ī²a1 o(Ī±i) ā‰„ 0
Dual variables aĢ‚a1 o āˆˆRk
Dual variables zĢ‚a1 o āˆˆR
The dual formulation of (6) then contains the following constraints
1
(corresponding to variables V, Va o, Ļ‡a1 o and qa1o):
a1
1 1
Ļ€ (a ) = 1 (7a)
Ī±i
1
a o
i 1 1
Ī² (Ī± ) = Ī³ Ļ€(a ) āˆ€a1,o (7b)
1
a o
j
Ī±i
j
1
(i) a o
i
a
Ė† = a Ī² (Ī± ) 1
āˆ€a ,o (7c)
1
a o
Ī±i
1
(i) a o
i
zĖ† = z Ī² (Ī± ) 1
āˆ€a ,o. (7d)
Constraints (7) give the following interpretation to variables
Ļ€1, Ī²a1o, aĖ†a1o and zĖ†a1o. Variables Ļ€1 directly represent the strategy
of player 1 for the current stage, Ļ€1 āˆˆļæ½(A1). Variables Ī²a1o(Ī±i) are
the probabilities of playing strategy the value of which is repre-
sented by the linear function Ī±i(Ļ‡) = (a(i))T Ļ‡ + z(i) in a subgame
where player 1 played action a1 and received observation o. These
probabilities are represented in the form of realization plans, i.e.,
they sum to Ī³Ļ€ 1(a1) instead of one. We can, however, obtain the
true probabilities by dividing Ī²a1o by Ī³Ļ€ 1(a1). Variables aĖ†a1o
and zĖ†a1o then deļ¬ne a linear function Ī¶Ė†a1o(Ļ‡ ) = (aĖ†a1o)T Ļ‡ + zĖ†a1o
representing the value of playing the given mix of strategies in the
subgame and are used to express the value of the game in the rest
of the dual formulation.
By normalizing variables Ī²a1o we obtain a linear function
Ī¶ a1o : Rk ā†’ R, called gadget, which represent the value in the
subgame given that subgame has been reached,
Ī¶a1 o(Ļ‡ ) = (aa1 o)TĻ‡ + za1 o where
1
a o
Ī±i
1
i 1 1
a = (Ī² (Ī± )/Ī³ Ļ€(a )) Ā·a
a o (i)
1
a o
Ī±i
1
i 1 1
a o (i)
z = (Ī² (Ī± )/Ī³Ļ€ (a )) Ā·z . (8)
Note that the gadgets Ī¶ a1o are only deļ¬ned for subgames that can
be reached, i.e., where Ļ€1(a1) > 0.
The representation of a strategy of player 1 using variables Ļ€ 1
and Ī¶ can then be used to play the game following the similar
ideas that are used in continual resolving in the domain of
extensive-form games (MoravcĖ‡Ć­k et al., 2017).
Algorithm 2: Playing a compact OS-POSG using VĖœLB.
0
1 Ļ‡ ā† Ļ‡ , Ī¶ ā† initial gadget Ī¶ (Ļ‡) =āˆ’āˆž
2 while player~1 is required to act do
3 Solve HĖœVĖœLB(Ļ‡) with additional constraint o(Ļ‡) ā‰„ Ī¶ (Ļ‡) in
the dual formulation and extract strategy Ļ€1, Ī¶ of
player~1
4 Sample action a1 āˆ¼ Ļ€1, execute it and observe an
observation o
5 Ļ‡ ā† Ļ‡a1o/qa1o, Ī¶ ā† Ī¶ a1o
Algorithm 2 describes the algorithmic scheme to play a game in
compact representation. Whenever player 1 is required to act, he
solves a stage game HĖœVĖœLB(Ļ‡). In the ļ¬rst stage, he solves the game
without any additional constraints (since initially Ī¶ (Ļ‡) = āˆ’āˆž) to
ļ¬nd the near-optimal strategy for the entire game. In the subse-
quent stages, player 1 has to ensure that the strategy he is about to
play is consistent with the strategy assumed in the ļ¬rst stage, i.e.,
it provides (at least) the value represented by the gadget Ī¶ a1o
from the previous stage. To this end, a constraint o(Ļ‡ ) ā‰„ Ī¶ is added
to the dual formulation of (6) (recall that o(Ļ‡ ) is the objective of
the dual linear program).
5. The lateral movement POSG
We now formally deļ¬ne the lateral movement POSG and de-
scribe the key steps of the algorithm speciļ¬cally for this domain.
The game is played on a graph G = (V, E) representing a computer
network. We assume that V = {v1, . . . , v|V|}. The goal of the at-
tacker is to reach vertex v from the initial source of the infection
|V|
v1 by traversing the edges of the graph, while minimizing the cost
to do so, see Fig. 1(a).
Initially, the attacker controls only the vertex v1, i.e., the initial
state of infection is I0 = {v1}. Then, in each stage of the game,
k
i=1
the attacker chooses a directed path P = {P(i)} (where k is the
length of the path) from any of the infected vertices to the target
vertex v|V|. Fig. 1(a) provides examples of paths the attacker may
consider, however, he can only use paths P2 and P3 if vertex v2 has
been previously infected.
Unless the defender takes countermeasures, the attacker infects
all the vertices on the path, including the target vertex v|V|, and
pays the cost of traversing each of the edges on the path,
āˆ’,P
P(i)āˆˆP
c = C(P(i)), (9)
where C(P(i)) is a cost associated with taking edge P(i).
The defender tries to discover the attacker and increase his cost
to reach the target (and thus possibly discourage the attack) by
6 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
v1
v|V|
v2
v3
v5
v4
a
b(I0) = 0.7
I0 I1 I2
I3 I4 I5
I6 I7 I8
b
v|V|
v2
v3
v5
v1
v4
1
Ļ‡(b)
= 1
3
Ļ‡(b)
= 0
5
Ļ‡(b)
= 0
2
Ļ‡(b)
= 0.3
4
Ļ‡(b)
= 0
b(I8) = 0.3
c
Fig. 1. Example of a lateral mo vement POSG: (a) Example of a network with 6 vertices. Attacker starts in vertex v1 and is trying to reach vertex v|V| while minimizing the cost. (b)
Subset of states of the game (in total, 14 non-terminal states are reachable). Initial state of the game is I0 (only v1 is infected). Further states are reached by infecting adjacent
vertices to an already infected vertex. OS-POSGs deļ¬ne the belief as a probability distribution over statesā€”b is an example of such belief. (c) Our approach
i
summarizes the belief b using a characteristic vector Ļ‡ (b). In this case, Ļ‡(b) represents the probability that vertex vi is infected when belief b from Fig. 1(b) is considered
2 4 5 6 8
(b)
2 4 5 6 8
(here, v is infected in the states of infection I , I , I and I , i.e., Ļ‡ = b(I ) + b(I ) + b(I ) + b(I ) = 0.3).
deploying honeypots into the network. In each stage, the defender
can decide a subset H āˆˆE[NH] of edges E (where E[NH] denotes all
subsets of E with cardinality NH) to be honeypots. A honeypot
is then deployed on every edge h āˆˆH and is able to detect if the
attacker traverses that speciļ¬c edge. Furthermore, it also increases
the cost for using the edge h to C(h). If the attacker observes that
he has traversed any of the honeypot edges, he may decide to
change his plan and therefore he does not execute the rest of his
originally intended path P. The cost of playing a path P against a
honeypot placement H is therefore
H,P
c =
P(i)āˆˆPā‰¤H hāˆˆH
hāˆˆPā‰¤H
C(P(i)) + 1 Ā·[C(h) āˆ’C(h)] (10)
where Pā‰¤ H is the preļ¬x of the path P until the interaction with
any honeypot edge h āˆˆH,
ā‰¤H
P =
P Pāˆ©H = āˆ…
{P(i)}min{ j|P(j)āˆˆH}
i=1 otherwise.
(11)
For the reasons of notational simplicity, we write Pā‰¤ h instead of
Pā‰¤ {h} when it is clear that h is a singleton.
Since the attacker does not need to continue to execute his
selected path P (since the defender can update his belief over the
possible subsets of infected nodes and thus reconļ¬gure the hon-
eypots), the new state of infection IH,P (i.e., the subset of vertices
that are infected at the beginning of the next stage) becomes
IH,P = I āˆŖ{v |(u, v) āˆˆPā‰¤H}. (12)
We assume the worst case scenario from the perspective of the
defender where the attacker is assumed to be able to infer the
position of all honeypots upon interacting with any of them.
The above problem can be formalized as a one-sided POSG
without using the summarized abstraction where
ā€¢ states are possible subsets of infected vertices (or infections),
i.e., SāŠ† 2V (Fig. 1(b) shows an example of a state space where
ļ¬lled dots represent infected vertices),
ā€¢ actions of the defender are honeypot allocations, i.e., A1 = E[NH]
where E[NH] denotes NH-element subsets of E,
ā€¢ actions A2 of the attacker are directed paths in G reaching v|V|,
ā€¢ observations denote whether and where the defender detected
the attacker (observations det(h)) or the attacker reached the
target v|V| undetected (observation Ā¬det),
ā€¢ transitions follow the Eq. (12) and an observation det(h) is gen-
erated if and only if the honeypot edge h is the ļ¬rst honeypot
edge traversed on the path chosen by the attacker,
ā€¢ reward of the defender is the cost of the attacker, i.e.,
R(H,P) = cH,P,
ā€¢ discount factor of the game is Ī³ = 1, and
0 0
0
ā€¢ initial belief b of the game satisļ¬es b (I ) = 1.
Note that the original HSVI algorithm for one-sided POSGs has
been deļ¬ned and proved for discounted problems with Ī³ < 1. In
this particular case, however, the convergence properties translate
even to the undiscounted case since the game is essentially ļ¬nite
(in a ļ¬nite number of steps, all vertices, including v|V|, get infected
and the game ends).
5.1. Characteristic vectors
The number of states in the game is exponential in the number
of vertices of the graph, up to |S|= 2|V |āˆ’2 + 1 (we consider that v1
K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 7
R|V|
aTĻ‡ + k
Ļ‡
VĢƒU B
Fig. 2. Dual interpretation of the projection on the convex hull.
is always infected and we treat all states where v|V| is infected as
a single terminal state of the game). We use marginal probabilities
of a vertex being infected as characteristic vectors Ļ‡ āˆˆR|V|, i.e.
(b)
i
Ļ‡ =
IāˆˆS|viāˆˆI
b(I). (13)
Consider the example of a belief from Fig. 1(b). The belief in the
original OS-POSG model is a probability distribution over
states, i.e., |S|-dimensional vector with |S|āˆ’ 1 degrees of freedom.
In this game, the number of non-terminal reachable states (i.e.,
states where v|V| is not yet infected) is 14. In comparison, the
characteristic vector (see Fig. 1(c)) has only |V| dimensions. More-
over, observe that in this case the set {b |Ab = Ļ‡(b)} of possible
beliefs that get projected to Ļ‡ (b) is a singleton and contains only
the belief b from Fig. 1(b).
5.2. Value function representation
The algorithm from Section 4.1 approximates V
Ėœāˆ—using a pair of
value functions, the lower bound VĖœL
B and the upper bound VĖœUB.
While we have discussed the representation of the lower bound
(which is represented using a set r of linear functions
Ī±i(Ļ‡) = (a(i))T Ļ‡ + z(i)), the representation of the upper bound is
more challenging.
In the original algorithm, the value function V is deļ¬ned (see Eq.
(3)) over the probability simplex ļæ½(S). In that case, it s
u
ļ¬ƒ
c
e
s to
consider |S| points in Ļ’ to deļ¬ne V for every belief. In contrary, the
space of characteristic vectors (i.e., marginal probabilities) is formed
by a hypercube [0, 1]|V| with 2|V| vertices, which would make the
straightforward point representation (using Eq. (3)) impractical.
We can, however, leverage the fact that in this domain infecting
an additional node can only decrease the cost to the target (and
hence V
Ėœ āˆ— is decreasing). Consider the dual formulation of the
optimization problem (3). In this formulation, the projection of Ļ‡
to the lower convex hull of a set of points is represented by the
optimal linear function aTĻ‡ + z deļ¬ning a facet of the convex hull
(see Fig. 2). Since V
Ėœāˆ— is decreasing in Ļ‡ , we can also enforce that
aTĻ‡ + z is decreasing in Ļ‡ (i.e., add the constraint a ā‰¤ 0 to the
dual formulation). This additional constraint translates to a change
1ā‰¤iā‰¤|Ļ’| i
(i)
Ī» Ļ‡ = Ļ‡ in the primal problem to an
of the equality
inequality.
VĖœUB(b) =min
Ī»āˆˆR
|Ļ’ |
ā‰„0
r
i
(i) T
1ā‰¤iā‰¤|Ļ’| 1ā‰¤iā‰¤|Ļ’|
i
(i)
Ī» y |1 Ī» = 1, Ī» Ļ‡ ā‰¤Ļ‡ (14)
Now it is suļ¬ƒcient that the set Ļ’ contains just one point (Ļ‡ (i), y(i))
where Ļ‡(i) = 0|V| (instead of 2|V| points) to make the constraint
:E1 ā‰¤iā‰¤|Ļ’ | i
(i)
Ī» Ļ‡ ā‰¤ Ļ‡ satisļ¬able.
It is possible to adapt the constraint (6g) to use the represen-
tation from (14), similarly to the original (unabstracted) one-sided
POSGs (HorĆ”k and BoÅ”anskĆ½, 2016), and thus obtain linear program
to perform the upper bound computation.
5.3. Using marginalized strategies in stage games
The linear program formed by modiļ¬cations from Eqs. (6) still
requires solving the stage game for the original, unabstracted
problem. In this section, we show that it is possible to avoid
expressing the belief b explicitly, and to compute the stage game
directly using the characteristic vectors and marginalized strategies
of the attacker.
First, we present the representation of the stage-game strategies
of the attacker. Instead of representing joint probabilities Ļ€ 2(Iāˆ§P)
of choosing path P in state I, we only model the probability
Ļ€
Ėœ2(P) of choosing path P aggregated over all states I āˆˆS. Further-
more, we allow the attacker to choose the probability Ī¾ (Pāˆ§vi) that
vertex vi is infected while he opts to follow path P.
P
2
Ļ€
Ėœ (P) =1 (15a)
i
0 ā‰¤ Ī¾(P āˆ§v ) ā‰¤ Ļ€2
Ėœ (P) āˆ€
P, vi (15b)
Ļ€
Ėœ2(P) ā‰„0 āˆ€
P (15c)
To ensure that the strategy represented by variables Ļ€
Ėœ2 and Ī¾
is feasible it must be consistent with the characteristic vector Ļ‡ ,
i i
where Ļ‡ is the probability that the vertex v is infected at the
beginning of the stage.
P
Ī¾(P āˆ§v ) = Ļ‡
i i āˆ€
vi (15d)
Furthermore, the path P must start in an already infected
vertex (denoted as Pre(P)), i.e., the conditional probability Pr[Pre(P)
āˆˆI |P] of Pre(P) being infected when path P is cho-
sen has to be 1. Now, since Ī¾ (Pāˆ§v) is the joint probability,
Ī¾(P āˆ§v) = Pr[Pre(P) āˆˆI|P] Ā·Ļ€
Ėœ2(P),
Ī¾(P āˆ§v) = Ļ€
Ėœ2(P) āˆ€P,v =Pre(P). (15e)
Example 1. Consider the example from Fig. 1(a) and assume that
the attacker wanted to play a strategy Ļ€ 2 in the original game,
where
Ļ€2(I0 āˆ§P1) = 0.7 Ļ€2(I8 āˆ§P1) =0.1
Ļ€2(I8 āˆ§P2) = 0.1 Ļ€2(I8 āˆ§P3) =0.1.
The same strategy can be described using the marginalized
representation Ļ€
Ėœ2 and Ī¾:
Ļ€
Ėœ2(P1) = Ļ€2(I0 āˆ§P1) + Ļ€2(I8 āˆ§P1) = 0.8
Ļ€
Ėœ2(P2) = Ļ€2(I8 āˆ§P2)= 0.1
Ļ€
Ėœ2(P3) = Ļ€2(I8 āˆ§P3)= 0.1
Ī¾ (P1 āˆ§ v1) = 0.8 Ī¾ (P1 āˆ§ v2) = 0.1 Ī¾
(P2 āˆ§ v1) = 0.1 Ī¾ (P2 āˆ§ v2) = 0.1 Ī¾
(P3 āˆ§v1) = 0.1 Ī¾ (P3 āˆ§v2) = 0.1
2
all other variables Ļ€
Ėœ and Ī¾ are zero
It is straightforward to verify that variables Ļ€
Ėœ 2 and Ī¾ satisfy Eqs.
(15a)ā€“(15e) when the characteristic vector Ļ‡ (b) from Fig. 1(c) is
considered.
The representation of strategies of the attacker using variables
2
Ļ€
Ėœ and Ī¾ is suļ¬ƒcient to express the expected immediate reward
of the strategy Ļ€
Ėœ2, hence the constraint (2c) can be changed to use
the marginalized strategies,
P
V ā‰„ Ļ€
Ėœ2(P)cH,P +
hāˆˆH
VH,det(h)
[NH]
āˆ€
H āˆˆE . (15f)
Importantly, we can also skip the computation of the belief
bH,det(h) and compute the characteristic vector formed by the
marginals Ļ‡ H,det(h) directly from the variables Ļ€
Ėœ2 and Ī¾ . We now
8 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
present the equation to compute the updated marginal Ļ‡ H,det(h)
given that the attacker has been detected while traversing the
honeypot edge h (and this has been the ļ¬rst honeypot edge he
traversed). Denote the set of all paths satisfying this condition
h
H ā‰¤h ā‰¤H
P = {P |h āˆˆPāˆ§P āŠ† P }.
Ļ‡ H,det(h)
i
=
h
H i
PāˆˆP |P āŠ†P
ā‰¤(Ā·,v ) ā‰¤h
2
Ļ€
Ėœ (P) +
h
H i
PāˆˆP |P hāŠ†P
ā‰¤(Ā·,v ) ā‰¤h
i
Ī¾(P āˆ§v ) (15g)
The ļ¬rst sum stands for the probability that the attacker is de-
tected while traversing edge h, but he managed to infect v on his
i
chosen path before the detection. The second sum represents the
probability that the attacker was detected on the edge h as well,
but this time he has not infected v using path P, however, the
qH,det(h)
PāˆˆPh
H
= Ļ€2
Ėœ (P). (15h)
We need not consider the subsequent stages where the attacker
has not been detected (i.e., Ā¬det observation has been generated) or
the honeypot edge h reaches the target vertex v|V|. In each of these
cases, the target vertex has been reached and thus the value of the
subsequent stage is zero.
Example 2. Consider once again the example from Fig. 1 and the
strategy of the attacker Ļ€ 2 from Example 1 and assume that a
honeypot is deployed on the edge h = (v3, v5) only (i.e., H = {h})
and the attacker has been detected there. In such case, the attacker
could have either used path P1 or P2, but not P3 as it does not reach
h.
If the attacker used P1 in I0, he infects v3 and v5 up to the point
of the detection and thus reaches a new state of infection
I (this might have happened with probability 0.7). Similarly,
3
using P1 in I8 reaches I4 (with probability 0.1) and using P2 in I8
reaches I5 (with probability 0.1). The (denormalized) belief bH,det(h)
(corresponding to Eq. (2d)) is thus
bH,det(h)(I3) = 0.7 bH,det(h)(I4) = bH,det(h)(I5) = 0.1.
Computing marginal probabilities of each vertex being infected in
the belief bH,det(h) yields
Ļ‡1
H,det(h) H,det(h) H,det(h)
3 5
= Ļ‡ = Ļ‡ =1
Ļ‡ H,det(h)
2
= 0.2 Ļ‡ H,det(h)
4
= 0.1.
We can obtain exactly the same marginal probabilities when
2
using marginalized strategy Ļ€
Ėœ and Ī¾ from Example 1. For ex-
ample, neither P1 nor P2 infects vertex v2 (i.e., no edge ( Ā·, v2)
Ļ‡
is present). Therefore, the ļ¬rst sum in Eq. (15g) is empty and
H,det(h)
2 1 2 2 2 2
= Ī¾ (P āˆ§v ) + Ī¾ (P āˆ§v ) = 0.2. Similarly, only P infects
in (15g), and Ļ‡
v4 before reaching edge h, hence is contained in the ļ¬rst sum
H,det(h)
4 2 2 1 4
= Ļ€
Ėœ (P ) + Ī¾ (P āˆ§v ) =0.1.
5.4. Initializing bounds
We now describe our approach to initialize bounds VĖœL
B and VĖœUB
(line 1 of Algorithm 1). We start with the former. To obtain a lower
bound VĖœLB, we assume that no honeypots can be deployed in the
network. The attacker then only needs to ļ¬nd the cheapest (i.e.,
shortest) path in the graph when costs C are considered. Denote
Cāˆ—(vi) the cost of the cheapest path from vertex vi. We ini- tialize
the lower bound VĖœL
B using two linear functions. Firstly, the
cost cannot be negative and we use a linear function Ī±1(Ļ‡) = 0.
Next, we consider linear function Ī±2(Ļ‡) = (a(2))T Ļ‡ + z(2) where
(2)
1
(2)
i
z = Cāˆ—(v ) and a =min{ i 1
Cāˆ—(v ) āˆ’ Cāˆ—(v ), 0} which is a tighter
lower bound for some Ļ‡ (especially close to the initial character-
0 0
1
istic vector Ļ‡ where Ļ‡ = 1 and other vertices are not infected,
0
i
i.e., Ļ‡ = 0 for i ā‰„ 2).
To initialize the upper bound VĖœUB, we consider a perfect in-
formation variant of the game (similarly to HorƔk et al. (2017)).
However, since the number of states (i.e., possible subsets of
infected vertices) in such a game can be exponential, we consider a
simpler version of the game where the attacker is only in charge of
the vertex he visited the last (instead of a subset of all ver- tices
visited in the past) which can only increase the cost for the
i
attacker. Denote C
āˆ—
(v ) the expected cost of the attacker in the per-
fect information variant of the game where the attacker controls
vertex vi only. We then use the set Ļ’ = {(Ļ‡( j), y( j)) |1 ā‰¤ j ā‰¤ |V|}
UB
of points to initialize VĢƒ where
( j)
i
i
vertex vi has already been infected before he started to execute
path P.
Analogously, we can obtain the probability qH,det(h) that the Ļ‡ =
attacker got detected while traversing edge h as
1 i = j
0 i h= j
( j) āˆ—
j
y = C (v ). (16)
6. Scaling up: incremental strategy generation
In the previous section, we introduced marginalized beliefs and
strategies to deal with the large number of game states. However,
the number of actions grows fast as well which makes the game
computationally challenging and may cause memory consumption
issues in practical implementation of the approach. In this section,
we address this issue by adopting a double-oracle approach to
generate actions of the players incrementally.
6.1. Double-Oracle scheme for solving stage games
Before describing the oracle algorithms used in the exper-
iments, we provide a generic scheme for using double-oracle
approach to solving stage games H
Ėœ
V
Ėœ(Ļ‡), see Algorithm 3. The
Algorithm 3: Generic double-oracle scheme for solving stage
games.
Ėœ Ėœ
1 2
1 A , A ā† initial subsets of actions of player~1 and player~2
deļ¬ning a restricted game
2 When solving a stage game H
Ėœ
V
Ėœ (Ļ‡):
3 do
4 Solve stage game H
Ėœ
V
Ėœ(Ļ‡) considering actions AĖœ1and AĖœ2
only
if improving allocation H for the defender is found then
5
6 AĖœ1 ā† AĖœ1āˆŖ{H}
7 continue
8 else if improving path P for the attacker is found then
9 AĖœ2 ā† AĖœ2āˆŖ{P}
10 continue
11 break
12 while an improving action is found for one of the players
algorithm keeps global sets of actions AĖœ1 and AĖœ2 deļ¬ning a re-
stricted game where only these actions are considered (instead of
entire sets of actions A1 and A2). Set A1 is initialized to a single
arbitrary honeypot allocation, set A2 is initialized to an
arbitrary set of |V| paths P1, . . . , P|V| from each of the vertices
vi to v|V|. When solving a stage game, we ļ¬rst query the oracle of
the defender before querying the oracle of the attacker. This
decision is motivated by the possibility to use a heuristic oracle for
generating actions of the defender (see Section 6.4) which is less
computationally demanding compared to the exact computation.
K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 9
6.2. Exact oracle of the attacker
In Section 4.2, we have described the way the strategy of player
1 can be extracted in the form of variables Ļ€1 and gadgets
Ī¶ H,det(h)(Ļ‡ ) = (aH,det(h))T Ļ‡ + zH,det(h). The oracle of the attacker
inspects this strategy and its lower bound o(Ļ‡ ) (extracted from the
objective of the dual formulation of the stage game) and tries to
prove or disprove that the value o(Ļ‡ ) is a valid lower bound on the
solution.
To this end, the attacker selects a single path from any of the
vertices of the graph towards the target vertex v and selects a
|V|
characteristic vector Ļ‡ where this pure strategy of the attacker is
applicable.
Ī“(vj)+ Pij =
(vi,vj)āˆˆE (vj,vk)āˆˆE
Pjk āˆ€vjh=
v|V |
(17a)
vih=v|
V|
i
Ī“(v ) = 1 (17b)
ij
P āˆˆ{0, 1} i j
āˆ€(v, v )āˆˆE (17c)
Ī“(vi) ā‰„ 0 vi h=
v|V |
(17d)
The single path of the attacker is represented as an integral ļ¬‚ow in
the graph of capacity 1. Constraint (17a) represents the Kirchoffā€™s
law which has to hold for any but two verticesā€”the source vertex
of the path which is indicated by Ī“(vi) = 1, and the target vertex
v|V|. Furthermore, constraint (17b) ensures that only one path is
considered.
The characteristic vector Ļ‡ must make the path represented
by the ļ¬‚ow P applicable, i.e. Ļ‡i = 1 for the source vertex vi where
Ī“(vi) =1.
i i
Ļ‡ ā‰„ Ī“(v ) āˆ€v h= v
i |V| (17e)
0 ā‰¤ Ļ‡i ā‰¤1 āˆ€viāˆˆV (17f)
We now express the value of playing a path represented by the
ļ¬‚ow P against the honeypot allocation H. To this end, we ļ¬rst
deļ¬ne an auxiliary ļ¬‚ow PH which is identical to P, except that it
gets blocked by the ļ¬rst honeypot edge (vi, vj) āˆˆ H.
j
Ī“(v ) + H
ij
P =
(vi,vj)āˆˆEH (vj,vk)āˆˆE
PH
jk
āˆ€v h= v
j |V| (17g)
H
P ā‰¤P
ij ij (17h)
H
ij
0 ā‰¤ P ā‰¤1
āˆ€(vi, vj) āˆˆE
āˆ€(vi, vj) āˆˆE (17i)
Note that the fact that ļ¬‚ow PH is identical to P (except for the
blocking on honeypot edges H) is ensured by the constraint (17h)ā€”
if the ļ¬‚ow has to continue from any vertex, it must follow the edge
contained in P. Now, we can express the immediate cost of playing
P against honeypot allocation H as
H,P
c =
(vi,vj)āˆˆEH
i j
H
ij
C(v , v )P +
(vi,vj)āˆˆH
i j ij
C(v ,v )PH. (17j)
Next, we express the characteristic vector Ļ‡ H in the subgame
given that the attacker got detected on any edge from H (here,
(v ,v )āˆˆE
H
i j ij
P is a binary value indicating whether the attacker
reached vj by the time he got detected).
H
j
Ļ‡ =
(vi,vj)āˆˆE
H
ij j
P + Ļ‡ 1 āˆ’
(vi,vj)āˆˆE
PH
ij
I 
(17k)
This allows us to derive the expected cost in the subgame given
that the attacker got detected on a honeypot edge h = (vx, vy) āˆˆ H.
VH,det(h)=
āŽ§
āŽØ0 vy =v|V|
viāˆˆV
āŽ© i
H,det(h) H H H,det(h) H
i xy xy
a Ļ‡ P + z P otherwise (17l)
Note that VH,det(h) is zero if the attacker does not reach edge h (in
H
xy |V|
that case P = 0) or the attacker reached his target v . This gives
us the expected cost VH of the path represented by the ļ¬‚ow P
against honeypot allocation H,
H
H,P
hāˆˆH
V = c + VH,det(h)
, (17m)
and the expected cost of using path P is
H
1
H
V = Ļ€ (H)V . (17n)
The optimization objective of the attacker is to disprove that
o(Ļ‡ ) is the lower bound. To this end, he attempts to ļ¬nd ab-
stracted belief Ļ‡ and path represented by P where the difference V
āˆ’ o(Ļ‡) is maximized, i.e.,
max o(Ļ‡) āˆ’V
s.t. constraints (17a)-(17n).
If o(Ļ‡) āˆ’ V > 0, the function o(Ļ‡ ) is provably not a lower
bound on the solution of the stage game. Hence, the path of the
attacker represented by P can be added into the restricted game.
Note that all products z = term1 Ā·term2 of variable expressions
in the above mathematical program only involves expressions
term1 āˆˆ[0, 1] and a binary expression term2 āˆˆ{0, 1}. Such products
can be rewritten using a set of linear constraints
z ā‰¤ term1 z ā‰¤ term2 z ā‰„ term1 āˆ’ (term2 āˆ’ 1) z ā‰„0
(18)
and thus obtain a mixed integer linear program formulation.
6.3. Exact oracle of the defender
The oracle of the defender is aimed on computing a best
response of the defender (i.e., the best possible placement of
honeypots on edges) against a ļ¬xed strategy of the attacker
(described by variables Ļ€
Ėœ 2 and Ī¾ ). To this end, we propose a
branch-and-bound approach.
Each node of a branch-and-bound search tree corresponds to
a partial allocation of the honeypots H. We can obtain a lower
bound on the cost of the attacker induced by all solutions H āŠ‡ H
by assuming that only honeypot edges in H are used,
P
LB(H) = Ļ€2 H,P
hāˆˆH
H,det(h) Ėœ
Ėœ (P)c + q V(Ļ‡ H,det(h) H,det(h)
/ q ). (19)
Note that the division by qH,det(h) is used to account for the fact that
Ļ‡ H,det(h) is denormalized (it is multiplied by the probability that
observation det(h) is generated).
An upper bound on the solutions H āŠ‡ H is obtained by ne-
glecting the sequential interaction of remaining honeypots (i.e.,
the attacker can be possibly caught more than once). By placing a
honeypot on an edge h āˆˆ
h H, the cost can be increased by at most
ļæ½h =
P|hāˆˆPāˆ§Pā‰¤hāŠ†Pā‰¤H
Ļ€
Ėœ2(P) Ā·(C(h) āˆ’C(h))
HāˆŖ{h},det(h) Ėœ HāˆŖ{h},det(h) HāˆŖ{h},det(h)
+q V(Ļ‡ /q ). (20)
Given that the attacker reaches edge h, he pays C(h) āˆ’ C(h) extra
cost and furthermore he enters a subgame where he got detected
on edge h from a honeypot set HāˆŖ{h}. We can then obtain the
upper bound using
10 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
NHāˆ’|H|
i=1
i
UB(H) = LB(H) + ļæ½āˆ—
, (21)
where ļæ½āˆ—
1,. . . , ļæ½āˆ—
N āˆ’|H| are the NH āˆ’ |H| highest values among ļæ½h.
H
6.4. Heuristic oracles
In practice, ļ¬nding exact solutions of oracles from Sections 6.2
and 6.3 can be computationally expensive. To mitigate the impact
of the oracle computation times on the performance of the
algorithm, we present a heuristic version of the oracle of the
defender. The heuristic oracle is a greedy variant of the exact oracle
presented in Section 6.3.
The use of the heuristic oracle of the defender from the
Algorithm 4 can lead to a degradation of the quality of the
1 H ā† āˆ…
2 while |H| < NHdo
3 hāˆ—ā† arg maxhāˆˆEHļæ½h
4 Hā† HāˆŖ{hāˆ—}
solution found by the HSVI algorithm. By omitting some actions of
the defender from the support of the equilibrium, the cost of the
attacker can be lowered. Nevertheless, the lower-bound guarantees
on the quality of the solution found by the algorithm are retained.
7. Experimental evaluation
In this section we focus on the experimental evaluation of the
proposed techniques on the lateral movement stochastic game
introduced in Section 5. We consider three variants of the HSVI
algorithm for compact POSGs depending on the oracle algorithm
used: (1) the version where no oracles are used (i.e., AĖœi= Ai), (2)
the variant using exact oracles for both of the players (described in
Sections 6.2 and 6.3), (3) and ļ¬nally the variant where heuristic
oracle for the defender (described in Section 6.4) is used instead of
the exact computation of the best response. We also compare these
approaches with the original HSVI algorithm (HorƔk et al., 2017)
where the summarized abstraction is not used and the computa-
tion is performed based on the entire (exponential) state space.
7.1. Experiments setting
The evaluation has been performed on a set of randomly
generated graphs with varying parametersā€”number of vertices in
the network |V|, number of honeypots NH and the density of the
graph. For simplicity we used directed acyclic graphs (DAGs) for the
experiments, however, the techniques presented in the paper are
general enough to apply on scenarios where the network is not
acyclic.
We use the following algorithm to generate mesh-like net-
works. First, positions of |V| nodes are randomly generated from a
2-dimensional interval [0, 20] Ɨ [0, 20] with v1 positioned at (0,0)
and v|V| positioned at (20,20) (pi āˆˆ[0, 20]2 denotes the position
of vi). For each vertex vi a set of k nearest neighbors Nk(vi) (with
respect to positions pj) is determined and an edge connecting
vertices vi and vj is created for every vj āˆˆNk(vi). The orientation
of the edges is set to always lead towards the vertex closer to the
target v|V|. Note that increasing the parameter k leads to an
increase in the number of edges in the graph.
The graph generated by the above method may, however,
contain multiple vertices with zero in-degree or out-degree (apart
of vertices v1 and v|V|). We ļ¬x this by regenerating positions of
Security Level 1 ( Īŗ1)
Security Level 2 ( Īŗ2)
Security Level 3 ( Īŗ3)
Algorithm 4: Heuristic version of the oracle of the defender.
Fig. 3. Example of a mesh network used in experiments. Contour lines depict the
cost function f used to derive the costs C and C of the edges. The vertices are par-
titioned into 3 security levels.
such nodes until ļ¬nding a directed acyclic graphs where only v1
and v|V| have zero in-degree and out-degree, respectively.
Next, we deļ¬ne a cost function f (x, y) ā†’ R>0. This function
indicates the hardness to operate in point (x, y). In our exper-
iments, we use f (x, y) = x + y which attains its maximum in
(20,20) where the target vertex v|V| is located (i.e., the network is
the most secured around the target vertex). To obtain the cost C(vi,
vj) of traversing an edge (vi, vj) given that the honeypot is not
deployed there, we integrate the cost function f along the line
connecting points pi and pj,
{ 1
0
C(vi, vj) = Ipi āˆ’ pj I2 f(Ī»pi + (1 āˆ’Ī»)p j) dĪ». (22)
Furthermore, we partition the vertices into 3 security perime- ters
(with multiplicative constants Īŗ1, Īŗ2 and Īŗ3) based on the distance
from the position of the target vertex v|V|. The vertices closest to
the target vertex v|V| are the most secured, hence the attacker has
to use a potentially costly zero-day exploit to proceed (and risk its
revelation). Thus the cost C(vi, v j) of traversing an edge entering
vertex vj that is present in the most secured zone is high,
C(vi,vj) = Īŗ3 Ā·C(vi,vj) for Īŗ3 = 4. (23)
For security perimeters further away from the target vertex v|V|
we use Īŗ2 = 1.5 and Īŗ1 = 1. Fig. 3 provides an example of such a
network and its accompanying cost function f.
All computational results have been obtained on computers
equipped with Intel Xeon Scalable Gold 6146 processors and 16GB
of available RAM while limiting the runtime of the algorithms to 2
hours. CPLEX 12.9 has been used to solve (mixed integer) linear
programs. The algorithms were required to ļ¬nd an E-optimal
solution where E is set to 1% of the error VĖœUB(Ļ‡ 0) āˆ’ VĖœLB(Ļ‡ 0) in
the initial characteristic vector Ļ‡ 0 after the initialization phase
described in Section 5.4 is completed. If the algorithm failed to
reach this level of precision within 2 hours, we report an instance
as unsolved. The results are based on 100 randomly generated
networks for each parameter set.
7.2. Scalability in the size of graphs
First, we focus on the scalability of the algorithms in size of the
networkā€”in the number of vertices |V|, and in the density of the
graph (controlled by the parameter k).
Fig. 4 depict the scalability of the algorithms in the number of
vertices. First of all, observe that the original HSVI algorithm for
one-sided POSGs has been unable to solve all but the smallest
K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 11
Fig. 4. Scalability in the number of vertices in the network |V| (averages based on 100 instances for each parameter set). Conļ¬dence intervals mark the standard error. The
reported runtimes include only instances solved by the algorithms. The percentage of instances where the algorithms failed to terminate within 2 hours are reported separately.
12 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
Fig. 5. Scalability of the algorithms: (Left) in the density (i.e., the number k of nearest neighbors considered when forming the edges) of the network with 14 vertices and
NH = 2 honeypots, (Right) in the number of honeypots NH in a network with 14 vertices and k = 6. Conļ¬dence intervals mark the standard error. The reported runtimes
include only instances solved by the algorithms. The percentage of instances where the algorithms failed to terminate within 2 hours are reported separately.
instances with 8ā€“12 vertices and even with only NH = 1 honeypots
(while exhausting the memory on larger instances). To this end, we
did not evaluate this algorithm on more complicated instances
with NH > 1. In contrary, the algorithm relying on the technique of
summarized abstractions have been able to solve most of the
instances with NH = 1 even without the use of the incremental
strategy generation technique described in Section 6. The versions
of the algorithm relying on the incremental strategy generation
were then able to solve all 100 randomly generated instances for
each parametrization when NH = 1 honeypot is considered. Im-
portantly, the bounds computed by the versions using incremental
strategy generation overlap with the bounds computed by the re-
maining variants. This demonstrates that in spite of improving the
runtimes by using the incremental strategy generation technique,
the quality of the solution does not deteriorate.
Increasing the number of honeypots to NH = 2 highlights the
advantages of using the oracles to generate actions of the players
incrementally. While the algorithm that does not use the oracles
failed to solve most of the instances with |V| ā‰„ 18 in the time limit
of 2 hours, both of the variants relying on the incremental strategy
generation technique were able to solve even most of the largest
instances with 24 vertices and k = 4 in less than 20 minutes on
average (considering only the instances solved by the algorithms).
Increasing the number of honeypots to NH = 3 highlights the
computational advantages of using the heuristic version of the
oracle of the defender presented in Section 6.4. This version of the
algorithm was able to solve signiļ¬cantly more instances compared
to the version using the exact version of the oracle of the defender
(Section 6.3). Note that the version that does not use the strategy
generation technique was able to solve only a
few instances with |V| ā‰¤ 14. This is mainly attributed to the size
of the linear programs to solve the stage gamesā€”e.g., the number
of possible combinations of actions of players in the games with
14 vertices and k = 5 we considered is up to |A1| Ɨ |A2| '.: 5.2Ā·
107.
Note that the algorithms using the oracles were able to solve
nearly all instances in this setting (|V|= 14 and k = 5).
In Fig. 5 (left), we focus on the scalability of the three vari-
ants of the algorithm using the summarized abstraction based on
the density of the network (controlled by the parameter k). The
number of actions in the game can grow fast with the addition of
new edges to the network which makes solving the game chal-
lenging for the variant without the use of oracles. However, the
addition of the edges need not increase the support of the equilib-
rium and hence the runtime of the variants relying on incremental
strategy generation is not affected by the increasing value of the
parameter k.
7.3. Scalability in the number of honeypots
In Fig. 5 (right) we focus on the scalability of the algorithm in
the number of honeypots. The number of actions of the defender is
exponential in the number of honeypots NH. Hence, it is not
surprising that the algorithm that does not use the incremental
strategy generation to form defenderā€™s actions can solve only the
smallest instances in terms of the number of honeypots NH. The
variants of the algorithm relying on the oracles to incremen- tally
generate the actions perform signiļ¬cantly better, however,
when the number of honeypots is large (NH ā‰„ 4), computing the
best response of the defender exactly (using the approach from
Section 6.3) is prohibitively expensive. In contrary, the heuristic
variant of the oracle of the defender allows the algorithm to scale
better and solve most instances up to NH = 6.
Interestingly, we observed that using the heuristic oracle did
not impact the quality of the solution in a negative way. Consid-
ering the instances from Fig. 5 (right) that were solved by the
variant using the exact oracle of the defender, we observed that
the lower bounds VĖœLB(Ļ‡ 0) computed by these two variants of the
algorithm were within 0.6% of each other.
7.4. Applicability to computer networks
Previous experiments have been focused on randomly gener-
ated mesh networks. The physical properties of mesh networks
typically disallow direct communication between the devices (e.g.,
due to the presence of obstacles, or low signal strength caused by
large distances between the devices). This leads to a signiļ¬cantly
lower number of edges in the corresponding game graph com-
pared to regular computer network architectures. To demonstrate
the versatility of our approach, we evaluate our algorithms on a
graph originating from a fundamentally different network topology
given as an example by a network security expert.
In Fig. 6(a), the physical topology of the considered computer
network is shown. Based on this topology, we derived logical
communication links, shown in Fig. 6(b), that the attacker can use
to perform lateral movement and that do not directly correspond
to the physical links. For example, every device can directly
K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 13
9 end-user machines
5 end-user machines
target
sensor2
sensor1
*
*
*
sensor3
web1*
email*
transf*
*
er
db *
SECON
*
DC *
web2*
a
sensor3
db
SECON
target 5 end-user machines
web1
transfer
web2
sensor1
sensor2
email
Possible attack sources
(7 machines)
9 end-user machines
DC
b
Fig. 6. Computer network topology considered in the experiments: (a) Physical topology of the network. Nodes marked with asterisk exist in multiple instances and can be
used as honeypots. (b) Logical communication links in the network. Dashed lines correspond to easy diļ¬ƒculty levels, solid lines to medium diļ¬ƒculty and ļ¬nally bold solid lines
represent hard diļ¬ƒculty.
communicate with the web server and thus the attacker can use
this communication channel to carry on his attack. Moreover,
compromising the web server allows an attack on the database
server. We also reļ¬‚ect potential side channels in the networks such
as social engineering allowing the attacker to compromise end-
user machines by publishing malicious ļ¬les on the web
presentation of the company.
Each logical link can be used to infect an adjacent machine. The
links, however, differ in the diļ¬ƒculty of carrying such an attack
along the edge, e.g., it is substantially easier to inļ¬ltrate web server
via an exploitable web presentation than attempting to compro-
mise the domain controller running the latest versions of the
services. We distinguish three diļ¬ƒculty levels (with three different
associated costs representing the time needed to successfully tra-
verse the edge): easy (C(e) = 10 units of time), medium (C(e) = 20)
and hard (C(e) = 40). In the presence of a honeypot, the costs
increase to C(e) = 40, C(e) = 80 and C(e) = 160, respectively2
In our example, we assume that the attacker uses a computer
outside of the network (i.e., in the internet zone) to inļ¬ltrate the
network, with the aim of compromising the target node within
the internal part of the network (assume that sensitive data are
located at this host). The defender attempts to slow down
attackerā€™s progress by deploying a honeypot on one of the hosts
within the network (hosts that can be selected are denoted by an
asterisk in Fig. 6(a)).
We assume that each such host is present in d instances
within the network (e.g., one being the production server and the
remaining instances serving for backup purposes). In order to serve
the legitimate users, the defender is allowed to operate a
2 Note that the exact values of costs do not affect usability of our algorithm and
hence can be set speciļ¬cally for each network.
honeypot on one of the instances only. For d = 2 instances of
each server, the algorithm (using the exact versions of the oracles)
took 982s to ļ¬nd a solution with the lower bound on attackerā€™s
cost of 69.4, while for d = 3 instances of each server a solution
with quality 63.3 has been found in 5950s. The shortest attack path
in the graph has a cost of 50, which marks an increase in the
attackerā€™s cost by 38.8% and 26.6%, respectively. The larger of
the two instances (with d = 3) is represented by a graph with 52
vertices and 504 edges. Note that the decreasing value of the game
with increasing value of d is caused by the fact that the defender is
able to cover only one of the instances of a serverā€”thus the
probability that the attacker manages to choose an unprotected
instance increases.
While we have demonstrated the capabilities of the algorithm
to solve problems related to computer networks, we believe that
its scalability on this type of networks can be signiļ¬cantly
improved and we discuss this in the next section.
8. Discussion and future extensions
The model introduced in Section 5 represents the interaction
with an attacker who is determined to reach his goal (in our case
reach the target vertex v|V|). The attacker accepts the risk of getting
detected in the course of the attack, and understands the
implications of a successful detection. First, the cost of traversing an
honeypot edge is higher. More importantly, the defender gets vital
information about the progress of the attack which the defender
can use to harden further progress of the attacker.
In Section 7.4, we presented an application of the proposed
algorithm to improve security of computer networks. Real-world
applications, however, require that the legitimate services do not
get replaced by honeypots to sustain the operation of the network.
14 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579
We reļ¬‚ected this by operating each server in multiple instances
(essentially by duplicating vertices and their incident logical
communication links in the network) and allowing the defender to
use only one of the instances as a honeypot (thus confusing the
attacker without compromising the experience of legitimate users).
Such an approach inherently introduces symmetries to the
problemā€”the defender can decide to choose any of the instances of
a single service as a honeypot, while the attacker can choose any
instance (not knowing which one is a decoy) to carry on the
attack. Symmetries, however, have negative impact on the time
needed to solve the game as the experiments in Section 7.4 show.
We believe that our approach is well suited to deal with
1 d
such symmetries. Since all instances vi , . . . , vi of a service are
indistinguishable within the graph, it is possible to replace coor-
dinates Ļ‡i1
, . . . , Ļ‡id
of a characteristic vector Ļ‡ by a single value
i id
Ļ‡ + Ā·
Ā·
Ā·+ Ļ‡ without compromising the quality of the solution.
1
The location of the honeypots is typically assumed to be known
to legitimate users, and the interaction with a honeypot can thus
be considered as a reliable indication of an ongoing attack. Real-
world computer networks rely also on intrusion detection systems
(IDS) to discover potential malicious actions of the attackers. The
detection of IDS is, however, not reliable. Another natural direction
is thus to extend the algorithm (especially the Eqs. (15)) to account
for unreliable detection represented by stochastic transitions in the
game.
9. Conclusions
Cybersecurity domains present a wide variety of challenging
problems due to the changing nature of the threats and the
complexity of the decisions. We focus on the problem of using
honeypots to detect and mitigate the lateral movement of cyber
attackers. While previous work has considered basic versions of
this problem, we consider a much more realistic model where the
defender has uncertainty about the resources that attackers may
control, and there is a long-term dynamic interaction between the
defender and the attacker. We present a formulation of this
problem using partially observable stochastic games (POSGs).
However, existing state of the art solution algorithms are not able
to solve anything but the most trivial examples of this problem.
We present two major advances in solving POSGs that dramat-
ically improve our ability to solve Lateral Movement POSG. First, we
introduce a novel compact representation of partial informa- tion
in these games using a summarized abstraction of the belief state.
Second, we describe how we can use incremental strategy-
generation in combination with this compact representation to
further improve the solution algorithm. We show experimentally
that these improvements lead to a great improvement in scala-
bility of the algorithms to solve the Lateral Movement POSG, and
allow us to ļ¬nd near-optimal solution for problems with realistic
network sizes. These techniques should also be applicable to many
other dynamic games with similar problem structure.
Declaration of Competing Interest
The authors declare that they have no known competing ļ¬nan-
cial interests or personal relationships that could have appeared to
inļ¬‚uence the work reported in this paper.
Acknowledgments
This research was supported by the Czech Science Founda- tion
(no. 19-24384Y), by the Ministry of Education, Youth and Sports
(MEYS) funded project CZ.02.1.01/0.0/0.0/16 019/0000765 ā€œResearch
Center for Informaticsā€, and by the U.S. Army Combat Capabilities
Development Command Army Research Laboratory
and was accomplished under Cooperative Agreement Number
W911NF-13-2-0045 (ARL Cyber Security CRA). The views and con-
clusions contained in this document are those of the authors and
should not be interpreted as representing the oļ¬ƒcial policies, ei-
ther expressed or implied, of the Combat Capabilities Development
Command Army Research Laboratory or the U.S. Government. The
U.S. Government is authorized to reproduce and distribute reprints
for Government purposes not with standing any copyright notation
here on. We thank Sridhar Venkatesan for providing the expertise
on experiments with the business network.
References
Abuzainab, N., Saad, W., 2018. Dynamic connectivity game for adversarial internet of
battleļ¬eld things systems. IEEE Internet Things J. 5 (1), 378ā€“390.
Basu, A., Stettner, L., 2015. Finite- and inļ¬nite-Horizon shapley games with nonsym-
metric partial observation. SIAM J. Control Optim. 53 (6), 3584ā€“3619.
BoÅ”anskĆ½, B., Kiekintveld, C., LisĆ½, V., PeĖ‡choucĖ‡ek, M., 2014. An exact double-Oracle
algorithm for zero-Sum extensive-Form games with imperfect information. J.
Artiļ¬cial Intell. Res. 51, 829ā€“866.
Cai, G.-l., Wang, B.-s., Hu, W., Wang, T.-z., 2016. Moving target defense: state of the art
and characteristics. Front. Inf. Technol. Electron. Eng. 17 (11), 1122ā€“1153.
Chatterjee, K., Doyen, L., 2014. Partial-observation stochastic games: ho w to win
when belief fails. ACM Trans. Comput. Logic (TOCL) 15 (2).
Du, M., Li, Y., Lu, Q., Wang, K., 2017. Bayesian game based pseudo Honeypot model in
social Networks. In: International Conference on Cloud Computing and Secu- rity.
Springer, pp. 62ā€“71.
Durkota, K., LisĆ½, V., BoÅ”anskĆ½, B., Kiekintveld, C., 2015. Approximate solutions for
attack graph games with imperfect information. In: International Conference on
Decision and Game Theory for Security, pp. 228ā€“249.
Garg, N., Grosu, D., 2007. Deception in honeynets: A game-theoretic analysis. In: 2007
IEEE SMC Information Assurance and Security Workshop. IEEE, pp. 107ā€“113.
Ghosh, M.K., McDonald, D., Sinha, S., 2004. Zero-sum stochastic games with partial
information. J. Optim. Theory Appl. 121 (1), 99ā€“118.
Hansen, E.A., Bernstein, D.S., Zilberstein, S., 2004. Dynamic programming for par-
tially observable stochastic games. In: National Conference on Artiļ¬cial Intelli-
gence (AAAI), pp. 709ā€“715.
HorĆ”k, K., BoÅ”anskĆ½, B., 2019. A point-based approximate algorithm for one-sided
partially observable pursuit-evasion games. In: International Conference on De-
cision and Game Theory for Security, pp. 435ā€“454.
HorĆ”k, K., BoÅ”anskĆ½, B., 2019. Solving partially observable stochastic games with
public observations. In: 33rd AAAI Conference on Artiļ¬cial Intelligence, pp.
2029ā€“2036.
HorĆ”k, K., BoÅ”anskĆ½, B., PeĖ‡choucĖ‡ek, M., 2017. Heuristic search value iteration for one-
sided partially observable stochastic games. In: 31st AAAI Conference on
Artiļ¬cial Intelligence, pp. 558ā€“564.
HorĆ”k, K., BoÅ”anskĆ½, B., Kiekintveld, C., Kamhoua, C., 2019. Compact representation of
value function in partially observable stochastic games, 28th International Joint
Conference on Artiļ¬cial Intelligence. In press.
Jain, M., Conitzer, V., Tambe, M., 2013. Security scheduling for real-world networks.
In: International Conference on Autonomous Agents and Multiagent Systems, pp.
215ā€“222.
Jain, M., Korzhyk, D., VaneĖ‡k, O., Conitzer, V., Tambe, M., PeĖ‡choucĖ‡ek, M., 2011. Double
Oracle algorithm for zero-sum security games on graph. In: 10th International
Conference on Autonomous Agents and Multiagent Systems, pp. 327ā€“334.
Jajodia, S., Ghosh, A.K., Subrahmanian, V., Swarup, V., Wang, C., Wang, X.S., 2012.
Moving target defense II: application of game theory and adversarial modeling,
100. Springer.
Kamdem, G., Kamhoua, C., Lu, Y., Shetty, S., Njilla, L., 2017. A Markov game theoritic
approach for power grid security. In: 2017 IEEE 37th International Conference on
Distributed Computing Systems Workshops (ICDCSW). IEEE, pp. 139ā€“144.
Kiekintveld, C., LisĆ½, V., PĆ­bil, R., 2015. Game-theoretic foundations for the strategic
use of honeypots in network security. In: Cyber Warfare. Springer, pp. 81ā€“101.
Kumar, A., Zilberstein, S., 2009. Dynamic programming approximations for par- tially
observable stochastic games. In: 22nd International FLAIRS Conference, pp. 547ā€“
552.
La, Q.D., Quek, T.Q., Lee, J., Jin, S., Zhu, H., 2016. Deceptive attack and defense game in
honeypot-enabled networks for the internet of things. IEEE Internet Things J. 3
(6), 1025ā€“1035.
Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., PĆ©rolat, J., Sil-
ver, D., Graepel, T., 2017. A uniļ¬ed game-theoretic approach to multiagent re-
inforcement learning. In: Advances in Neural Information Processing Systems,
pp. 4190ā€“4203.
Li, X., Cheung, W.K., Liu, J., 2010. Improving POMDP tractability via belief compres-
sion and clustering. IEEE Trans. Syst. Man. Cybern., Part B (Cybernetics) 40 (1),
125ā€“136.
Mairh, A., Barik, D., Verma, K., Jena, D., 2011. Honeypot in network security: a sur-
vey. In: Proceedings of the 2011 International Conference on Communication,
Computing & security. ACM, pp. 600ā€“605.
Mandiant Intelligence Center, 2013. APT1: Exposing one of Chinaā€™s cyber espionage
units. Mandiant, Tech. Rep. https://www.ļ¬reeye.com/content/dam/ļ¬reeye-www/
services/pdfs/mandiant-apt1-report.pdf.
K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 15
McMahan, H.B., Gordon, G.J., Blum, A., 2003. Planning in the presence of Cost Func-
tions Controlled by an Adversary. In: 20th International Conference on Machine
Learning, pp. 536ā€“543.
MoravcĖ‡Ć­k, M., Schmid, M., Burch, N., LisĆ½, V., Morrill, D., Bard, N., Davis, T., Waugh, K.,
Johanson, M., Bowling, M., 2017. Deepstack: expert-level artiļ¬cial in- telligence in
heads-up no-limit poker. Science 508ā€“513.
Nawrocki, M., WƤhlisch, M., Schmidt, T.C., Keil, C., Schƶnfelder, J., 2016. A survey on
honeypot software and data analysis. arXiv:1608.06249.
Nguyen, T.H., Wellman, M.P., Singh, S., 2017. A stackelberg game model for botnet
data exļ¬ltration. In: International Conference on Decision and Game Theory for
Security, pp. 151ā€“170.
Noureddine, M.A., Fawaz, A., Sanders, W.H., BasĀø ar, T., 2016. A game-theoretic ap-
proach to respond to attacker lateral movement. In: International Conference on
Decision and Game Theory for Security. Springer, pp. 294ā€“313.
Oliehoek, F.A., Savani, R., Gallego-Posada, J., van der Pol, E., de Jong, E.D., Gross, R.,
2017. GANGs: Generative adversarial network games. arXiv:1712.00679.
Pawlick, J., Zhu, Q., 2015. Deception by design: evidence-based signaling games for
network defense. arXiv:1503.05458.
PĆ­bil, R., LisĆ½, V., Kiekintveld, C., BoÅ”anskĆ½, B., PeĖ‡choucĖ‡ek, M., 2012. Game theoretic
model of strategic honeypot selection in computer networks. In: International
Conference on Decision and Game Theory for Security, pp. 201ā€“220.
Roy, N., Gordon, G., Thrun, S., 2005. Finding approximate POMDP solutions through
belief compression. J. Artif. Intell. Res. 23, 1ā€“40.
Smith, T., Simmons, R., 2004. Heuristic Search Value Iteration for POMDPs. In: 20th
Conference on Uncertainty in Artiļ¬cial Intelligence, pp. 520ā€“527.
Stoll, C., 1989. The Cuckooā€™s Egg: Tracking a Spy Through the Maze of Computer
Espionage. Doubleday.
VaneĖ‡k, O., Yin, Z., Jain, M., BoÅ”anskĆ½, B., Tambe, M., PeĖ‡choucĖ‡ek, M., 2012. Game-theo-
retic Resource Allocation for Malicious Packet Detection in Computer Networks.
In: 11th International Conference on Autonomous Agents and Multiagent Sys-
tems, pp. 902ā€“915.
Wang, K., Du, M., Maharjan, S., Sun, Y., 2017. Strategic honeypot game model for
distributed denial of service attacks in the smart grid. IEEE Trans. Smart. Grid 8
(5), 2474ā€“2482.
Wang, Y., Shi, Z.R., Yu, L., Wu, Y., Singh, R., Joppa, L., Fang, F., 2019. Deep reinforce-
me nt learning for green security games with real-time information. In: 33rd
AAAI Conference on Artiļ¬cial Intelligence, pp. 1401ā€“1408.
Zhou, E., Fu, M.C., Marcus, S.I., 2010. Solving continuous-state POMDPs via densit y
projection. IEEE Trans. Automat. Contr. 55 (5), 1101ā€“1116.
Karel HorƔk is a postgraduate student at the Dept. of Computer Science of the
Faculty of Electrical Engineering, at the Czech Technical University in Prague. He
received his Masters degree in artiļ¬cial intelligence from Czech Technical University
in Prague in 2015. Currently, he is interested in algorithms for solving inļ¬nite-
horizon stochastic games with imperfect information and their application to the
security. His works have been published at the AAAI and IJCAI conferences and other
venues.
Branislav BoÅ”anskĆ½ is an assistant professor at the Dept. of Computer Science of the
Faculty of Electrical Engineering, at the Czech Technical University in Prague. He
received his PhD in 2015 at the Faculty of Electrical Engineering, at the Czech Tech-
nical University in Prague and spent one year as a postdoc researcher at the Aarhus
University with prof. Peter Bro Miltersen. Branislavs research focuses on algorith-
mic game theory and dynamic games. He is interested in computing approximate
equilibria in dynamic games and designing scalable algorithms for using dynamic
games in computer network security and robust machine learning methods used in
an environment with an adversary. He has published 20+ papers and presented at
the top AI conferences (AAAI, IJCAI, AAMAS) and top AI journals (AIJ, JAIR).
Petr TomĆ”Å”ek is a postgraduate student at the Dept. of Computer Science of the
Faculty of Electrical Engineering, at the Czech Technical University in Prague. He
received his Masters degree in 2018 at the Faculty of Electrical Engineering, at the
Czech Technical University in Prague. He is interested in game theory, especially in
solving large extensive-form games.
Christopher Kiekintveld is currently an associate professor at the University of
Texas at El Paso (UTEP), and received his Ph.D in 2008 from the University of Michi-
gan. His research is in the area of intelligent systems, focusing on multi-agent sys-
tems and computational decision making, as well as applications in security and
other areas with potential social beneļ¬t. He has worked on several deployed ap-
plications of game theory for security, including systems in use by the Federal Air
Marshals Service and Transportation Security Administration. He has authored more
than 80 papers in peer-reviewed conferences and journals (e.g., AAMAS, IJCAI, AAAI,
JAIR, JAAMAS, ECRA). He has received several best paper awards, the David Rist Prize,
and an NSF CAREER award.
Charles A. Kamhoua is a researcher at the Network Security Branch of the U.S. Army
Research Laboratory (ARL) in Adelphi, MD, where he is responsible for con- ducting
and directing basic research in the area of game theory applied to cyber security.
Prior to joining the Army Research Laboratory, he was a researcher at the
U.S. Air Force Research Laboratory (AFRL), Rome, New York for 6 years and an edu-
cator in different academic institutions for more than 10 years. He has held visiting
research positions at the University of Oxford and Harvard University. He has co-
authored more than 150 peer-reviewed journal and conference papers. He received a
B.S. in electronics from the University of Douala (ENSET), Cameroon, in 1999, an
M.S. in Telecommunication and Networking from Florida International University
(FIU) in 2008, and a Ph.D. in Electrical Engineering from FIU in 2011.

More Related Content

Similar to Optimizing honeypot strategies against dynamic lateral movement using partially observable stochastic games.pptx

Development of a secure routing protocol using game theory model in mobile ad...
Development of a secure routing protocol using game theory model in mobile ad...Development of a secure routing protocol using game theory model in mobile ad...
Development of a secure routing protocol using game theory model in mobile ad...LogicMindtech Nologies
Ā 
Development of a secure routing protocol using game theory model in mobile ad...
Development of a secure routing protocol using game theory model in mobile ad...Development of a secure routing protocol using game theory model in mobile ad...
Development of a secure routing protocol using game theory model in mobile ad...LogicMindtech Nologies
Ā 
Optimal Security Response to Attacks on Open Science Grids Mine Altunay, Sven...
Optimal Security Response to Attacks on Open Science Grids Mine Altunay, Sven...Optimal Security Response to Attacks on Open Science Grids Mine Altunay, Sven...
Optimal Security Response to Attacks on Open Science Grids Mine Altunay, Sven...Information Security Awareness Group
Ā 
Trust Metric-Based Anomaly Detection Via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection Via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection Via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection Via Deep Deterministic Policy Gradient R...IJCNCJournal
Ā 
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...IJCNCJournal
Ā 
Detection of Replica Nodes in Wireless Sensor Network: A Survey
Detection of Replica Nodes in Wireless Sensor Network: A SurveyDetection of Replica Nodes in Wireless Sensor Network: A Survey
Detection of Replica Nodes in Wireless Sensor Network: A SurveyIOSR Journals
Ā 
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...IJNSA Journal
Ā 
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...IJNSA Journal
Ā 
Game theory for resource sharing in large distributed systems
Game theory for resource sharing in large distributed systemsGame theory for resource sharing in large distributed systems
Game theory for resource sharing in large distributed systemsIJECEIAES
Ā 
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...ijcseit
Ā 
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...ijcseit
Ā 
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYDIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYijgca
Ā 
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYDIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYijgca
Ā 
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYDIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYijgca
Ā 
Application of Attack Graphs in Intrusion Detection Systems: An Implementation
Application of Attack Graphs in Intrusion Detection Systems: An ImplementationApplication of Attack Graphs in Intrusion Detection Systems: An Implementation
Application of Attack Graphs in Intrusion Detection Systems: An ImplementationCSCJournals
Ā 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Sameera Horawalavithana
Ā 
A SYMMETRIC TOKEN ROUTING FOR SECURED COMMUNICATION OF MANET
A SYMMETRIC TOKEN ROUTING FOR SECURED COMMUNICATION OF MANET A SYMMETRIC TOKEN ROUTING FOR SECURED COMMUNICATION OF MANET
A SYMMETRIC TOKEN ROUTING FOR SECURED COMMUNICATION OF MANET cscpconf
Ā 
Replay of Malicious Traffic in Network Testbeds
Replay of Malicious Traffic in Network TestbedsReplay of Malicious Traffic in Network Testbeds
Replay of Malicious Traffic in Network TestbedsDETER-Project
Ā 
A Secure & Optimized Data Hiding Technique Using DWT With PSNR Value
A Secure & Optimized Data Hiding Technique Using DWT With PSNR ValueA Secure & Optimized Data Hiding Technique Using DWT With PSNR Value
A Secure & Optimized Data Hiding Technique Using DWT With PSNR ValueIJERA Editor
Ā 

Similar to Optimizing honeypot strategies against dynamic lateral movement using partially observable stochastic games.pptx (20)

Development of a secure routing protocol using game theory model in mobile ad...
Development of a secure routing protocol using game theory model in mobile ad...Development of a secure routing protocol using game theory model in mobile ad...
Development of a secure routing protocol using game theory model in mobile ad...
Ā 
Development of a secure routing protocol using game theory model in mobile ad...
Development of a secure routing protocol using game theory model in mobile ad...Development of a secure routing protocol using game theory model in mobile ad...
Development of a secure routing protocol using game theory model in mobile ad...
Ā 
Optimal Security Response to Attacks on Open Science Grids Mine Altunay, Sven...
Optimal Security Response to Attacks on Open Science Grids Mine Altunay, Sven...Optimal Security Response to Attacks on Open Science Grids Mine Altunay, Sven...
Optimal Security Response to Attacks on Open Science Grids Mine Altunay, Sven...
Ā 
Trust Metric-Based Anomaly Detection Via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection Via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection Via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection Via Deep Deterministic Policy Gradient R...
Ā 
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Ā 
Detection of Replica Nodes in Wireless Sensor Network: A Survey
Detection of Replica Nodes in Wireless Sensor Network: A SurveyDetection of Replica Nodes in Wireless Sensor Network: A Survey
Detection of Replica Nodes in Wireless Sensor Network: A Survey
Ā 
P017129296
P017129296P017129296
P017129296
Ā 
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...
Ā 
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...
BOTNET ECONOMICS AND DEVISING DEFENCE SCHEMES FROM ATTACKERā€™S OWN REWARD PROC...
Ā 
Game theory for resource sharing in large distributed systems
Game theory for resource sharing in large distributed systemsGame theory for resource sharing in large distributed systems
Game theory for resource sharing in large distributed systems
Ā 
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
Ā 
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
Ā 
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYDIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
Ā 
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYDIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
Ā 
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYDIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
Ā 
Application of Attack Graphs in Intrusion Detection Systems: An Implementation
Application of Attack Graphs in Intrusion Detection Systems: An ImplementationApplication of Attack Graphs in Intrusion Detection Systems: An Implementation
Application of Attack Graphs in Intrusion Detection Systems: An Implementation
Ā 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Ā 
A SYMMETRIC TOKEN ROUTING FOR SECURED COMMUNICATION OF MANET
A SYMMETRIC TOKEN ROUTING FOR SECURED COMMUNICATION OF MANET A SYMMETRIC TOKEN ROUTING FOR SECURED COMMUNICATION OF MANET
A SYMMETRIC TOKEN ROUTING FOR SECURED COMMUNICATION OF MANET
Ā 
Replay of Malicious Traffic in Network Testbeds
Replay of Malicious Traffic in Network TestbedsReplay of Malicious Traffic in Network Testbeds
Replay of Malicious Traffic in Network Testbeds
Ā 
A Secure & Optimized Data Hiding Technique Using DWT With PSNR Value
A Secure & Optimized Data Hiding Technique Using DWT With PSNR ValueA Secure & Optimized Data Hiding Technique Using DWT With PSNR Value
A Secure & Optimized Data Hiding Technique Using DWT With PSNR Value
Ā 

More from prathamgunj

Computer in nursing seminar.pptx
Computer in nursing seminar.pptxComputer in nursing seminar.pptx
Computer in nursing seminar.pptxprathamgunj
Ā 
Computer Applications in Health Care.pptx
Computer Applications in Health Care.pptxComputer Applications in Health Care.pptx
Computer Applications in Health Care.pptxprathamgunj
Ā 
Microsoft PowerPoint 2019 Fundamentals.pdf
Microsoft PowerPoint 2019 Fundamentals.pdfMicrosoft PowerPoint 2019 Fundamentals.pdf
Microsoft PowerPoint 2019 Fundamentals.pdfprathamgunj
Ā 
Computer in nursing seminar.pdf
Computer in nursing seminar.pdfComputer in nursing seminar.pdf
Computer in nursing seminar.pdfprathamgunj
Ā 
cse40822-amazon.pptx
cse40822-amazon.pptxcse40822-amazon.pptx
cse40822-amazon.pptxprathamgunj
Ā 
CPU Register Organization.ppt
CPU Register Organization.pptCPU Register Organization.ppt
CPU Register Organization.pptprathamgunj
Ā 

More from prathamgunj (7)

Computer in nursing seminar.pptx
Computer in nursing seminar.pptxComputer in nursing seminar.pptx
Computer in nursing seminar.pptx
Ā 
Computer Applications in Health Care.pptx
Computer Applications in Health Care.pptxComputer Applications in Health Care.pptx
Computer Applications in Health Care.pptx
Ā 
Microsoft PowerPoint 2019 Fundamentals.pdf
Microsoft PowerPoint 2019 Fundamentals.pdfMicrosoft PowerPoint 2019 Fundamentals.pdf
Microsoft PowerPoint 2019 Fundamentals.pdf
Ā 
Computer in nursing seminar.pdf
Computer in nursing seminar.pdfComputer in nursing seminar.pdf
Computer in nursing seminar.pdf
Ā 
cse40822-amazon.pptx
cse40822-amazon.pptxcse40822-amazon.pptx
cse40822-amazon.pptx
Ā 
mano.ppt
mano.pptmano.ppt
mano.ppt
Ā 
CPU Register Organization.ppt
CPU Register Organization.pptCPU Register Organization.ppt
CPU Register Organization.ppt
Ā 

Recently uploaded

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
Ā 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
Ā 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
Ā 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
Ā 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
Ā 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
Ā 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
Ā 
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdfssuser54595a
Ā 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
Ā 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
Ā 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
Ā 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
Ā 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
Ā 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
Ā 
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
Ā 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
Ā 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
Ā 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
Ā 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
Ā 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
Ā 

Recently uploaded (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Ā 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
Ā 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
Ā 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
Ā 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
Ā 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
Ā 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Ā 
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
Ā 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
Ā 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
Ā 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
Ā 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Ā 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
Ā 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
Ā 
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
Ā 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
Ā 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
Ā 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
Ā 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
Ā 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
Ā 

Optimizing honeypot strategies against dynamic lateral movement using partially observable stochastic games.pptx

  • 1. Computers & Security 87 (2019) 101579 Contents lists available at ScienceDirect Computers & Security journal homepage: www.elsevier.com/locate/cose Optimizing honeypot strategies against dynamic lateral movement using partially observable stochastic games Karel HorĆ”ka,āˆ—, Branislav BoÅ”anskĆ½ a, Petr TomĆ”Å”eka, Christopher Kiekintveldb, Charles Kamhoua c a Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University in Prague, Technicka 2, 166 27 Prague 6, Czechia b Computer Science Department, University of Texas at El Paso, 500 W. University Ave., El Paso, TX 79968-0518, United States c U.S. Army Research Laboratory, 2800 Powder Mill Road, Adelphi, MD 20783-1138, United States a r t i c l e i n f o Article history: Received 16 March 2019 Revised 24 June 2019 Accepted 18 July 2019 Available online 26 July 2019 Keywords: Dynamic honeypot allocation Lateral movement Partially observable stochastic games Compact representation Incremental strategy generation a b s t r a c t Partially observable stochastic games (POSGs) are a general game-theoretic model for capturing dynamic interactions where players have partial information. The existing algorithms for solving subclasses of POSGs have theoretical guarantees for converging to approximate optimal strategies, however, their scal- ability is limited and they cannot be directly used to solve games of realistic sizes. In our problem, the attacker uses lateral movement through the network in order to reach a speciļ¬c host, while the defender wants to discover the attacker by dynamically reallocating honeypots. We demonstrate that restricting to a speciļ¬c domain allows us to substantially improve existing algorithms: (1) we formulate a compact rep- resentation of uncertainty the defender faces, (2) we exploit the incremental strategy-generation method that over iterations expands the possible actions for players. The experimental evaluation shows that our novel algorithms scale several orders of magnitude better compared to the existing state of the art. Ā© 2019 Elsevier Ltd. All rights reserved. 1. Introduction One of the most potent threats in cybersecurity are the Ad- vanced Persistent Threat (APT) attackers. These are sophisticated (possibly state-sponsored) attackers who execute highly targeted, long-term, stealthy attacks against government, military, and cor- porate organizations. In many cases of known attacks, APTs have been successful at establishing a deep and persistent foothold in a target network and maintaining this presence for months or even years before being discovered (Mandiant Intelligence Center, 2013). Another emerging challenge in cybersecurity is securing highly dynamic, diverse mobile networks against short- term but very intense attacks, as in the Internet of Battleļ¬eld Things (IoBT) (Abuzainab and Saad, 2018). In both of these cases lateral movement plays a key role in the attack, as an attacker with a foothold needs to move to compromise additional network resources to achieve the goal of the attack. One of the methods a defender can use to detect and mitigate lateral movement is using honeypots and other deception technologies to detect, confuse, āˆ—Corresponding author. E-mail addresses: horak@agents.fel.cvut.cz (K. HorĆ”k), bosansky@agents.fel.cvut.cz (B. BoÅ”anskĆ½), petr.tomasek@aic.fel.cvut.cz (P. TomĆ”Å”ek), cdkiekintveld@utep.edu (C. Kiekintveld), charles.a.kamhoua.civ@mail.mil (C. Kamhoua). and redirect the attacker (Kamdem et al., 2017; Noureddine et al., 2016). The resulting adversarial competition for control of a network between the defender and the attacker can be modeled as a game, where the defender is allocating defensive resources (e.g., honeypots), and the attacker is taking actions to gain greater control while remaining undetected. Game theory is becoming an increasingly important tool for optimizing cybersecurity resources, including strategic allocation of honeypots (Durkota et al., 2015; Kiekintveld et al., 2015; PĆ­bil et al., 2012), allocating resources to perform the deep packet inspection (VaneĖ‡k et al., 2012), and modifying the structure or characteristics of the network (Cai et al., 2016; Jajodia et al., 2012; Nguyen et al., 2017). We adopt a game-theoretic framework for optimizing honeypot deployments for detecting and defeating an adversary. However, motivated by the scenarios of APT attackers and attacks on mobile IoBT infrastructure, we consider a richer class of games where the interactions between the attacker and defender are dynamic, involve uncertainty, and are long-lived. First, in the realistic cases we want to model the interaction which is dynamic, in that the defender and attacker can observe and react to new information and the moves of the other player, allowing them to update their strategies over time. Second, the players have imperfect informa- tion; for example, the defender typically does not know exactly what set of resources an attacker may control at any given time. https://doi.org/10.1016/j.cose.2019.101579 0167-4048/Ā© 2019 Elsevier Ltd. All rights reserved.
  • 2. 2 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 Finally, the interactions can be very long (as in the case of APTs), or very frequent (as in the case of IoBT networks under short but very intense attacks). Previous work using game theory to model lateral movement (Kamdem et al., 2017) and honeypot allocation (Durkota et al., 2015; Kiekintveld et al., 2015; PĆ­bil et al., 2012; Wang et al., 2017) has not been able to address all of these problems. Partially Observable Stochastic Games (POSGs) are a class of game- theoretic models that encapsulates all of these properties and allows us to reason about strategies in dynamic scenarios with imperfect information and very long (even inļ¬nite) time horizon. While POSGs are a very general model, this comes at the cost of computational complexity. Computing optimal strategies in POSGs is highly intractable from the algorithmic perspective, even in the strictly competitive case (i.e., two-player zero-sum setting). If we assume that both players have partial information about the en- vironment, the players need to reason not only about their belief over possible states, but also about the belief the opponent has over the possible states, beliefs over beliefs, and so on. Restricting to subclasses of POSGs where this issue of nested beliefs does not arise, however, allows us to design and implement algorithms that are guaranteed to converge to optimal strategies (HorĆ”k and BoÅ”anskĆ½, 2019; HorĆ”k et al., 2017), Here, we develop new algorithms for POSG models of the lateral movement problem in cybsersecurity, and show that these new algorithms allow us to solve problems of a realistic size even with all of these complications. We study a model of attacker moving laterally in a computer network ā€“ we assume that the attacker inļ¬ltrated some nodes in a network and aims to reach a certain host/node (e.g., the main database). We model this problem as a two-player game on a graph between the attacker and the defender. The nodes of the graph correspond to hosts in the network, and the edges correspond to attacking another host using a particular vulnerability on the target host, which has an associated cost. The attacker sequentially chooses which hosts to attack to progress towards the goal. The defender chooses edges that will act as honeypots (i.e., fake vulnerabilities) which are able to detect certain moves by the attacker through the network and allow him to take additional mitigating actions. The defender can also change his honeypot deployment after any detection event to take into account the new piece of information about the attack. We refer to this game as the Lateral Movement POSG. We for- malize it as a One-Sided POSGs (HorĆ”k et al., 2017) where the defender has partial information about the assets of the attacker, while the attacker always knows his current assets and knows when he has been detected. This allows us to apply existing meth- ods for restricted classes of POSGs to this problem. Unfortunately, these existing algorithms have very limited scalability for our games due to two fundamental limitations in the game representa- tion and algorithm design. The ļ¬rst limitation is the complexity of representing, updating, and reasoning about the uncertainty of the defender. In the existing POSGs (as well as in similar single-agent Partially Observable Markov Decision Processes (POMDPs)), uncer- tainty is modeled using beliefs that correspond to probability dis- tributions over the possible states. In the Lateral Movement POSG the defender must reason over all possible subsets of hosts in a network, maintaining a probability that any speciļ¬c subset of hosts has already been compromised. Since the number of possible sub- sets grows exponentially in the size of the network, the number of possible states quickly become too large for the algorithm to scale. The second limitation comes from the number of possible ac- tions of the players. In the Lateral Movement POSG, the actions of the defender correspond to placing multiple honeypots on edges in a graph. Even if there are only 20 possible edges in a network and the defender has 3 honeypots, the defender has up to 6.8 Ā· 103 possible actions to choose from at a single decision point in the game. Since the game is dynamic with many decision points, this problem is exacerbated even further by the length of the game. We address both of these limitations and propose two key contributions for improving the scalability of POSG algorithms. These speciļ¬cally address the problems that arise in our cyber- security domain, but can apply to many other similar classes of POSG as well. In particular: (1) we replace the representation of beliefs over the exponential number of possible states with a summarized abstraction that captures key information but reduces the dimensionality of the beliefs; (2) we use an incremental strategy generation technique to iteratively expand the strategy space of the players in order to overcome large branching factor1 We present general version of the algorithms and novel solution concepts, and then show how they can be applied speciļ¬cally to the Lateral Movement POSG. We give theoretical justiļ¬cation and analysis of the key elements of the algorithm. In addition, we evaluate the performance of our algorithm empirically on realistic instances of the Lateral Movement POSG. Our algorithm is capable of ļ¬nding strategies that are very near the optimal solutions for small cases where we can compute the true optimal value. It also scales dramatically better than the previous state of the art, allowing us to solve games with twenty or more nodes that are of practical signiļ¬cance. For example, we can ļ¬nd defense strategies in smaller computer networks or abstractions of larger network. With more aggressive approximation we could scale to even larger examples using the same algorithms. 2. Related work The concept of a honeypot has been around for more than 30 years in cybersecurity (Stoll, 1989). They have been used for many different purposes, and have evolved to be much more sophisti- cated with greater abilities to mimic real hosts and to capture use- ful information about attackers (Mairh et al., 2011; Nawrocki et al., 2016). Recently, there have been a large number of game theory models developed to capture aspects of how honeypots and other deception methods can be strategically used to improve cyber- security (Durkota et al., 2015; Garg and Grosu, 2007; Kiekintveld et al., 2015; La et al., 2016; PĆ­bil et al., 2012; Wang et al., 2017). While none of these previous models address the full scope of long-term dynamic games with uncertainty that we consider here, several have considered similar aspects of uncertainty or dynamic interactions. For example, Wang et al. (2017) investigate the use of honeypots in the smart grid to mitigate denial of service attacks through the lens of Bayesian games. Durkota et al. (2015) de- veloped a model of honeypot allocation under uncertainty. La et al. (2016) also model honeypots mitigating denial of service attacks in the Internet-of-Things domain. Du et al. (2017) tackle a similar ā€œhoneypots for denial of service attackā€ problem with Bayesian game modeling in the social networking domain. In ad- dition, a number of works have consider deception as a signaling game (e.g., Pawlick and Zhu (2015)). In games with inļ¬nite horizon, the problem with nested beliefs prevents one from designing an (approximate) optimal algorithm for fully general settings. Nested beliefs can be tackled directly with histories ā€“ one of the few such approaches is bottom-up dynamic programming for constructing relevant ļ¬nite-horizon policy trees for individual players while pruning-out dominated 1 A shorter version of this paper based on a subset of the work has been pub- lished in IJCAI 2019 conference (HorĆ”k et al., 2019). We provide following signif- icant novel contributions: (1) integration of the incremental strategy-generation algorithm for solving stage games, (2) generalization of the problem so that the defender can use multiple honeypots, (3) extended discussion throughout the text, and an example of the algorithm, (4) extended experimental evaluation demonstrat- ing additional signiļ¬cant speed-up over the conference version of the algorithm, and on more domain-speciļ¬c examples.
  • 3. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 3 strategies (Hansen et al., 2004; Kumar and Zilberstein, 2009). However, due to the explicit dependence on history, the scalability in horizon is very limited. A more common approach is to focus on a subclass of POSGs. In Ghosh et al. (2004), zero-sum POSGs with public actions and observations are considered. The authors show that the game has a value and present an algorithm that exploits the transformation of such a model into a game with complete information. Another signiļ¬cant subclass of POSGs are One-Sided POSGs where one player has perfect information (Basu and Stettner, 2015; Chatterjee and Doyen, 2014; HorĆ”k et al., 2017). This subclass is particularly important for security applications since it provides a naturally robust strategies against the worst-case fully-informed opponent. While our algorithm is based on the algorithm for the one-sided case, we provide signiļ¬cant generalizations of the previous work, especially in the representation of value function, deļ¬nition and algorithms for computing value-backup operator. Replacing the representation of beliefs over the exponential number of possible states using a compact characteristic vector that captures key information reduces the dimensionality of the beliefs. For POMDPs, a similar compact representation has appeared in existing literature (e.g., in Li et al. (2010); Roy et al. (2005); Zhou et al. (2010)). The most general approach was to choose an abstraction based on Principal Components Analysis (Roy et al., 2005). For our Lateral Movement POSG, we can exploit natural representation in marginal probabilities ā€“ i.e., reason about prob- ability of each resource being infected as a characteristic vector, instead of explicitly considering all possible subsets of infected resources. While this offers a path to scaling to much larger prob- lems, it has consequences for the solution quality as well as the construction of the solution algorithm. Many components of state- of-the-art POSG solvers are based on manipulating the full belief distribution and we show how these can be redesigned to operate using the more compact summarized belief representations. Finally, we also use incremental strategy generation method that is often known as the double-oracle algorithm and of- ten used for solving games with large combinatorial number of actions in security games (e.g., in Jain et al. (2013, 2011); McMahan et al. (2003), but also many others) and more recently, the double oracle method is being also used in combination with reinforcement learning on domains where computing (approxi- mate) best response is not possible (Lanctot et al., 2017; Oliehoek et al., 2017; Wang et al., 2019). Double-oracle algorithm restricts the space of possible actions to choose from ā€“ the algorithm forms a restricted stage game that is iteratively expanded by adding new actions into the restricted game. These actions are often computed as best responses to the current strategy of the opponent in the restricted problem. In the worst case, all actions have to be added into the restricted problem. This, however, rarely happens in practice and double-oracle algorithms are often able to ļ¬nd an optimal strategy using only a fraction of all possible plans (see, for example, BoÅ”anskĆ½ et al. (2014); Jain et al. (2011); Lanctot et al. (2017)). While the domain-independent idea of double-oracle algorithm is simple, our contribution is in (1) full integration of this methodology into the algorithm for solving stage games of Lateral Movement POSGs and (2) description of exact and heuristic oracle algorithms for determining which actions should be added into the restricted game. 3. One-sided POSGs A one-sided partially observable stochastic game (HorĆ”k et al., 2017), or one-sided POSG (OS-POSG), is an imperfect-information two-player zero-sum inļ¬nite-horizon game with perfect recall represented by a tuple (S, A1, A2, O, T, R). The game is played for an inļ¬nite number of stages. At each stage, the game is in one of the states s āˆˆS and players choose their actions a1 āˆˆA1 and a2 āˆˆA2 simultaneously. An initial state of the game is drawn from a probability distribution b0 āˆˆļæ½(S) over states termed the initial belief. The one-sided nature of the game translates to the fact that while player 1 lacks detailed information about the course of the game, player 2 is able to observe the game perfectly (i.e., his only uncertainty is the action a1 player 1 is about to take in the current stage). The choice of actions determines the outcome of the current stage: Player 1 gets an observation o āˆˆO and the game transitions to a state s āˆˆS with probability T(o, s | s, a1, a2), where s is the current state. Furthermore, player 1 gets a reward R(s, a1, a2) for this transition, and player 2 receives āˆ’R(s, a1, a2). The rewards are discounted over time with discount factor Ī³ < 1. One-sided POSGs can be solved by approximating the optimal value function Vāˆ—: ļæ½(S) ā†’ R using a pair of value functions V (lower bound on Vāˆ—) and V (upper bound on Vāˆ—). The algorithm proposed in HorĆ”k et al. (2017) reļ¬nes these bounds by solving a sequence of stage games. In each of these stage games, the algorithm focuses on deciding the optimal strategies of the players in this stage (i.e., Ļ€1 āˆˆļæ½(A1) for player 1 and Ļ€2(s) āˆˆļæ½(A2) for player 2) while assuming that the play in the subsequent stages yields values represented by value functions V or V, respectively. Solving the stage games also deļ¬nes the ļ¬xed point equation for Vāˆ—, Vāˆ— (b) =HVāˆ— (b) Ļ€2 Ļ€1 ( = min max Eb,Ļ€ ,Ļ€ 1 2 [R]+ a1 ,o 1 2 b,Ļ€ ,Ļ€ 1 1 2 +Ī³ Pr [a , o] Ā·Vāˆ—(Ļ„(b, a , Ļ€ , o)) , (1) 1 2 where Ļ„ (b, a , Ļ€ , o) denotes the Bayesian update of the belief of player 1 when he played a1 and observed o. Note that actions of player 2 who has complete information affect the observations player 1 receives and thus belief of player 1. Each stage game is parameterized by the current belief b āˆˆ ļæ½(S) of player 1. For piecewise-linear and convex V and V, a stage game is solved using linear programming. We show this linear program for V (represented as a point-wise maximum over a set r of linear i functions Ī± : ļæ½(S) ā†’ R). min V (2a) s.t. a2 2 2 Ļ€ (s āˆ§a ) =b(s) (2b) s,a2 2 2 V ā‰„ Ļ€ (s āˆ§a )R(s, a ,a 1 2 o ) + Ī³ V 1 a o (2c) 1 a o s,a2 2 b (s ) = Ļ€ (sāˆ§ 2 1 2 a )T(o, s |s,a , a ) (2d) 1 a o s i 1 a o V ā‰„ Ī± (s )b (s ) āˆ€s āˆ€a1 āˆ€a1,o,s āˆ€a1,o,Ī±i (2e) 2 2 Ļ€ (s āˆ§a ) ā‰„0 āˆ€s,a 2 (2f) In this linear program, player 2 seeks a strategy Ļ€2 for the current stage of the game to minimize the utility V of player 1. Here Ļ€2(sāˆ§a2) stands for the joint probability (ensured by con- straints (2b) and (2f)) that the current state of the game is s and player 2 plays the action a2. Constraint (2c) stands for player 1 choosing the best-responding action a1. Constraint (2d) expresses the belief in the subsequent stage of the game when a1 was played by player 1 and observation o was seen (multiplied by the probability of seeing that observation), and ļ¬nally con- straint (2e) represents the value of V in the belief ba1o. Such a linear program can be then combined with point-based variants of the value- iteration algorithm, such as Heuristic Search Value Iteration (HSVI) (HorĆ”k et al., 2017; Smith and Simmons, 2004).
  • 4. 4 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 A different approximation scheme is used for the upper bound V on Vāˆ—. Instead of using the point-wise maximum over linear function, V is expressed using a set Ļ’ = {(b(i), y(i)) |1 ā‰¤ i ā‰¤ |Ļ’|} of V(b) =min Ī»āˆˆR |Ļ’ | ā‰„0 i (i) T 1ā‰¤iā‰¤|Ļ’| 1ā‰¤iā‰¤|Ļ’| i (i) points. The lower convex hull of this set of points is then formed to obtain the value of V (b), r Ī» y |1 Ī» = 1, Ī» b = b . (3) Constraint (2e) can then be adapted to use this representation of V as detailed in HorĆ”k and BoÅ”anskĆ½ (2016). 4. Compact representation of Vāˆ— The dimension of the value function Vāˆ— depends on the num- ber of states, which can be very large in games like the lateral movement POSG where the number of states is exponential in the size of the network. We propose an abstraction scheme called summarized abstraction to decrease the dimensionality of the problem by creating a simpliļ¬ed representation of the beliefs over the state space. We associate each belief b āˆˆļæ½(S) in the game with a character- istic vector Ļ‡(b) = A Ā·b (for some matrix A āˆˆRkƗ|S| where k Ā« |S|) and we deļ¬ne an (approximate) value function V Ėœāˆ—: Rk ā†’ R over the space of characteristic vectors. We denote Ļ‡ 0 the characteristic vector corresponding to the initial belief b0, i.e., Ļ‡ 0 = A Ā·b0. The main goal is to adapt algorithms based on value iteration to operate over the more compact space Rk instead of the original belief space ļæ½(S). First, we adapt the ļ¬xed point Eq. (1) for OS- POSGs to work with compact V Ėœāˆ—(instead of original Vāˆ—). We let the player 2 choose any belief that is consistent with the current char- acteristic vector Ļ‡ (by adding an extra minimization term), and we replace the value of belief b, Vāˆ—(b), by the value of its characteri- zation V Ėœāˆ—(A Ā·b). We also denote this compound function V Ėœāˆ—ā—¦ A. V Ėœāˆ—(Ļ‡) = H Ėœ V Ėœāˆ—(Ļ‡) b|Ab=Ļ‡ Ļ€2 Ļ€1 ( = min min max Eb,Ļ€1 ,Ļ€2 [R]+ a1 ,o 1 2 b,Ļ€ ,Ļ€ 1 Ėœ āˆ— ( 1 2 +Ī³ Pr [a , o] Ā·V A Ā·Ļ„(b, a , Ļ€ , o)) (4) Next, we show that the value function V Ėœāˆ—satisfying the ļ¬xed point Eq. (4) (and obtained as a limit of applying operator H Ėœ) provides a valid lower bound on the solution of the original game representation over the belief space. Theorem 1. V Ėœāˆ—(Ļ‡(b)) ā‰¤ Vāˆ—(b) for every b āˆˆļæ½(S). Proof. We prove the theorem by induction. Let V Ėœ0 be arbitrary Ėœ 0 0 0 (b) Ėœ0 0 and let V of the original game be V (b) = V (Ļ‡ ) (i.e., V ā‰¤ V ). Assume VĖœt(Ļ‡(b)) ā‰¤ Vt(b) for every b āˆˆļæ½(S). Now H(VĖœt ā—¦ A) ā‰¤ H(Vt). The extra minimization over beliefs b in HVĖœcan only decrease the utility of the stage game and hence Ėœ (b) Ėœ Ėœ (b) Ėœ t+1 t t t t+1 V (Ļ‡ ) = HV (Ļ‡ ) ā‰¤ H(V ā—¦ A)(b) ā‰¤ HV (b) = V (b). (5) D Note that the Eq. (4) can also be solved using linear program- ming. Consider a lower bound VĖœL B on V Ėœāˆ— formed as a point-wise maximum over linear functions Ī±i(Ļ‡) = (a(i))T Ļ‡ + z(i). We modify the linear program (2) by considering that the belief b is a variable (constrained by Ļ‡), constraints (2b), (2c), (2d), (2f) (6a) s b(s) =1 (6b) A Ā·b = Ļ‡ (6c) b(s) ā‰„0 āˆ€ s, (6d) and we replace the constraint (2e) to account for different representation of VĖœL B by Ļ‡a1 o = A Ā·ba1 o āˆ€a1,o (6e) q = 1 Ā·b 1 1 a o T a o 1 āˆ€a ,o (6f) 1 a o 1 1 (i) T a o (i) a o V ā‰„ (a ) Ā·Ļ‡ + z Ā·q āˆ€a1,o,Ī±i, (6g) where qa1o = 1T Ā·ba1o is the probability that o is generated when player 1 uses action a1. Observe that the only constraints with non-zero right-hand side constants in this new linear program are constraints (6b) and (6c). This is critical in the following proof that V Ėœāˆ—is convex. Theorem 2. Value function V Ėœāˆ—is convex. Proof. Consider a dual formulation of the linear program (2) up- dated according to equations (6). Since the only non-zero right- hand side terms in the primal are 1 and Ļ‡ , the objective of the dual formulation is o(Ļ‡) = Ļ‡ T Ā·a + z. Moreover, this is the only place where the characteristic vector Ļ‡ occurs. Hence the polytope of the feasible solutions of the dual problem is the same for every characteristic vector Ļ‡ āˆˆRk and o(Ļ‡ ) (after ļ¬xing variables a and z) forms a bound on the objective value of the solution for arbitrary Ļ‡ . Since we maximize over all possible o(Ļ‡ ) in the dual, the objective value of the linear program (and also HĖœVĖœLB) is convex in the parameter Ļ‡. LB Now, starting with an arbitrary convex (e.g., linear) V Ėœ0 , the LB t= 0 Ėœ t āˆž Ėœ t+1 LB Ėœ Ėœt LB sequence of functions {V } , where V = HV , is formed only by convex functions. Therefore, the ļ¬xed point V Ėœāˆ—is also a convex function. D 4.1. HSVI algorithm for compact POSGs The Algorithm 1 we propose for solving abstracted games Algorithm 1: HSVI algorithm for one-sided POSGs when sum- marized abstraction is used. 1 Initialize VĖœL B and VĖœUB to lower and upper bound on V Ėœ āˆ— ĖœUB 0 ĖœLB 0 2 while V (Ļ‡ ) āˆ’ V (Ļ‡ ) > E do 3 Explore(Ļ‡0, E,0) 4 procedure Explore(Ļ‡, E,t) 5 (b, Ļ€2) ā† optimal belief and strategy of player 2 in HĖœVĖœLB(Ļ‡) 6 (a1, o) ā† select according to heuristic 7 Ļ‡ ā† Ļ„(Ļ‡, a1, Ļ€2, o) 8 Ėœ Ėœ Update r and Ļ’ based on the solutions of HĖœVĖœLB(Ļ‡) and HVUB(Ļ‡) 9 if VĖœUB(Ļ‡ ) āˆ’ VĖœLB(Ļ‡ ) > EĪ³āˆ’t then 10 Explore(Ļ‡ , E,t + 1) 11 Update r and Ļ’ based on the solutions of HĖœVĖœLB(Ļ‡ ) and HĖœVĖœUB(Ļ‡) is a modiļ¬ed version of the original heuristic search value it- eration algorithm (HSVI) for solving unabstracted one-sided POSGs (HorĆ”k et al., 2017). The key difference here is that we use the value functions VĖœL B and VĖœUB (instead of V and V ) and we must have modiļ¬ed all parts of the algorithm to use the abstracted representation of the beliefs. First, we initialize bounds VĖœL B and VĖœUB (line 1) to valid piece- wise linear and convex lower and upper bounds on V Ėœāˆ—(using sets r of linear functions Ī±i = (a(i))T Ļ‡ + z(i) and Ļ’ of points (Ļ‡ (i),
  • 5. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 5 y(i))). Then, we perform a sequence of trials (lines 2ā€“3) from the initial characteristic vector Ļ‡ 0 = Ab0 until the desired precision E> 0 isreached. In each of the trials, we ļ¬rst compute the optimal optimistic strategy of player 2, which in this case is the selection of the belief b and strategy Ļ€ (line 5). Next, we choose the action a 2 1 of player 1 and the observation o (lines 6ā€“7) so that the excess approximation error VĖœUB(Ļ‡ ) āˆ’ VĖœLB(Ļ‡ ) āˆ’ EĪ³āˆ’t in the subsequent stage (where the belief is described by a characteristic vector Ļ‡ = Ļ„(Ļ‡, a1, Ļ€2, o)) is maximized. If this excess approximation er- ror is positive, we recurse to the characteristic vector Ļ‡ (line 10). Before and after the recursion the bounds VĖœLB(Ļ‡ ) and VĖœUB(Ļ‡) are improved using the solution of HĖœVĖœLB(Ļ‡) and HĖœVĖœUB(Ļ‡) (lines 8 and 11). The update of VĖœUBis straightforward and a new point (Ļ‡, HĖœVĖœUB(Ļ‡)) is added to Ļ’ . To obtain a new linear function to add to the set r , we use the objective function o(Ļ‡) = Ļ‡ T Ā·a + z (after ļ¬xing variables a and z) of the dual linear program to (6) that forms a lower bound on HĖœVĖœL Band V Ėœāˆ—(see the proof of Theorem 2 for more discussion). 4.2. Extracting strategy of the player 1 In Sections 3 and 4, we have presented linear programs to solve stage game H Ėœ V Ėœ (Ļ‡) from the perspective of the player 2. These linear programs solved a minimax problem of player 2 by explicitly reasoning about her strategies. The actions of the player 1, however, were only represented as best-response con- straints which does not provide the strategy of player 1 directly. In this section, we discuss the dual formulation of the linear program (6) and show the way the strategy of the player 1 can be extracted from dual variables. Consider constraints (2c), (6e), (6f) and (6g) and let us establish the following mapping to dual variables. Constraints (2c) Constraints (6g) Constraints (6e) Constraints (6 f ) Dual variables Ļ€1(a1) ā‰„ 0 Dual variables Ī²a1 o(Ī±i) ā‰„ 0 Dual variables aĢ‚a1 o āˆˆRk Dual variables zĢ‚a1 o āˆˆR The dual formulation of (6) then contains the following constraints 1 (corresponding to variables V, Va o, Ļ‡a1 o and qa1o): a1 1 1 Ļ€ (a ) = 1 (7a) Ī±i 1 a o i 1 1 Ī² (Ī± ) = Ī³ Ļ€(a ) āˆ€a1,o (7b) 1 a o j Ī±i j 1 (i) a o i a Ė† = a Ī² (Ī± ) 1 āˆ€a ,o (7c) 1 a o Ī±i 1 (i) a o i zĖ† = z Ī² (Ī± ) 1 āˆ€a ,o. (7d) Constraints (7) give the following interpretation to variables Ļ€1, Ī²a1o, aĖ†a1o and zĖ†a1o. Variables Ļ€1 directly represent the strategy of player 1 for the current stage, Ļ€1 āˆˆļæ½(A1). Variables Ī²a1o(Ī±i) are the probabilities of playing strategy the value of which is repre- sented by the linear function Ī±i(Ļ‡) = (a(i))T Ļ‡ + z(i) in a subgame where player 1 played action a1 and received observation o. These probabilities are represented in the form of realization plans, i.e., they sum to Ī³Ļ€ 1(a1) instead of one. We can, however, obtain the true probabilities by dividing Ī²a1o by Ī³Ļ€ 1(a1). Variables aĖ†a1o and zĖ†a1o then deļ¬ne a linear function Ī¶Ė†a1o(Ļ‡ ) = (aĖ†a1o)T Ļ‡ + zĖ†a1o representing the value of playing the given mix of strategies in the subgame and are used to express the value of the game in the rest of the dual formulation. By normalizing variables Ī²a1o we obtain a linear function Ī¶ a1o : Rk ā†’ R, called gadget, which represent the value in the subgame given that subgame has been reached, Ī¶a1 o(Ļ‡ ) = (aa1 o)TĻ‡ + za1 o where 1 a o Ī±i 1 i 1 1 a = (Ī² (Ī± )/Ī³ Ļ€(a )) Ā·a a o (i) 1 a o Ī±i 1 i 1 1 a o (i) z = (Ī² (Ī± )/Ī³Ļ€ (a )) Ā·z . (8) Note that the gadgets Ī¶ a1o are only deļ¬ned for subgames that can be reached, i.e., where Ļ€1(a1) > 0. The representation of a strategy of player 1 using variables Ļ€ 1 and Ī¶ can then be used to play the game following the similar ideas that are used in continual resolving in the domain of extensive-form games (MoravcĖ‡Ć­k et al., 2017). Algorithm 2: Playing a compact OS-POSG using VĖœLB. 0 1 Ļ‡ ā† Ļ‡ , Ī¶ ā† initial gadget Ī¶ (Ļ‡) =āˆ’āˆž 2 while player~1 is required to act do 3 Solve HĖœVĖœLB(Ļ‡) with additional constraint o(Ļ‡) ā‰„ Ī¶ (Ļ‡) in the dual formulation and extract strategy Ļ€1, Ī¶ of player~1 4 Sample action a1 āˆ¼ Ļ€1, execute it and observe an observation o 5 Ļ‡ ā† Ļ‡a1o/qa1o, Ī¶ ā† Ī¶ a1o Algorithm 2 describes the algorithmic scheme to play a game in compact representation. Whenever player 1 is required to act, he solves a stage game HĖœVĖœLB(Ļ‡). In the ļ¬rst stage, he solves the game without any additional constraints (since initially Ī¶ (Ļ‡) = āˆ’āˆž) to ļ¬nd the near-optimal strategy for the entire game. In the subse- quent stages, player 1 has to ensure that the strategy he is about to play is consistent with the strategy assumed in the ļ¬rst stage, i.e., it provides (at least) the value represented by the gadget Ī¶ a1o from the previous stage. To this end, a constraint o(Ļ‡ ) ā‰„ Ī¶ is added to the dual formulation of (6) (recall that o(Ļ‡ ) is the objective of the dual linear program). 5. The lateral movement POSG We now formally deļ¬ne the lateral movement POSG and de- scribe the key steps of the algorithm speciļ¬cally for this domain. The game is played on a graph G = (V, E) representing a computer network. We assume that V = {v1, . . . , v|V|}. The goal of the at- tacker is to reach vertex v from the initial source of the infection |V| v1 by traversing the edges of the graph, while minimizing the cost to do so, see Fig. 1(a). Initially, the attacker controls only the vertex v1, i.e., the initial state of infection is I0 = {v1}. Then, in each stage of the game, k i=1 the attacker chooses a directed path P = {P(i)} (where k is the length of the path) from any of the infected vertices to the target vertex v|V|. Fig. 1(a) provides examples of paths the attacker may consider, however, he can only use paths P2 and P3 if vertex v2 has been previously infected. Unless the defender takes countermeasures, the attacker infects all the vertices on the path, including the target vertex v|V|, and pays the cost of traversing each of the edges on the path, āˆ’,P P(i)āˆˆP c = C(P(i)), (9) where C(P(i)) is a cost associated with taking edge P(i). The defender tries to discover the attacker and increase his cost to reach the target (and thus possibly discourage the attack) by
  • 6. 6 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 v1 v|V| v2 v3 v5 v4 a b(I0) = 0.7 I0 I1 I2 I3 I4 I5 I6 I7 I8 b v|V| v2 v3 v5 v1 v4 1 Ļ‡(b) = 1 3 Ļ‡(b) = 0 5 Ļ‡(b) = 0 2 Ļ‡(b) = 0.3 4 Ļ‡(b) = 0 b(I8) = 0.3 c Fig. 1. Example of a lateral mo vement POSG: (a) Example of a network with 6 vertices. Attacker starts in vertex v1 and is trying to reach vertex v|V| while minimizing the cost. (b) Subset of states of the game (in total, 14 non-terminal states are reachable). Initial state of the game is I0 (only v1 is infected). Further states are reached by infecting adjacent vertices to an already infected vertex. OS-POSGs deļ¬ne the belief as a probability distribution over statesā€”b is an example of such belief. (c) Our approach i summarizes the belief b using a characteristic vector Ļ‡ (b). In this case, Ļ‡(b) represents the probability that vertex vi is infected when belief b from Fig. 1(b) is considered 2 4 5 6 8 (b) 2 4 5 6 8 (here, v is infected in the states of infection I , I , I and I , i.e., Ļ‡ = b(I ) + b(I ) + b(I ) + b(I ) = 0.3). deploying honeypots into the network. In each stage, the defender can decide a subset H āˆˆE[NH] of edges E (where E[NH] denotes all subsets of E with cardinality NH) to be honeypots. A honeypot is then deployed on every edge h āˆˆH and is able to detect if the attacker traverses that speciļ¬c edge. Furthermore, it also increases the cost for using the edge h to C(h). If the attacker observes that he has traversed any of the honeypot edges, he may decide to change his plan and therefore he does not execute the rest of his originally intended path P. The cost of playing a path P against a honeypot placement H is therefore H,P c = P(i)āˆˆPā‰¤H hāˆˆH hāˆˆPā‰¤H C(P(i)) + 1 Ā·[C(h) āˆ’C(h)] (10) where Pā‰¤ H is the preļ¬x of the path P until the interaction with any honeypot edge h āˆˆH, ā‰¤H P = P Pāˆ©H = āˆ… {P(i)}min{ j|P(j)āˆˆH} i=1 otherwise. (11) For the reasons of notational simplicity, we write Pā‰¤ h instead of Pā‰¤ {h} when it is clear that h is a singleton. Since the attacker does not need to continue to execute his selected path P (since the defender can update his belief over the possible subsets of infected nodes and thus reconļ¬gure the hon- eypots), the new state of infection IH,P (i.e., the subset of vertices that are infected at the beginning of the next stage) becomes IH,P = I āˆŖ{v |(u, v) āˆˆPā‰¤H}. (12) We assume the worst case scenario from the perspective of the defender where the attacker is assumed to be able to infer the position of all honeypots upon interacting with any of them. The above problem can be formalized as a one-sided POSG without using the summarized abstraction where ā€¢ states are possible subsets of infected vertices (or infections), i.e., SāŠ† 2V (Fig. 1(b) shows an example of a state space where ļ¬lled dots represent infected vertices), ā€¢ actions of the defender are honeypot allocations, i.e., A1 = E[NH] where E[NH] denotes NH-element subsets of E, ā€¢ actions A2 of the attacker are directed paths in G reaching v|V|, ā€¢ observations denote whether and where the defender detected the attacker (observations det(h)) or the attacker reached the target v|V| undetected (observation Ā¬det), ā€¢ transitions follow the Eq. (12) and an observation det(h) is gen- erated if and only if the honeypot edge h is the ļ¬rst honeypot edge traversed on the path chosen by the attacker, ā€¢ reward of the defender is the cost of the attacker, i.e., R(H,P) = cH,P, ā€¢ discount factor of the game is Ī³ = 1, and 0 0 0 ā€¢ initial belief b of the game satisļ¬es b (I ) = 1. Note that the original HSVI algorithm for one-sided POSGs has been deļ¬ned and proved for discounted problems with Ī³ < 1. In this particular case, however, the convergence properties translate even to the undiscounted case since the game is essentially ļ¬nite (in a ļ¬nite number of steps, all vertices, including v|V|, get infected and the game ends). 5.1. Characteristic vectors The number of states in the game is exponential in the number of vertices of the graph, up to |S|= 2|V |āˆ’2 + 1 (we consider that v1
  • 7. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 7 R|V| aTĻ‡ + k Ļ‡ VĢƒU B Fig. 2. Dual interpretation of the projection on the convex hull. is always infected and we treat all states where v|V| is infected as a single terminal state of the game). We use marginal probabilities of a vertex being infected as characteristic vectors Ļ‡ āˆˆR|V|, i.e. (b) i Ļ‡ = IāˆˆS|viāˆˆI b(I). (13) Consider the example of a belief from Fig. 1(b). The belief in the original OS-POSG model is a probability distribution over states, i.e., |S|-dimensional vector with |S|āˆ’ 1 degrees of freedom. In this game, the number of non-terminal reachable states (i.e., states where v|V| is not yet infected) is 14. In comparison, the characteristic vector (see Fig. 1(c)) has only |V| dimensions. More- over, observe that in this case the set {b |Ab = Ļ‡(b)} of possible beliefs that get projected to Ļ‡ (b) is a singleton and contains only the belief b from Fig. 1(b). 5.2. Value function representation The algorithm from Section 4.1 approximates V Ėœāˆ—using a pair of value functions, the lower bound VĖœL B and the upper bound VĖœUB. While we have discussed the representation of the lower bound (which is represented using a set r of linear functions Ī±i(Ļ‡) = (a(i))T Ļ‡ + z(i)), the representation of the upper bound is more challenging. In the original algorithm, the value function V is deļ¬ned (see Eq. (3)) over the probability simplex ļæ½(S). In that case, it s u ļ¬ƒ c e s to consider |S| points in Ļ’ to deļ¬ne V for every belief. In contrary, the space of characteristic vectors (i.e., marginal probabilities) is formed by a hypercube [0, 1]|V| with 2|V| vertices, which would make the straightforward point representation (using Eq. (3)) impractical. We can, however, leverage the fact that in this domain infecting an additional node can only decrease the cost to the target (and hence V Ėœ āˆ— is decreasing). Consider the dual formulation of the optimization problem (3). In this formulation, the projection of Ļ‡ to the lower convex hull of a set of points is represented by the optimal linear function aTĻ‡ + z deļ¬ning a facet of the convex hull (see Fig. 2). Since V Ėœāˆ— is decreasing in Ļ‡ , we can also enforce that aTĻ‡ + z is decreasing in Ļ‡ (i.e., add the constraint a ā‰¤ 0 to the dual formulation). This additional constraint translates to a change 1ā‰¤iā‰¤|Ļ’| i (i) Ī» Ļ‡ = Ļ‡ in the primal problem to an of the equality inequality. VĖœUB(b) =min Ī»āˆˆR |Ļ’ | ā‰„0 r i (i) T 1ā‰¤iā‰¤|Ļ’| 1ā‰¤iā‰¤|Ļ’| i (i) Ī» y |1 Ī» = 1, Ī» Ļ‡ ā‰¤Ļ‡ (14) Now it is suļ¬ƒcient that the set Ļ’ contains just one point (Ļ‡ (i), y(i)) where Ļ‡(i) = 0|V| (instead of 2|V| points) to make the constraint :E1 ā‰¤iā‰¤|Ļ’ | i (i) Ī» Ļ‡ ā‰¤ Ļ‡ satisļ¬able. It is possible to adapt the constraint (6g) to use the represen- tation from (14), similarly to the original (unabstracted) one-sided POSGs (HorĆ”k and BoÅ”anskĆ½, 2016), and thus obtain linear program to perform the upper bound computation. 5.3. Using marginalized strategies in stage games The linear program formed by modiļ¬cations from Eqs. (6) still requires solving the stage game for the original, unabstracted problem. In this section, we show that it is possible to avoid expressing the belief b explicitly, and to compute the stage game directly using the characteristic vectors and marginalized strategies of the attacker. First, we present the representation of the stage-game strategies of the attacker. Instead of representing joint probabilities Ļ€ 2(Iāˆ§P) of choosing path P in state I, we only model the probability Ļ€ Ėœ2(P) of choosing path P aggregated over all states I āˆˆS. Further- more, we allow the attacker to choose the probability Ī¾ (Pāˆ§vi) that vertex vi is infected while he opts to follow path P. P 2 Ļ€ Ėœ (P) =1 (15a) i 0 ā‰¤ Ī¾(P āˆ§v ) ā‰¤ Ļ€2 Ėœ (P) āˆ€ P, vi (15b) Ļ€ Ėœ2(P) ā‰„0 āˆ€ P (15c) To ensure that the strategy represented by variables Ļ€ Ėœ2 and Ī¾ is feasible it must be consistent with the characteristic vector Ļ‡ , i i where Ļ‡ is the probability that the vertex v is infected at the beginning of the stage. P Ī¾(P āˆ§v ) = Ļ‡ i i āˆ€ vi (15d) Furthermore, the path P must start in an already infected vertex (denoted as Pre(P)), i.e., the conditional probability Pr[Pre(P) āˆˆI |P] of Pre(P) being infected when path P is cho- sen has to be 1. Now, since Ī¾ (Pāˆ§v) is the joint probability, Ī¾(P āˆ§v) = Pr[Pre(P) āˆˆI|P] Ā·Ļ€ Ėœ2(P), Ī¾(P āˆ§v) = Ļ€ Ėœ2(P) āˆ€P,v =Pre(P). (15e) Example 1. Consider the example from Fig. 1(a) and assume that the attacker wanted to play a strategy Ļ€ 2 in the original game, where Ļ€2(I0 āˆ§P1) = 0.7 Ļ€2(I8 āˆ§P1) =0.1 Ļ€2(I8 āˆ§P2) = 0.1 Ļ€2(I8 āˆ§P3) =0.1. The same strategy can be described using the marginalized representation Ļ€ Ėœ2 and Ī¾: Ļ€ Ėœ2(P1) = Ļ€2(I0 āˆ§P1) + Ļ€2(I8 āˆ§P1) = 0.8 Ļ€ Ėœ2(P2) = Ļ€2(I8 āˆ§P2)= 0.1 Ļ€ Ėœ2(P3) = Ļ€2(I8 āˆ§P3)= 0.1 Ī¾ (P1 āˆ§ v1) = 0.8 Ī¾ (P1 āˆ§ v2) = 0.1 Ī¾ (P2 āˆ§ v1) = 0.1 Ī¾ (P2 āˆ§ v2) = 0.1 Ī¾ (P3 āˆ§v1) = 0.1 Ī¾ (P3 āˆ§v2) = 0.1 2 all other variables Ļ€ Ėœ and Ī¾ are zero It is straightforward to verify that variables Ļ€ Ėœ 2 and Ī¾ satisfy Eqs. (15a)ā€“(15e) when the characteristic vector Ļ‡ (b) from Fig. 1(c) is considered. The representation of strategies of the attacker using variables 2 Ļ€ Ėœ and Ī¾ is suļ¬ƒcient to express the expected immediate reward of the strategy Ļ€ Ėœ2, hence the constraint (2c) can be changed to use the marginalized strategies, P V ā‰„ Ļ€ Ėœ2(P)cH,P + hāˆˆH VH,det(h) [NH] āˆ€ H āˆˆE . (15f) Importantly, we can also skip the computation of the belief bH,det(h) and compute the characteristic vector formed by the marginals Ļ‡ H,det(h) directly from the variables Ļ€ Ėœ2 and Ī¾ . We now
  • 8. 8 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 present the equation to compute the updated marginal Ļ‡ H,det(h) given that the attacker has been detected while traversing the honeypot edge h (and this has been the ļ¬rst honeypot edge he traversed). Denote the set of all paths satisfying this condition h H ā‰¤h ā‰¤H P = {P |h āˆˆPāˆ§P āŠ† P }. Ļ‡ H,det(h) i = h H i PāˆˆP |P āŠ†P ā‰¤(Ā·,v ) ā‰¤h 2 Ļ€ Ėœ (P) + h H i PāˆˆP |P hāŠ†P ā‰¤(Ā·,v ) ā‰¤h i Ī¾(P āˆ§v ) (15g) The ļ¬rst sum stands for the probability that the attacker is de- tected while traversing edge h, but he managed to infect v on his i chosen path before the detection. The second sum represents the probability that the attacker was detected on the edge h as well, but this time he has not infected v using path P, however, the qH,det(h) PāˆˆPh H = Ļ€2 Ėœ (P). (15h) We need not consider the subsequent stages where the attacker has not been detected (i.e., Ā¬det observation has been generated) or the honeypot edge h reaches the target vertex v|V|. In each of these cases, the target vertex has been reached and thus the value of the subsequent stage is zero. Example 2. Consider once again the example from Fig. 1 and the strategy of the attacker Ļ€ 2 from Example 1 and assume that a honeypot is deployed on the edge h = (v3, v5) only (i.e., H = {h}) and the attacker has been detected there. In such case, the attacker could have either used path P1 or P2, but not P3 as it does not reach h. If the attacker used P1 in I0, he infects v3 and v5 up to the point of the detection and thus reaches a new state of infection I (this might have happened with probability 0.7). Similarly, 3 using P1 in I8 reaches I4 (with probability 0.1) and using P2 in I8 reaches I5 (with probability 0.1). The (denormalized) belief bH,det(h) (corresponding to Eq. (2d)) is thus bH,det(h)(I3) = 0.7 bH,det(h)(I4) = bH,det(h)(I5) = 0.1. Computing marginal probabilities of each vertex being infected in the belief bH,det(h) yields Ļ‡1 H,det(h) H,det(h) H,det(h) 3 5 = Ļ‡ = Ļ‡ =1 Ļ‡ H,det(h) 2 = 0.2 Ļ‡ H,det(h) 4 = 0.1. We can obtain exactly the same marginal probabilities when 2 using marginalized strategy Ļ€ Ėœ and Ī¾ from Example 1. For ex- ample, neither P1 nor P2 infects vertex v2 (i.e., no edge ( Ā·, v2) Ļ‡ is present). Therefore, the ļ¬rst sum in Eq. (15g) is empty and H,det(h) 2 1 2 2 2 2 = Ī¾ (P āˆ§v ) + Ī¾ (P āˆ§v ) = 0.2. Similarly, only P infects in (15g), and Ļ‡ v4 before reaching edge h, hence is contained in the ļ¬rst sum H,det(h) 4 2 2 1 4 = Ļ€ Ėœ (P ) + Ī¾ (P āˆ§v ) =0.1. 5.4. Initializing bounds We now describe our approach to initialize bounds VĖœL B and VĖœUB (line 1 of Algorithm 1). We start with the former. To obtain a lower bound VĖœLB, we assume that no honeypots can be deployed in the network. The attacker then only needs to ļ¬nd the cheapest (i.e., shortest) path in the graph when costs C are considered. Denote Cāˆ—(vi) the cost of the cheapest path from vertex vi. We ini- tialize the lower bound VĖœL B using two linear functions. Firstly, the cost cannot be negative and we use a linear function Ī±1(Ļ‡) = 0. Next, we consider linear function Ī±2(Ļ‡) = (a(2))T Ļ‡ + z(2) where (2) 1 (2) i z = Cāˆ—(v ) and a =min{ i 1 Cāˆ—(v ) āˆ’ Cāˆ—(v ), 0} which is a tighter lower bound for some Ļ‡ (especially close to the initial character- 0 0 1 istic vector Ļ‡ where Ļ‡ = 1 and other vertices are not infected, 0 i i.e., Ļ‡ = 0 for i ā‰„ 2). To initialize the upper bound VĖœUB, we consider a perfect in- formation variant of the game (similarly to HorĆ”k et al. (2017)). However, since the number of states (i.e., possible subsets of infected vertices) in such a game can be exponential, we consider a simpler version of the game where the attacker is only in charge of the vertex he visited the last (instead of a subset of all ver- tices visited in the past) which can only increase the cost for the i attacker. Denote C āˆ— (v ) the expected cost of the attacker in the per- fect information variant of the game where the attacker controls vertex vi only. We then use the set Ļ’ = {(Ļ‡( j), y( j)) |1 ā‰¤ j ā‰¤ |V|} UB of points to initialize VĢƒ where ( j) i i vertex vi has already been infected before he started to execute path P. Analogously, we can obtain the probability qH,det(h) that the Ļ‡ = attacker got detected while traversing edge h as 1 i = j 0 i h= j ( j) āˆ— j y = C (v ). (16) 6. Scaling up: incremental strategy generation In the previous section, we introduced marginalized beliefs and strategies to deal with the large number of game states. However, the number of actions grows fast as well which makes the game computationally challenging and may cause memory consumption issues in practical implementation of the approach. In this section, we address this issue by adopting a double-oracle approach to generate actions of the players incrementally. 6.1. Double-Oracle scheme for solving stage games Before describing the oracle algorithms used in the exper- iments, we provide a generic scheme for using double-oracle approach to solving stage games H Ėœ V Ėœ(Ļ‡), see Algorithm 3. The Algorithm 3: Generic double-oracle scheme for solving stage games. Ėœ Ėœ 1 2 1 A , A ā† initial subsets of actions of player~1 and player~2 deļ¬ning a restricted game 2 When solving a stage game H Ėœ V Ėœ (Ļ‡): 3 do 4 Solve stage game H Ėœ V Ėœ(Ļ‡) considering actions AĖœ1and AĖœ2 only if improving allocation H for the defender is found then 5 6 AĖœ1 ā† AĖœ1āˆŖ{H} 7 continue 8 else if improving path P for the attacker is found then 9 AĖœ2 ā† AĖœ2āˆŖ{P} 10 continue 11 break 12 while an improving action is found for one of the players algorithm keeps global sets of actions AĖœ1 and AĖœ2 deļ¬ning a re- stricted game where only these actions are considered (instead of entire sets of actions A1 and A2). Set A1 is initialized to a single arbitrary honeypot allocation, set A2 is initialized to an arbitrary set of |V| paths P1, . . . , P|V| from each of the vertices vi to v|V|. When solving a stage game, we ļ¬rst query the oracle of the defender before querying the oracle of the attacker. This decision is motivated by the possibility to use a heuristic oracle for generating actions of the defender (see Section 6.4) which is less computationally demanding compared to the exact computation.
  • 9. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 9 6.2. Exact oracle of the attacker In Section 4.2, we have described the way the strategy of player 1 can be extracted in the form of variables Ļ€1 and gadgets Ī¶ H,det(h)(Ļ‡ ) = (aH,det(h))T Ļ‡ + zH,det(h). The oracle of the attacker inspects this strategy and its lower bound o(Ļ‡ ) (extracted from the objective of the dual formulation of the stage game) and tries to prove or disprove that the value o(Ļ‡ ) is a valid lower bound on the solution. To this end, the attacker selects a single path from any of the vertices of the graph towards the target vertex v and selects a |V| characteristic vector Ļ‡ where this pure strategy of the attacker is applicable. Ī“(vj)+ Pij = (vi,vj)āˆˆE (vj,vk)āˆˆE Pjk āˆ€vjh= v|V | (17a) vih=v| V| i Ī“(v ) = 1 (17b) ij P āˆˆ{0, 1} i j āˆ€(v, v )āˆˆE (17c) Ī“(vi) ā‰„ 0 vi h= v|V | (17d) The single path of the attacker is represented as an integral ļ¬‚ow in the graph of capacity 1. Constraint (17a) represents the Kirchoffā€™s law which has to hold for any but two verticesā€”the source vertex of the path which is indicated by Ī“(vi) = 1, and the target vertex v|V|. Furthermore, constraint (17b) ensures that only one path is considered. The characteristic vector Ļ‡ must make the path represented by the ļ¬‚ow P applicable, i.e. Ļ‡i = 1 for the source vertex vi where Ī“(vi) =1. i i Ļ‡ ā‰„ Ī“(v ) āˆ€v h= v i |V| (17e) 0 ā‰¤ Ļ‡i ā‰¤1 āˆ€viāˆˆV (17f) We now express the value of playing a path represented by the ļ¬‚ow P against the honeypot allocation H. To this end, we ļ¬rst deļ¬ne an auxiliary ļ¬‚ow PH which is identical to P, except that it gets blocked by the ļ¬rst honeypot edge (vi, vj) āˆˆ H. j Ī“(v ) + H ij P = (vi,vj)āˆˆEH (vj,vk)āˆˆE PH jk āˆ€v h= v j |V| (17g) H P ā‰¤P ij ij (17h) H ij 0 ā‰¤ P ā‰¤1 āˆ€(vi, vj) āˆˆE āˆ€(vi, vj) āˆˆE (17i) Note that the fact that ļ¬‚ow PH is identical to P (except for the blocking on honeypot edges H) is ensured by the constraint (17h)ā€” if the ļ¬‚ow has to continue from any vertex, it must follow the edge contained in P. Now, we can express the immediate cost of playing P against honeypot allocation H as H,P c = (vi,vj)āˆˆEH i j H ij C(v , v )P + (vi,vj)āˆˆH i j ij C(v ,v )PH. (17j) Next, we express the characteristic vector Ļ‡ H in the subgame given that the attacker got detected on any edge from H (here, (v ,v )āˆˆE H i j ij P is a binary value indicating whether the attacker reached vj by the time he got detected). H j Ļ‡ = (vi,vj)āˆˆE H ij j P + Ļ‡ 1 āˆ’ (vi,vj)āˆˆE PH ij I (17k) This allows us to derive the expected cost in the subgame given that the attacker got detected on a honeypot edge h = (vx, vy) āˆˆ H. VH,det(h)= āŽ§ āŽØ0 vy =v|V| viāˆˆV āŽ© i H,det(h) H H H,det(h) H i xy xy a Ļ‡ P + z P otherwise (17l) Note that VH,det(h) is zero if the attacker does not reach edge h (in H xy |V| that case P = 0) or the attacker reached his target v . This gives us the expected cost VH of the path represented by the ļ¬‚ow P against honeypot allocation H, H H,P hāˆˆH V = c + VH,det(h) , (17m) and the expected cost of using path P is H 1 H V = Ļ€ (H)V . (17n) The optimization objective of the attacker is to disprove that o(Ļ‡ ) is the lower bound. To this end, he attempts to ļ¬nd ab- stracted belief Ļ‡ and path represented by P where the difference V āˆ’ o(Ļ‡) is maximized, i.e., max o(Ļ‡) āˆ’V s.t. constraints (17a)-(17n). If o(Ļ‡) āˆ’ V > 0, the function o(Ļ‡ ) is provably not a lower bound on the solution of the stage game. Hence, the path of the attacker represented by P can be added into the restricted game. Note that all products z = term1 Ā·term2 of variable expressions in the above mathematical program only involves expressions term1 āˆˆ[0, 1] and a binary expression term2 āˆˆ{0, 1}. Such products can be rewritten using a set of linear constraints z ā‰¤ term1 z ā‰¤ term2 z ā‰„ term1 āˆ’ (term2 āˆ’ 1) z ā‰„0 (18) and thus obtain a mixed integer linear program formulation. 6.3. Exact oracle of the defender The oracle of the defender is aimed on computing a best response of the defender (i.e., the best possible placement of honeypots on edges) against a ļ¬xed strategy of the attacker (described by variables Ļ€ Ėœ 2 and Ī¾ ). To this end, we propose a branch-and-bound approach. Each node of a branch-and-bound search tree corresponds to a partial allocation of the honeypots H. We can obtain a lower bound on the cost of the attacker induced by all solutions H āŠ‡ H by assuming that only honeypot edges in H are used, P LB(H) = Ļ€2 H,P hāˆˆH H,det(h) Ėœ Ėœ (P)c + q V(Ļ‡ H,det(h) H,det(h) / q ). (19) Note that the division by qH,det(h) is used to account for the fact that Ļ‡ H,det(h) is denormalized (it is multiplied by the probability that observation det(h) is generated). An upper bound on the solutions H āŠ‡ H is obtained by ne- glecting the sequential interaction of remaining honeypots (i.e., the attacker can be possibly caught more than once). By placing a honeypot on an edge h āˆˆ h H, the cost can be increased by at most ļæ½h = P|hāˆˆPāˆ§Pā‰¤hāŠ†Pā‰¤H Ļ€ Ėœ2(P) Ā·(C(h) āˆ’C(h)) HāˆŖ{h},det(h) Ėœ HāˆŖ{h},det(h) HāˆŖ{h},det(h) +q V(Ļ‡ /q ). (20) Given that the attacker reaches edge h, he pays C(h) āˆ’ C(h) extra cost and furthermore he enters a subgame where he got detected on edge h from a honeypot set HāˆŖ{h}. We can then obtain the upper bound using
  • 10. 10 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 NHāˆ’|H| i=1 i UB(H) = LB(H) + ļæ½āˆ— , (21) where ļæ½āˆ— 1,. . . , ļæ½āˆ— N āˆ’|H| are the NH āˆ’ |H| highest values among ļæ½h. H 6.4. Heuristic oracles In practice, ļ¬nding exact solutions of oracles from Sections 6.2 and 6.3 can be computationally expensive. To mitigate the impact of the oracle computation times on the performance of the algorithm, we present a heuristic version of the oracle of the defender. The heuristic oracle is a greedy variant of the exact oracle presented in Section 6.3. The use of the heuristic oracle of the defender from the Algorithm 4 can lead to a degradation of the quality of the 1 H ā† āˆ… 2 while |H| < NHdo 3 hāˆ—ā† arg maxhāˆˆEHļæ½h 4 Hā† HāˆŖ{hāˆ—} solution found by the HSVI algorithm. By omitting some actions of the defender from the support of the equilibrium, the cost of the attacker can be lowered. Nevertheless, the lower-bound guarantees on the quality of the solution found by the algorithm are retained. 7. Experimental evaluation In this section we focus on the experimental evaluation of the proposed techniques on the lateral movement stochastic game introduced in Section 5. We consider three variants of the HSVI algorithm for compact POSGs depending on the oracle algorithm used: (1) the version where no oracles are used (i.e., AĖœi= Ai), (2) the variant using exact oracles for both of the players (described in Sections 6.2 and 6.3), (3) and ļ¬nally the variant where heuristic oracle for the defender (described in Section 6.4) is used instead of the exact computation of the best response. We also compare these approaches with the original HSVI algorithm (HorĆ”k et al., 2017) where the summarized abstraction is not used and the computa- tion is performed based on the entire (exponential) state space. 7.1. Experiments setting The evaluation has been performed on a set of randomly generated graphs with varying parametersā€”number of vertices in the network |V|, number of honeypots NH and the density of the graph. For simplicity we used directed acyclic graphs (DAGs) for the experiments, however, the techniques presented in the paper are general enough to apply on scenarios where the network is not acyclic. We use the following algorithm to generate mesh-like net- works. First, positions of |V| nodes are randomly generated from a 2-dimensional interval [0, 20] Ɨ [0, 20] with v1 positioned at (0,0) and v|V| positioned at (20,20) (pi āˆˆ[0, 20]2 denotes the position of vi). For each vertex vi a set of k nearest neighbors Nk(vi) (with respect to positions pj) is determined and an edge connecting vertices vi and vj is created for every vj āˆˆNk(vi). The orientation of the edges is set to always lead towards the vertex closer to the target v|V|. Note that increasing the parameter k leads to an increase in the number of edges in the graph. The graph generated by the above method may, however, contain multiple vertices with zero in-degree or out-degree (apart of vertices v1 and v|V|). We ļ¬x this by regenerating positions of Security Level 1 ( Īŗ1) Security Level 2 ( Īŗ2) Security Level 3 ( Īŗ3) Algorithm 4: Heuristic version of the oracle of the defender. Fig. 3. Example of a mesh network used in experiments. Contour lines depict the cost function f used to derive the costs C and C of the edges. The vertices are par- titioned into 3 security levels. such nodes until ļ¬nding a directed acyclic graphs where only v1 and v|V| have zero in-degree and out-degree, respectively. Next, we deļ¬ne a cost function f (x, y) ā†’ R>0. This function indicates the hardness to operate in point (x, y). In our exper- iments, we use f (x, y) = x + y which attains its maximum in (20,20) where the target vertex v|V| is located (i.e., the network is the most secured around the target vertex). To obtain the cost C(vi, vj) of traversing an edge (vi, vj) given that the honeypot is not deployed there, we integrate the cost function f along the line connecting points pi and pj, { 1 0 C(vi, vj) = Ipi āˆ’ pj I2 f(Ī»pi + (1 āˆ’Ī»)p j) dĪ». (22) Furthermore, we partition the vertices into 3 security perime- ters (with multiplicative constants Īŗ1, Īŗ2 and Īŗ3) based on the distance from the position of the target vertex v|V|. The vertices closest to the target vertex v|V| are the most secured, hence the attacker has to use a potentially costly zero-day exploit to proceed (and risk its revelation). Thus the cost C(vi, v j) of traversing an edge entering vertex vj that is present in the most secured zone is high, C(vi,vj) = Īŗ3 Ā·C(vi,vj) for Īŗ3 = 4. (23) For security perimeters further away from the target vertex v|V| we use Īŗ2 = 1.5 and Īŗ1 = 1. Fig. 3 provides an example of such a network and its accompanying cost function f. All computational results have been obtained on computers equipped with Intel Xeon Scalable Gold 6146 processors and 16GB of available RAM while limiting the runtime of the algorithms to 2 hours. CPLEX 12.9 has been used to solve (mixed integer) linear programs. The algorithms were required to ļ¬nd an E-optimal solution where E is set to 1% of the error VĖœUB(Ļ‡ 0) āˆ’ VĖœLB(Ļ‡ 0) in the initial characteristic vector Ļ‡ 0 after the initialization phase described in Section 5.4 is completed. If the algorithm failed to reach this level of precision within 2 hours, we report an instance as unsolved. The results are based on 100 randomly generated networks for each parameter set. 7.2. Scalability in the size of graphs First, we focus on the scalability of the algorithms in size of the networkā€”in the number of vertices |V|, and in the density of the graph (controlled by the parameter k). Fig. 4 depict the scalability of the algorithms in the number of vertices. First of all, observe that the original HSVI algorithm for one-sided POSGs has been unable to solve all but the smallest
  • 11. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 11 Fig. 4. Scalability in the number of vertices in the network |V| (averages based on 100 instances for each parameter set). Conļ¬dence intervals mark the standard error. The reported runtimes include only instances solved by the algorithms. The percentage of instances where the algorithms failed to terminate within 2 hours are reported separately.
  • 12. 12 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 Fig. 5. Scalability of the algorithms: (Left) in the density (i.e., the number k of nearest neighbors considered when forming the edges) of the network with 14 vertices and NH = 2 honeypots, (Right) in the number of honeypots NH in a network with 14 vertices and k = 6. Conļ¬dence intervals mark the standard error. The reported runtimes include only instances solved by the algorithms. The percentage of instances where the algorithms failed to terminate within 2 hours are reported separately. instances with 8ā€“12 vertices and even with only NH = 1 honeypots (while exhausting the memory on larger instances). To this end, we did not evaluate this algorithm on more complicated instances with NH > 1. In contrary, the algorithm relying on the technique of summarized abstractions have been able to solve most of the instances with NH = 1 even without the use of the incremental strategy generation technique described in Section 6. The versions of the algorithm relying on the incremental strategy generation were then able to solve all 100 randomly generated instances for each parametrization when NH = 1 honeypot is considered. Im- portantly, the bounds computed by the versions using incremental strategy generation overlap with the bounds computed by the re- maining variants. This demonstrates that in spite of improving the runtimes by using the incremental strategy generation technique, the quality of the solution does not deteriorate. Increasing the number of honeypots to NH = 2 highlights the advantages of using the oracles to generate actions of the players incrementally. While the algorithm that does not use the oracles failed to solve most of the instances with |V| ā‰„ 18 in the time limit of 2 hours, both of the variants relying on the incremental strategy generation technique were able to solve even most of the largest instances with 24 vertices and k = 4 in less than 20 minutes on average (considering only the instances solved by the algorithms). Increasing the number of honeypots to NH = 3 highlights the computational advantages of using the heuristic version of the oracle of the defender presented in Section 6.4. This version of the algorithm was able to solve signiļ¬cantly more instances compared to the version using the exact version of the oracle of the defender (Section 6.3). Note that the version that does not use the strategy generation technique was able to solve only a few instances with |V| ā‰¤ 14. This is mainly attributed to the size of the linear programs to solve the stage gamesā€”e.g., the number of possible combinations of actions of players in the games with 14 vertices and k = 5 we considered is up to |A1| Ɨ |A2| '.: 5.2Ā· 107. Note that the algorithms using the oracles were able to solve nearly all instances in this setting (|V|= 14 and k = 5). In Fig. 5 (left), we focus on the scalability of the three vari- ants of the algorithm using the summarized abstraction based on the density of the network (controlled by the parameter k). The number of actions in the game can grow fast with the addition of new edges to the network which makes solving the game chal- lenging for the variant without the use of oracles. However, the addition of the edges need not increase the support of the equilib- rium and hence the runtime of the variants relying on incremental strategy generation is not affected by the increasing value of the parameter k. 7.3. Scalability in the number of honeypots In Fig. 5 (right) we focus on the scalability of the algorithm in the number of honeypots. The number of actions of the defender is exponential in the number of honeypots NH. Hence, it is not surprising that the algorithm that does not use the incremental strategy generation to form defenderā€™s actions can solve only the smallest instances in terms of the number of honeypots NH. The variants of the algorithm relying on the oracles to incremen- tally generate the actions perform signiļ¬cantly better, however, when the number of honeypots is large (NH ā‰„ 4), computing the best response of the defender exactly (using the approach from Section 6.3) is prohibitively expensive. In contrary, the heuristic variant of the oracle of the defender allows the algorithm to scale better and solve most instances up to NH = 6. Interestingly, we observed that using the heuristic oracle did not impact the quality of the solution in a negative way. Consid- ering the instances from Fig. 5 (right) that were solved by the variant using the exact oracle of the defender, we observed that the lower bounds VĖœLB(Ļ‡ 0) computed by these two variants of the algorithm were within 0.6% of each other. 7.4. Applicability to computer networks Previous experiments have been focused on randomly gener- ated mesh networks. The physical properties of mesh networks typically disallow direct communication between the devices (e.g., due to the presence of obstacles, or low signal strength caused by large distances between the devices). This leads to a signiļ¬cantly lower number of edges in the corresponding game graph com- pared to regular computer network architectures. To demonstrate the versatility of our approach, we evaluate our algorithms on a graph originating from a fundamentally different network topology given as an example by a network security expert. In Fig. 6(a), the physical topology of the considered computer network is shown. Based on this topology, we derived logical communication links, shown in Fig. 6(b), that the attacker can use to perform lateral movement and that do not directly correspond to the physical links. For example, every device can directly
  • 13. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 13 9 end-user machines 5 end-user machines target sensor2 sensor1 * * * sensor3 web1* email* transf* * er db * SECON * DC * web2* a sensor3 db SECON target 5 end-user machines web1 transfer web2 sensor1 sensor2 email Possible attack sources (7 machines) 9 end-user machines DC b Fig. 6. Computer network topology considered in the experiments: (a) Physical topology of the network. Nodes marked with asterisk exist in multiple instances and can be used as honeypots. (b) Logical communication links in the network. Dashed lines correspond to easy diļ¬ƒculty levels, solid lines to medium diļ¬ƒculty and ļ¬nally bold solid lines represent hard diļ¬ƒculty. communicate with the web server and thus the attacker can use this communication channel to carry on his attack. Moreover, compromising the web server allows an attack on the database server. We also reļ¬‚ect potential side channels in the networks such as social engineering allowing the attacker to compromise end- user machines by publishing malicious ļ¬les on the web presentation of the company. Each logical link can be used to infect an adjacent machine. The links, however, differ in the diļ¬ƒculty of carrying such an attack along the edge, e.g., it is substantially easier to inļ¬ltrate web server via an exploitable web presentation than attempting to compro- mise the domain controller running the latest versions of the services. We distinguish three diļ¬ƒculty levels (with three different associated costs representing the time needed to successfully tra- verse the edge): easy (C(e) = 10 units of time), medium (C(e) = 20) and hard (C(e) = 40). In the presence of a honeypot, the costs increase to C(e) = 40, C(e) = 80 and C(e) = 160, respectively2 In our example, we assume that the attacker uses a computer outside of the network (i.e., in the internet zone) to inļ¬ltrate the network, with the aim of compromising the target node within the internal part of the network (assume that sensitive data are located at this host). The defender attempts to slow down attackerā€™s progress by deploying a honeypot on one of the hosts within the network (hosts that can be selected are denoted by an asterisk in Fig. 6(a)). We assume that each such host is present in d instances within the network (e.g., one being the production server and the remaining instances serving for backup purposes). In order to serve the legitimate users, the defender is allowed to operate a 2 Note that the exact values of costs do not affect usability of our algorithm and hence can be set speciļ¬cally for each network. honeypot on one of the instances only. For d = 2 instances of each server, the algorithm (using the exact versions of the oracles) took 982s to ļ¬nd a solution with the lower bound on attackerā€™s cost of 69.4, while for d = 3 instances of each server a solution with quality 63.3 has been found in 5950s. The shortest attack path in the graph has a cost of 50, which marks an increase in the attackerā€™s cost by 38.8% and 26.6%, respectively. The larger of the two instances (with d = 3) is represented by a graph with 52 vertices and 504 edges. Note that the decreasing value of the game with increasing value of d is caused by the fact that the defender is able to cover only one of the instances of a serverā€”thus the probability that the attacker manages to choose an unprotected instance increases. While we have demonstrated the capabilities of the algorithm to solve problems related to computer networks, we believe that its scalability on this type of networks can be signiļ¬cantly improved and we discuss this in the next section. 8. Discussion and future extensions The model introduced in Section 5 represents the interaction with an attacker who is determined to reach his goal (in our case reach the target vertex v|V|). The attacker accepts the risk of getting detected in the course of the attack, and understands the implications of a successful detection. First, the cost of traversing an honeypot edge is higher. More importantly, the defender gets vital information about the progress of the attack which the defender can use to harden further progress of the attacker. In Section 7.4, we presented an application of the proposed algorithm to improve security of computer networks. Real-world applications, however, require that the legitimate services do not get replaced by honeypots to sustain the operation of the network.
  • 14. 14 K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 We reļ¬‚ected this by operating each server in multiple instances (essentially by duplicating vertices and their incident logical communication links in the network) and allowing the defender to use only one of the instances as a honeypot (thus confusing the attacker without compromising the experience of legitimate users). Such an approach inherently introduces symmetries to the problemā€”the defender can decide to choose any of the instances of a single service as a honeypot, while the attacker can choose any instance (not knowing which one is a decoy) to carry on the attack. Symmetries, however, have negative impact on the time needed to solve the game as the experiments in Section 7.4 show. We believe that our approach is well suited to deal with 1 d such symmetries. Since all instances vi , . . . , vi of a service are indistinguishable within the graph, it is possible to replace coor- dinates Ļ‡i1 , . . . , Ļ‡id of a characteristic vector Ļ‡ by a single value i id Ļ‡ + Ā· Ā· Ā·+ Ļ‡ without compromising the quality of the solution. 1 The location of the honeypots is typically assumed to be known to legitimate users, and the interaction with a honeypot can thus be considered as a reliable indication of an ongoing attack. Real- world computer networks rely also on intrusion detection systems (IDS) to discover potential malicious actions of the attackers. The detection of IDS is, however, not reliable. Another natural direction is thus to extend the algorithm (especially the Eqs. (15)) to account for unreliable detection represented by stochastic transitions in the game. 9. Conclusions Cybersecurity domains present a wide variety of challenging problems due to the changing nature of the threats and the complexity of the decisions. We focus on the problem of using honeypots to detect and mitigate the lateral movement of cyber attackers. While previous work has considered basic versions of this problem, we consider a much more realistic model where the defender has uncertainty about the resources that attackers may control, and there is a long-term dynamic interaction between the defender and the attacker. We present a formulation of this problem using partially observable stochastic games (POSGs). However, existing state of the art solution algorithms are not able to solve anything but the most trivial examples of this problem. We present two major advances in solving POSGs that dramat- ically improve our ability to solve Lateral Movement POSG. First, we introduce a novel compact representation of partial informa- tion in these games using a summarized abstraction of the belief state. Second, we describe how we can use incremental strategy- generation in combination with this compact representation to further improve the solution algorithm. We show experimentally that these improvements lead to a great improvement in scala- bility of the algorithms to solve the Lateral Movement POSG, and allow us to ļ¬nd near-optimal solution for problems with realistic network sizes. These techniques should also be applicable to many other dynamic games with similar problem structure. Declaration of Competing Interest The authors declare that they have no known competing ļ¬nan- cial interests or personal relationships that could have appeared to inļ¬‚uence the work reported in this paper. Acknowledgments This research was supported by the Czech Science Founda- tion (no. 19-24384Y), by the Ministry of Education, Youth and Sports (MEYS) funded project CZ.02.1.01/0.0/0.0/16 019/0000765 ā€œResearch Center for Informaticsā€, and by the U.S. Army Combat Capabilities Development Command Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). The views and con- clusions contained in this document are those of the authors and should not be interpreted as representing the oļ¬ƒcial policies, ei- ther expressed or implied, of the Combat Capabilities Development Command Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes not with standing any copyright notation here on. We thank Sridhar Venkatesan for providing the expertise on experiments with the business network. References Abuzainab, N., Saad, W., 2018. Dynamic connectivity game for adversarial internet of battleļ¬eld things systems. IEEE Internet Things J. 5 (1), 378ā€“390. Basu, A., Stettner, L., 2015. Finite- and inļ¬nite-Horizon shapley games with nonsym- metric partial observation. SIAM J. Control Optim. 53 (6), 3584ā€“3619. BoÅ”anskĆ½, B., Kiekintveld, C., LisĆ½, V., PeĖ‡choucĖ‡ek, M., 2014. An exact double-Oracle algorithm for zero-Sum extensive-Form games with imperfect information. J. Artiļ¬cial Intell. Res. 51, 829ā€“866. Cai, G.-l., Wang, B.-s., Hu, W., Wang, T.-z., 2016. Moving target defense: state of the art and characteristics. Front. Inf. Technol. Electron. Eng. 17 (11), 1122ā€“1153. Chatterjee, K., Doyen, L., 2014. Partial-observation stochastic games: ho w to win when belief fails. ACM Trans. Comput. Logic (TOCL) 15 (2). Du, M., Li, Y., Lu, Q., Wang, K., 2017. Bayesian game based pseudo Honeypot model in social Networks. In: International Conference on Cloud Computing and Secu- rity. Springer, pp. 62ā€“71. Durkota, K., LisĆ½, V., BoÅ”anskĆ½, B., Kiekintveld, C., 2015. Approximate solutions for attack graph games with imperfect information. In: International Conference on Decision and Game Theory for Security, pp. 228ā€“249. Garg, N., Grosu, D., 2007. Deception in honeynets: A game-theoretic analysis. In: 2007 IEEE SMC Information Assurance and Security Workshop. IEEE, pp. 107ā€“113. Ghosh, M.K., McDonald, D., Sinha, S., 2004. Zero-sum stochastic games with partial information. J. Optim. Theory Appl. 121 (1), 99ā€“118. Hansen, E.A., Bernstein, D.S., Zilberstein, S., 2004. Dynamic programming for par- tially observable stochastic games. In: National Conference on Artiļ¬cial Intelli- gence (AAAI), pp. 709ā€“715. HorĆ”k, K., BoÅ”anskĆ½, B., 2019. A point-based approximate algorithm for one-sided partially observable pursuit-evasion games. In: International Conference on De- cision and Game Theory for Security, pp. 435ā€“454. HorĆ”k, K., BoÅ”anskĆ½, B., 2019. Solving partially observable stochastic games with public observations. In: 33rd AAAI Conference on Artiļ¬cial Intelligence, pp. 2029ā€“2036. HorĆ”k, K., BoÅ”anskĆ½, B., PeĖ‡choucĖ‡ek, M., 2017. Heuristic search value iteration for one- sided partially observable stochastic games. In: 31st AAAI Conference on Artiļ¬cial Intelligence, pp. 558ā€“564. HorĆ”k, K., BoÅ”anskĆ½, B., Kiekintveld, C., Kamhoua, C., 2019. Compact representation of value function in partially observable stochastic games, 28th International Joint Conference on Artiļ¬cial Intelligence. In press. Jain, M., Conitzer, V., Tambe, M., 2013. Security scheduling for real-world networks. In: International Conference on Autonomous Agents and Multiagent Systems, pp. 215ā€“222. Jain, M., Korzhyk, D., VaneĖ‡k, O., Conitzer, V., Tambe, M., PeĖ‡choucĖ‡ek, M., 2011. Double Oracle algorithm for zero-sum security games on graph. In: 10th International Conference on Autonomous Agents and Multiagent Systems, pp. 327ā€“334. Jajodia, S., Ghosh, A.K., Subrahmanian, V., Swarup, V., Wang, C., Wang, X.S., 2012. Moving target defense II: application of game theory and adversarial modeling, 100. Springer. Kamdem, G., Kamhoua, C., Lu, Y., Shetty, S., Njilla, L., 2017. A Markov game theoritic approach for power grid security. In: 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW). IEEE, pp. 139ā€“144. Kiekintveld, C., LisĆ½, V., PĆ­bil, R., 2015. Game-theoretic foundations for the strategic use of honeypots in network security. In: Cyber Warfare. Springer, pp. 81ā€“101. Kumar, A., Zilberstein, S., 2009. Dynamic programming approximations for par- tially observable stochastic games. In: 22nd International FLAIRS Conference, pp. 547ā€“ 552. La, Q.D., Quek, T.Q., Lee, J., Jin, S., Zhu, H., 2016. Deceptive attack and defense game in honeypot-enabled networks for the internet of things. IEEE Internet Things J. 3 (6), 1025ā€“1035. Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., PĆ©rolat, J., Sil- ver, D., Graepel, T., 2017. A uniļ¬ed game-theoretic approach to multiagent re- inforcement learning. In: Advances in Neural Information Processing Systems, pp. 4190ā€“4203. Li, X., Cheung, W.K., Liu, J., 2010. Improving POMDP tractability via belief compres- sion and clustering. IEEE Trans. Syst. Man. Cybern., Part B (Cybernetics) 40 (1), 125ā€“136. Mairh, A., Barik, D., Verma, K., Jena, D., 2011. Honeypot in network security: a sur- vey. In: Proceedings of the 2011 International Conference on Communication, Computing & security. ACM, pp. 600ā€“605. Mandiant Intelligence Center, 2013. APT1: Exposing one of Chinaā€™s cyber espionage units. Mandiant, Tech. Rep. https://www.ļ¬reeye.com/content/dam/ļ¬reeye-www/ services/pdfs/mandiant-apt1-report.pdf.
  • 15. K. HorĆ”k, B. BoÅ”anskĆ½ and P. TomĆ”Å”ek et al. / Computers & Security 87 (2019) 101579 15 McMahan, H.B., Gordon, G.J., Blum, A., 2003. Planning in the presence of Cost Func- tions Controlled by an Adversary. In: 20th International Conference on Machine Learning, pp. 536ā€“543. MoravcĖ‡Ć­k, M., Schmid, M., Burch, N., LisĆ½, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., Bowling, M., 2017. Deepstack: expert-level artiļ¬cial in- telligence in heads-up no-limit poker. Science 508ā€“513. Nawrocki, M., WƤhlisch, M., Schmidt, T.C., Keil, C., Schƶnfelder, J., 2016. A survey on honeypot software and data analysis. arXiv:1608.06249. Nguyen, T.H., Wellman, M.P., Singh, S., 2017. A stackelberg game model for botnet data exļ¬ltration. In: International Conference on Decision and Game Theory for Security, pp. 151ā€“170. Noureddine, M.A., Fawaz, A., Sanders, W.H., BasĀø ar, T., 2016. A game-theoretic ap- proach to respond to attacker lateral movement. In: International Conference on Decision and Game Theory for Security. Springer, pp. 294ā€“313. Oliehoek, F.A., Savani, R., Gallego-Posada, J., van der Pol, E., de Jong, E.D., Gross, R., 2017. GANGs: Generative adversarial network games. arXiv:1712.00679. Pawlick, J., Zhu, Q., 2015. Deception by design: evidence-based signaling games for network defense. arXiv:1503.05458. PĆ­bil, R., LisĆ½, V., Kiekintveld, C., BoÅ”anskĆ½, B., PeĖ‡choucĖ‡ek, M., 2012. Game theoretic model of strategic honeypot selection in computer networks. In: International Conference on Decision and Game Theory for Security, pp. 201ā€“220. Roy, N., Gordon, G., Thrun, S., 2005. Finding approximate POMDP solutions through belief compression. J. Artif. Intell. Res. 23, 1ā€“40. Smith, T., Simmons, R., 2004. Heuristic Search Value Iteration for POMDPs. In: 20th Conference on Uncertainty in Artiļ¬cial Intelligence, pp. 520ā€“527. Stoll, C., 1989. The Cuckooā€™s Egg: Tracking a Spy Through the Maze of Computer Espionage. Doubleday. VaneĖ‡k, O., Yin, Z., Jain, M., BoÅ”anskĆ½, B., Tambe, M., PeĖ‡choucĖ‡ek, M., 2012. Game-theo- retic Resource Allocation for Malicious Packet Detection in Computer Networks. In: 11th International Conference on Autonomous Agents and Multiagent Sys- tems, pp. 902ā€“915. Wang, K., Du, M., Maharjan, S., Sun, Y., 2017. Strategic honeypot game model for distributed denial of service attacks in the smart grid. IEEE Trans. Smart. Grid 8 (5), 2474ā€“2482. Wang, Y., Shi, Z.R., Yu, L., Wu, Y., Singh, R., Joppa, L., Fang, F., 2019. Deep reinforce- me nt learning for green security games with real-time information. In: 33rd AAAI Conference on Artiļ¬cial Intelligence, pp. 1401ā€“1408. Zhou, E., Fu, M.C., Marcus, S.I., 2010. Solving continuous-state POMDPs via densit y projection. IEEE Trans. Automat. Contr. 55 (5), 1101ā€“1116. Karel HorĆ”k is a postgraduate student at the Dept. of Computer Science of the Faculty of Electrical Engineering, at the Czech Technical University in Prague. He received his Masters degree in artiļ¬cial intelligence from Czech Technical University in Prague in 2015. Currently, he is interested in algorithms for solving inļ¬nite- horizon stochastic games with imperfect information and their application to the security. His works have been published at the AAAI and IJCAI conferences and other venues. Branislav BoÅ”anskĆ½ is an assistant professor at the Dept. of Computer Science of the Faculty of Electrical Engineering, at the Czech Technical University in Prague. He received his PhD in 2015 at the Faculty of Electrical Engineering, at the Czech Tech- nical University in Prague and spent one year as a postdoc researcher at the Aarhus University with prof. Peter Bro Miltersen. Branislavs research focuses on algorith- mic game theory and dynamic games. He is interested in computing approximate equilibria in dynamic games and designing scalable algorithms for using dynamic games in computer network security and robust machine learning methods used in an environment with an adversary. He has published 20+ papers and presented at the top AI conferences (AAAI, IJCAI, AAMAS) and top AI journals (AIJ, JAIR). Petr TomĆ”Å”ek is a postgraduate student at the Dept. of Computer Science of the Faculty of Electrical Engineering, at the Czech Technical University in Prague. He received his Masters degree in 2018 at the Faculty of Electrical Engineering, at the Czech Technical University in Prague. He is interested in game theory, especially in solving large extensive-form games. Christopher Kiekintveld is currently an associate professor at the University of Texas at El Paso (UTEP), and received his Ph.D in 2008 from the University of Michi- gan. His research is in the area of intelligent systems, focusing on multi-agent sys- tems and computational decision making, as well as applications in security and other areas with potential social beneļ¬t. He has worked on several deployed ap- plications of game theory for security, including systems in use by the Federal Air Marshals Service and Transportation Security Administration. He has authored more than 80 papers in peer-reviewed conferences and journals (e.g., AAMAS, IJCAI, AAAI, JAIR, JAAMAS, ECRA). He has received several best paper awards, the David Rist Prize, and an NSF CAREER award. Charles A. Kamhoua is a researcher at the Network Security Branch of the U.S. Army Research Laboratory (ARL) in Adelphi, MD, where he is responsible for con- ducting and directing basic research in the area of game theory applied to cyber security. Prior to joining the Army Research Laboratory, he was a researcher at the U.S. Air Force Research Laboratory (AFRL), Rome, New York for 6 years and an edu- cator in different academic institutions for more than 10 years. He has held visiting research positions at the University of Oxford and Harvard University. He has co- authored more than 150 peer-reviewed journal and conference papers. He received a B.S. in electronics from the University of Douala (ENSET), Cameroon, in 1999, an M.S. in Telecommunication and Networking from Florida International University (FIU) in 2008, and a Ph.D. in Electrical Engineering from FIU in 2011.