SlideShare a Scribd company logo
1 of 19
Download to read offline
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
488
A New Perspective of Traffic Assignment: A Game Theoretical Approach
Genaro PEQUE, Jr. a
, Toshihiko MIYAGI b
, Fumitaka KURAUCHI c
a,b,c
Department of Civil Engineering, Gifu University, Gifu, 501-1193, Japan
a
E-mail: gpequejr@ gifu-u.ac.jp
b
E-mail: t_miyagi@gifu-u.ac.jp
c
E-mail: kurauchi@gifu-u.ac.jp
Abstract: Traditional equilibrium models consider transportation networks with well-defined
link travel time functions and continuous drivers. Recently, researchers focused on adding the
behavioral dimension lacking in traditional equilibrium models by treating drivers as individual
decision-makers (atomic drivers). However, there is currently no underpinning theory that
supports the shift from macroscopic to microscopic traffic assignment modeling.
In this paper, a game theoretical model which provides this link is presented. We will
show that this model describe drivers’ adaptive behaviors as they perform day-to-day route
choices. Drivers acquire payoffs with unknown noise of their chosen and alternative routes.
This scenario describes a transportation network with the presence of a Traffic Management
Center (TMC).
Finally, a simulation-based dynamic traffic assignment simulation is carried out to
validate the model using the Simulation of Urban MObility (SUMO) open source software. The
simulation shows that Nash equilibrium can be achieved almost surely.
Keywords: Nash Equilibrium, Multi-agent Model, Stochastic Congestion Game
1. INTRODUCTION
Traditional equilibrium models have been widely used as a modeling tool in traffic assignment.
The governing solution concept in these models is the Wardrop equilibrium. A solution to a
traffic assignment problem is a situation in which travel demand and travel supply is consistent
with each other; traffic equilibria are mathematically described in terms of a fixed point
(Nonoyama and Miyagi, 1982; Miyagi et al., 1991) where the interaction of the travel demand
and travel supply doesn’t change the input or the outcome. This equilibrium is described by
either the user equilibrium (UE) or the stochastic user equilibrium (SUE).
A user equilibrium (UE) suggests that the flow on a route in a transportation network is
zero if the route has non-minimal cost (Wardrop, 1952). Hence, a UE is attained when all users
are on the routes with minimal costs. An analyst’s interpretation of a UE would be based on the
user’s perspective where a user can estimate the current best route in the transportation network.
This would imply that link travel time functions are common knowledge, route choices can be
observed and users calculate their best route choice based on this information (best-response).
However, assuming that users have the ability to calculate the current best route is highly
unrealistic and computationally expensive. An alternative approach is to relax these
assumptions and not require the best (optimal) route but rather to consider a user’s β€œperceived”
best route, caused by a user-specific random utility term, while maintaining the common
knowledge assumption (a user knows the distribution of her random utility as well). The process
requires the distribution of demand onto the routes based on the different route cost perceptions
of each user. Route flows fulfill some distribution and flows are shifted towards the desired
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
489
route-choice distribution. The shifting of flows happen in a gradual manner (iteratively) until
some stopping criterion is fulfilled indicating that a fixed point has been reached. A stochastic
user equilibrium (SUE) is then obtained when all users take the route of perceived minimal cost
(Daganzo and Sheffi, 1977).
Recently, researchers have focused on the structure of real travel decisions identifying it
as a major contributing factor in travel demand. Travel decisions are based on users’ reactions
from their interaction with each other which are not accounted for in traditional equilibrium
models. Implementation of the traditional equilibrium models such as UE and SUE focuses on
a single representative of the population, which means that the users being studied are
homogeneous and thus, behavior is invariable. Naturally, to account for real travel decisions,
different representatives from the demand population are required. This increases the level of
detail of the model which consequently increases the degree of heterogeneity of the
transportation network users. Additionally, the traditional equilibrium model treats a
user/traveler (henceforth, we will refer to a β€œuser” as β€œtraveler” to describe an individual
decision maker representing a single or group of users in the population with specific
characteristics) as a non-atomic particle (infinitely divisible). When the demand model accounts
for the increase of travelers, because of the combinatorial nature of all possible choices a single
traveler encounters during a single day and the non-atomic particle representation of each
traveler, traffic assignment becomes computationally intractable (Nagel and Flotterod, 2012).
To overcome this, a traveler can be interpreted as an atomic particle (a discrete decision
maker or a single agent) representing an individual in the population with a different
characteristic. The demand population can now be represented by multiple decision-makers
(multi-agent model). Flow distributions can then be reinterpreted as choice distributions over
the demand drawn using Monte Carlo techniques which maintains its mathematical
interpretation. With the advancement of computing power, micro-traffic simulators [SUMO,
VISSIM, MATSim, TRANSIMS] are being widely adopted for this purpose. A multi-agent
model typically used in micro-traffic simulation sample travelers with different characteristics
in the population and simulates the travelers’ interactions in the network. Traveler interaction
occurs during each iterative traffic assignment simulation until a stopping criterion is met.
Additionally, a traveler’s choice distribution is reinterpreted as random draws from her own
choice set (i.e. route set, a plan set, and activity chain set). Thus, an iterative solution procedure
in the traditional equilibrium models can be reinterpreted as a day-to-day learning behavioral
loop. An important aspect in traditional equilibrium models is the functional relationship
between link travel times and link flows which aren’t carried over to micro-traffic simulation.
Instead, the cost-flow relationships merely serve as look-up tables (where link travel time
functions are implicitly assumed) rather than as functional relationships. Moreover, the main
advantage of using the traditional equilibrium traffic assignment models is the robustness of its
solution, the Wardrop equilibrium. Therefore, in order to overcome the limitations of the
traditional equilibrium models while preserving its solution concept, there is a need to
reinterpret (rather than change) it. We then turned to game theory in modeling traveler behavior
where we focus on the Nash equilibrium solution concept which consequently implies a
Wardrop equilibrium. For an extensive review on game theory’s development and application,
the readers are referred to Tadelis (2012) and Fudenberg and Levine (1998).
Miyagi and Peque (2012) proposed a game theoretical model which accounts for the
adaptive behavior of players (travelers) in a transportation network. In addition, the authors
defined three classes of players, a.) Partially informed-users (PIU) with anticipated payoffs, b.)
Partially informed-users with announced payoffs and c.) NaΓ―ve users (NU), as a consequence
of whether players’ user-specific random utility is known or unknown and whether players’
actions can be observed or cannot be observed in addition to the user-specific travel time
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
490
functions. From the stochastic congestion game model the authors have proposed, even though
they believe it is applicable to a dynamic traffic assignment setting, so far they have only
validated their model under a static traffic assignment setting with PIU with anticipated payoffs
and naΓ―ve users. In contrast, this paper focuses on the PIU with announced payoffs and its close
relation to transportation networks with the presence of Traffic Management Centers (TMCs)
that β€œnowcast” travel times to all drivers in the transportation network to be used by all drivers
in making route choice decisions for the following day (day-to-day dynamics), a scenario
typical of a transportation network utilizing Intelligent Transportation Systems (ITS). To further
develop this model, we use the Simulation of Urban MObility (SUMO) software to validate it.
A clear motivation in building on this model is the need to develop comprehensive and
sophisticated traffic simulation procedures that include traffic flow simulation in which drivers’
decisions on route choices are interactively connected to the travel times generated by the traffic
simulation. Moreover, the convergence properties in dynamic route choice behavior based on
microscopic simulation are not yet fully established because travel times of the trips generated
by microscopic traffic simulation are not continuous and the expected values of the travel time
functions are not known in advance.
A similar case to the PIU with announced payoffs has been extensively studied in game
theory (Hart and Mas-Colell, 2000; Marden et al., 2009) and reinforcement learning (Borkar,
2008; Miyagi, 2005). In game theory, this is mostly in the better-reply variety of no-regret
algorithms. Hart and Mas-Collel’s (2000) work focused on the convergence to the set of
correlated equilibrium using regret-matching while Marden et al’s. (2009) work strengthened
the guarantees of regret-based learning in weakly acyclic games. They proved convergence to
Nash equilibrium almost surely. Although, players’ payoffs in these cases are unperturbed (no
additive random utility). On the other hand, reinforcement learning using stochastic
approximation (Robbins and Monro, 1951) was extensively studied by Borkar (2008) and was
applied by Miyagi (2005) to transportation, however, under the continuous player assumption.
Reinforcement learning is normally used when players’ payoffs are initially unknown and must
be estimated over time due to noisy observations (i.e. corrupted payoffs due to the unobserved
switches in actions by the other players, delay/inaccuracy of the information received, etc.).
This has been used by the authors (Leslie and Collins, 2003; Leslie and Collins, 2005; Leslie
and Collins, 2006; Cominetti et al., 2010; Chapman et al., 2013) we follow but they considered
naΓ―ve users.
Our contribution in this paper is the application of the stochastic congestion game model
with PIU with announced payoffs proposed by the authors (Miyagi and Peque, 2012) to a
simulation-based dynamic traffic assignment simulation. In the simulation, we used a
generalised weakened fictitious play actor-critic algorithm (Leslie and Collins, 2006), proposed
for the naΓ―ve user case, in the PIU with announced payoffs case. However, we slightly modified
the temperature (dispersion or logit) parameter updating scheme by using a regret-based
updating scheme (Miyagi and Peque, 2012; Miyagi et al., 2013) wherein players route choices
are improving, based on their regret, as time progresses which readily justifies the algorithm as
a model of learning. More importantly, our simulation results show that convergence to Nash
equilibrium is achieved almost surely.
The paper progresses as follows: In the next section, we introduce the notations,
definitions and concepts used in game theory and how it is applied to the traffic assignment
problem. In section 3, we introduce the stochastic congestion game model together with the
derivation of some of the updating formulations we use in this paper. We introduce the
generalised weakened fictitious play actor-critic learning model and its development and then
present it in section 4. In section 5, we present the simulation-based dynamic traffic assignment
simulation using the Simulation of Urban MObility (SUMO) software and show that players’
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
491
payoffs converge to Nash equilibrium almost surely. In section 6, we present our conclusions.
2. CONGESTION GAMES
In this section, we introduce a game, including its notations and some definitions, describing
the transportation network and its players, we then define the desired outcome of the
corresponding game.
2.1 Notations
Consider a game 𝒒 described by the triple,
𝒒 = (ℐ, {π’œπ‘–
, 𝑒𝑖
}π‘–βˆˆβ„
). (2.1.1)
The sets ℐ = {1, … , 𝑖, … , 𝐼}, where 𝐼 = |ℐ| and π’œπ‘–
= {𝒢1
𝑖
, … , π’Άπ‘˜
𝑖
, … , 𝒢𝑁
𝑖
}, where 𝑁 = |π’œπ‘–
|,
represent the set of players and the set of actions of each player 𝑖, respectively. We use the
notation π’Άβˆ’π‘–
∈ π’œβˆ’π‘–
to represent the action taken by the opponent(s) of player 𝑖, π’Άβˆ’π‘–
=
(𝒢1
, … , π’Άπ‘–βˆ’1
, 𝒢𝑖+1
, … , 𝒢𝐼
)and the action set of her opponent(s), π’œβˆ’π‘–
= π’œ1
Γ— β‹― Γ— π’œπ‘–βˆ’1
Γ—
π’œπ‘–+1
Γ— β‹― Γ— π’œπΌ
. An action profile is a vector denoted by 𝒢 = (𝒢1
, … , 𝒢𝑖
, … , 𝒢𝐼
) ∈ π’œ = π’œ1
Γ—
β‹― Γ— π’œπ‘–
Γ— β‹― Γ— π’œπΌ
. We use the conventional notation 𝒢 = (𝒢𝑖
, π’Άβˆ’π‘–
) to represent an action
profile to explicitly show an action taken by player 𝑖 against the actions taken by her
opponent(s), βˆ’π‘–. In this analysis, these sets are assumed finite, non-empty, non-unitary and
time-invariant. In the game 𝒒, each player 𝑖 represents a driver in the transportation network
choosing among her set of routes represented by π’œπ‘–
from her origin to her destination. We
sometimes interchangeably use the terms driver, user, traveler and player. The game 𝒒 is
played stage by stage as a repeated game. In a repeated game, each stage 𝑑 ∈ 𝑇 = {0,1,2, … } βŠ†
β„• lasts when all the players have chosen an action 𝒢𝑑
𝑖
denoted by 𝒢𝑑 = {𝒢𝑑
1
, … , 𝒢𝑑
𝑖
, … , 𝒢𝑑
𝐼
}.
The payoff of each player 𝑖 in a one-shot game, 𝑇 = {0}, is determined by the function
π“Šπ‘–
: π’œ β†’ ℝ. When the one-shot game is repeated finitely or infinitely often, 𝑇 = {0,1,2, … },
each player 𝑖 ∈ ℐ observes a sample 𝒰𝑑
𝑖
which is the player’s payoff at stage 𝑑 expressed as
𝒰𝑑
𝑖
= π“Šπ‘–
(𝒢𝑑
𝑖
, 𝒢𝑑
βˆ’π‘–
). Each player’s action 𝒢𝑑
𝑖
at stage 𝑑 is chosen according to a probability
distribution, πœ‹π‘‘
𝑖
, which we will refer to as the strategy of player 𝑖 at stage 𝑑. A player’s
strategy at stage 𝑑 relies only on her observations from stages 𝑇 = {0,1,2, … , 𝑑 βˆ’ 1} which
are dependent on the information restrictions assumed.
We define the empirical frequency of an action selected by player 𝑖 at stage 𝑑 as,
𝓏𝑑
𝑖
(𝒢𝑖
) =
1
𝑑
βˆ‘ 𝕀{𝒢𝑠
𝑖
= 𝒢𝑖
}
π‘‘βˆ’1
𝑠=0 , (2.1.2)
where 𝕀{β‹…} is the indicator function that takes the value of 1 if the statement in the parenthesis
is true and 0 otherwise.
From the stage payoffs, each player can estimate their action values denoted by,
𝒱
̅𝑑
𝑖
(𝒢
̃𝑖
) =
1
𝑑
βˆ‘ 𝕀{𝒢𝑠
βˆ’π‘–
= π’Άβˆ’π‘–
}π“Šπ‘–
(𝒢
̃𝑖
, 𝒢𝑠
βˆ’π‘–
) =
π‘‘βˆ’1
𝑠=0 π“Šπ‘–
(𝒢
̃𝑖
, 𝓏𝑠
βˆ’π‘–
), βˆ€π’Άπ‘–
∈ π’œπ‘–
. (2.1.3)
An average of the realized payoffs for player 𝑖 at stage 𝑑 can then be defined as,
𝒰
̅𝑑
𝑖
= βˆ‘ 𝓏𝑠
𝑖
π“Šπ‘–
(𝒢𝑖
, 𝓏𝑠
βˆ’π‘–
)
π‘‘βˆ’1
𝑠=0 ≔ π“Šπ‘–
(𝓏𝑑
𝑖
, 𝓏𝑑
βˆ’π‘–
), (2.1.4)
where 𝓏𝑑
βˆ’π‘–
= (𝓏𝑑
1
, … , 𝓏𝑑
π‘–βˆ’1
, 𝓏𝑑
𝑖+1
, … , 𝓏𝑑
𝐼
). For now, let the empirical frequencies, 𝓏𝑑
𝑖
(𝒢𝑖
), βˆ€π’Άπ‘–
∈
π’œπ‘–
, of player 𝑖 denote the (empirical) mixed-strategy, 𝓏𝑑
𝑖
(𝒢𝑖
) = πœ‹π‘‘
𝑖
(𝒢𝑖
) ∈ Ξ”(π’œπ‘–
), βˆ€π’Άπ‘–
∈ π’œπ‘–
,
of player 𝑖 at stage 𝑑. Consider a discrete-time process where the objective of each player is
to maximize her expected payoff based on her mixed-strategy denoted by,
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
492
maxπœ‹π“Šπ‘–
(πœ‹π‘‘
𝑖
, πœ‹π‘‘
βˆ’π‘–
) = lim
π‘‘β†’βˆž
π”Όπœ‹ [
1
𝑑
βˆ‘ 𝒰𝑠
𝑖
π‘‘βˆ’1
𝑠=0 ] = lim
π‘‘β†’βˆž
π”Όπœ‹ [𝒰
̅𝑑
𝑖
]. (2.1.5)
A player’s strategy πœŽπ‘–
∈ 𝛴𝑖
is the function πœŽπ‘–
: 𝒱
̅𝑑
𝑖
β†’ Ξ”(π’œπ‘–
) which induces the set of
probability distributions or mixed-strategies at each stage, {πœ‹π‘‘
𝑖
}𝑑>0
and 𝛴𝑖
is the set of all
possible strategies of player 𝑖. Let 𝛴 = (𝛴1
, … , 𝛴𝑖
, … , 𝛴𝐼
) be the set of all strategy profiles.
Whenever the mixed-strategies at stage 𝑑, πœ‹π‘‘, induces the same probability distributions, πœ‹π‘‘
𝑖
∈
Ξ”(π’œπ‘–
), βˆ€π’Άπ‘–
∈ π’œπ‘–
, 𝑖 ∈ ℐ, in the succeeding stages such that it maximizes the players’ payoffs
and that none of the players can obtain a performance improvement by unilaterally using
another mixed-strategy, it is called a mixed-strategy Nash equilibrium. A mixed-strategy Nash
equilibrium is formally defined as follows.
Definition 2.1.1. (Mixed-strategy Nash equilibrium). In the game 𝒒, a strategy profile
πœ‹βˆ— ∈ Ξ”(π’œπ‘–
) is a mixed-strategy Nash equilibrium if it satisfies for all 𝑖 ∈ ℐ and for all πœ‹π‘–
∈
Ξ”(π’œπ‘–
) such that
π“Šπ‘–(πœ‹βˆ—
𝑖 ,πœ‹βˆ—
βˆ’π‘–) β‰₯ π“Šπ‘–(πœ‹π‘–,πœ‹βˆ—
βˆ’π‘–). (2.1.6)
When all players assign a probability 1 to only one action, i.e. πœ‹π‘–
(𝒢𝑖
) = 1 and it satisfies the
condition above, we get a Nash equilibrium in pure strategies which we formally define below.
Definition 2.1.2. (Pure-strategy Nash equilibrium). In the game 𝒒, a strategy profile
π’Άβˆ— ∈ π’œπ‘–
is a pure-strategy Nash equilibrium if it satisfies for all 𝑖 ∈ ℐ and for all 𝒢𝑖
∈ π’œπ‘–
,
that
π“Šπ‘–(π’Άβˆ—
𝑖 ,π’Άβˆ—
βˆ’π‘–) β‰₯ π“Šπ‘–(𝒢𝑖,π’Άβˆ—
βˆ’π‘–). (2.1.7)
Nash equilibrium is one of the central solution concepts of game theory. Therefore, one
of the objectives of learning models is to study the kind of behavioral rules that lead to this
equilibrium as a consequence of the long-run, non-equilibrium process of learning.
2.2 Potential Games and Weakly Acyclic Games
We define the transportation network as a traffic game with atomic flow. A traffic game with
atomic flow was first proposed by Rosenthal (1973) and is known to be equivalent to a
(deterministic) congestion game. A congestion game is a special case of potential game
(Monderer and Shapley, 1996) where the incentive of all players to change their strategy can
be expressed using a single global function called the potential function, πœ™. For now, we define
a potential game and its generalizations. A potential game is formally defined as follows.
Definition 2.2.1. (Potential games). A finite 𝐼 βˆ’ player game with action sets {π’œπ‘–
}π‘–βˆˆβ„
and payoff functions {π“Šπ‘–
}π‘–βˆˆβ„ is a potential game if for all 𝑖 ∈ ℐ, for all π’Άβˆ’π‘–
∈ π’œβˆ’π‘–
, for all pairs
(𝒢𝑖
, 𝒢
̃𝑖
) ∈ π’œπ‘–
Γ— π’œπ‘–
and for some potential function πœ™: π’œ β†’ ℝ,
π“Šπ‘–
(𝒢𝑖,π’Άβˆ’π‘–) βˆ’ π“Šπ‘–
(𝒢
̃𝑖
,π’Άβˆ’π‘–) = πœ™(𝒢𝑖,π’Άβˆ’π‘–) βˆ’ πœ™ (𝒢
̃𝑖
,π’Άβˆ’π‘–). (2.2.1)
This means that each player’s payoff function is aligned with the potential function.
Additionally, potential games have the finite improvement property (FIP) where any best or
better-response of a player to some action profile increases the potential function and every
path in the best or better-response leads to a Nash equilibrium. The figure 2.2.1 below shows a
game of three players with two actions each, π’œπ‘–
= {0,1}, and two Nash equilibria (blue nodes).
A node represents the actions chosen by each player while the directed links represent an
improvement of a player’s payoff. The left figure shows an example of a potential game where
the nodes represent an action profile and each directed link represents an improvement path.
We define a general type of potential game where the players’ payoff function alignment
with the potential function is relaxed. It is defined as follows.
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
493
Figure 2.2.1. A potential game (left) and a weakly acyclic game (right)
Definition 2.2.2. (Generalized ordinal potential games). A finite 𝐼 βˆ’ player game with
action sets {π’œπ‘–
}π‘–βˆˆβ„
and payoff functions {π“Šπ‘–
}π‘–βˆˆβ„ is a generalized ordinal potential game if for
all 𝑖 ∈ ℐ, for all π’Άβˆ’π‘–
∈ π’œβˆ’π‘–
, for all pairs (𝒢𝑖
, 𝒢
̃𝑖
) ∈ π’œπ‘–
Γ— π’œπ‘–
and for some potential function
πœ™: π’œ β†’ ℝ,
π“Šπ‘–
(𝒢𝑖,π’Άβˆ’π‘–) βˆ’ π“Šπ‘–
(𝒢
̃𝑖
,π’Άβˆ’π‘–) > 0 ⟹ πœ™(𝒢𝑖,π’Άβˆ’π‘–) βˆ’ πœ™ (𝒢
̃𝑖
,π’Άβˆ’π‘–) > 0. (2.2.2)
A generalized ordinal potential game also has the FIP.
A less restrictive class of game which is more general than both the potential and
generalized ordinal potential game which we use in this paper is called a weakly acyclic game.
A weakly acyclic game requires only that at least one player’s payoff function is aligned with
the potential function. Before defining weakly acyclic games, we first define a better and best-
response action and strategy. This is formally defined as follows.
Definition 2.2.3. (Better-response). An action 𝒢𝑖
∈ π’œπ‘–
is a better-response of player 𝑖
to an action profile (𝒢
̃𝑖
,π’Άβˆ’π‘–) if (𝒢𝑖,π’Άβˆ’π‘–) > (𝒢
̃𝑖
,π’Άβˆ’π‘–). A mixed-strategy πœ‹π‘–
∈ Ξ”(π’œπ‘–
) is a
better-response of player i to a strategy profile (πœ‹
̃𝑖
,πœ‹βˆ’π‘–) if (πœ‹π‘–,πœ‹βˆ’π‘–) > (πœ‹
̃𝑖
,πœ‹βˆ’π‘–).
Definition 2.2.4. (Best-response). An action 𝒢𝑖
∈ π’œπ‘–
is a best-response of player 𝑖 to
an action profile π’Άβˆ’π‘–
∈ π’œβˆ’π‘–
of the other players if 𝒢𝑖
∈ argmax𝒢
Μƒπ‘–π“Šπ‘–
(𝒢
̃𝑖
, π’Άβˆ’π‘–
). A mixed-
strategy πœ‹π‘–
∈ Ξ”(π’œπ‘–
) is a best-response of player 𝑖 to a mixed-strategy profile πœ‹βˆ’π‘–
∈
Ξ”(π’œβˆ’π‘–
) of the other players if πœ‹π‘–
∈ argmaxπœ‹
Μƒπ‘–π“Šπ‘–
(πœ‹
̃𝑖
, πœ‹βˆ’π‘–
).
A best-response strategy is normally used when unperturbed payoffs with complete
information is assumed where greedy algorithms can easily be applied. On the other hand,
perturbed payoffs with incomplete information requires a better-response strategy as it relies
on player’s beliefs (which may not be accurate) about her environment which improves over
time, getting closer to or becoming equal to a best-response, as she gains experience.
We now formally define weakly acyclic games as follows.
Definition 2.2.5. (Weakly acyclic games). A finite 𝐼 βˆ’ player game with action sets
{π’œπ‘–
}π‘–βˆˆβ„
and payoff functions {π“Šπ‘–
}π‘–βˆˆβ„ is a weakly acyclic game if there exist a potential
function, πœ™: π’œ β†’ ℝ, with the following property: For any action profile 𝒢 that is not a Nash
equilibrium, βˆƒπ‘– ∈ ℐ with an action 𝒢𝑖
∈ π’œπ‘–
for all π’Άβˆ’π‘–
∈ π’œβˆ’π‘–
, for all pairs (𝒢𝑖
, 𝒢
̃𝑖
) ∈ π’œπ‘–
Γ—
π’œπ‘–
such that,
π“Šπ‘–
(𝒢𝑖,π’Άβˆ’π‘–) βˆ’ π“Šπ‘–
(𝒢
̃𝑖
,π’Άβˆ’π‘–) > 0 and πœ™(𝒢𝑖,π’Άβˆ’π‘–) βˆ’ πœ™ (𝒢
̃𝑖
,π’Άβˆ’π‘–) > 0. (2.2.3)
The right figure in figure 2.2.1 shows a weakly acyclic game. The red directed links
represent a loop where at least one of the player’s payoff function is aligned with the potential
function. Weakly acyclic games are generalizations of the Cournot adjustment process of two
firms (i.e. players). The Cournot adjustment assumes that in each period one firm chooses a
pure strategy that is a best-response to the strategy of the other firm from the previous stage.
111
011
100
110
010
000
001
101 101
111
000
010
001
011
100
110
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
494
Weakly acyclic games doesn’t necessarily have the finite improvement property as shown
above and it was originally defined for better-responses but has been recently also defined for
best-responses (Fabrikant et al., 2013).
2.3 Flows and Costs
We begin with the flow conservation equations in traffic games with atomic flow. For simplicity,
we restrict our analysis to a transportation network with a single origin-destination (OD) pair
connected by a set of paths, 𝒦 = {1, … , π‘˜, … , 𝑁}, made up of a subset of links, β„“ ∈ β„’. We
assume that for all players in the transportation network, the set of available paths are the same
and is defined to be the players’ action sets, i.e. {𝒢1
𝑖 (1),… , π’Άπ‘˜
𝑖
(π‘˜), … , 𝒢𝑁
𝑖 (𝑁)} ≔
{1, . . , π‘˜, … , 𝑁}, βˆ€π‘– ∈ ℐ . To avoid confusion, we drop the path index π‘˜ in the notation, π’Άπ‘˜
𝑖
(π‘˜),
which means that we use 𝒢𝑖
and π‘˜ interchangeably to denote an action or a path selected by
player 𝑖. Path flows are denoted by an 𝑁 βˆ’ dimensional vector β„Ž = (β„Ž(1), . . , β„Ž(π‘˜), … , β„Ž(𝑁))
where each element represents the number of players who chose the path π‘˜, β„Ž(π‘˜) = |{𝑖: 𝒢𝑖
}|.
Hence, βˆ‘ β„Ž(π‘˜) = |𝐼|
π‘˜βˆˆπ’¦ .
A visit to path π‘˜ by player 𝑖 at stage 𝑑 is expressed as,
𝓏𝑑
𝑖(π‘˜) = 𝕀{𝒢𝑑
𝑖
= π‘˜}. (2.3.1)
The aggregated path flows at an arbitrary stage 𝑑 are then defined as follows.
βˆ‘ 𝓏𝑑
𝑖(π‘˜) = 1
π’Άπ‘–βˆˆπ’œπ‘– , (2.3.2)
βˆ‘ 𝓏𝑑
𝑖(π‘˜) = β„Žπ‘‘(π‘˜)
π‘–βˆˆβ„ . (2.3.3)
Let {𝛿ℓ(π‘˜)}β„“βˆˆπ‘˜ denote elements in the link-path incidence matrix and 𝑓ℓ be the flow on
the link β„“. We can then define the link flows as,
βˆ‘ 𝛿ℓ(π‘˜)
π‘˜βˆˆπ’¦ β„Žπ‘‘(π‘˜) = 𝑓ℓ,𝑑, βˆ€β„“ ∈ β„’. (2.3.4)
We also use the following notation on link flows,
𝑓ℓ
𝑖
= βˆ‘ 𝕀{β„“ = π‘˜}
π‘˜βˆˆπ’œπ‘– , βˆ€π‘– ∈ ℐ, (2.3.5)
βˆ‘ 𝑓ℓ
𝑖
= 𝑓ℓ
π‘–βˆˆβ„ . (2.3.6)
Congestion games are a specific class of games in which players’ payoff functions have
a special structure. Let β„’ = {β„“1,β„“2, … } denote a finite set of links. For each link β„“ ∈ β„’, there is
an associated congestion or travel time function denoted by,
πœβ„“: {0,1,2, … } β†’ ℝ, (2.3.7)
which reflects the travel time for β€œusing” the link as a function of the number of players using
that link, β„“.
The link travel time is given by a real-valued, non-decreasing but not necessarily
continuous differentiable function, πœβ„“(𝑓ℓ
). The cost of a path π‘˜ ∈ π’œπ‘–
chosen by player 𝑖 at
stage 𝑑 is defined as,
𝑐𝑖(π‘˜) = βˆ‘ 𝛿ℓ(π‘˜)(𝛾𝑖
πœβ„“ + 𝐹ℓ)
β„“βˆˆβ„’ , (2.3.8)
where 𝛾𝑖
is the value of time for player 𝑖 and 𝐹ℓ is the fare imposed on the link β„“. We define
the payoff that a player receives when she chooses a path π‘˜ ∈ π’œπ‘–
as π“Šπ‘–(π‘˜) = βˆ’π‘π‘–(π‘˜). Since a
path flow is dependent on the link flows which are also dependent on the number of discrete
players, the payoff function is discontinuous.
3. STOCHASTIC CONGESTION GAMES
3.1 Travel Information and Route Choice
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
495
Following Selten et al. (2004), Miyagi and Peque (2012) introduced different classes of players
defined by the player’s knowledge about the states of the routes on a traffic network: Partially
informed-users (PIUs) and NaΓ―ve users (NUs).
PIUs are further categorized into partially informed-users with announced payoffs and
partially informed-users with anticipated payoffs. The first type of players are assumed to not
know the structural form of their payoff functions nor any information about the other players.
However, a Transportation Management Center (TMC) announces to all players, in hindsight,
the observed realized payoffs of the actions taken by all the players in the transportation
network. Additionally, payoffs of alternatives actions not taken by the players are also
announced to all players. Therefore, each player can get the realized travel times in all of the
available routes between any O-D pair. On the other hand, for the second type of players, each
player knows the structural form of her own payoff function and is capable of observing the
actions of all the other players at every stage. However, she doesn't know the structural form of
the other players’ payoff functions. Each player can estimate the expected payoffs that she
would receive by taking other actions different from the action taken at stage 𝑑 through
exploration where the actions of the other players are held constant. Furthermore, each player
believes that the other players' action selection are based on empirical frequencies. NaΓ―ve users
are more realistic in the sense that the only information available to her is the realized travel
time of the selected route on that day.
We restrict our attention to the equilibrium problem of traffic networks used by PIUs with
announced payoffs in this paper. The assumptions on the PIUs with the announced payoffs
follow the prevailing assumptions used in the traditional route choice models, however, we
assume that the travel time functions (or cost functions) are not common knowledge (it will be
similar if we assume that players’ true expected payoffs are the same). Furthermore, the PIU
with the announced payoffs can be regarded as a situation where a TMC observes traffic
volumes and vehicle average speeds on each link in the network through sensors allocated in
the system, and computes all possible paths during a specified time period for any origin-
destination pair in the network.
3.2 The Model
For now we set 𝛾𝑖
= 1, βˆ€π‘– ∈ ℐ. We define the potential function in a (deterministic) congestion
game which we are trying to minimize as,
minπœ™(β„Ž) = βˆ‘ βˆ‘ πœβ„“(π‘š)
𝑓ℓ(β„Ž)
π‘š=0
β„“βˆˆπ’Ά . (3.2.1)
The traffic game was shown by Rosenthal (1973) to have at least one pure-strategy Nash
equilibrium.
Lemma 3.2.1. (Rosenthal, 1973). A game with a strictly increasing cost function with
respect to 𝑓ℓ of the form (2.3.8) with a potential function of the form (3.2.1) possess at least
one pure-strategy Nash equilibrium.
A (deterministic) congestion game is an exact potential game. The action 𝒢𝑖
is a best-
response of player 𝑖 when,
βˆ‘ πœβ„“(𝑓ℓ
βˆ’π‘–
+ 1)
β„“βˆˆπ’Ά
̃𝑖 > βˆ‘ πœβ„“(𝑓ℓ
βˆ’π‘–
+ 1),
β„“βˆˆπ’Άπ‘– 𝒢𝑖
β‰  βˆ€π’Ά
̃𝑖
∈ π’œπ‘–
(3.2.2)
holds. The equation (3.2.2) expresses the exploration process in which each player can compare
the payoffs (or costs) among alternative routes and judge which alternative is the best. This
exploration process might be automatically accomplished by a micro-processor equipped in a
traveler’s own car. Exploration implies that player 𝑖 knows the structural form of her payoff
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
496
function and hence, the cost function and can search the action values of her alternative actions
provided that the actions of her opponents remain unchanged.
In (deterministic) congestion games, it is assumed that all players have complete and
perfect knowledge of the game. Therefore, the realized payoff is always equal to the expected
payoff which is the result of the best action (route) chosen by player 𝑖 at each stage which is
denoted as,
𝒰𝑑
𝑖
= π“Šπ‘–
(𝒢𝑑
𝑖
, 𝒢𝑑
βˆ’π‘–
). (3.2.3)
Equation (3.2.3) explicitly shows that player 𝑖 can get information about the other players’
actions, π’Άβˆ’π‘–
. To avoid drivers from making deterministic decisions, a stochastic congestion
game is used. In stochastic congestion games the payoff function is perturbed. It is assumed
that the realized payoff consists of the expected payoff π“Šπ‘–
(𝒢𝑖,π’Άβˆ’π‘–) and a player-specific
random term 𝑒𝑑
𝑖
(𝒢𝑑
𝑖
). That is,
𝒰𝑑
𝑖
= π“Šπ‘–
(𝒢𝑑
𝑖
, 𝒢𝑑
βˆ’π‘–
) + 𝑒𝑑
𝑖
(𝒢𝑑
𝑖
) (3.2.4)
where π“Šπ‘–
is the true expected payoff and 𝑒𝑑
𝑖
(𝒢𝑑
𝑖
) is a component of the player-specific and
time-dependent noise or private information, 𝑒𝑑
𝑖
= (𝑒𝑑
𝑖(1), … , 𝑒𝑑
𝑖(π‘˜), … , 𝑒𝑑
𝑖(π‘šπ‘‘)). It should be
noted that the realized payoff is defined when all the actions of players who are participants of
the game are observed. Each player believes that the action selection of the other players are
executed based on their mixed-strategies. Therefore, each player’s strategy is formulated as
follows:
𝛽
̃𝑖
(π“Šπ‘–
) = argmax
πœ‹π‘–βˆˆπ‘–π‘›π‘‘(Ξ”(π’œπ‘–
))
[βˆ‘ πœ‹π‘–
(𝒢𝑖
)π“Šπ‘–
(𝒢𝑖,πœ‹βˆ’π‘–) + πœ‡π‘–
πœ“π‘–
(πœ‹π‘–
)
π’Άπ‘–βˆˆπ’œπ‘– ], (3.2.5)
where πœ‡π‘–
> 0 is a smoothing parameter and the function πœ“π‘–
(πœ‹π‘–
) is known only to player 𝑖
and is assumed to be a smooth, strictly differentiable concave function satisfying the boundary
condition that as πœ‹π‘–
approaches the boundary of the simplex, the slope of πœ“π‘–
becomes infinite.
Fudenberg and Levine (1998) assumed the following entropy function:
πœ“π‘–
(πœ‹π‘–) = βˆ’ βˆ‘ πœ‹π‘–(𝒢𝑖) log πœ‹π‘–(𝒢𝑖)
π’Άπ‘–βˆˆπ’œπ‘– . (3.2.6)
This formulation generates the so-called smooth best response function:
𝛽
̃𝑖
(π“Šπ‘–) =
𝑒π‘₯𝑝{π“Šπ‘–(𝒢𝑖,πœ‹βˆ’π‘–)/1
πœ‡π‘–
⁄ }
βˆ‘ 𝑒π‘₯𝑝{π“Šπ‘–(𝑏
𝑖
,πœ‹βˆ’π‘–)/1
πœ‡π‘–
⁄ }
𝑏
𝑖
βˆˆπ’œπ‘–
∈ Ξ” (π’œπ‘–
). (3.2.7)
Equation (3.2.7) is a map from payoffs to choice probabilities which is the standard choice
probability function from the additive random utility model of discrete choice theory
(McFadden, 1974) where the random utility 𝑒𝑖
is distributed according to the double
exponential function. Miyagi (1983) showed a duality relation between the entropy function
and the satisfaction function (or log-sum function) of the logit model using Frenchel’s duality
theorem while Hofbauer and Sandholm (2002) used an analysis based on the Legendre
transforms. These imply that the log-sum function is the optimized function of the random
utility model and gives a potential function for a stochastic congestion game. However, the
duality holds if and only if the random utility 𝑒𝑑
𝑖
(𝒢𝑑
𝑖
) is specified by the double exponential
function.
In our PIU with announced payoffs specification, other players’ actions cannot be
observed and the distribution of the random utility is unknown. Additionally, all stage payoffs
are announced (i.e. chosen and alternative actions’ payoffs) which is denoted by,
𝒰𝑑
𝑖
= π“Šπ‘–
(𝒢𝑑
𝑖
) + 𝑒𝑑
𝑖
(𝒢𝑑
𝑖
), βˆ€π’Άπ‘–
∈ π’œπ‘–
, βˆ€π‘‘. (3.2.8)
Hence, payoffs must be estimated. A player tries to maximize her utility by choosing an action
using the Boltzmann-Gibbs action selection procedure with a vanishing temperature parameter,
πœ‡π‘‘
𝑖
, for her strategy selection given by the equation,
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
497
𝛽
̃𝑖
(𝒬𝑑
𝑖
) =
𝑒π‘₯𝑝{𝒬𝑑
𝑖
(𝒢𝑖)/1
πœ‡π‘‘
𝑖
⁄ }
βˆ‘ 𝑒π‘₯𝑝{𝒬𝑑
𝑖(𝑏
𝑖
)/1
πœ‡π‘‘
𝑖
⁄ }
π‘π‘–βˆˆπ’œπ‘–
, 𝒢𝑖 ∈ π’œπ‘–
,𝑖 ∈ ℐ, (3.2.9)
and 𝒬 βˆ’ learning given by equation (3.2.10) for her payoff estimation,
𝒬𝑑
𝑖
(𝒢𝑖) = π’¬π‘‘βˆ’1
𝑖
(𝒢𝑖) + πœ†π‘‘ (𝒰𝑑
𝑖(𝒢𝑖) βˆ’ π’¬π‘‘βˆ’1
𝑖
(𝒢𝑖)), βˆ€π’Άπ‘– ∈ π’œπ‘–
. (3.2.10)
To show the equivalency of equations 3.2.7 and 3.2.9, it is necessary for the condition,
‖𝒬𝑑
𝑖
(𝒢𝑖
) βˆ’ 𝒰𝑑
𝑖
(𝒢𝑖
, πœ‹βˆ’π‘–
)β€– β†’ 0, as 𝑑 β†’ ∞ a.s. (3.2.11)
to hold.
4. THE PARTIALLY INFORMED-USER ALGORITHM
In this section, we first introduce the generalised weakened fictitious play process and then
proceed with the actor-critic learning algorithm for the day-to-day route-choice of players in
the stochastic congestion game with partially informed-user with announced payoffs.
4.1 The Generalised Weakened Fictitious Play
The generalised weakened fictitious play (GWFP) actor-critic algorithm was proposed by
Leslie and Collins (2006) for the naΓ―ve user case where they proved that with probability 1, the
mixed-strategies follow a GWFP process. However, the generalised weakened fictitious play
was proposed as an extension of the weakened fictitious play by Van der Genugten (2000)
normally considered for games wherein players’ actions can be observed which is closely
related to the PIU with anticipated payoffs. Additionally, the GWFP process considers a
vanishing best-response perturbation as a mechanism for speeding up the convergence of
fictitious play which implies that strategies are also estimated according to a stochastic
approximation process.
Before we formally define a GWFP process, let 𝑏𝑖
(πœ‹βˆ’π‘–
) be the best-response set of
player 𝑖 to the mixed-strategy πœ‹βˆ’π‘–
and let,
π‘πœ€
𝑖
(πœ‹βˆ’π‘–
) = {πœ‹βˆ’π‘–
∈ βˆ†(π’œπ‘–
): 𝒰𝑑
𝑖
(πœ‹π‘–
, πœ‹βˆ’π‘–
) β‰₯ 𝒰𝑑
𝑖
(𝑏𝑖
(πœ‹βˆ’π‘–
), πœ‹βˆ’π‘–
) βˆ’ πœ€}. (4.1.1)
That is, the set of player 𝑖’s strategies perform not more than πœ€ worse than her best-
response. The joint πœ€ βˆ’ best-response to the mixed-strategy profile πœ‹ is defined as the set,
π‘πœ€(πœ‹) = π‘πœ€
1
(πœ‹βˆ’π‘–
), … , π‘πœ€
𝑖
(πœ‹βˆ’π‘–
), … , π‘πœ€
𝐼
(πœ‹βˆ’π‘–
). (4.1.2)
Definition 4.1.1. (GWFP process). A generalised weakened fictitious play process is any
process {πœ‹π‘‘}𝑑β‰₯1, with πœ‹π‘‘ such that,
πœ‹π‘‘+1 ∈ (1 βˆ’ 𝛼𝑑+1)πœ‹π‘‘ + 𝛼𝑑+1(π‘πœ€π‘‘
(πœ‹π‘‘) + 𝑀𝑑+1), (4.1.3)
with an 𝛼𝑑 β†’ 0 and πœ€π‘‘ β†’ 0 as 𝑑 β†’ ∞,
βˆ‘ 𝛼𝑑 = ∞
𝑑β‰₯1 ,
and {𝑀𝑑}𝑑β‰₯1 a sequence of perturbations such that, for any 𝑇 > 0,
limπ‘‘β†’βˆž supπ‘˜{β€–βˆ‘ 𝛼𝑠+1𝑀𝑠+1
π‘˜βˆ’1
𝑠=𝑑 β€–: βˆ‘ 𝛼𝑠+1 ≀ 𝑇
π‘˜βˆ’1
𝑠=𝑑 } = 0.
In other words, the current strategies are adapted towards a (possibly perturbed) joint πœ€ βˆ’
best-response. Leslie and Collins (2006) showed that allowing non-zero πœ€π‘‘ , letting 𝛼𝑑 be
chosen differently and allowing (certain) perturbations does not affect the convergence result.
Lemma 4.1.1. (Leslie and Collins, 2006). The set of limit points of a generalised weakened
fictitious play process is a connected-internally chain-recurrent set of the best response
differential inclusion.
And subsequently presented the following result.
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
498
Lemma 4.1.2. (Leslie and Collins, 2006). Any generalised weakened fictitious play
process will converge to the set of Nash equilibria in potential games.
4.2 The Boltzmann-Gibbs Actor-critic Algorithm
Actor-critic algorithms are normally used in cases where players only obtain payoffs for the
actions they have chosen which is our definition of the naΓ―ve user case. However, due to the
complex nature of the dynamic traffic assignment simulation of PIUs with announced payoffs
we consider, we use it to estimate both the strategies and payoffs of each player.
Definition 4.2.1. (Boltzmann-Gibbs actor-critic algorithm). A Boltzmann-Gibbs actor-
critic algorithm is a process {πœ‹π‘‘, 𝒬𝑑} such that,
{
πœ‹π‘‘
𝑖
(𝒢𝑖
) = (1 βˆ’ 𝛼𝑑)πœ‹π‘‘βˆ’1
𝑖
+ 𝛼𝑑𝛽𝑑
𝑖
(𝒢𝑖
)
𝒬𝑑
𝑖
(𝒢𝑖
) = π’¬π‘‘βˆ’1
𝑖
(𝒢𝑖
) + πœ†π‘‘ (𝒰𝑑
𝑖
(𝒢𝑖
) βˆ’ π’¬π‘‘βˆ’1
𝑖
(𝒢𝑖
))
, βˆ€π’Άπ‘–
∈ π’œπ‘–
, βˆ€π‘– ∈ ℐ, (4.2.1)
where 𝛽𝑑
𝑖
(𝒬𝑑
𝑖
) is defined by the equation (3.2.9) and the temperature parameter is updated
according to the regret-based updating scheme,
πœ‡π‘‘
𝑖
= πœ‡π‘‘βˆ’1
𝑖
+
1
𝑑
(max([𝑅]+
, 0) βˆ’ πœ‡π‘‘βˆ’1
𝑖
), [𝑅]+
= maxπ‘˜βˆˆπ’œπ‘–(𝒬𝑑
𝑖(π‘˜) βˆ’ 𝒰
̅𝑑
𝑖
) > 0. (4.2.2)
We present our result without proof.
Proposition 4.2.2. Suppose that {πœ‹π‘‘, 𝒬𝑑} is a Boltzmann-Gibbs actor-critic process for
which,
1. ) 𝛼𝑑 = (𝐢𝛼 + 𝑑)βˆ’πœŒπ›Ό where 𝐢𝛼 > 0 and πœŒπ›Ό ∈ ]0.5,1], (4.2.3)
2. ) πœ†π‘‘ = (πΆπœ† + 𝑑)βˆ’πœŒπœ† where πΆπœ† > 0 and πœŒπœ† ∈ ]0.5, πœŒπ›Ό[, (4.2.4)
3. ) πœ‡π‘‘
𝑖
is calculated using equation (4.2.2).
Then with probability 1, the πœ‹π‘‘ follow a generalised weakened fictitious play process.
The regret-based temperature parameter updating scheme is used since it reduces the
exogenous variables unknown to the model. A player’s regret is directly connected to her
strategy selection and payoffs in which an improving action selection policy should be
dependent upon which is more logical as compared to the difference of the maximum and
minimum estimates used by Leslie and Collins (2006) with the exogenous variable πœŒπœ‹,
πœ‡π‘‘
𝑖
=
max
π‘˜βˆˆπ’œπ‘–π’¬π‘‘
𝑖(π‘˜)βˆ’min
π‘˜βˆˆπ’œπ‘–π’¬π‘‘
𝑖(π‘˜)
πœŒπœ‹ log𝑑
. (4.2.5)
Additionally, since we are dealing with PIU with announced payoffs, the action counts in the
payoff learning rates of the players used in Leslie and Collins (2006),
πœ†π‘‘ = (πΆπœ† + #𝑑
𝑖
(𝒢𝑑
𝑖
))
βˆ’πœŒπœ†
, #𝑑
𝑖
(𝒢𝑑
𝑖
) = βˆ‘ 𝕀{𝒢𝑑
𝑖
= 𝒢𝑖
}
𝑑 , (4.2.6)
are replaced by just the iteration times since the action counts acted as some sort of unbiased
estimator (Leslie and Collins, 2005) caused by the infrequent updates of action values with low
probabilities which can be viewed as a player’s way of compensating for the fact that actions
played infrequently do not receive updates of their values, so when they are played, any reward
prediction error must have greater influence on the value than if frequent updates occur.
However, in the PIU with announced payoffs scenario, all action values are updated at each
stage, thus, there is no need for an estimator.
Using the result by Singh et al. (2000) and Leslie and Collins (2006), the goal is to show
that ‖𝒬𝑑
𝑖
βˆ’ 𝒰𝑑
𝑖
(πœ‹π‘–
)β€– β†’ 0 and πœ‡π‘‘
𝑖
β†’ 0 as 𝑑 β†’ ∞. We can rewrite equation (4.2.2) as,
πœ‡π‘‘
𝑖
βˆ’ πœ‡π‘‘βˆ’1
𝑖
(1 βˆ’
1
𝑑
) =
max[max
π‘˜βˆˆπ’œπ‘–(𝒬𝑑
𝑖(π‘˜)βˆ’π’°
̅𝑑
𝑖
),0]
𝑑
. (4.2.7)
The second term goes to zero almost surely as 𝑑 β†’ ∞ if ‖𝒬𝑑
𝑖
βˆ’ 𝒰𝑑
𝑖
(πœ‹π‘–
)β€– β†’ 0 which makes
πœ‡π‘‘
𝑖
β†’ 0. So we only need to show that ‖𝒬𝑑
𝑖
βˆ’ 𝒰𝑑
𝑖
(πœ‹π‘–
)β€– β†’ 0 almost surely as 𝑑 β†’ ∞ which we
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
499
show through our simulation result.
5. SIMULATION RESULTS
We present a simulation of a transportation network shown in figure 5.1. We assume that each
player has the same set of actions, π’œπ‘–
= {1,2,3}, βˆ€π‘– ∈ ℐ or set of available routes. We assigned
1000 players to the network composed of a single origin-destination (OD) pair with 3 routes,
i.e., 𝐼 = 1000. The flow conservation is described by the equation 𝐼 = βˆ‘ β„Ž(π‘˜)
π‘˜βˆˆπ’¦ , 𝒦 =
{1,2,3}.
Figure 5.1. The test network
Table 5.1. Link segment settings
Link segment Length Maximum allowed speed Number of lanes
1 500 meters 13.89 meter per second 2
2 1005 meters 13.89 meter per second 1
3 1005 meters 13.89 meter per second 2
4 1005 meters 13.89 meter per second 2
5 1005 meters 13.89 meter per second 1
6 200 meters 13.89 meter per second 1
7 500 meters 13.89 meter per second 2
The simulation-based dynamic traffic assignment is carried out using the Simulation of
Urban MObility (SUMO) software. SUMO is a free and open traffic simulation suite which has
been available since 2001. SUMO allows modelling of intermodal traffic systems including
road vehicles, public transport and pedestrians.
In the simulation, players use the equations (4.2.1)-(4.2.2) to update their route choices
and payoff estimates. There are 3600 simulation seconds per iteration under 1000 iterations
where players have a player-specific, Poisson-distributed, dynamic departure time. We assume
that speed, flow and density are collected by sensors positioned all throughout the links. Travel
1
2 3
4 5
6
7
1 lane
2 lanes
route 1
route 2
route 3
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
500
times on all routes are announced to all the players in the network where for an unused route,
the free-flow travel time is announced.
The intersection is made up of links 4 and 6 are priority-based where in link 4 is the main
priority. This means that vehicles traversing link 6 will wait for a gap in link 4 before they can
enter link 5. This also occurs at the intersection between links 3 and 5 where link 3 is the priority.
The legal speed limit on each link is 13.89 m/s. We assume that all vehicles accelerate at 0.8
m/s and decelerate at 4.5 m/s. The maximum speed of a vehicle is assumed to be 70 m/s
(achievable speed of the vehicle’s engine). Each player has an imperfection coefficient (sigma)
which is a braking probability that we set to 0.05. To ensure variable vehicle speeds at each
time step, we set the speed deviation parameter to 0.1 which results in a speed distribution
where 95% of the vehicles drive between 80% and 120% of the legal speed limit.
Figure 5.2. Fundamental diagram of the first iteration
Figure 5.2 shows the relationships of the speed, flow and density which compose the
fundamental diagram of traffic flow used to predict the capability of a road system or its
behavior when applying inflow regulation or speed limits. The upper-right figure shows the
speed-density relationship with a negative linear slope which means that as the density increases,
the speed on the link decreases. The line that crosses the speed axis is at the free flow speed
while the line that crosses the density axis is at jam density. The figure shows that the speed
approaches free flow speed as the density approaches zero. As the density increases, the speed
of the vehicles on the links decreases and it reaches zero when the density equals the jam density.
However, link 3 has a positive slope because the road segment comes from a single lane (link
2) which then transfers to a two lane road segment that distributes the incoming vehicles to each
of the lanes and thus, doesn’t cause as much congestion as compared to the other links in the
network.
The flow-density relationship in the lower-right of figure 5.2 follows a triangular shaped
curve which is approximated by a parabolic curve. However, this is inverted in the density axis.
Normally, the flow-density graph is represented by two vectors representing the free flow
velocity (negative slope in the figure) and the congested branch (positive slope in the figure).
The congested branch implies that even though there are more vehicles on the road, the number
of vehicles passing a single point is less than if there were fewer vehicles on the road. Flow on
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
501
the links 3 and 7 are almost unaffected by the increase in density for two reasons, 1.) route 1,
which link 3 belongs to, is the priority route in the intersection where links 3 and 5 intersect,
this means that vehicles using link 3 doesn’t stop to allow vehicles on link 5 and 2.) the flow of
vehicles come from link 2, a single lane link, transferring to link 3, a dual lane link.
The speed-flow diagram on the upper-left of figure 5.2 is used to determine the speed at
which maximum flow occurs which consists of the free flow and congested branches. There is
currently no function that approximates it, however, the linear approximations (looking from
left to right) show that the average speed decreases as the average flow decreases implying that
this is in the congested branch of the speed-flow diagram. Additionally, the approximation on
links 3 and 7 show that these links are almost at optimum flows for the same reasons stated
above.
Figure 5.3. Fundamental diagram of the last iteration
Comparing the fundamental diagrams of figure 5.2 and figure 5.3 dramatically shows that
the players have learned to avoid long travel times. Link 1 is slightly congested due to the fact
that vehicles can change lanes and are inserted into the network randomly between the two
lanes. This means that when a vehicle who is set to use route 3 is inserted in the upper lane this
vehicle needs to wait for a gap in the lower lane to be able to go to link 4. This random insertion
causes a slight congestion on this link. Looking at the speed-flow diagram on figure 5.3, link 5
has a negative slope which isn’t caused by a congestion on link 5 but a long waiting time due
to the priorities in the merging links. Link 5 belongs to route 3 which has a lesser priority
compared to link 3 which belongs to route 1. This causes vehicles using link 5 to wait for a gap
in order to move to link 7. Lastly, no vehicle uses link 6, which belongs to route 2, as this has
a very high travel time. When a vehicle uses route 2, there are two intersections where this
vehicle needs to wait for a gap in order to move to the next link caused by lesser link priorities.
In the intersection between links 4 and 6 in which link 4 has a higher priority, and again in the
intersection between link 3 and 5 in which link 3 has a higher priority.
Figure 5.4 below shows that vehicles immediately realize that route 1 is the best route
choice since link 3 is the priority in the link 3 and 5 intersection. However, the fluctuations that
can be observed in the mean route travel time figure in the middle is caused by vehicles from
route 1 changing to routes 2 or 3. Vehicles who have faster speeds are limited by the speed of
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
502
the vehicles ahead of them in link 2 making the travel time on this specific iteration higher for
this route. Therefore, probabilities for this route decrease in the next iteration.
Figure 5.4. Link and route information
In figure 5.5 below, strategy and payoff learning parameters, 𝛼 and πœ†, respectively, are
shown to be slowly decreasing to zero as time progresses as required by our result. The
temperature parameter, πœ‡ , appears to be decreasing to zero which is required to show
convergence to a generalised weakened fictitious play process. More importantly, it shows that
the temperature parameter is player-specific which implies that players are learning and
updating their strategies independent from each other, validating the multi-agent model and that
as this parameter decreases, the probability of choosing the best action increases. This validates
the actor-critic algorithm as a learning model where players’ strategy selection is improving
due to
perience.
The top figure in figure 5.6 below shows the averaged route probabilities of the selected
routes of all players. This figure is almost similar to the route counts figure (bottom of figure
5.4) because these represents the choice distributions of the players. The only difference is that
these are the β€˜real’ route probabilities of the players for selecting the action (i.e. mixed
strategies) which makes it slightly lower than the route counts figure. If these where based on
pure-strategies (probability 1), it would be exactly similar to the route counts figure. The middle
and bottom figures show that as time progresses, the distance between the estimated payoffs (-
239.5752278) and average payoff (-239.7410205) is approaching zero which is necessary to
show its equivalency to the case wherein players can actually observe the other players’ actions
and as a requirement to show convergence to a generalised weakened fictitious play process.
Significantly, it can be observed that even though the information that the players receive are
not very accurate (mean route travel time shown in the middle figure in figure 5.4), the estimates
of the players’ payoffs and strategies still converge. Furthermore, the simulation has been
carried out more than 50 times and we get the same consistent result (approximately -239.5)
within a reasonable iteration (i.e. after only 200 iterations with 1000 players).
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
503
Figure 5.5. Learning parameters
Figure 5.6. Link and route information
6. CONCLUSION
This paper further developed the stochastic congestion game model proposed by Miyagi and
Peque (2012) by applying it to a simulation-based dynamic traffic assignment simulation with
PIU with announced payoffs. Our motivation is the application of such a model to a
transportation network where a Traffic Management Center (TMC) is present which announces
travel times to all drivers. This scenario is typical in a transportation network where Intelligent
Transportation Systems (ITS) are utilized. Examples of such a scenario is the availability of the
Vehicle Information and Communication System (VICS) technology of Japan or the Traffic
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
504
Message Channel (TMC) technology of Europe.
The motivation for proposing the game theoretical model was the lack of behavioral
realism inherent in the traditional equilibrium models such as the UE and SUE. This lead to the
discretization of demand where users are treated as individual decision-makers making it a
multi-agent model. Since Wardrop equilibrium is applicable only to the case where players are
non-atomic, Nash equilibrium is used which preserves its mathematical interpretation.
The authors (Miyagi and Peque, 2012) have shown that Nash equilibrium can be achieved
in two of the three classes of players they have defined, namely, the PIU with anticipated
payoffs (Miyagi and Peque, 2012) and naΓ―ve users (Miyagi et al., 2013). However, the results
were only shown for the static traffic assignment setting in both papers. Hence, to validate their
model, we tackled the case where players are PIU with announced payoffs under a simulation-
based dynamic traffic assignment simulation. Moreover, players have a player-specific,
Poisson-distributed, dynamic departure time making the problem highly complex. To solve this,
players in the transportation network are learning and updating their strategy and payoff
estimates using an actor-critic algorithm proposed by Leslie and Collins (2006) which we
slightly modified to fit the scenario. Regardless, we obtain the expected results they have
presented which consequently validates the efficacy of the game theoretical model. As an
additional consequence, we are able to analyze the evolution of the players’ route choices and
behaviors by learning how to use the transportation network. Finally, the simulation shows that
even when the players receive information with noise, convergence to Nash equilibrium is
achieved almost surely within a reasonable iteration interval.
Although the current simulation results are restrictive, these are significant. It showed that
the multi-agent model is capable of including player-specific attributes in the traffic simulation
and route choice model, simultaneously and it is likely to have the global convergence property
inherent in the usual traffic environment.
Our next step is to apply our methods to a more sophisticated network (i.e. a larger
network, a network with traffic lights and loop detectors, etc.). Furthermore, we are interested
in extending the simulation to the naΓ―ve user setting. The naΓ―ve user setting closely resembles
the assumptions currently used in micro-traffic simulation models and is a more realistic and
plausible model of current transportation networks.
ACKNOWLEDGEMENT
This research is supported by MEXT Grants-in-Aid for Scientific Research, No. 26420511, for
the term 2014-2016.
REFERENCES
Borkar, V. (2008) Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge
University Press.
Chapman, A., Leslie, D., Rogers, A., Jennings, N. (2013) Convergent Learning Algorithms for
Unknown Reward Games. SIAM Journal on Control and Optimization 2013 51:4, 3154-3180.
Cominetti, R., Melo, E., Sorin, S. (2010) A payoff-based learning procedure and its application
to traffic games. Games and Economic Behavior, 70, pp.71-83.
Daganzo, C., Sheffi, Y. (1977) On stochastic models of traffic assignment. Transpn. Sci. 11,
253-274.
Fabrikant, A., Jaggard, A., Schapira, M. (2013) On the Structure of Weakly Acyclic Games.
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
505
Theory of Computing Systems 53, 107-122.
Fudenberg, D., Levine, D. (1998) The Theory of Learning in Games. The MIT Press, Cambridge,
MA, USA.
Hart, S., Mas-Colell, A. (2000) A simple adaptive procedure leading to a correlated equilibrium.
Econometrica, 68:1127-1150.
Hofbauer, J., Sandholm, W. (2002) On the global convergence of stochastic fictitious play.
Econometrica 70, 2265-2294.
Leslie, D., Collins E. (2003) Convergent multiple-timescales reinforcement learning algorithm
in normal form games. Ann., Appl. Probab., 13, pp. 1231-1251.
Leslie, D., Collins E. (2005) Individual Q-learning in normal form games. SIAM J. Control
Optim, 44(2), pp. 495-514.
Leslie, D., Collins E. (2006) Generalised weakened fictitious play. Games and Economic
Behavior, 56:285–298.
Marden, J., Young, P., Arslan, G., Shamma, J. (2009) Payoff-based dynamics for multiplayer
weakly acyclic games. SIAM J. Control and Optimization, 48(1).
McFadden, D. (1974) Conditional logit analysis of qualitative choice-behavior. In Zarembka P,
(Ed.) Frontiers in econometrics, Academic Press, New York.
Miyagi, T. (1983) Dual approach to the modal equilibrium problem. Technical Report, N0.83-
TE-MT3-8, Dept. of Civil Engineering, Gifu University.
Miyagi, T. (2005) Stochastic fictitious play, reinforcement learning and the user equilibrium in
transportation networks. A paper presented at the IVth meeting on "Mathematics in
Transport", University College London.
Miyagi, T., Peque, G. (2012) Informed user algorithm that converge to a pure Nash equilibrium
in traffic games. Procedia- Social and Behavioral Sciences, Volume 54, 4 October, pp. 438–
449.
Miyagi, T., Peque, G., Fukumoto, J. (2013) Adaptive Learning Algorithms for Traffic Games
with Naive Users. Procedia - Social and Behavioral Sciences, Volume 80, 7 June, Pages 806-
817.
Miyagi, T., Ohno, E., Morisugi, H. (1991) A fixed point algorithm for solving the traffic
equilibria. Studies of Regional Sciences, No.21,pp. 229-246.
Monderer, D., Shapley, L. (1996) Potential games. Games and Economic Behavior, 14:124–
143.
Nagel, K., Flotterod, G. (2012) Agent-based traffic assignment: going from trips to behavioral
travelers. in R. Pendyala and C. Bhat (eds), Travel Behaviour Research in an Evolving World,
Emerald Group Publishing, Bingley, UK, pp. 261-293.
Nonoyama, H., Miyagi, T. (1982) A fixed point approach to the supply-demand equilibrium
problem in traffic network. Proc. of Infrastructure Planning.
Robbins, H., Monro, S. (1951) A Stochastic Approximation Method. The Annals of
Mathematical Statistics 22 (3): 400.
Rosenthal, R. (1973) A class of games possessing pure-strategy Nash equilibria. International
Journal of Game Theory 2: 65–67.
Selten, R., Schreckenberg, M., Chmura, T., Pitz, T., Kube, S., Hafstein, S., Chrobok, R.,
Pottmeier, A., Wahle, J. (2004) Experimental investigation of day-to-day route-choice
behaviour and network simulations of autobahn traffic in North Rhine-Westphalia. In:
Schreckenberg A, Selten R, editors, Human Behaviour and Traffic Networks. Springer,
Berlin Heidelberg, pp. 1-21.
Singh, S., Jaakola, T., Littman, M., Szepesvari, C. (2000) Convergence results for single-step
on-policy reinforcement-learning algorithms. Machine Learning 38, 287-308.
Tadelis, S. (2012) Game Theory: An Introduction. Economics Books, Princeton University
Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015
506
Press, edition 1, volume 1, number 10001.
Van der Genugten, B. (2000) Aweakened form of fictitious play in two-person zero-sum games.
Int. Game Theory Rev. 2, 307-328.
Wardrop, J. (1952) Some theoretical aspects of road traffic research. In Proceedings of the
Institute of Civil Engineers, Part II, pp. 325-378.

More Related Content

Similar to A New Perspective Of Traffic Assignment A Game Theoretical Approach

MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIES
MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIESMEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIES
MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIEScscpconf
Β 
Measuring similarity between mobility models and real world motion trajectories
Measuring similarity between mobility models and real world motion trajectoriesMeasuring similarity between mobility models and real world motion trajectories
Measuring similarity between mobility models and real world motion trajectoriescsandit
Β 
A New Schedule-Based Transit Assignment Model With Travel Strategies And Supp...
A New Schedule-Based Transit Assignment Model With Travel Strategies And Supp...A New Schedule-Based Transit Assignment Model With Travel Strategies And Supp...
A New Schedule-Based Transit Assignment Model With Travel Strategies And Supp...Susan Campos
Β 
A mobility model of theme park visitors ieee trans on mobile computing 2015
A mobility model of theme park visitors   ieee trans on mobile computing 2015A mobility model of theme park visitors   ieee trans on mobile computing 2015
A mobility model of theme park visitors ieee trans on mobile computing 2015Ngoc Thanh Dinh
Β 
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...IRJET Journal
Β 
IRJET- Fuzzy Logic based Route Choice Behaviour Modelling
IRJET-  	  Fuzzy Logic based Route Choice Behaviour ModellingIRJET-  	  Fuzzy Logic based Route Choice Behaviour Modelling
IRJET- Fuzzy Logic based Route Choice Behaviour ModellingIRJET Journal
Β 
1. Why do Muslims oppose icons and depictions of people How is.docx
1. Why do Muslims oppose icons and depictions of people  How is.docx1. Why do Muslims oppose icons and depictions of people  How is.docx
1. Why do Muslims oppose icons and depictions of people How is.docxjackiewalcutt
Β 
Persuasive Technologies for Sustainable Urban Mobility
Persuasive Technologies for Sustainable Urban MobilityPersuasive Technologies for Sustainable Urban Mobility
Persuasive Technologies for Sustainable Urban MobilityGreenapps&web
Β 
Online Bus Arrival Time Prediction Using Hybrid Neural Network and Kalman fil...
Online Bus Arrival Time Prediction Using Hybrid Neural Network and Kalman fil...Online Bus Arrival Time Prediction Using Hybrid Neural Network and Kalman fil...
Online Bus Arrival Time Prediction Using Hybrid Neural Network and Kalman fil...IJMER
Β 
Identification of Attributes Affecting Mode Choice Modal for Bus Rapid Transi...
Identification of Attributes Affecting Mode Choice Modal for Bus Rapid Transi...Identification of Attributes Affecting Mode Choice Modal for Bus Rapid Transi...
Identification of Attributes Affecting Mode Choice Modal for Bus Rapid Transi...IRJET Journal
Β 
A Simulation-Based Dynamic Traffic Assignment Model With Combined Modes
A Simulation-Based Dynamic Traffic Assignment Model With Combined ModesA Simulation-Based Dynamic Traffic Assignment Model With Combined Modes
A Simulation-Based Dynamic Traffic Assignment Model With Combined ModesAllison Koehn
Β 
MODE CHOICE ANALYSIS BETWEEN ONLINE RIDE-HAILING AND PARATRANSIT IN BANJARMAS...
MODE CHOICE ANALYSIS BETWEEN ONLINE RIDE-HAILING AND PARATRANSIT IN BANJARMAS...MODE CHOICE ANALYSIS BETWEEN ONLINE RIDE-HAILING AND PARATRANSIT IN BANJARMAS...
MODE CHOICE ANALYSIS BETWEEN ONLINE RIDE-HAILING AND PARATRANSIT IN BANJARMAS...IAEME Publication
Β 
Pedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving IIPedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving IIYu Huang
Β 
The Prediction of Optimal Route of City Transportation Based on Passenger Occ...
The Prediction of Optimal Route of City Transportation Based on Passenger Occ...The Prediction of Optimal Route of City Transportation Based on Passenger Occ...
The Prediction of Optimal Route of City Transportation Based on Passenger Occ...TELKOMNIKA JOURNAL
Β 
Mobility models for delay tolerant network a survey
Mobility models for delay tolerant network a surveyMobility models for delay tolerant network a survey
Mobility models for delay tolerant network a surveyijwmn
Β 
6. Pedestrian and multimodal route guidance adaptation for elderly citizens
6. Pedestrian and multimodal route guidance adaptation for elderly citizens6. Pedestrian and multimodal route guidance adaptation for elderly citizens
6. Pedestrian and multimodal route guidance adaptation for elderly citizensAEGIS-ACCESSIBLE Projects
Β 
Improving transport in Malta using GIS and LBS
Improving transport in Malta using GIS and LBSImproving transport in Malta using GIS and LBS
Improving transport in Malta using GIS and LBSMatthew Pulis
Β 
Adaptive traffic lights based on traffic flow prediction using machine learni...
Adaptive traffic lights based on traffic flow prediction using machine learni...Adaptive traffic lights based on traffic flow prediction using machine learni...
Adaptive traffic lights based on traffic flow prediction using machine learni...IJECEIAES
Β 

Similar to A New Perspective Of Traffic Assignment A Game Theoretical Approach (20)

MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIES
MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIESMEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIES
MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIES
Β 
Measuring similarity between mobility models and real world motion trajectories
Measuring similarity between mobility models and real world motion trajectoriesMeasuring similarity between mobility models and real world motion trajectories
Measuring similarity between mobility models and real world motion trajectories
Β 
A New Schedule-Based Transit Assignment Model With Travel Strategies And Supp...
A New Schedule-Based Transit Assignment Model With Travel Strategies And Supp...A New Schedule-Based Transit Assignment Model With Travel Strategies And Supp...
A New Schedule-Based Transit Assignment Model With Travel Strategies And Supp...
Β 
A mobility model of theme park visitors ieee trans on mobile computing 2015
A mobility model of theme park visitors   ieee trans on mobile computing 2015A mobility model of theme park visitors   ieee trans on mobile computing 2015
A mobility model of theme park visitors ieee trans on mobile computing 2015
Β 
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
Β 
IRJET- Fuzzy Logic based Route Choice Behaviour Modelling
IRJET-  	  Fuzzy Logic based Route Choice Behaviour ModellingIRJET-  	  Fuzzy Logic based Route Choice Behaviour Modelling
IRJET- Fuzzy Logic based Route Choice Behaviour Modelling
Β 
capitulos 4
capitulos 4capitulos 4
capitulos 4
Β 
1. Why do Muslims oppose icons and depictions of people How is.docx
1. Why do Muslims oppose icons and depictions of people  How is.docx1. Why do Muslims oppose icons and depictions of people  How is.docx
1. Why do Muslims oppose icons and depictions of people How is.docx
Β 
Adamu muhammad isah
Adamu muhammad isahAdamu muhammad isah
Adamu muhammad isah
Β 
Persuasive Technologies for Sustainable Urban Mobility
Persuasive Technologies for Sustainable Urban MobilityPersuasive Technologies for Sustainable Urban Mobility
Persuasive Technologies for Sustainable Urban Mobility
Β 
Online Bus Arrival Time Prediction Using Hybrid Neural Network and Kalman fil...
Online Bus Arrival Time Prediction Using Hybrid Neural Network and Kalman fil...Online Bus Arrival Time Prediction Using Hybrid Neural Network and Kalman fil...
Online Bus Arrival Time Prediction Using Hybrid Neural Network and Kalman fil...
Β 
Identification of Attributes Affecting Mode Choice Modal for Bus Rapid Transi...
Identification of Attributes Affecting Mode Choice Modal for Bus Rapid Transi...Identification of Attributes Affecting Mode Choice Modal for Bus Rapid Transi...
Identification of Attributes Affecting Mode Choice Modal for Bus Rapid Transi...
Β 
A Simulation-Based Dynamic Traffic Assignment Model With Combined Modes
A Simulation-Based Dynamic Traffic Assignment Model With Combined ModesA Simulation-Based Dynamic Traffic Assignment Model With Combined Modes
A Simulation-Based Dynamic Traffic Assignment Model With Combined Modes
Β 
MODE CHOICE ANALYSIS BETWEEN ONLINE RIDE-HAILING AND PARATRANSIT IN BANJARMAS...
MODE CHOICE ANALYSIS BETWEEN ONLINE RIDE-HAILING AND PARATRANSIT IN BANJARMAS...MODE CHOICE ANALYSIS BETWEEN ONLINE RIDE-HAILING AND PARATRANSIT IN BANJARMAS...
MODE CHOICE ANALYSIS BETWEEN ONLINE RIDE-HAILING AND PARATRANSIT IN BANJARMAS...
Β 
Pedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving IIPedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving II
Β 
The Prediction of Optimal Route of City Transportation Based on Passenger Occ...
The Prediction of Optimal Route of City Transportation Based on Passenger Occ...The Prediction of Optimal Route of City Transportation Based on Passenger Occ...
The Prediction of Optimal Route of City Transportation Based on Passenger Occ...
Β 
Mobility models for delay tolerant network a survey
Mobility models for delay tolerant network a surveyMobility models for delay tolerant network a survey
Mobility models for delay tolerant network a survey
Β 
6. Pedestrian and multimodal route guidance adaptation for elderly citizens
6. Pedestrian and multimodal route guidance adaptation for elderly citizens6. Pedestrian and multimodal route guidance adaptation for elderly citizens
6. Pedestrian and multimodal route guidance adaptation for elderly citizens
Β 
Improving transport in Malta using GIS and LBS
Improving transport in Malta using GIS and LBSImproving transport in Malta using GIS and LBS
Improving transport in Malta using GIS and LBS
Β 
Adaptive traffic lights based on traffic flow prediction using machine learni...
Adaptive traffic lights based on traffic flow prediction using machine learni...Adaptive traffic lights based on traffic flow prediction using machine learni...
Adaptive traffic lights based on traffic flow prediction using machine learni...
Β 

More from Anita Miller

TIME Transitions Words List In English - English Gram
TIME Transitions Words List In English - English GramTIME Transitions Words List In English - English Gram
TIME Transitions Words List In English - English GramAnita Miller
Β 
Free Santa Letter Notepaper - KiddyCharts Free Resources
Free Santa Letter Notepaper - KiddyCharts Free ResourcesFree Santa Letter Notepaper - KiddyCharts Free Resources
Free Santa Letter Notepaper - KiddyCharts Free ResourcesAnita Miller
Β 
The Importance Of An Honest Essay Writing Company Revi
The Importance Of An Honest Essay Writing Company ReviThe Importance Of An Honest Essay Writing Company Revi
The Importance Of An Honest Essay Writing Company ReviAnita Miller
Β 
How To Avoid WriterS Block. Online assignment writing service.
How To Avoid WriterS Block. Online assignment writing service.How To Avoid WriterS Block. Online assignment writing service.
How To Avoid WriterS Block. Online assignment writing service.Anita Miller
Β 
Essay Websites Best Research Paper. Online assignment writing service.
Essay Websites Best Research Paper. Online assignment writing service.Essay Websites Best Research Paper. Online assignment writing service.
Essay Websites Best Research Paper. Online assignment writing service.Anita Miller
Β 
Everything You Need To Know About Writing A Diag
Everything You Need To Know About Writing A DiagEverything You Need To Know About Writing A Diag
Everything You Need To Know About Writing A DiagAnita Miller
Β 
Tutorial 2 - Example Essay - TUTORIAL 2 EXAMPLE E
Tutorial 2 - Example Essay - TUTORIAL 2 EXAMPLE ETutorial 2 - Example Essay - TUTORIAL 2 EXAMPLE E
Tutorial 2 - Example Essay - TUTORIAL 2 EXAMPLE EAnita Miller
Β 
ANALYTIC CHECKLIST FOR ESSAYS - ESL Works
ANALYTIC CHECKLIST FOR ESSAYS - ESL WorksANALYTIC CHECKLIST FOR ESSAYS - ESL Works
ANALYTIC CHECKLIST FOR ESSAYS - ESL WorksAnita Miller
Β 
Seminar Paper Writing Guidelines. Online assignment writing service.
Seminar Paper Writing Guidelines. Online assignment writing service.Seminar Paper Writing Guidelines. Online assignment writing service.
Seminar Paper Writing Guidelines. Online assignment writing service.Anita Miller
Β 
Old Writing On Grunge Black Paper Texture With Water Stains (Paper
Old Writing On Grunge Black Paper Texture With Water Stains (PaperOld Writing On Grunge Black Paper Texture With Water Stains (Paper
Old Writing On Grunge Black Paper Texture With Water Stains (PaperAnita Miller
Β 
Opinion Editorial Essay Example. Online assignment writing service.
Opinion Editorial Essay Example. Online assignment writing service.Opinion Editorial Essay Example. Online assignment writing service.
Opinion Editorial Essay Example. Online assignment writing service.Anita Miller
Β 
Writing Paper Printable Kids Learning Activity Writing
Writing Paper Printable  Kids Learning Activity  WritingWriting Paper Printable  Kids Learning Activity  Writing
Writing Paper Printable Kids Learning Activity WritingAnita Miller
Β 
Thank You Writing Template By Giraffe And Zebra Creat
Thank You Writing Template By Giraffe And Zebra CreatThank You Writing Template By Giraffe And Zebra Creat
Thank You Writing Template By Giraffe And Zebra CreatAnita Miller
Β 
Write Esse Free Editing Websites For Writing
Write Esse Free Editing Websites For WritingWrite Esse Free Editing Websites For Writing
Write Esse Free Editing Websites For WritingAnita Miller
Β 
Conclusion - How To Write An Essay - LibGuides At University
Conclusion - How To Write An Essay - LibGuides At UniversityConclusion - How To Write An Essay - LibGuides At University
Conclusion - How To Write An Essay - LibGuides At UniversityAnita Miller
Β 
Motivation Essay Example - PHDessay.Com. Online assignment writing service.
Motivation Essay Example - PHDessay.Com. Online assignment writing service.Motivation Essay Example - PHDessay.Com. Online assignment writing service.
Motivation Essay Example - PHDessay.Com. Online assignment writing service.Anita Miller
Β 
How To Write A Summary Step-By-Step Guide With E
How To Write A Summary Step-By-Step Guide With EHow To Write A Summary Step-By-Step Guide With E
How To Write A Summary Step-By-Step Guide With EAnita Miller
Β 
MLA Handbook For Writers Of Research Papers MLA 97
MLA Handbook For Writers Of Research Papers MLA 97MLA Handbook For Writers Of Research Papers MLA 97
MLA Handbook For Writers Of Research Papers MLA 97Anita Miller
Β 
Writing An Evaluation Essay. How Do You Write A Evaluation Essay
Writing An Evaluation Essay. How Do You Write A Evaluation EssayWriting An Evaluation Essay. How Do You Write A Evaluation Essay
Writing An Evaluation Essay. How Do You Write A Evaluation EssayAnita Miller
Β 
17 Best Images About Printable Lined Writing Pa
17 Best Images About Printable Lined Writing Pa17 Best Images About Printable Lined Writing Pa
17 Best Images About Printable Lined Writing PaAnita Miller
Β 

More from Anita Miller (20)

TIME Transitions Words List In English - English Gram
TIME Transitions Words List In English - English GramTIME Transitions Words List In English - English Gram
TIME Transitions Words List In English - English Gram
Β 
Free Santa Letter Notepaper - KiddyCharts Free Resources
Free Santa Letter Notepaper - KiddyCharts Free ResourcesFree Santa Letter Notepaper - KiddyCharts Free Resources
Free Santa Letter Notepaper - KiddyCharts Free Resources
Β 
The Importance Of An Honest Essay Writing Company Revi
The Importance Of An Honest Essay Writing Company ReviThe Importance Of An Honest Essay Writing Company Revi
The Importance Of An Honest Essay Writing Company Revi
Β 
How To Avoid WriterS Block. Online assignment writing service.
How To Avoid WriterS Block. Online assignment writing service.How To Avoid WriterS Block. Online assignment writing service.
How To Avoid WriterS Block. Online assignment writing service.
Β 
Essay Websites Best Research Paper. Online assignment writing service.
Essay Websites Best Research Paper. Online assignment writing service.Essay Websites Best Research Paper. Online assignment writing service.
Essay Websites Best Research Paper. Online assignment writing service.
Β 
Everything You Need To Know About Writing A Diag
Everything You Need To Know About Writing A DiagEverything You Need To Know About Writing A Diag
Everything You Need To Know About Writing A Diag
Β 
Tutorial 2 - Example Essay - TUTORIAL 2 EXAMPLE E
Tutorial 2 - Example Essay - TUTORIAL 2 EXAMPLE ETutorial 2 - Example Essay - TUTORIAL 2 EXAMPLE E
Tutorial 2 - Example Essay - TUTORIAL 2 EXAMPLE E
Β 
ANALYTIC CHECKLIST FOR ESSAYS - ESL Works
ANALYTIC CHECKLIST FOR ESSAYS - ESL WorksANALYTIC CHECKLIST FOR ESSAYS - ESL Works
ANALYTIC CHECKLIST FOR ESSAYS - ESL Works
Β 
Seminar Paper Writing Guidelines. Online assignment writing service.
Seminar Paper Writing Guidelines. Online assignment writing service.Seminar Paper Writing Guidelines. Online assignment writing service.
Seminar Paper Writing Guidelines. Online assignment writing service.
Β 
Old Writing On Grunge Black Paper Texture With Water Stains (Paper
Old Writing On Grunge Black Paper Texture With Water Stains (PaperOld Writing On Grunge Black Paper Texture With Water Stains (Paper
Old Writing On Grunge Black Paper Texture With Water Stains (Paper
Β 
Opinion Editorial Essay Example. Online assignment writing service.
Opinion Editorial Essay Example. Online assignment writing service.Opinion Editorial Essay Example. Online assignment writing service.
Opinion Editorial Essay Example. Online assignment writing service.
Β 
Writing Paper Printable Kids Learning Activity Writing
Writing Paper Printable  Kids Learning Activity  WritingWriting Paper Printable  Kids Learning Activity  Writing
Writing Paper Printable Kids Learning Activity Writing
Β 
Thank You Writing Template By Giraffe And Zebra Creat
Thank You Writing Template By Giraffe And Zebra CreatThank You Writing Template By Giraffe And Zebra Creat
Thank You Writing Template By Giraffe And Zebra Creat
Β 
Write Esse Free Editing Websites For Writing
Write Esse Free Editing Websites For WritingWrite Esse Free Editing Websites For Writing
Write Esse Free Editing Websites For Writing
Β 
Conclusion - How To Write An Essay - LibGuides At University
Conclusion - How To Write An Essay - LibGuides At UniversityConclusion - How To Write An Essay - LibGuides At University
Conclusion - How To Write An Essay - LibGuides At University
Β 
Motivation Essay Example - PHDessay.Com. Online assignment writing service.
Motivation Essay Example - PHDessay.Com. Online assignment writing service.Motivation Essay Example - PHDessay.Com. Online assignment writing service.
Motivation Essay Example - PHDessay.Com. Online assignment writing service.
Β 
How To Write A Summary Step-By-Step Guide With E
How To Write A Summary Step-By-Step Guide With EHow To Write A Summary Step-By-Step Guide With E
How To Write A Summary Step-By-Step Guide With E
Β 
MLA Handbook For Writers Of Research Papers MLA 97
MLA Handbook For Writers Of Research Papers MLA 97MLA Handbook For Writers Of Research Papers MLA 97
MLA Handbook For Writers Of Research Papers MLA 97
Β 
Writing An Evaluation Essay. How Do You Write A Evaluation Essay
Writing An Evaluation Essay. How Do You Write A Evaluation EssayWriting An Evaluation Essay. How Do You Write A Evaluation Essay
Writing An Evaluation Essay. How Do You Write A Evaluation Essay
Β 
17 Best Images About Printable Lined Writing Pa
17 Best Images About Printable Lined Writing Pa17 Best Images About Printable Lined Writing Pa
17 Best Images About Printable Lined Writing Pa
Β 

Recently uploaded

MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
Β 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
Β 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
Β 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
Β 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
Β 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
Β 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
Β 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
Β 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
Β 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
Β 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
Β 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
Β 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
Β 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
Β 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
Β 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
Β 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
Β 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
Β 

Recently uploaded (20)

MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
Β 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
Β 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
Β 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
Β 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
Β 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
Β 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
Β 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
Β 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
Β 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
Β 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
Β 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
Β 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
Β 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
Β 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
Β 
CΓ³digo Creativo y Arte de Software | Unidad 1
CΓ³digo Creativo y Arte de Software | Unidad 1CΓ³digo Creativo y Arte de Software | Unidad 1
CΓ³digo Creativo y Arte de Software | Unidad 1
Β 
Model Call Girl in Bikash Puri Delhi reach out to us at πŸ”9953056974πŸ”
Model Call Girl in Bikash Puri  Delhi reach out to us at πŸ”9953056974πŸ”Model Call Girl in Bikash Puri  Delhi reach out to us at πŸ”9953056974πŸ”
Model Call Girl in Bikash Puri Delhi reach out to us at πŸ”9953056974πŸ”
Β 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
Β 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Β 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
Β 

A New Perspective Of Traffic Assignment A Game Theoretical Approach

  • 1. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 488 A New Perspective of Traffic Assignment: A Game Theoretical Approach Genaro PEQUE, Jr. a , Toshihiko MIYAGI b , Fumitaka KURAUCHI c a,b,c Department of Civil Engineering, Gifu University, Gifu, 501-1193, Japan a E-mail: gpequejr@ gifu-u.ac.jp b E-mail: t_miyagi@gifu-u.ac.jp c E-mail: kurauchi@gifu-u.ac.jp Abstract: Traditional equilibrium models consider transportation networks with well-defined link travel time functions and continuous drivers. Recently, researchers focused on adding the behavioral dimension lacking in traditional equilibrium models by treating drivers as individual decision-makers (atomic drivers). However, there is currently no underpinning theory that supports the shift from macroscopic to microscopic traffic assignment modeling. In this paper, a game theoretical model which provides this link is presented. We will show that this model describe drivers’ adaptive behaviors as they perform day-to-day route choices. Drivers acquire payoffs with unknown noise of their chosen and alternative routes. This scenario describes a transportation network with the presence of a Traffic Management Center (TMC). Finally, a simulation-based dynamic traffic assignment simulation is carried out to validate the model using the Simulation of Urban MObility (SUMO) open source software. The simulation shows that Nash equilibrium can be achieved almost surely. Keywords: Nash Equilibrium, Multi-agent Model, Stochastic Congestion Game 1. INTRODUCTION Traditional equilibrium models have been widely used as a modeling tool in traffic assignment. The governing solution concept in these models is the Wardrop equilibrium. A solution to a traffic assignment problem is a situation in which travel demand and travel supply is consistent with each other; traffic equilibria are mathematically described in terms of a fixed point (Nonoyama and Miyagi, 1982; Miyagi et al., 1991) where the interaction of the travel demand and travel supply doesn’t change the input or the outcome. This equilibrium is described by either the user equilibrium (UE) or the stochastic user equilibrium (SUE). A user equilibrium (UE) suggests that the flow on a route in a transportation network is zero if the route has non-minimal cost (Wardrop, 1952). Hence, a UE is attained when all users are on the routes with minimal costs. An analyst’s interpretation of a UE would be based on the user’s perspective where a user can estimate the current best route in the transportation network. This would imply that link travel time functions are common knowledge, route choices can be observed and users calculate their best route choice based on this information (best-response). However, assuming that users have the ability to calculate the current best route is highly unrealistic and computationally expensive. An alternative approach is to relax these assumptions and not require the best (optimal) route but rather to consider a user’s β€œperceived” best route, caused by a user-specific random utility term, while maintaining the common knowledge assumption (a user knows the distribution of her random utility as well). The process requires the distribution of demand onto the routes based on the different route cost perceptions of each user. Route flows fulfill some distribution and flows are shifted towards the desired
  • 2. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 489 route-choice distribution. The shifting of flows happen in a gradual manner (iteratively) until some stopping criterion is fulfilled indicating that a fixed point has been reached. A stochastic user equilibrium (SUE) is then obtained when all users take the route of perceived minimal cost (Daganzo and Sheffi, 1977). Recently, researchers have focused on the structure of real travel decisions identifying it as a major contributing factor in travel demand. Travel decisions are based on users’ reactions from their interaction with each other which are not accounted for in traditional equilibrium models. Implementation of the traditional equilibrium models such as UE and SUE focuses on a single representative of the population, which means that the users being studied are homogeneous and thus, behavior is invariable. Naturally, to account for real travel decisions, different representatives from the demand population are required. This increases the level of detail of the model which consequently increases the degree of heterogeneity of the transportation network users. Additionally, the traditional equilibrium model treats a user/traveler (henceforth, we will refer to a β€œuser” as β€œtraveler” to describe an individual decision maker representing a single or group of users in the population with specific characteristics) as a non-atomic particle (infinitely divisible). When the demand model accounts for the increase of travelers, because of the combinatorial nature of all possible choices a single traveler encounters during a single day and the non-atomic particle representation of each traveler, traffic assignment becomes computationally intractable (Nagel and Flotterod, 2012). To overcome this, a traveler can be interpreted as an atomic particle (a discrete decision maker or a single agent) representing an individual in the population with a different characteristic. The demand population can now be represented by multiple decision-makers (multi-agent model). Flow distributions can then be reinterpreted as choice distributions over the demand drawn using Monte Carlo techniques which maintains its mathematical interpretation. With the advancement of computing power, micro-traffic simulators [SUMO, VISSIM, MATSim, TRANSIMS] are being widely adopted for this purpose. A multi-agent model typically used in micro-traffic simulation sample travelers with different characteristics in the population and simulates the travelers’ interactions in the network. Traveler interaction occurs during each iterative traffic assignment simulation until a stopping criterion is met. Additionally, a traveler’s choice distribution is reinterpreted as random draws from her own choice set (i.e. route set, a plan set, and activity chain set). Thus, an iterative solution procedure in the traditional equilibrium models can be reinterpreted as a day-to-day learning behavioral loop. An important aspect in traditional equilibrium models is the functional relationship between link travel times and link flows which aren’t carried over to micro-traffic simulation. Instead, the cost-flow relationships merely serve as look-up tables (where link travel time functions are implicitly assumed) rather than as functional relationships. Moreover, the main advantage of using the traditional equilibrium traffic assignment models is the robustness of its solution, the Wardrop equilibrium. Therefore, in order to overcome the limitations of the traditional equilibrium models while preserving its solution concept, there is a need to reinterpret (rather than change) it. We then turned to game theory in modeling traveler behavior where we focus on the Nash equilibrium solution concept which consequently implies a Wardrop equilibrium. For an extensive review on game theory’s development and application, the readers are referred to Tadelis (2012) and Fudenberg and Levine (1998). Miyagi and Peque (2012) proposed a game theoretical model which accounts for the adaptive behavior of players (travelers) in a transportation network. In addition, the authors defined three classes of players, a.) Partially informed-users (PIU) with anticipated payoffs, b.) Partially informed-users with announced payoffs and c.) NaΓ―ve users (NU), as a consequence of whether players’ user-specific random utility is known or unknown and whether players’ actions can be observed or cannot be observed in addition to the user-specific travel time
  • 3. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 490 functions. From the stochastic congestion game model the authors have proposed, even though they believe it is applicable to a dynamic traffic assignment setting, so far they have only validated their model under a static traffic assignment setting with PIU with anticipated payoffs and naΓ―ve users. In contrast, this paper focuses on the PIU with announced payoffs and its close relation to transportation networks with the presence of Traffic Management Centers (TMCs) that β€œnowcast” travel times to all drivers in the transportation network to be used by all drivers in making route choice decisions for the following day (day-to-day dynamics), a scenario typical of a transportation network utilizing Intelligent Transportation Systems (ITS). To further develop this model, we use the Simulation of Urban MObility (SUMO) software to validate it. A clear motivation in building on this model is the need to develop comprehensive and sophisticated traffic simulation procedures that include traffic flow simulation in which drivers’ decisions on route choices are interactively connected to the travel times generated by the traffic simulation. Moreover, the convergence properties in dynamic route choice behavior based on microscopic simulation are not yet fully established because travel times of the trips generated by microscopic traffic simulation are not continuous and the expected values of the travel time functions are not known in advance. A similar case to the PIU with announced payoffs has been extensively studied in game theory (Hart and Mas-Colell, 2000; Marden et al., 2009) and reinforcement learning (Borkar, 2008; Miyagi, 2005). In game theory, this is mostly in the better-reply variety of no-regret algorithms. Hart and Mas-Collel’s (2000) work focused on the convergence to the set of correlated equilibrium using regret-matching while Marden et al’s. (2009) work strengthened the guarantees of regret-based learning in weakly acyclic games. They proved convergence to Nash equilibrium almost surely. Although, players’ payoffs in these cases are unperturbed (no additive random utility). On the other hand, reinforcement learning using stochastic approximation (Robbins and Monro, 1951) was extensively studied by Borkar (2008) and was applied by Miyagi (2005) to transportation, however, under the continuous player assumption. Reinforcement learning is normally used when players’ payoffs are initially unknown and must be estimated over time due to noisy observations (i.e. corrupted payoffs due to the unobserved switches in actions by the other players, delay/inaccuracy of the information received, etc.). This has been used by the authors (Leslie and Collins, 2003; Leslie and Collins, 2005; Leslie and Collins, 2006; Cominetti et al., 2010; Chapman et al., 2013) we follow but they considered naΓ―ve users. Our contribution in this paper is the application of the stochastic congestion game model with PIU with announced payoffs proposed by the authors (Miyagi and Peque, 2012) to a simulation-based dynamic traffic assignment simulation. In the simulation, we used a generalised weakened fictitious play actor-critic algorithm (Leslie and Collins, 2006), proposed for the naΓ―ve user case, in the PIU with announced payoffs case. However, we slightly modified the temperature (dispersion or logit) parameter updating scheme by using a regret-based updating scheme (Miyagi and Peque, 2012; Miyagi et al., 2013) wherein players route choices are improving, based on their regret, as time progresses which readily justifies the algorithm as a model of learning. More importantly, our simulation results show that convergence to Nash equilibrium is achieved almost surely. The paper progresses as follows: In the next section, we introduce the notations, definitions and concepts used in game theory and how it is applied to the traffic assignment problem. In section 3, we introduce the stochastic congestion game model together with the derivation of some of the updating formulations we use in this paper. We introduce the generalised weakened fictitious play actor-critic learning model and its development and then present it in section 4. In section 5, we present the simulation-based dynamic traffic assignment simulation using the Simulation of Urban MObility (SUMO) software and show that players’
  • 4. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 491 payoffs converge to Nash equilibrium almost surely. In section 6, we present our conclusions. 2. CONGESTION GAMES In this section, we introduce a game, including its notations and some definitions, describing the transportation network and its players, we then define the desired outcome of the corresponding game. 2.1 Notations Consider a game 𝒒 described by the triple, 𝒒 = (ℐ, {π’œπ‘– , 𝑒𝑖 }π‘–βˆˆβ„ ). (2.1.1) The sets ℐ = {1, … , 𝑖, … , 𝐼}, where 𝐼 = |ℐ| and π’œπ‘– = {𝒢1 𝑖 , … , π’Άπ‘˜ 𝑖 , … , 𝒢𝑁 𝑖 }, where 𝑁 = |π’œπ‘– |, represent the set of players and the set of actions of each player 𝑖, respectively. We use the notation π’Άβˆ’π‘– ∈ π’œβˆ’π‘– to represent the action taken by the opponent(s) of player 𝑖, π’Άβˆ’π‘– = (𝒢1 , … , π’Άπ‘–βˆ’1 , 𝒢𝑖+1 , … , 𝒢𝐼 )and the action set of her opponent(s), π’œβˆ’π‘– = π’œ1 Γ— β‹― Γ— π’œπ‘–βˆ’1 Γ— π’œπ‘–+1 Γ— β‹― Γ— π’œπΌ . An action profile is a vector denoted by 𝒢 = (𝒢1 , … , 𝒢𝑖 , … , 𝒢𝐼 ) ∈ π’œ = π’œ1 Γ— β‹― Γ— π’œπ‘– Γ— β‹― Γ— π’œπΌ . We use the conventional notation 𝒢 = (𝒢𝑖 , π’Άβˆ’π‘– ) to represent an action profile to explicitly show an action taken by player 𝑖 against the actions taken by her opponent(s), βˆ’π‘–. In this analysis, these sets are assumed finite, non-empty, non-unitary and time-invariant. In the game 𝒒, each player 𝑖 represents a driver in the transportation network choosing among her set of routes represented by π’œπ‘– from her origin to her destination. We sometimes interchangeably use the terms driver, user, traveler and player. The game 𝒒 is played stage by stage as a repeated game. In a repeated game, each stage 𝑑 ∈ 𝑇 = {0,1,2, … } βŠ† β„• lasts when all the players have chosen an action 𝒢𝑑 𝑖 denoted by 𝒢𝑑 = {𝒢𝑑 1 , … , 𝒢𝑑 𝑖 , … , 𝒢𝑑 𝐼 }. The payoff of each player 𝑖 in a one-shot game, 𝑇 = {0}, is determined by the function π“Šπ‘– : π’œ β†’ ℝ. When the one-shot game is repeated finitely or infinitely often, 𝑇 = {0,1,2, … }, each player 𝑖 ∈ ℐ observes a sample 𝒰𝑑 𝑖 which is the player’s payoff at stage 𝑑 expressed as 𝒰𝑑 𝑖 = π“Šπ‘– (𝒢𝑑 𝑖 , 𝒢𝑑 βˆ’π‘– ). Each player’s action 𝒢𝑑 𝑖 at stage 𝑑 is chosen according to a probability distribution, πœ‹π‘‘ 𝑖 , which we will refer to as the strategy of player 𝑖 at stage 𝑑. A player’s strategy at stage 𝑑 relies only on her observations from stages 𝑇 = {0,1,2, … , 𝑑 βˆ’ 1} which are dependent on the information restrictions assumed. We define the empirical frequency of an action selected by player 𝑖 at stage 𝑑 as, 𝓏𝑑 𝑖 (𝒢𝑖 ) = 1 𝑑 βˆ‘ 𝕀{𝒢𝑠 𝑖 = 𝒢𝑖 } π‘‘βˆ’1 𝑠=0 , (2.1.2) where 𝕀{β‹…} is the indicator function that takes the value of 1 if the statement in the parenthesis is true and 0 otherwise. From the stage payoffs, each player can estimate their action values denoted by, 𝒱 ̅𝑑 𝑖 (𝒢 ̃𝑖 ) = 1 𝑑 βˆ‘ 𝕀{𝒢𝑠 βˆ’π‘– = π’Άβˆ’π‘– }π“Šπ‘– (𝒢 ̃𝑖 , 𝒢𝑠 βˆ’π‘– ) = π‘‘βˆ’1 𝑠=0 π“Šπ‘– (𝒢 ̃𝑖 , 𝓏𝑠 βˆ’π‘– ), βˆ€π’Άπ‘– ∈ π’œπ‘– . (2.1.3) An average of the realized payoffs for player 𝑖 at stage 𝑑 can then be defined as, 𝒰 ̅𝑑 𝑖 = βˆ‘ 𝓏𝑠 𝑖 π“Šπ‘– (𝒢𝑖 , 𝓏𝑠 βˆ’π‘– ) π‘‘βˆ’1 𝑠=0 ≔ π“Šπ‘– (𝓏𝑑 𝑖 , 𝓏𝑑 βˆ’π‘– ), (2.1.4) where 𝓏𝑑 βˆ’π‘– = (𝓏𝑑 1 , … , 𝓏𝑑 π‘–βˆ’1 , 𝓏𝑑 𝑖+1 , … , 𝓏𝑑 𝐼 ). For now, let the empirical frequencies, 𝓏𝑑 𝑖 (𝒢𝑖 ), βˆ€π’Άπ‘– ∈ π’œπ‘– , of player 𝑖 denote the (empirical) mixed-strategy, 𝓏𝑑 𝑖 (𝒢𝑖 ) = πœ‹π‘‘ 𝑖 (𝒢𝑖 ) ∈ Ξ”(π’œπ‘– ), βˆ€π’Άπ‘– ∈ π’œπ‘– , of player 𝑖 at stage 𝑑. Consider a discrete-time process where the objective of each player is to maximize her expected payoff based on her mixed-strategy denoted by,
  • 5. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 492 maxπœ‹π“Šπ‘– (πœ‹π‘‘ 𝑖 , πœ‹π‘‘ βˆ’π‘– ) = lim π‘‘β†’βˆž π”Όπœ‹ [ 1 𝑑 βˆ‘ 𝒰𝑠 𝑖 π‘‘βˆ’1 𝑠=0 ] = lim π‘‘β†’βˆž π”Όπœ‹ [𝒰 ̅𝑑 𝑖 ]. (2.1.5) A player’s strategy πœŽπ‘– ∈ 𝛴𝑖 is the function πœŽπ‘– : 𝒱 ̅𝑑 𝑖 β†’ Ξ”(π’œπ‘– ) which induces the set of probability distributions or mixed-strategies at each stage, {πœ‹π‘‘ 𝑖 }𝑑>0 and 𝛴𝑖 is the set of all possible strategies of player 𝑖. Let 𝛴 = (𝛴1 , … , 𝛴𝑖 , … , 𝛴𝐼 ) be the set of all strategy profiles. Whenever the mixed-strategies at stage 𝑑, πœ‹π‘‘, induces the same probability distributions, πœ‹π‘‘ 𝑖 ∈ Ξ”(π’œπ‘– ), βˆ€π’Άπ‘– ∈ π’œπ‘– , 𝑖 ∈ ℐ, in the succeeding stages such that it maximizes the players’ payoffs and that none of the players can obtain a performance improvement by unilaterally using another mixed-strategy, it is called a mixed-strategy Nash equilibrium. A mixed-strategy Nash equilibrium is formally defined as follows. Definition 2.1.1. (Mixed-strategy Nash equilibrium). In the game 𝒒, a strategy profile πœ‹βˆ— ∈ Ξ”(π’œπ‘– ) is a mixed-strategy Nash equilibrium if it satisfies for all 𝑖 ∈ ℐ and for all πœ‹π‘– ∈ Ξ”(π’œπ‘– ) such that π“Šπ‘–(πœ‹βˆ— 𝑖 ,πœ‹βˆ— βˆ’π‘–) β‰₯ π“Šπ‘–(πœ‹π‘–,πœ‹βˆ— βˆ’π‘–). (2.1.6) When all players assign a probability 1 to only one action, i.e. πœ‹π‘– (𝒢𝑖 ) = 1 and it satisfies the condition above, we get a Nash equilibrium in pure strategies which we formally define below. Definition 2.1.2. (Pure-strategy Nash equilibrium). In the game 𝒒, a strategy profile π’Άβˆ— ∈ π’œπ‘– is a pure-strategy Nash equilibrium if it satisfies for all 𝑖 ∈ ℐ and for all 𝒢𝑖 ∈ π’œπ‘– , that π“Šπ‘–(π’Άβˆ— 𝑖 ,π’Άβˆ— βˆ’π‘–) β‰₯ π“Šπ‘–(𝒢𝑖,π’Άβˆ— βˆ’π‘–). (2.1.7) Nash equilibrium is one of the central solution concepts of game theory. Therefore, one of the objectives of learning models is to study the kind of behavioral rules that lead to this equilibrium as a consequence of the long-run, non-equilibrium process of learning. 2.2 Potential Games and Weakly Acyclic Games We define the transportation network as a traffic game with atomic flow. A traffic game with atomic flow was first proposed by Rosenthal (1973) and is known to be equivalent to a (deterministic) congestion game. A congestion game is a special case of potential game (Monderer and Shapley, 1996) where the incentive of all players to change their strategy can be expressed using a single global function called the potential function, πœ™. For now, we define a potential game and its generalizations. A potential game is formally defined as follows. Definition 2.2.1. (Potential games). A finite 𝐼 βˆ’ player game with action sets {π’œπ‘– }π‘–βˆˆβ„ and payoff functions {π“Šπ‘– }π‘–βˆˆβ„ is a potential game if for all 𝑖 ∈ ℐ, for all π’Άβˆ’π‘– ∈ π’œβˆ’π‘– , for all pairs (𝒢𝑖 , 𝒢 ̃𝑖 ) ∈ π’œπ‘– Γ— π’œπ‘– and for some potential function πœ™: π’œ β†’ ℝ, π“Šπ‘– (𝒢𝑖,π’Άβˆ’π‘–) βˆ’ π“Šπ‘– (𝒢 ̃𝑖 ,π’Άβˆ’π‘–) = πœ™(𝒢𝑖,π’Άβˆ’π‘–) βˆ’ πœ™ (𝒢 ̃𝑖 ,π’Άβˆ’π‘–). (2.2.1) This means that each player’s payoff function is aligned with the potential function. Additionally, potential games have the finite improvement property (FIP) where any best or better-response of a player to some action profile increases the potential function and every path in the best or better-response leads to a Nash equilibrium. The figure 2.2.1 below shows a game of three players with two actions each, π’œπ‘– = {0,1}, and two Nash equilibria (blue nodes). A node represents the actions chosen by each player while the directed links represent an improvement of a player’s payoff. The left figure shows an example of a potential game where the nodes represent an action profile and each directed link represents an improvement path. We define a general type of potential game where the players’ payoff function alignment with the potential function is relaxed. It is defined as follows.
  • 6. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 493 Figure 2.2.1. A potential game (left) and a weakly acyclic game (right) Definition 2.2.2. (Generalized ordinal potential games). A finite 𝐼 βˆ’ player game with action sets {π’œπ‘– }π‘–βˆˆβ„ and payoff functions {π“Šπ‘– }π‘–βˆˆβ„ is a generalized ordinal potential game if for all 𝑖 ∈ ℐ, for all π’Άβˆ’π‘– ∈ π’œβˆ’π‘– , for all pairs (𝒢𝑖 , 𝒢 ̃𝑖 ) ∈ π’œπ‘– Γ— π’œπ‘– and for some potential function πœ™: π’œ β†’ ℝ, π“Šπ‘– (𝒢𝑖,π’Άβˆ’π‘–) βˆ’ π“Šπ‘– (𝒢 ̃𝑖 ,π’Άβˆ’π‘–) > 0 ⟹ πœ™(𝒢𝑖,π’Άβˆ’π‘–) βˆ’ πœ™ (𝒢 ̃𝑖 ,π’Άβˆ’π‘–) > 0. (2.2.2) A generalized ordinal potential game also has the FIP. A less restrictive class of game which is more general than both the potential and generalized ordinal potential game which we use in this paper is called a weakly acyclic game. A weakly acyclic game requires only that at least one player’s payoff function is aligned with the potential function. Before defining weakly acyclic games, we first define a better and best- response action and strategy. This is formally defined as follows. Definition 2.2.3. (Better-response). An action 𝒢𝑖 ∈ π’œπ‘– is a better-response of player 𝑖 to an action profile (𝒢 ̃𝑖 ,π’Άβˆ’π‘–) if (𝒢𝑖,π’Άβˆ’π‘–) > (𝒢 ̃𝑖 ,π’Άβˆ’π‘–). A mixed-strategy πœ‹π‘– ∈ Ξ”(π’œπ‘– ) is a better-response of player i to a strategy profile (πœ‹ ̃𝑖 ,πœ‹βˆ’π‘–) if (πœ‹π‘–,πœ‹βˆ’π‘–) > (πœ‹ ̃𝑖 ,πœ‹βˆ’π‘–). Definition 2.2.4. (Best-response). An action 𝒢𝑖 ∈ π’œπ‘– is a best-response of player 𝑖 to an action profile π’Άβˆ’π‘– ∈ π’œβˆ’π‘– of the other players if 𝒢𝑖 ∈ argmax𝒢 Μƒπ‘–π“Šπ‘– (𝒢 ̃𝑖 , π’Άβˆ’π‘– ). A mixed- strategy πœ‹π‘– ∈ Ξ”(π’œπ‘– ) is a best-response of player 𝑖 to a mixed-strategy profile πœ‹βˆ’π‘– ∈ Ξ”(π’œβˆ’π‘– ) of the other players if πœ‹π‘– ∈ argmaxπœ‹ Μƒπ‘–π“Šπ‘– (πœ‹ ̃𝑖 , πœ‹βˆ’π‘– ). A best-response strategy is normally used when unperturbed payoffs with complete information is assumed where greedy algorithms can easily be applied. On the other hand, perturbed payoffs with incomplete information requires a better-response strategy as it relies on player’s beliefs (which may not be accurate) about her environment which improves over time, getting closer to or becoming equal to a best-response, as she gains experience. We now formally define weakly acyclic games as follows. Definition 2.2.5. (Weakly acyclic games). A finite 𝐼 βˆ’ player game with action sets {π’œπ‘– }π‘–βˆˆβ„ and payoff functions {π“Šπ‘– }π‘–βˆˆβ„ is a weakly acyclic game if there exist a potential function, πœ™: π’œ β†’ ℝ, with the following property: For any action profile 𝒢 that is not a Nash equilibrium, βˆƒπ‘– ∈ ℐ with an action 𝒢𝑖 ∈ π’œπ‘– for all π’Άβˆ’π‘– ∈ π’œβˆ’π‘– , for all pairs (𝒢𝑖 , 𝒢 ̃𝑖 ) ∈ π’œπ‘– Γ— π’œπ‘– such that, π“Šπ‘– (𝒢𝑖,π’Άβˆ’π‘–) βˆ’ π“Šπ‘– (𝒢 ̃𝑖 ,π’Άβˆ’π‘–) > 0 and πœ™(𝒢𝑖,π’Άβˆ’π‘–) βˆ’ πœ™ (𝒢 ̃𝑖 ,π’Άβˆ’π‘–) > 0. (2.2.3) The right figure in figure 2.2.1 shows a weakly acyclic game. The red directed links represent a loop where at least one of the player’s payoff function is aligned with the potential function. Weakly acyclic games are generalizations of the Cournot adjustment process of two firms (i.e. players). The Cournot adjustment assumes that in each period one firm chooses a pure strategy that is a best-response to the strategy of the other firm from the previous stage. 111 011 100 110 010 000 001 101 101 111 000 010 001 011 100 110
  • 7. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 494 Weakly acyclic games doesn’t necessarily have the finite improvement property as shown above and it was originally defined for better-responses but has been recently also defined for best-responses (Fabrikant et al., 2013). 2.3 Flows and Costs We begin with the flow conservation equations in traffic games with atomic flow. For simplicity, we restrict our analysis to a transportation network with a single origin-destination (OD) pair connected by a set of paths, 𝒦 = {1, … , π‘˜, … , 𝑁}, made up of a subset of links, β„“ ∈ β„’. We assume that for all players in the transportation network, the set of available paths are the same and is defined to be the players’ action sets, i.e. {𝒢1 𝑖 (1),… , π’Άπ‘˜ 𝑖 (π‘˜), … , 𝒢𝑁 𝑖 (𝑁)} ≔ {1, . . , π‘˜, … , 𝑁}, βˆ€π‘– ∈ ℐ . To avoid confusion, we drop the path index π‘˜ in the notation, π’Άπ‘˜ 𝑖 (π‘˜), which means that we use 𝒢𝑖 and π‘˜ interchangeably to denote an action or a path selected by player 𝑖. Path flows are denoted by an 𝑁 βˆ’ dimensional vector β„Ž = (β„Ž(1), . . , β„Ž(π‘˜), … , β„Ž(𝑁)) where each element represents the number of players who chose the path π‘˜, β„Ž(π‘˜) = |{𝑖: 𝒢𝑖 }|. Hence, βˆ‘ β„Ž(π‘˜) = |𝐼| π‘˜βˆˆπ’¦ . A visit to path π‘˜ by player 𝑖 at stage 𝑑 is expressed as, 𝓏𝑑 𝑖(π‘˜) = 𝕀{𝒢𝑑 𝑖 = π‘˜}. (2.3.1) The aggregated path flows at an arbitrary stage 𝑑 are then defined as follows. βˆ‘ 𝓏𝑑 𝑖(π‘˜) = 1 π’Άπ‘–βˆˆπ’œπ‘– , (2.3.2) βˆ‘ 𝓏𝑑 𝑖(π‘˜) = β„Žπ‘‘(π‘˜) π‘–βˆˆβ„ . (2.3.3) Let {𝛿ℓ(π‘˜)}β„“βˆˆπ‘˜ denote elements in the link-path incidence matrix and 𝑓ℓ be the flow on the link β„“. We can then define the link flows as, βˆ‘ 𝛿ℓ(π‘˜) π‘˜βˆˆπ’¦ β„Žπ‘‘(π‘˜) = 𝑓ℓ,𝑑, βˆ€β„“ ∈ β„’. (2.3.4) We also use the following notation on link flows, 𝑓ℓ 𝑖 = βˆ‘ 𝕀{β„“ = π‘˜} π‘˜βˆˆπ’œπ‘– , βˆ€π‘– ∈ ℐ, (2.3.5) βˆ‘ 𝑓ℓ 𝑖 = 𝑓ℓ π‘–βˆˆβ„ . (2.3.6) Congestion games are a specific class of games in which players’ payoff functions have a special structure. Let β„’ = {β„“1,β„“2, … } denote a finite set of links. For each link β„“ ∈ β„’, there is an associated congestion or travel time function denoted by, πœβ„“: {0,1,2, … } β†’ ℝ, (2.3.7) which reflects the travel time for β€œusing” the link as a function of the number of players using that link, β„“. The link travel time is given by a real-valued, non-decreasing but not necessarily continuous differentiable function, πœβ„“(𝑓ℓ ). The cost of a path π‘˜ ∈ π’œπ‘– chosen by player 𝑖 at stage 𝑑 is defined as, 𝑐𝑖(π‘˜) = βˆ‘ 𝛿ℓ(π‘˜)(𝛾𝑖 πœβ„“ + 𝐹ℓ) β„“βˆˆβ„’ , (2.3.8) where 𝛾𝑖 is the value of time for player 𝑖 and 𝐹ℓ is the fare imposed on the link β„“. We define the payoff that a player receives when she chooses a path π‘˜ ∈ π’œπ‘– as π“Šπ‘–(π‘˜) = βˆ’π‘π‘–(π‘˜). Since a path flow is dependent on the link flows which are also dependent on the number of discrete players, the payoff function is discontinuous. 3. STOCHASTIC CONGESTION GAMES 3.1 Travel Information and Route Choice
  • 8. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 495 Following Selten et al. (2004), Miyagi and Peque (2012) introduced different classes of players defined by the player’s knowledge about the states of the routes on a traffic network: Partially informed-users (PIUs) and NaΓ―ve users (NUs). PIUs are further categorized into partially informed-users with announced payoffs and partially informed-users with anticipated payoffs. The first type of players are assumed to not know the structural form of their payoff functions nor any information about the other players. However, a Transportation Management Center (TMC) announces to all players, in hindsight, the observed realized payoffs of the actions taken by all the players in the transportation network. Additionally, payoffs of alternatives actions not taken by the players are also announced to all players. Therefore, each player can get the realized travel times in all of the available routes between any O-D pair. On the other hand, for the second type of players, each player knows the structural form of her own payoff function and is capable of observing the actions of all the other players at every stage. However, she doesn't know the structural form of the other players’ payoff functions. Each player can estimate the expected payoffs that she would receive by taking other actions different from the action taken at stage 𝑑 through exploration where the actions of the other players are held constant. Furthermore, each player believes that the other players' action selection are based on empirical frequencies. NaΓ―ve users are more realistic in the sense that the only information available to her is the realized travel time of the selected route on that day. We restrict our attention to the equilibrium problem of traffic networks used by PIUs with announced payoffs in this paper. The assumptions on the PIUs with the announced payoffs follow the prevailing assumptions used in the traditional route choice models, however, we assume that the travel time functions (or cost functions) are not common knowledge (it will be similar if we assume that players’ true expected payoffs are the same). Furthermore, the PIU with the announced payoffs can be regarded as a situation where a TMC observes traffic volumes and vehicle average speeds on each link in the network through sensors allocated in the system, and computes all possible paths during a specified time period for any origin- destination pair in the network. 3.2 The Model For now we set 𝛾𝑖 = 1, βˆ€π‘– ∈ ℐ. We define the potential function in a (deterministic) congestion game which we are trying to minimize as, minπœ™(β„Ž) = βˆ‘ βˆ‘ πœβ„“(π‘š) 𝑓ℓ(β„Ž) π‘š=0 β„“βˆˆπ’Ά . (3.2.1) The traffic game was shown by Rosenthal (1973) to have at least one pure-strategy Nash equilibrium. Lemma 3.2.1. (Rosenthal, 1973). A game with a strictly increasing cost function with respect to 𝑓ℓ of the form (2.3.8) with a potential function of the form (3.2.1) possess at least one pure-strategy Nash equilibrium. A (deterministic) congestion game is an exact potential game. The action 𝒢𝑖 is a best- response of player 𝑖 when, βˆ‘ πœβ„“(𝑓ℓ βˆ’π‘– + 1) β„“βˆˆπ’Ά ̃𝑖 > βˆ‘ πœβ„“(𝑓ℓ βˆ’π‘– + 1), β„“βˆˆπ’Άπ‘– 𝒢𝑖 β‰  βˆ€π’Ά ̃𝑖 ∈ π’œπ‘– (3.2.2) holds. The equation (3.2.2) expresses the exploration process in which each player can compare the payoffs (or costs) among alternative routes and judge which alternative is the best. This exploration process might be automatically accomplished by a micro-processor equipped in a traveler’s own car. Exploration implies that player 𝑖 knows the structural form of her payoff
  • 9. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 496 function and hence, the cost function and can search the action values of her alternative actions provided that the actions of her opponents remain unchanged. In (deterministic) congestion games, it is assumed that all players have complete and perfect knowledge of the game. Therefore, the realized payoff is always equal to the expected payoff which is the result of the best action (route) chosen by player 𝑖 at each stage which is denoted as, 𝒰𝑑 𝑖 = π“Šπ‘– (𝒢𝑑 𝑖 , 𝒢𝑑 βˆ’π‘– ). (3.2.3) Equation (3.2.3) explicitly shows that player 𝑖 can get information about the other players’ actions, π’Άβˆ’π‘– . To avoid drivers from making deterministic decisions, a stochastic congestion game is used. In stochastic congestion games the payoff function is perturbed. It is assumed that the realized payoff consists of the expected payoff π“Šπ‘– (𝒢𝑖,π’Άβˆ’π‘–) and a player-specific random term 𝑒𝑑 𝑖 (𝒢𝑑 𝑖 ). That is, 𝒰𝑑 𝑖 = π“Šπ‘– (𝒢𝑑 𝑖 , 𝒢𝑑 βˆ’π‘– ) + 𝑒𝑑 𝑖 (𝒢𝑑 𝑖 ) (3.2.4) where π“Šπ‘– is the true expected payoff and 𝑒𝑑 𝑖 (𝒢𝑑 𝑖 ) is a component of the player-specific and time-dependent noise or private information, 𝑒𝑑 𝑖 = (𝑒𝑑 𝑖(1), … , 𝑒𝑑 𝑖(π‘˜), … , 𝑒𝑑 𝑖(π‘šπ‘‘)). It should be noted that the realized payoff is defined when all the actions of players who are participants of the game are observed. Each player believes that the action selection of the other players are executed based on their mixed-strategies. Therefore, each player’s strategy is formulated as follows: 𝛽 ̃𝑖 (π“Šπ‘– ) = argmax πœ‹π‘–βˆˆπ‘–π‘›π‘‘(Ξ”(π’œπ‘– )) [βˆ‘ πœ‹π‘– (𝒢𝑖 )π“Šπ‘– (𝒢𝑖,πœ‹βˆ’π‘–) + πœ‡π‘– πœ“π‘– (πœ‹π‘– ) π’Άπ‘–βˆˆπ’œπ‘– ], (3.2.5) where πœ‡π‘– > 0 is a smoothing parameter and the function πœ“π‘– (πœ‹π‘– ) is known only to player 𝑖 and is assumed to be a smooth, strictly differentiable concave function satisfying the boundary condition that as πœ‹π‘– approaches the boundary of the simplex, the slope of πœ“π‘– becomes infinite. Fudenberg and Levine (1998) assumed the following entropy function: πœ“π‘– (πœ‹π‘–) = βˆ’ βˆ‘ πœ‹π‘–(𝒢𝑖) log πœ‹π‘–(𝒢𝑖) π’Άπ‘–βˆˆπ’œπ‘– . (3.2.6) This formulation generates the so-called smooth best response function: 𝛽 ̃𝑖 (π“Šπ‘–) = 𝑒π‘₯𝑝{π“Šπ‘–(𝒢𝑖,πœ‹βˆ’π‘–)/1 πœ‡π‘– ⁄ } βˆ‘ 𝑒π‘₯𝑝{π“Šπ‘–(𝑏 𝑖 ,πœ‹βˆ’π‘–)/1 πœ‡π‘– ⁄ } 𝑏 𝑖 βˆˆπ’œπ‘– ∈ Ξ” (π’œπ‘– ). (3.2.7) Equation (3.2.7) is a map from payoffs to choice probabilities which is the standard choice probability function from the additive random utility model of discrete choice theory (McFadden, 1974) where the random utility 𝑒𝑖 is distributed according to the double exponential function. Miyagi (1983) showed a duality relation between the entropy function and the satisfaction function (or log-sum function) of the logit model using Frenchel’s duality theorem while Hofbauer and Sandholm (2002) used an analysis based on the Legendre transforms. These imply that the log-sum function is the optimized function of the random utility model and gives a potential function for a stochastic congestion game. However, the duality holds if and only if the random utility 𝑒𝑑 𝑖 (𝒢𝑑 𝑖 ) is specified by the double exponential function. In our PIU with announced payoffs specification, other players’ actions cannot be observed and the distribution of the random utility is unknown. Additionally, all stage payoffs are announced (i.e. chosen and alternative actions’ payoffs) which is denoted by, 𝒰𝑑 𝑖 = π“Šπ‘– (𝒢𝑑 𝑖 ) + 𝑒𝑑 𝑖 (𝒢𝑑 𝑖 ), βˆ€π’Άπ‘– ∈ π’œπ‘– , βˆ€π‘‘. (3.2.8) Hence, payoffs must be estimated. A player tries to maximize her utility by choosing an action using the Boltzmann-Gibbs action selection procedure with a vanishing temperature parameter, πœ‡π‘‘ 𝑖 , for her strategy selection given by the equation,
  • 10. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 497 𝛽 ̃𝑖 (𝒬𝑑 𝑖 ) = 𝑒π‘₯𝑝{𝒬𝑑 𝑖 (𝒢𝑖)/1 πœ‡π‘‘ 𝑖 ⁄ } βˆ‘ 𝑒π‘₯𝑝{𝒬𝑑 𝑖(𝑏 𝑖 )/1 πœ‡π‘‘ 𝑖 ⁄ } π‘π‘–βˆˆπ’œπ‘– , 𝒢𝑖 ∈ π’œπ‘– ,𝑖 ∈ ℐ, (3.2.9) and 𝒬 βˆ’ learning given by equation (3.2.10) for her payoff estimation, 𝒬𝑑 𝑖 (𝒢𝑖) = π’¬π‘‘βˆ’1 𝑖 (𝒢𝑖) + πœ†π‘‘ (𝒰𝑑 𝑖(𝒢𝑖) βˆ’ π’¬π‘‘βˆ’1 𝑖 (𝒢𝑖)), βˆ€π’Άπ‘– ∈ π’œπ‘– . (3.2.10) To show the equivalency of equations 3.2.7 and 3.2.9, it is necessary for the condition, ‖𝒬𝑑 𝑖 (𝒢𝑖 ) βˆ’ 𝒰𝑑 𝑖 (𝒢𝑖 , πœ‹βˆ’π‘– )β€– β†’ 0, as 𝑑 β†’ ∞ a.s. (3.2.11) to hold. 4. THE PARTIALLY INFORMED-USER ALGORITHM In this section, we first introduce the generalised weakened fictitious play process and then proceed with the actor-critic learning algorithm for the day-to-day route-choice of players in the stochastic congestion game with partially informed-user with announced payoffs. 4.1 The Generalised Weakened Fictitious Play The generalised weakened fictitious play (GWFP) actor-critic algorithm was proposed by Leslie and Collins (2006) for the naΓ―ve user case where they proved that with probability 1, the mixed-strategies follow a GWFP process. However, the generalised weakened fictitious play was proposed as an extension of the weakened fictitious play by Van der Genugten (2000) normally considered for games wherein players’ actions can be observed which is closely related to the PIU with anticipated payoffs. Additionally, the GWFP process considers a vanishing best-response perturbation as a mechanism for speeding up the convergence of fictitious play which implies that strategies are also estimated according to a stochastic approximation process. Before we formally define a GWFP process, let 𝑏𝑖 (πœ‹βˆ’π‘– ) be the best-response set of player 𝑖 to the mixed-strategy πœ‹βˆ’π‘– and let, π‘πœ€ 𝑖 (πœ‹βˆ’π‘– ) = {πœ‹βˆ’π‘– ∈ βˆ†(π’œπ‘– ): 𝒰𝑑 𝑖 (πœ‹π‘– , πœ‹βˆ’π‘– ) β‰₯ 𝒰𝑑 𝑖 (𝑏𝑖 (πœ‹βˆ’π‘– ), πœ‹βˆ’π‘– ) βˆ’ πœ€}. (4.1.1) That is, the set of player 𝑖’s strategies perform not more than πœ€ worse than her best- response. The joint πœ€ βˆ’ best-response to the mixed-strategy profile πœ‹ is defined as the set, π‘πœ€(πœ‹) = π‘πœ€ 1 (πœ‹βˆ’π‘– ), … , π‘πœ€ 𝑖 (πœ‹βˆ’π‘– ), … , π‘πœ€ 𝐼 (πœ‹βˆ’π‘– ). (4.1.2) Definition 4.1.1. (GWFP process). A generalised weakened fictitious play process is any process {πœ‹π‘‘}𝑑β‰₯1, with πœ‹π‘‘ such that, πœ‹π‘‘+1 ∈ (1 βˆ’ 𝛼𝑑+1)πœ‹π‘‘ + 𝛼𝑑+1(π‘πœ€π‘‘ (πœ‹π‘‘) + 𝑀𝑑+1), (4.1.3) with an 𝛼𝑑 β†’ 0 and πœ€π‘‘ β†’ 0 as 𝑑 β†’ ∞, βˆ‘ 𝛼𝑑 = ∞ 𝑑β‰₯1 , and {𝑀𝑑}𝑑β‰₯1 a sequence of perturbations such that, for any 𝑇 > 0, limπ‘‘β†’βˆž supπ‘˜{β€–βˆ‘ 𝛼𝑠+1𝑀𝑠+1 π‘˜βˆ’1 𝑠=𝑑 β€–: βˆ‘ 𝛼𝑠+1 ≀ 𝑇 π‘˜βˆ’1 𝑠=𝑑 } = 0. In other words, the current strategies are adapted towards a (possibly perturbed) joint πœ€ βˆ’ best-response. Leslie and Collins (2006) showed that allowing non-zero πœ€π‘‘ , letting 𝛼𝑑 be chosen differently and allowing (certain) perturbations does not affect the convergence result. Lemma 4.1.1. (Leslie and Collins, 2006). The set of limit points of a generalised weakened fictitious play process is a connected-internally chain-recurrent set of the best response differential inclusion. And subsequently presented the following result.
  • 11. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 498 Lemma 4.1.2. (Leslie and Collins, 2006). Any generalised weakened fictitious play process will converge to the set of Nash equilibria in potential games. 4.2 The Boltzmann-Gibbs Actor-critic Algorithm Actor-critic algorithms are normally used in cases where players only obtain payoffs for the actions they have chosen which is our definition of the naΓ―ve user case. However, due to the complex nature of the dynamic traffic assignment simulation of PIUs with announced payoffs we consider, we use it to estimate both the strategies and payoffs of each player. Definition 4.2.1. (Boltzmann-Gibbs actor-critic algorithm). A Boltzmann-Gibbs actor- critic algorithm is a process {πœ‹π‘‘, 𝒬𝑑} such that, { πœ‹π‘‘ 𝑖 (𝒢𝑖 ) = (1 βˆ’ 𝛼𝑑)πœ‹π‘‘βˆ’1 𝑖 + 𝛼𝑑𝛽𝑑 𝑖 (𝒢𝑖 ) 𝒬𝑑 𝑖 (𝒢𝑖 ) = π’¬π‘‘βˆ’1 𝑖 (𝒢𝑖 ) + πœ†π‘‘ (𝒰𝑑 𝑖 (𝒢𝑖 ) βˆ’ π’¬π‘‘βˆ’1 𝑖 (𝒢𝑖 )) , βˆ€π’Άπ‘– ∈ π’œπ‘– , βˆ€π‘– ∈ ℐ, (4.2.1) where 𝛽𝑑 𝑖 (𝒬𝑑 𝑖 ) is defined by the equation (3.2.9) and the temperature parameter is updated according to the regret-based updating scheme, πœ‡π‘‘ 𝑖 = πœ‡π‘‘βˆ’1 𝑖 + 1 𝑑 (max([𝑅]+ , 0) βˆ’ πœ‡π‘‘βˆ’1 𝑖 ), [𝑅]+ = maxπ‘˜βˆˆπ’œπ‘–(𝒬𝑑 𝑖(π‘˜) βˆ’ 𝒰 ̅𝑑 𝑖 ) > 0. (4.2.2) We present our result without proof. Proposition 4.2.2. Suppose that {πœ‹π‘‘, 𝒬𝑑} is a Boltzmann-Gibbs actor-critic process for which, 1. ) 𝛼𝑑 = (𝐢𝛼 + 𝑑)βˆ’πœŒπ›Ό where 𝐢𝛼 > 0 and πœŒπ›Ό ∈ ]0.5,1], (4.2.3) 2. ) πœ†π‘‘ = (πΆπœ† + 𝑑)βˆ’πœŒπœ† where πΆπœ† > 0 and πœŒπœ† ∈ ]0.5, πœŒπ›Ό[, (4.2.4) 3. ) πœ‡π‘‘ 𝑖 is calculated using equation (4.2.2). Then with probability 1, the πœ‹π‘‘ follow a generalised weakened fictitious play process. The regret-based temperature parameter updating scheme is used since it reduces the exogenous variables unknown to the model. A player’s regret is directly connected to her strategy selection and payoffs in which an improving action selection policy should be dependent upon which is more logical as compared to the difference of the maximum and minimum estimates used by Leslie and Collins (2006) with the exogenous variable πœŒπœ‹, πœ‡π‘‘ 𝑖 = max π‘˜βˆˆπ’œπ‘–π’¬π‘‘ 𝑖(π‘˜)βˆ’min π‘˜βˆˆπ’œπ‘–π’¬π‘‘ 𝑖(π‘˜) πœŒπœ‹ log𝑑 . (4.2.5) Additionally, since we are dealing with PIU with announced payoffs, the action counts in the payoff learning rates of the players used in Leslie and Collins (2006), πœ†π‘‘ = (πΆπœ† + #𝑑 𝑖 (𝒢𝑑 𝑖 )) βˆ’πœŒπœ† , #𝑑 𝑖 (𝒢𝑑 𝑖 ) = βˆ‘ 𝕀{𝒢𝑑 𝑖 = 𝒢𝑖 } 𝑑 , (4.2.6) are replaced by just the iteration times since the action counts acted as some sort of unbiased estimator (Leslie and Collins, 2005) caused by the infrequent updates of action values with low probabilities which can be viewed as a player’s way of compensating for the fact that actions played infrequently do not receive updates of their values, so when they are played, any reward prediction error must have greater influence on the value than if frequent updates occur. However, in the PIU with announced payoffs scenario, all action values are updated at each stage, thus, there is no need for an estimator. Using the result by Singh et al. (2000) and Leslie and Collins (2006), the goal is to show that ‖𝒬𝑑 𝑖 βˆ’ 𝒰𝑑 𝑖 (πœ‹π‘– )β€– β†’ 0 and πœ‡π‘‘ 𝑖 β†’ 0 as 𝑑 β†’ ∞. We can rewrite equation (4.2.2) as, πœ‡π‘‘ 𝑖 βˆ’ πœ‡π‘‘βˆ’1 𝑖 (1 βˆ’ 1 𝑑 ) = max[max π‘˜βˆˆπ’œπ‘–(𝒬𝑑 𝑖(π‘˜)βˆ’π’° ̅𝑑 𝑖 ),0] 𝑑 . (4.2.7) The second term goes to zero almost surely as 𝑑 β†’ ∞ if ‖𝒬𝑑 𝑖 βˆ’ 𝒰𝑑 𝑖 (πœ‹π‘– )β€– β†’ 0 which makes πœ‡π‘‘ 𝑖 β†’ 0. So we only need to show that ‖𝒬𝑑 𝑖 βˆ’ 𝒰𝑑 𝑖 (πœ‹π‘– )β€– β†’ 0 almost surely as 𝑑 β†’ ∞ which we
  • 12. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 499 show through our simulation result. 5. SIMULATION RESULTS We present a simulation of a transportation network shown in figure 5.1. We assume that each player has the same set of actions, π’œπ‘– = {1,2,3}, βˆ€π‘– ∈ ℐ or set of available routes. We assigned 1000 players to the network composed of a single origin-destination (OD) pair with 3 routes, i.e., 𝐼 = 1000. The flow conservation is described by the equation 𝐼 = βˆ‘ β„Ž(π‘˜) π‘˜βˆˆπ’¦ , 𝒦 = {1,2,3}. Figure 5.1. The test network Table 5.1. Link segment settings Link segment Length Maximum allowed speed Number of lanes 1 500 meters 13.89 meter per second 2 2 1005 meters 13.89 meter per second 1 3 1005 meters 13.89 meter per second 2 4 1005 meters 13.89 meter per second 2 5 1005 meters 13.89 meter per second 1 6 200 meters 13.89 meter per second 1 7 500 meters 13.89 meter per second 2 The simulation-based dynamic traffic assignment is carried out using the Simulation of Urban MObility (SUMO) software. SUMO is a free and open traffic simulation suite which has been available since 2001. SUMO allows modelling of intermodal traffic systems including road vehicles, public transport and pedestrians. In the simulation, players use the equations (4.2.1)-(4.2.2) to update their route choices and payoff estimates. There are 3600 simulation seconds per iteration under 1000 iterations where players have a player-specific, Poisson-distributed, dynamic departure time. We assume that speed, flow and density are collected by sensors positioned all throughout the links. Travel 1 2 3 4 5 6 7 1 lane 2 lanes route 1 route 2 route 3
  • 13. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 500 times on all routes are announced to all the players in the network where for an unused route, the free-flow travel time is announced. The intersection is made up of links 4 and 6 are priority-based where in link 4 is the main priority. This means that vehicles traversing link 6 will wait for a gap in link 4 before they can enter link 5. This also occurs at the intersection between links 3 and 5 where link 3 is the priority. The legal speed limit on each link is 13.89 m/s. We assume that all vehicles accelerate at 0.8 m/s and decelerate at 4.5 m/s. The maximum speed of a vehicle is assumed to be 70 m/s (achievable speed of the vehicle’s engine). Each player has an imperfection coefficient (sigma) which is a braking probability that we set to 0.05. To ensure variable vehicle speeds at each time step, we set the speed deviation parameter to 0.1 which results in a speed distribution where 95% of the vehicles drive between 80% and 120% of the legal speed limit. Figure 5.2. Fundamental diagram of the first iteration Figure 5.2 shows the relationships of the speed, flow and density which compose the fundamental diagram of traffic flow used to predict the capability of a road system or its behavior when applying inflow regulation or speed limits. The upper-right figure shows the speed-density relationship with a negative linear slope which means that as the density increases, the speed on the link decreases. The line that crosses the speed axis is at the free flow speed while the line that crosses the density axis is at jam density. The figure shows that the speed approaches free flow speed as the density approaches zero. As the density increases, the speed of the vehicles on the links decreases and it reaches zero when the density equals the jam density. However, link 3 has a positive slope because the road segment comes from a single lane (link 2) which then transfers to a two lane road segment that distributes the incoming vehicles to each of the lanes and thus, doesn’t cause as much congestion as compared to the other links in the network. The flow-density relationship in the lower-right of figure 5.2 follows a triangular shaped curve which is approximated by a parabolic curve. However, this is inverted in the density axis. Normally, the flow-density graph is represented by two vectors representing the free flow velocity (negative slope in the figure) and the congested branch (positive slope in the figure). The congested branch implies that even though there are more vehicles on the road, the number of vehicles passing a single point is less than if there were fewer vehicles on the road. Flow on
  • 14. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 501 the links 3 and 7 are almost unaffected by the increase in density for two reasons, 1.) route 1, which link 3 belongs to, is the priority route in the intersection where links 3 and 5 intersect, this means that vehicles using link 3 doesn’t stop to allow vehicles on link 5 and 2.) the flow of vehicles come from link 2, a single lane link, transferring to link 3, a dual lane link. The speed-flow diagram on the upper-left of figure 5.2 is used to determine the speed at which maximum flow occurs which consists of the free flow and congested branches. There is currently no function that approximates it, however, the linear approximations (looking from left to right) show that the average speed decreases as the average flow decreases implying that this is in the congested branch of the speed-flow diagram. Additionally, the approximation on links 3 and 7 show that these links are almost at optimum flows for the same reasons stated above. Figure 5.3. Fundamental diagram of the last iteration Comparing the fundamental diagrams of figure 5.2 and figure 5.3 dramatically shows that the players have learned to avoid long travel times. Link 1 is slightly congested due to the fact that vehicles can change lanes and are inserted into the network randomly between the two lanes. This means that when a vehicle who is set to use route 3 is inserted in the upper lane this vehicle needs to wait for a gap in the lower lane to be able to go to link 4. This random insertion causes a slight congestion on this link. Looking at the speed-flow diagram on figure 5.3, link 5 has a negative slope which isn’t caused by a congestion on link 5 but a long waiting time due to the priorities in the merging links. Link 5 belongs to route 3 which has a lesser priority compared to link 3 which belongs to route 1. This causes vehicles using link 5 to wait for a gap in order to move to link 7. Lastly, no vehicle uses link 6, which belongs to route 2, as this has a very high travel time. When a vehicle uses route 2, there are two intersections where this vehicle needs to wait for a gap in order to move to the next link caused by lesser link priorities. In the intersection between links 4 and 6 in which link 4 has a higher priority, and again in the intersection between link 3 and 5 in which link 3 has a higher priority. Figure 5.4 below shows that vehicles immediately realize that route 1 is the best route choice since link 3 is the priority in the link 3 and 5 intersection. However, the fluctuations that can be observed in the mean route travel time figure in the middle is caused by vehicles from route 1 changing to routes 2 or 3. Vehicles who have faster speeds are limited by the speed of
  • 15. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 502 the vehicles ahead of them in link 2 making the travel time on this specific iteration higher for this route. Therefore, probabilities for this route decrease in the next iteration. Figure 5.4. Link and route information In figure 5.5 below, strategy and payoff learning parameters, 𝛼 and πœ†, respectively, are shown to be slowly decreasing to zero as time progresses as required by our result. The temperature parameter, πœ‡ , appears to be decreasing to zero which is required to show convergence to a generalised weakened fictitious play process. More importantly, it shows that the temperature parameter is player-specific which implies that players are learning and updating their strategies independent from each other, validating the multi-agent model and that as this parameter decreases, the probability of choosing the best action increases. This validates the actor-critic algorithm as a learning model where players’ strategy selection is improving due to perience. The top figure in figure 5.6 below shows the averaged route probabilities of the selected routes of all players. This figure is almost similar to the route counts figure (bottom of figure 5.4) because these represents the choice distributions of the players. The only difference is that these are the β€˜real’ route probabilities of the players for selecting the action (i.e. mixed strategies) which makes it slightly lower than the route counts figure. If these where based on pure-strategies (probability 1), it would be exactly similar to the route counts figure. The middle and bottom figures show that as time progresses, the distance between the estimated payoffs (- 239.5752278) and average payoff (-239.7410205) is approaching zero which is necessary to show its equivalency to the case wherein players can actually observe the other players’ actions and as a requirement to show convergence to a generalised weakened fictitious play process. Significantly, it can be observed that even though the information that the players receive are not very accurate (mean route travel time shown in the middle figure in figure 5.4), the estimates of the players’ payoffs and strategies still converge. Furthermore, the simulation has been carried out more than 50 times and we get the same consistent result (approximately -239.5) within a reasonable iteration (i.e. after only 200 iterations with 1000 players).
  • 16. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 503 Figure 5.5. Learning parameters Figure 5.6. Link and route information 6. CONCLUSION This paper further developed the stochastic congestion game model proposed by Miyagi and Peque (2012) by applying it to a simulation-based dynamic traffic assignment simulation with PIU with announced payoffs. Our motivation is the application of such a model to a transportation network where a Traffic Management Center (TMC) is present which announces travel times to all drivers. This scenario is typical in a transportation network where Intelligent Transportation Systems (ITS) are utilized. Examples of such a scenario is the availability of the Vehicle Information and Communication System (VICS) technology of Japan or the Traffic
  • 17. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 504 Message Channel (TMC) technology of Europe. The motivation for proposing the game theoretical model was the lack of behavioral realism inherent in the traditional equilibrium models such as the UE and SUE. This lead to the discretization of demand where users are treated as individual decision-makers making it a multi-agent model. Since Wardrop equilibrium is applicable only to the case where players are non-atomic, Nash equilibrium is used which preserves its mathematical interpretation. The authors (Miyagi and Peque, 2012) have shown that Nash equilibrium can be achieved in two of the three classes of players they have defined, namely, the PIU with anticipated payoffs (Miyagi and Peque, 2012) and naΓ―ve users (Miyagi et al., 2013). However, the results were only shown for the static traffic assignment setting in both papers. Hence, to validate their model, we tackled the case where players are PIU with announced payoffs under a simulation- based dynamic traffic assignment simulation. Moreover, players have a player-specific, Poisson-distributed, dynamic departure time making the problem highly complex. To solve this, players in the transportation network are learning and updating their strategy and payoff estimates using an actor-critic algorithm proposed by Leslie and Collins (2006) which we slightly modified to fit the scenario. Regardless, we obtain the expected results they have presented which consequently validates the efficacy of the game theoretical model. As an additional consequence, we are able to analyze the evolution of the players’ route choices and behaviors by learning how to use the transportation network. Finally, the simulation shows that even when the players receive information with noise, convergence to Nash equilibrium is achieved almost surely within a reasonable iteration interval. Although the current simulation results are restrictive, these are significant. It showed that the multi-agent model is capable of including player-specific attributes in the traffic simulation and route choice model, simultaneously and it is likely to have the global convergence property inherent in the usual traffic environment. Our next step is to apply our methods to a more sophisticated network (i.e. a larger network, a network with traffic lights and loop detectors, etc.). Furthermore, we are interested in extending the simulation to the naΓ―ve user setting. The naΓ―ve user setting closely resembles the assumptions currently used in micro-traffic simulation models and is a more realistic and plausible model of current transportation networks. ACKNOWLEDGEMENT This research is supported by MEXT Grants-in-Aid for Scientific Research, No. 26420511, for the term 2014-2016. REFERENCES Borkar, V. (2008) Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press. Chapman, A., Leslie, D., Rogers, A., Jennings, N. (2013) Convergent Learning Algorithms for Unknown Reward Games. SIAM Journal on Control and Optimization 2013 51:4, 3154-3180. Cominetti, R., Melo, E., Sorin, S. (2010) A payoff-based learning procedure and its application to traffic games. Games and Economic Behavior, 70, pp.71-83. Daganzo, C., Sheffi, Y. (1977) On stochastic models of traffic assignment. Transpn. Sci. 11, 253-274. Fabrikant, A., Jaggard, A., Schapira, M. (2013) On the Structure of Weakly Acyclic Games.
  • 18. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 505 Theory of Computing Systems 53, 107-122. Fudenberg, D., Levine, D. (1998) The Theory of Learning in Games. The MIT Press, Cambridge, MA, USA. Hart, S., Mas-Colell, A. (2000) A simple adaptive procedure leading to a correlated equilibrium. Econometrica, 68:1127-1150. Hofbauer, J., Sandholm, W. (2002) On the global convergence of stochastic fictitious play. Econometrica 70, 2265-2294. Leslie, D., Collins E. (2003) Convergent multiple-timescales reinforcement learning algorithm in normal form games. Ann., Appl. Probab., 13, pp. 1231-1251. Leslie, D., Collins E. (2005) Individual Q-learning in normal form games. SIAM J. Control Optim, 44(2), pp. 495-514. Leslie, D., Collins E. (2006) Generalised weakened fictitious play. Games and Economic Behavior, 56:285–298. Marden, J., Young, P., Arslan, G., Shamma, J. (2009) Payoff-based dynamics for multiplayer weakly acyclic games. SIAM J. Control and Optimization, 48(1). McFadden, D. (1974) Conditional logit analysis of qualitative choice-behavior. In Zarembka P, (Ed.) Frontiers in econometrics, Academic Press, New York. Miyagi, T. (1983) Dual approach to the modal equilibrium problem. Technical Report, N0.83- TE-MT3-8, Dept. of Civil Engineering, Gifu University. Miyagi, T. (2005) Stochastic fictitious play, reinforcement learning and the user equilibrium in transportation networks. A paper presented at the IVth meeting on "Mathematics in Transport", University College London. Miyagi, T., Peque, G. (2012) Informed user algorithm that converge to a pure Nash equilibrium in traffic games. Procedia- Social and Behavioral Sciences, Volume 54, 4 October, pp. 438– 449. Miyagi, T., Peque, G., Fukumoto, J. (2013) Adaptive Learning Algorithms for Traffic Games with Naive Users. Procedia - Social and Behavioral Sciences, Volume 80, 7 June, Pages 806- 817. Miyagi, T., Ohno, E., Morisugi, H. (1991) A fixed point algorithm for solving the traffic equilibria. Studies of Regional Sciences, No.21,pp. 229-246. Monderer, D., Shapley, L. (1996) Potential games. Games and Economic Behavior, 14:124– 143. Nagel, K., Flotterod, G. (2012) Agent-based traffic assignment: going from trips to behavioral travelers. in R. Pendyala and C. Bhat (eds), Travel Behaviour Research in an Evolving World, Emerald Group Publishing, Bingley, UK, pp. 261-293. Nonoyama, H., Miyagi, T. (1982) A fixed point approach to the supply-demand equilibrium problem in traffic network. Proc. of Infrastructure Planning. Robbins, H., Monro, S. (1951) A Stochastic Approximation Method. The Annals of Mathematical Statistics 22 (3): 400. Rosenthal, R. (1973) A class of games possessing pure-strategy Nash equilibria. International Journal of Game Theory 2: 65–67. Selten, R., Schreckenberg, M., Chmura, T., Pitz, T., Kube, S., Hafstein, S., Chrobok, R., Pottmeier, A., Wahle, J. (2004) Experimental investigation of day-to-day route-choice behaviour and network simulations of autobahn traffic in North Rhine-Westphalia. In: Schreckenberg A, Selten R, editors, Human Behaviour and Traffic Networks. Springer, Berlin Heidelberg, pp. 1-21. Singh, S., Jaakola, T., Littman, M., Szepesvari, C. (2000) Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning 38, 287-308. Tadelis, S. (2012) Game Theory: An Introduction. Economics Books, Princeton University
  • 19. Journal of the Eastern Asia Society for Transportation Studies, Vol.11, 2015 506 Press, edition 1, volume 1, number 10001. Van der Genugten, B. (2000) Aweakened form of fictitious play in two-person zero-sum games. Int. Game Theory Rev. 2, 307-328. Wardrop, J. (1952) Some theoretical aspects of road traffic research. In Proceedings of the Institute of Civil Engineers, Part II, pp. 325-378.