2. Abstract
In this thesis we consider the ‘max-plus’ algebra; that is, the set Rmax = R ∪ {−∞}
endowed with the operations a ⊕ b = max{a, b} and a ⊗ b = a + b. It is shown
that Rmax has the structure of a semiring with several additional useful properties.
We introduce the idea of matrices over the max-plus semiring and develop max-plus
variants of several familiar concepts from classical linear algebra; most notably the
theory of eigenvalues and eigenvectors. In Chapter 2 we introduce the theory of
event graphs which are used to model dynamical systems which admit a degree of
synchronisation such as rail networks or automated manufacturing processes. We use
the theory of max-plus algebra developed in Chapter 1 to derive results concerning
the time evolution of such systems and also consider their long-term behaviour.
Finally, in Chapter 3 we consider event graphs in which the timed elements form
sequences of random variables. We look for steady state distributions and conditions
for their existence, and attempt to characterise the asymptotic behaviour of the event
timings concerned. We conclude by exploring how we can represent certain types
of queuing systems by stochastic event graphs and present a key theorem regarding
the stability of their waiting times.
i
6. Chapter 0
Introduction
Exotic semirings such as (R ∪ {−∞}, max, +) and (R ∪ {+∞}, min, +) have been studied at
length since the 1950s, beginning primarily in the area of operational research. Nowadays the
term ‘tropical mathematics’ is often used to describe their study, though this term originally
referred to one particular discrete version of the max-plus algebra introduced by I. Simon
in 1988 [15]. Their applications span a wide range of fields including optimisation & control,
mathematical physics, algebraic geometry, dynamic programming and mathematical biology [10,
15]. In particular, the study of such algebras in relation to discrete event system theory (both
deterministic and stochastic), graph theory, Markov decision processes, asymptotic analysis and
language theory has lead to some significant progress in these areas over the last 30 years [8].
Many of the concepts developed in conventional linear algebra have been ‘translated’ into the
world of max-plus, including solutions to linear and non-linear systems (both analytical and
numerical), linear dependence and independence, determinants, eigenvalues and eigenvectors
[9]. In 1979 Cuninghame-Green authored the first comprehensive unified account of these results
entitled “Minimax Algebra” [7], building on many papers published over the preceding 20 years
from various disciplines within mathematics, economics and computer science. As recently
as 2006, Heidergott, Olsder and Woude published what they consider the first ‘textbook’ in
the area of max-plus algebra [13], and many of the ideas explored below can be found in this
publication.
In the first chapter of this thesis, we aim to give an overview of max-plus linear algebra and
to build the necessary groundwork required for the applications discussed in the chapters that
follow. In particular, we present two celebrated theorems in the area of max-plus theory. The
first, which can be found in [7], concerns spectral theory and says that under mild conditions,
a matrix over the max-plus algebra has a unique eigenvalue with a simple graph-theoretic
interpretation. The second, originally proved by M. Viot in 1983 [2, 6], relates to the asymptotic
behaviour of sequential powers of max-plus matrices, which turns out to be essentially periodic
and has great implications for the material explored in Chapters 2 & 3.
In chapter 2 we introduce the concept of timed Petri nets & event graphs. For a thorough
1
7. discussion on the scope of their application readers are referred to [18]; in this thesis we fo-
cus solely on their use in the modelling of the time behaviour of a class of dynamic systems
known as ‘discrete event dynamic systems’. In simple terms, these are systems in which a finite
number of resources (e.g. processors or machines) are shared by several users (e.g. packets or
manufactured objects) which all contribute to the achievement of some common goal (e.g. a
parallel computation or the assembly of a product) [2]. We will see that under certain conditions
these systems, while highly non-linear in the conventional sense, can be ‘linearised’ by using
the max-plus algebra. This observation, first made in [5], is of vital importance and constitutes
one of the main reasons for the continued study of max-plus algebra today. The main content
of Chapter 2 concerns the ‘basic autonomous equation’ which governs the time evolution of
discrete event systems, and the steps towards its solution. We are then able to apply some ideas
from Chapter 1 to explore the long-term behaviour of such systems.
Chapter 3 concerns stochastic event graphs, which can be thought of as a natural extension
to the concepts introduced in Chapter 2. As the name suggests, we now assume a degree of
randomness in the event timings of the systems we are trying to model. Amongst other things,
stochastic event graphs can be used to model many types of queuing systems [3], the most
simple of which being the G/G/1 queue. We introduce several key ‘first order’ theorems which
establish the nature of stationary regimes in terms of the inverse throughput, and explore the
conditions under which such regimes are reached. We end by presenting a ‘second order’ theorem
concerning the stability of inter-event timings (for example, waiting times) in the context of
queuing systems.
2
8. Chapter 1
Max-Plus Algebra
1.1 The Max-Plus Semiring
1.1.1 Basic Definitions and Properties
In this thesis we work exclusively with the max-plus algebra (Rmax, ⊕, ⊗), where Rmax = R ∪
{−∞}, and for a, b ∈ Rmax:
a ⊕ b := max{a, b}
a ⊗ b := a + b
We begin by examining its algebraic structure, and we will then move on to vectors and matrices
over Rmax. We start by defining the term semiring.
Definition 1.1. A semiring is a triple (R, +, ×) where R is a non-empty set and +, × are
binary operations on R (referred to as addition and multiplication respectively) such that
(i) (R, +) is commutative and associative, with zero element εR:
(a) a + b = b + a
(b) (a + b) + c = a + (b + c)
(c) εR + a = a + εR = a
(ii) (R, ×) is associative, with unit element eR:
(a) (a × b) × c = a × (b × c)
(b) eR × a = a × eR = a
(iii) Multiplication distributes over addition:
(a) a × (b + c) = (a × b) + (a × c)
(b) (a + b) × c = (a × c) + (b × c)
(iv) Multiplication by εR annihilates R:
3
9. (a) εR × a = a × εR = εR
Note that the final axiom is not required in the definition of a standard ring since it follows
from the others, but it is needed here.
As the title of this section suggests, the max-plus algebra is a semiring with additive identity
ε := −∞ and multiplicative identity e := 0. It is straightforward to verify that all the axioms
of Definition 1.1 hold in the case of (Rmax, ⊕, ⊗). For example, the first distributive law holds
since
a ⊗ (b ⊕ c) = a + max{b, c}
= max{a + b, a + c}
= (a ⊗ b) ⊕ (a ⊗ c)
and the others follow similarly. For the sake of simplicity we will write Rmax for (Rmax, ⊕, ⊗)
when the context is clear.
Below we list three additional algebraic properties of Rmax which do not form part of the
definition of a semiring:
(i) Commutativity of ⊗:
∀a, b ∈ Rmax : a ⊗ b = b ⊗ a
(ii) Existence of multiplicative inverses:
∀a ∈ Rmax{ε} ∃ b ∈ Rmax such that a ⊗ b = e
(iii) Idempotency of ⊕:
∀a ∈ Rmax : a ⊕ a = a
The first two properties follow directly from the fact that (R, +) forms an abelian group, and
the third property is easily proved: a ⊕ a = max{a, a} = a. Properties (i) and (ii) mean that
we could refer to (Rmax, ⊕, ⊗) as a semifield (i.e. a field without additive inverses), though
this term can be ambiguous and is seldom used in mathematical literature. Note also that in
general, any semiring in which addition is idempotent we call an idempotent semiring. The term
dioid (originating from the phrase double monoid) was introduced by Baccelli et al. in 1992 to
mean idempotent semiring [2], but we do not use this word here.
The crucial difference between a semiring and a ring in general is that an element of the former
need not have an additive inverse. Note that this does not say that additive inverses can never
exist - there may be a non-empty subset of R containing elements which do have additive
inverses (which could be thought of as the additive analogue to the set of units in a standard
ring). However, the following lemma immediately tells us that no elements of Rmax (apart from
4
10. ε) have additive inverses.
Lemma 1.2. Let (R, +, ×) be a semiring. If + is idempotent then additive inverses do not
exist.
Proof. Suppose that εR = a ∈ R has an additive inverse b. Then
a + b = εR
Adding a to both sides of the equation yields
a + a + b = a + εR
By idempotency of +, the left-hand side is equal to a + b, whereas the right-hand side is equal
to a. Hence we have
a + b = a
which contradicts a + b = εR. Thus a does not have an additive inverse.
1.1.2 Other Algebraic Definitions
For a ∈ Rmax, n ∈ N, define
a⊗n
:= a ⊗ a ⊗ · · · ⊗ a
n times
Thus exponentiation in max-plus is equivalent to conventional multiplication a⊗n = n×a. Some
of the laws of exponentiation are therefore different to what we are used to. For a, b ∈ Rmax,
m, n ∈ N:
(i) a⊗m ⊗ a⊗n = ma + na = (m + n)a = a⊗(m⊗n)
(ii) (a⊗m)⊗n = (ma)⊗n = nma = a⊗(m⊗n)
(iii) a⊗1 = 1a = a
(iv) a⊗m ⊗ b⊗m = ma + mb = m(a + b) = (a ⊗ b)⊗m
and we also adopt the natural conventions a⊗ε := ε and a⊗e := e. For negative exponents we
can take
a⊗−n
:= (a⊗n
)⊗−1
where the outer exponent on the right-hand side denotes the max-plus multiplicative inverse,
which was shown to exist in the previous section. Finally, we can extend the concept of ex-
ponentiation in Rmax to non-integer exponents using conventional notation in the following
5
11. way:
a⊗ n
m :=
n
m
× a
which is well-defined, assuming m = ε.
Next, we can equip the max-plus algebra with a natural order relation as follows:
Definition 1.3. For a, b ∈ Rmax, we say a ≤ b if a ⊕ b = b.
It is easily verified that the max-plus operations ⊕ and ⊗ preserve this order, i.e. ∀a, b, c ∈ Rmax,
a ≤ b ⇒ a ⊕ c ≤ b ⊕ c and a ⊗ c ≤ b ⊗ c.
Finally, infinite sums in max-plus are defined by i∈I xi := sup{xi : i ∈ I} for any possibly
infinite (even uncountable) family {xi}i∈I of elements of Rmax, when the supremum exists. In
general, we say that an idempotent semiring is complete if any such family has a supremum,
and if the product distributes over infinite sums. The max-plus semiring Rmax is not complete
(a complete idempotent semiring must have a maximal element), but it can be embedded in
the complete semiring (Rmax, ⊕, ⊗), where Rmax := Rmax ∪ {+∞}.
1.2 Vectors and Matrices over Rmax
1.2.1 Definitions and Structure
Let n, m ∈ N. We denote the set of n × m matrices over Rmax by Rn×m
max . For i ∈ {1, . . . , n},
j ∈ {1, . . . , m}, the element of a matrix A ∈ Rn×m
max in row i and column j is denoted by [A]ij,
or simply aij for notational convenience. Thus A ∈ Rn×m
max can be written as
a11 a12 · · · a1m
a21 a22 · · · a2m
...
...
...
...
an1 an2 · · · anm
where a11, . . . , anm ∈ Rmax. In a similar vein, the elements of Rn
max := Rn×1
max are called max-plus
vectors, and we write the i-th element of a vector x ∈ Rn
max as [x]i, or simply xi.
Typical concepts and operations from conventional algebra are defined for max-plus matrices
in the usual way (replacing + and × with ⊕ and ⊗ respectively), as outlined in the following
definitions.
Definition 1.4. The n × n max-plus identity matrix, denoted En, is defined by
[En]ij =
0 i = j
ε i = j
We will write E := En whenever the context is clear.
6
12. Definitions 1.5. (i) For A, B ∈ Rn×m
max , their sum A ⊕ B is defined by
[A ⊕ B]ij = aij ⊕ bij = max aij, bij
(ii) For A ∈ Rn×k
max and B ∈ Rk×m
max , their product A ⊗ B is defined by
[A ⊗ B]il =
k
j=1
(aij ⊗ bjl) = max
j=1,...,k
(aij + bjl)
(iii) The transpose of a matrix A ∈ Rn×m
max is denoted by A and is defined as usual by
[A ]ij = [A]ji
(iv) For A ∈ Rn×n
max and k ∈ N, the k-th power of A, denoted A⊗k, is defined by
A⊗k
= A ⊗ A ⊗ · · · ⊗ A
k times
For k = 0, A⊗0 := En.
(v) For A ∈ Rn×m
max and α ∈ Rmax, α ⊗ A is defined by
[α ⊗ A]ij = α ⊗ [A]ij
We now look at a crucial result concerning the algebraic structure of square matrices over Rmax.
Proposition 1.6. (Rn×n
max , ⊕, ⊗) is an idempotent semiring with multiplicative identity En.
Proof. The axioms of Definition 1.1 all follow from the semiring structure of Rmax, and are
readily verified. For example, for A, B, C ∈ Rn×n
max we have that
[A ⊗ (B ⊕ C)]il =
n
j=1
(aij ⊗ (bjl ⊕ cjl))
=
n
j=1
(aij ⊗ bjl) ⊕ (aij ⊗ cjl)
=
n
j=1
(aij ⊗ bjl) ⊕
n
j=1
(aij ⊗ cjl)
= [(A ⊗ B) ⊕ (A ⊗ C)]il
and so A ⊗ (B ⊕ C) = (A ⊗ B) ⊕ (A ⊗ C). The other axioms follow similarly.
Note that since addition in (Rn×n
max , ⊕, ⊗) is idempotent, we can apply Lemma 1.2 once again to
see that no element of Rn×n
max has an additive inverse. However, unlike in Rmax, multiplication
7
13. of matrices over Rmax is not commutative. For example
1 e
ε −2
2 −1
3 ε
=
3 e
1 ε
=
3 2
4 3
=
2 −1
3 ε
1 e
ε −2
Also unlike Rmax, matrices over Rmax do not necessarily have multiplicative inverses (i.e. they
are not necessarily invertible). We explore this in the next section.
1.2.2 Matrix Inversion
Definition 1.7. Let A, B ∈ Rn×n
max . B is a right inverse of A if A ⊗ B = E, and B is a left
inverse of A if B ⊗ A = E.
Definition 1.8. A max-plus permutation matrix is a matrix A ∈ Rn×n
max with each row and
each column containing exactly one entry equal to e, with all other entries equal to ε. If σ :
{1, . . . , n} → {1, . . . , n} is a permutation, the max plus permutation matrix Pσ is defined by
[Pσ]ij :=
e i = σ(j)
ε i = σ(j)
As the name suggests, left multiplication by Pσ permutes the rows of a matrix: the i-th row of
a matrix A ∈ Rn×n
max will appear as the σ(i)-th row of Pσ ⊗ A. For example, if n = 2 and σ is
defined by σ(1) = 2, σ(2) = 1:
ε e
e ε
1 2
3 4
=
3 4
1 2
Similarly, it is straightforward to see that right multiplication by Pσ permutes the columns of
a matrix.
Definition 1.9. A matrix A ∈ Rn×n
max is diagonal if [A]ij = ε for all i = j. If a1, . . . , an ∈
Rmax{ε}, the diagonal matrix D(a1, . . . , an) is defined by
[D(a1, . . . , an)]ij :=
ai i = j
ε i = j
Combining these two definitions, if σ is a permutation and a1, . . . , an ∈ Rmax {ε}, Pσ ⊗
D(a1, . . . , an) gives a matrix in which each row and each column contains exactly one finite
entry. This class of matrices (sometimes referred to as generalised permutation matrices) in
max-plus turns out to be of some significance, as the theorem below shows.
Theorem 1.10. A matrix A ∈ Rn×n
max has a right inverse if and only if A = Pσ ⊗ D(a1, . . . , an)
for some permutation σ and a1, . . . , an ∈ Rmax{ε}.
Proof. Suppose A = Pσ ⊗ D(a1, . . . , an) for some permutation σ and a1, . . . , an ∈ Rmax{ε}.
8
14. Recalling from Section 1.1.1 that multiplicative inverses exist in Rmax, define B ∈ Rn×n
max by
[B]ij =
[A]⊗−1
ji if [A]ji = ε
ε otherwise
Then for i, j = 1, . . . , n we have that
[A ⊗ B]ij = max
k=1,...,n
aik ⊗ bkj
=
e j = i
ε j = i
Since if j = i, at least one of aik, bkj is equal to ε for each k = 1, . . . , n (since A only has one
finite element per column and row). Thus A ⊗ B = E, and B is a right inverse of A.
Conversely, suppose A has inverse B ∈ Rn×n
max . For i, j = 1, . . . , n we have
n
k=1
[A]ik ⊗ [B]kj = [E]ij
and therefore for each i = 1, . . . , n there is a (least) index c(i) (1 ≤ c(i) ≤ n) such that [A]ic(i)
and [B]c(i)i are both finite, since [E]ii = e. Moreover we cannot have [A]hc(i) finite with h = i,
since then
[A ⊗ B]hi ≥ [A]hc(i) ⊗ [B]c(i)i > ε = [E]hi
which contradicts our assumption that B is a right inverse of A. It follows that the mapping i →
c(i) is a bijection, i.e. each column of A is labelled c(i) for some i and contains exactly one finite
element, and each row of A contains exactly one finite element. That is, A = Pσ ⊗D(a1, . . . , an)
for some permutation σ and a1, . . . , an ∈ Rmax{ε}.
Theorem 1.11. For A, B ∈ Rn×n
max , A ⊗ B = E if and only if B ⊗ A = E (i.e. right and left
inverses are equivalent), and A uniquely determines B.
Proof. Suppose that A has right inverse BR ∈ Rn×n
max . Then by Theorem 1.10, we know that
A = Pσ ⊗ D(a1, . . . , an) for some permutation σ and a1, . . . , an ∈ Rmax{ε}. Now, as before,
define BL ∈ Rn×n
max by
[BL]ij =
[A]⊗−1
ji if [A]ji = ε
ε otherwise
and using the same reasoning as before we observe that BL is a left inverse of A. Finally, note
that
BR = E ⊗ BR = (BL ∗ A) ⊗ BR = BL ⊗ (A ⊗ BR) = BL ⊗ E = BL
showing that BR is uniquely determined, and is also a left inverse.
9
15. Theorem 1.11 tells us that we do not need to make a distinction between right and left inverses,
as we did in Definition 1.7. Before moving on we show one last result which says that the
product of two invertible matrices is also invertible.
Proposition 1.12. If A, B ∈ Rn×n
max are invertible then A ⊗ B is also invertible.
Proof. This proof uses some simple results regarding diagonal and permutation matrices in
conventional algebra, whose analogues are easily proved in max-plus. To start, recall that for a
permutation matrix Pσ, we have that P−1
σ = Pσ−1 . Thus if D(a1, . . . , an) is a diagonal matrix:
D(a1, . . . , an) ⊗ Pσ = (Pσ ⊗ Pσ−1 ) ⊗ D(a1, . . . , an) ⊗ Pσ
= Pσ ⊗ (Pσ−1 ⊗ D(a1, . . . , an) ⊗ Pσ)
= Pσ ⊗ D(aσ(1), . . . , aσ(n))
Now from Theorem 1.10 we can write A = PσA ⊗D(a1, . . . , an), B = PσB ⊗D(b1, . . . , bn). Then
using the above
A ⊗ B = PσA ⊗ D(a1, . . . , an) ⊗ PσB ⊗ D(b1, . . . , bn)
= PσA ⊗ PσB ⊗ D(aσA(1), . . . , aσA(n)) ⊗ D(b1, . . . , bn)
= PσB ◦ σA ⊗ D(aσA(1) ⊗ b1, . . . , aσA(n) ⊗ bn)
and therefore A ⊗ B is invertible by Theorem 1.10.
1.2.3 Determinants
Recall that in conventional algebra, the determinant of a matrix A ∈ Rn×n is defined as
det(A) =
σ∈Sn
sgn(σ)
n
i=1
aiσi
where Sn is the symmetric group on n elements (so an element of Sn is a permutation σ :
{1, . . . , n} → {1, . . . , n}), and the sign of a permutation σ ∈ Sn, denoted sgn(σ), is defined by
sgn(σ) =
1 σ even
−1 σ odd
Unfortunately this definition cannot be immediately translated into max-plus (i.e. by replacing
+ and × with ⊕ and ⊗ respectively) because the use of the sign function requires that we have
additive inverses. Instead, two related concepts are introduced below which offer alternatives
to the notion of the determinant in the case of the max-plus algebra.
Definition 1.13. Let A ∈ Rn×n
max . The permanent of A, denoted perm(A), is defined as
perm(A) =
σ∈Sn
n
i=1
aiσi
10
16. Note that, crudely put, the permanent is the max-plus analogue of the determinant with the
minuses simply removed. We can understand the formula to give the maximal sum of the
diagonal values for all permutations of the columns of A. The permanent has been studied at
length both in the case of conventional algebra (see [17]) and in max-plus & related semirings
(see [19]).
Note that if A ∈ Rn×n
max is invertible then by Theorem 1.10, A = Pσ ⊗ D(a1, . . . , an) and so
perm(A) = n
i=1 ai = ε. However, unlike in the case of determinants in conventional matrix
algebra, the converse is not necessarily true.
The second concept in max-plus related to the determinant, known as the dominant, can be
thought of as a refinement of the permanent. It is defined below.
Definition 1.14. Let A ∈ Rn×n
max and let the matrix zA be defined by [zA]ij = zaij . The dominant
of A, denoted dom(A), is defined as
dom(A) =
highest exponent in det(zA) if det(zA) = 0
ε otherwise
The dominant can be used to prove max-plus analogues of major results such as Cramér’s
Theorem and the Cayley-Hamilton Theorem. We do not have the space to include these here;
for a comprehensive discussion readers are again referred to [19].
1.3 Graph-theoretic Interpretations in Max-Plus
As in conventional linear algebra, when working with vectors and matrices it is often natural
to interpret definitions and theorems graphically. It turns out that in the case of max-plus
algebra, it is not only natural to do so but also rather insightful. We will only really be able to
appreciate this when we come to look at the eigenvalue problem in the next section, but firstly
we must define all of the graph-theoretic concepts that we will require.
Definitions 1.15. (i) A directed graph G is a pair (V, E) where V is the set of vertices (or
nodes) and E ⊆ V × V is the set of edges (or arcs).
(ii) A path from vertex i to vertex j is a sequence of edges p = (i1, . . . , is+1) with i1 = i and
is+1 = j, such that (ik, ik+1) ∈ E for all k ∈ {1, . . . , s}.
(iii) The length of a path p = (i1, . . . , is+1), denoted |p|l, is equal to s. The set of paths from
vertex i to vertex j of length k is denoted Pk(i, j).
(iv) The weight of a path p from vertex i to vertex j of length d is given by
|p|w =
d
k=1
aik+1,ik
where i1 = i and id+1 = j.
11
17. (v) The average weight of a path p is given by |p|w
|p|l
.
(vi) A circuit of length s is a path of length s which starts and finishes at the same vertex, i.e.
a path c = (i1, . . . , is+1) such that i1 = is+1.
(vii) A circuit c = (i1, . . . , is+1) is elementary if i1, . . . , is are distinct, and s ≥ 1. We denote
the set of elementary circuits in G(A) by C(A).
(viii) For A ∈ Rn×n
max , the communication graph (or the precedence graph) of A, denoted G(A),
is the graph with vertex set V(A) = {1, . . . , n} and edge set E(A) = {(i, j) : aji = ε}. The
weight of the edge (i, j) ∈ E(A) is given by the entry aji.
Note that the (i, j)-th entry of the matrix A specifies the weight of the edge in G(A) from vertex
j to vertex i. This is common practice in the area of max-plus and graph theory but may not
appear intuitive to those new to the subject.
We now move on to looking at two particular matrices that play a vital role in relating graph
theory to max-plus linear algebra. For A ∈ Rn×n
max , let
A+
:=
∞
k=1
A⊗k
The element [A+]ji gives the maximal weight of any path from i to j in G(A). This statement
is non-trivial, but follows directly from the theorem below.
Theorem 1.16. Let A ∈ Rn×n
max . Then ∀k ∈ N:
[A⊗k
]ji =
max |p|w : p ∈ Pk(i, j) if Pk(i, j) = ∅
ε if Pk(i, j) = ∅
Proof. We use induction on k. Let i, j ∈ {1, . . . , n}. When k = 1, P1(i, j) either contains a
single path of length 1, namely the edge (i, j), or is empty if no such edge exists. In the first case,
the weight of the path is by definition [A]ji, and in the second case max |p|w : p ∈ Pk(i, j) = ε,
which is again equal to the value [A]ji (since there is no edge from i to j).
Now suppose the result holds for some k. Firstly, assume that Pk+1(i, j) = ∅. A path p ∈
Pk+1(i, j) can be split up into a subpath of length k running from i to some vertex l, and a
path consisting of a single edge from l to j. More formally:
p = ˆp ◦ (l, j) with ˆp ∈ Pk(i, l)
The maximal weight of any path in Pk+1(i, j) can thus be obtained from
max
l=1,...,n
[A]jl + max{|ˆp|w : ˆp ∈ Pk(i, l)}
= max
l=1,...,n
[A]jl + [A⊗k
]li (Inductive hypothesis)
12
18. =
n
l=1
[A]jl ⊗ [A⊗k
]li
= [A ⊗ A⊗k
]ji
= [A⊗(k+1)
]ji
which is what we wanted to prove. Finally, consider the case when Pk+1(i, j) = ∅; i.e. when
there exists no path of length k + 1 from i to j. This implies that for any vertex l, either there
is no path of length k from i to l or there is no edge from l to j (or possibly both). Hence
for any l, at least one of the values [A]jl, [A⊗k]li equals ε. Therefore [A⊗(k+1)]ji = ε, and this
completes the proof.
Note that Theorem 1.16 immediately tells us that A+ is not necessarily well-defined. For
example, if there exists a circuit c = (i1, . . . , is+1) in G(A) in which every edge has positive
weight, then [A⊗k]ji diverges (i.e. tends to +∞) as k → ∞ for any i, j ∈ {i1, . . . , is+1} (since
we can loop around the circuit c as many times a we like, creating a path of higher and higher
weight). The next lemma provides us with a sufficient condition for A+ to be well-defined, and
also reduces the complexity of the infinite sum.
Lemma 1.17. Let A ∈ Rn×n
max be such that any circuit in G(A) has non-positive average weight
(i.e. less than or equal to e). Then we have
A+
= A⊗1
⊕ A⊗2
⊕ A⊗3
⊕ · · · ⊕ A⊗n
∈ Rn×n
max
Proof. Since A is of dimension n, any path p in G(A) from i to j of length greater than n
necessarily contains at least one circuit. We have assumed that all of the circuits in G(A) have
non-positive weights, so removing the circuits in p yields a path from i to j of length at most
n, and of greater average weight. It follows that
[A+
]ji ≤ max [A⊗k
]ji : k ∈ {0, . . . , n}
and the reverse inequality is immediate from the definition of A+. This concludes the proof.
Before moving on, we prove one simple property of A+ that will come in handy later on.
Proposition 1.18. For A ∈ Rn×n
max , we have that A+ ⊗ A+ = A+.
Proof. Consider two vertices i, l ∈ {1, . . . , n}. A path of maximal weight from i to l can be split
up as a path of maximal weight from i to j plus a path of maximal weight from j to l, for any
j ∈ {1, . . . , n} for which the sum of the two path weights is maximal. Indeed this relationship
holds if and only if j is in the path of maximal weight from i to l, but for our purposes we can
simply take the maximum over all vertices.
By Theorem 1.16, the weight of such a path is given by [A+]li. Thus in max-plus notation
13
19. (recalling that ⊗ is commutative for scalars α ∈ Rmax), we can write
[A+
]li =
n
j=1
[A+
]ji ⊗ [A+
]lj
=
n
j=1
[A+
]lj ⊗ [A+
]ji = [A+
⊗ A+
]li
and therefore A+ = A+ ⊗ A+ as required.
We now introduce one more definition which is closely related to the object A+ defined above.
This will prove to be an integral concept throughout the rest of this chapter and beyond, and
as such, this is one of the most important definitions in this thesis.
Definition 1.19. For A ∈ Rn×n
max , let
A∗
:=
∞
k=0
A⊗k
= E ⊕ A+
Clearly, A∗ and A+ only differ on the leading diagonal. By Theorem 1.16, the (j, i)-th of A∗
could be interpreted as the maximal weight of any path from i to j in G(A), provided we
recognise the additional concept of an empty circuit of length 0 and weight e from every vertex
to itself.
Using Lemma 1.17, it is immediate from the definition of A∗ that if all the circuits in G(A) have
non-positive average weight, then A∗ = A⊗0 ⊕ A⊗1 ⊕ · · · ⊕ A⊗n. However, as the lemma below
shows, thanks to the addition of the identity matrix (i.e. the A⊗0 term) in A∗, we are able to
refine this result slightly by dropping the final term in the sum.
Lemma 1.20. Let A ∈ Rn×n
max be such that any circuit in G(A) has non-positive average weight.
Then we have
A∗
= A⊗0
⊕ A⊗1
⊕ A⊗2
⊕ · · · ⊕ A⊗(n−1)
∈ Rn×n
max
Proof. The same argument applies as in the proof of Lemma 1.17. Note that any path p in G(A)
from i to j of length n or greater necessarily contains at least one circuit, and so removing the
circuit(s) yields a path from i to j of length at most n − 1 and with greater average weight. For
the special case when i = j and p is an elementary circuit of length n (so visiting each vertex
in G(A) exactly once), the i-th entry on the diagonal of A⊗0 (which equals e by definition) will
always be greater than the corresponding entry in A⊗n, since e is the maximum possible weight
of any circuit. This is why we can drop the A⊗n term.
Note that we also have a direct analogue of Lemma 1.18 for the matrix A∗, and this will be
useful in the analysis that follows:
Proposition 1.21. For A ∈ Rn×n
max , we have that A∗ ⊗ A∗ = A∗.
14
20. Proof. From Lemma 1.18 we have that A+ = A+ ⊗A+. Recalling the definition of A∗ and using
idempotency of matrix addition, we have
A∗
⊗ A∗
= (A+
⊕ E) ⊗ (A+
⊕ E)
= (A+
⊗ A+
) ⊕ (A+
⊗ E) ⊕ (E ⊗ A+
) ⊕ E
= A+
⊕ A+
⊕ A+
⊕ E
= A+
⊕ E = A∗
as required.
To finish this section, we introduce one more important property of square matrices over max-
plus known as irreducibility. The definition comes in three parts:
Definitions 1.22. (i) In a graph G, a vertex j is reachable from vertex i if there exists a
path from i to j.
(ii) A graph is strongly connected if every vertex is reachable from every other vertex.
(iii) A matrix A ∈ Rn×n
max is irreducible if G(A) is strongly connected.
The class of irreducible matrices over max-plus will turn out to be of real significance in Section
1.4. From a practical point of view it is not obvious how to determine whether a given matrix
A ∈ Rn×n
max is irreducible, but as the proposition below shows, one option is to examine the matrix
A+. Combined with Lemma 1.17 (when A has the appropriate properties), this provides us with
a handy (and computationally quick) way to check for matrix irreducibility over max-plus.
Proposition 1.23. A matrix A ∈ Rn×n
max is irreducible if and only if all the entries of A+ are
different from ε.
Proof. A matrix is irreducible if there is a path between any two vertices i and j in G(A), which
by Theorem 1.16 occurs exactly when the entry [A+]ji is not equal to ε.
1.4 Spectral Theory
1.4.1 Eigenvalues and Eigenvectors
Given a matrix A ∈ Rn×n
max , we consider the problem of existence of eigenvalues and eigenvectors.
The main result in max-plus spectral theory is that, under mild conditions, A has a unique
eigenvalue with a simple graph-theoretic interpretation. As can be seen below, the definition of
max-plus eigenvalues and eigenvectors is a direct translation from conventional linear algebra,
with the × operator replaced with ⊗:
Definition 1.24. Let A ∈ Rn×n
max . If there exists a scalar µ ∈ Rmax and a vector v ∈ Rn
max
(containing at least one finite element) such that
A ⊗ v = µ ⊗ v
15
21. then µ is an eigenvalue of A and v is an eigenvector of A associated with the eigenvalue µ.
Note that Definition 1.24 allows an eigenvalue to be µ = ε. However, the proposition below says
that this can only happen when A has a column in which all entries are ε. In graph-theoretic
terms this means that G(A) has a vertex which, once visited, can never be left (sometimes called
a sink). This is uninteresting from an analytical point of view, so it is reasonable to consider
the case µ = ε to be trivial. Before we prove this result, we introduce some simple notation.
Notation. Let A ∈ Rn×n
max . For i ∈ {1, . . . , n}, we denote the i-th row of A by [A]i·. Similarly,
for j ∈ {1, . . . , n}, we denote the j-th column of A by [A]·j.
Proposition 1.25. ε is an eigenvalue of A ∈ Rn×n
max iff A has at least one column in which all
entries are ε.
Proof. Let A ∈ Rn×n
max be such that [A]·j = (ε, . . . , ε) for some j ∈ {1, . . . , n}. Let v ∈ Rn
max be
such that [v]i = ε ∀i = j and [v]j = α = ε. Then it is easy to verify that [A ⊗ v]i = ε for all
i = 1, . . . , n; that is, ε is an eigenvalue of A with an associated eigenvector v.
Conversely, suppose A ∈ Rn×n
max has eigenvalue ε with an associated eigenvector v. let J = {j :
vj = ε}, which is non-empty by definition. Then for each i = 1, . . . , n we have
ε = [A ⊗ v]i =
n
j=1
aij ⊗ vj =
j∈J
aij ⊗ vj
=⇒ aij = ε ∀j ∈ J
So every column j of A for which vj = ε has all its entries equal to ε. In particular, A contains
at least one column in which all entries are ε.
Corollary 1.26. If A ∈ Rn×n
max is irreducible then ε is not an eigenvalue of A.
Proof. If A is irreducible then it cannot have a column in which all entries are ε. Thus by
Proposition 1.25, ε is not an eigenvalue of A.
Note that eigenvectors are not unique: any scalar multiple of an eigenvector is also an eigen-
vector, and more generally, if µ is an eigenvalue of A, v1, v2 are associated eigenvectors and
α1, α2 ∈ Rmax{ε}, then we have
A ⊗ (α1 ⊗ v1) ⊕ (α2 ⊗ v2) = A ⊗ (α1 ⊗ v1) ⊕ A ⊗ (α2 ⊗ v2)
= α1 ⊗ (A ⊗ v1) ⊕ α2 ⊗ (A ⊗ v2)
= α1 ⊗ (µ ⊗ v1) ⊕ α2 ⊗ (µ ⊗ v2)
= µ ⊗ (α1 ⊗ v1) ⊕ (α2 ⊗ v2)
So (α1 ⊗ v1) ⊕ (α2 ⊗ v2) is also an eigenvector associated with the eigenvalue µ. In fact, the
eigenvectors associated with a given eigenvalue form a vector space in max-plus called the
eigenspace which we shall explore in depth later.
16
22. As we mentioned at the beginning of Section 1.3, many of the results in the area of max-plus
spectral theory can be interpreted graphically, and the next key lemma constitutes the first step
in doing just that.
Lemma 1.27. Let A ∈ Rn×n
max have finite eigenvalue µ. Then µ is the average weight of some
elementary circuit in G(A).
Proof. Let v be an associated eigenvector of µ. Then by definition not all the entries of v equal
ε, i.e. there exists a vertex/index j1 ∈ {1, . . . , n} such that vj1 = ε. Now v is an eigenvector
and so we have [A ⊗ v]j1 = µ ⊗ vj1 = ε. But [A ⊗ v]j1 = n
k=1 aj1k ⊗ vk, and therefore there
exists a vertex j2 such that
aj1j2 ⊗ vj2 = [A ⊗ v]j1 = ε (1.1)
which implies aj1j2 = ε, i.e. (j2, j1) is an edge in G(A). (1.1) also implies that vj2 = ε, so we
can continue in the same fashion to find a vertex j3 with (j3, j2) an edge in G(A) and vj3 = ε.
Proceeding in this way, eventually some vertex, say, vertex jh, must be encountered for a second
time since the number of vertices is finite. Thus by ignoring the edges prior to encountering jh
for the first time, we have found an elementary circuit
c = ((jh, jh+l−1), (jh+l−1, jh+l−2), . . . , (jh+1, jh))
of length |c|l = l, and with weight
|c|w =
l−1
k=0
ajh+kjh+k+1
(1.2)
where jh = jh+l. By construction, we have that
l−1
k=0
(ajh+kjh+k+1
⊗ vjh+k+1
) = µ⊗l
⊗
l−1
k=0
vjh+k
or equivalently in conventional algebra (for ease of manipulation):
l−1
k=0
ajh+kjh+k+1
+ vjh+k+1
= (l × µ) +
l−1
k=0
vjh+k
Now, because jh = jh+l it follows that l−1
k=0 vjh+k+1
= l−1
k=0 vjh+k
, so subtracting l−1
k=0 vjh+k
from both sides yields
l−1
k=0
ajh+k
jh+k+1 = l × µ
and translated back into max-plus, we can substitute this into (1.2) to see that |c|w = µ⊗l.
17
23. Thus we have that the average weight of the circuit c is equal to
|c|w
|c|l
=
µ⊗l
l
= µ
as required.
Lemma 1.27 tells us that the only candidates for eigenvalues are the average weights of circuits
in G(A). However, it does not tell us which circuits actually define an eigenvalue and which
do not. Fortunately, when A is irreducible the answer to this question is very simple: only
the maximal average circuit weight defines an eigenvalue. This result is established in the two
theorems below, but first we require some additional definitions and notation.
Definitions 1.28. (i) A circuit c ∈ C(A) is critical if its average weight is maximal.
(ii) For A ∈ Rn×n
max , the critical graph of A, denoted Gc(A), is the graph containing the vertices
and edges which belong to the critical circuits in G(A). We write Gc(A) = (Vc(A), Ec(A)),
and refer to the vertices in Vc(A) as critical vertices.
(iii) The critical classes of A ∈ Rn×n
max are the maximal strongly connected components of Gc(A).
Notation. Let A ∈ Rn×n
max . For β ∈ Rmax{ε}, define the matrix Aβ by [Aβ]ij = aij − β.
Note that the ‘−’ operator is to be interpreted in conventional algebra, where we adopt the
convention ε − x = ε ∀x ∈ R. If β is an eigenvalue of A, the matrix Aβ is sometimes called the
normalised matrix.
Note that the communication graphs G(A) and G(Aβ) are identical except for their edge weights,
and if a circuit c in G(A) has average weight w then the same circuit in G(Aβ) has average weight
w − β. In particular, if G(A) has finite maximal average circuit weight λ then the maximal
average circuit weight in G(Aλ) is λ − λ = 0. Furthermore, a circuit in G(A) is critical if and
only if it is critical in G(Aλ), and therefore Gc(A) and Gc(Aλ) are identical (again, except for
their edge weights).
Consider the matrix A+
λ , which is to be read (Aλ)+
. By Theorem 1.16, the element [A+
λ ]ij gives
the maximal weight of any path from j to i in G(Aλ). In particular, since all circuits in G(Aλ)
have non-positive average weight, we must have [A+
λ ]ii ≤ e for all i ∈ {1, . . . , n}. Furthermore,
for the matrix A∗
λ (also to be read (Aλ)∗
) we obtain [A∗
λ]ii = e⊕[A+
λ ]ii = e for all i ∈ {1, . . . , n}.
Theorem 1.29. Let the communication graph G(A) of a matrix A ∈ Rn×n
max have finite maximal
average circuit weight λ. Then λ is an eigenvalue of A, with an associated eigenvector [A∗
λ]·j
for any vertex j ∈ Vc(A).
Proof. Firstly note that all the circuits in G(Aλ) have non-positive average weight, and therefore
A+
λ is well-defined by Lemma 1.17. Now, every vertex in Gc(Aλ) is contained in a non-empty
circuit which has weight e, i.e.
∀j ∈ Vc
(A) : [A+
λ ]jj = e (1.3)
18
24. Next, write
[A∗
λ]ij = [E ⊕ A+
λ ]ij =
ε ⊕ [A+
λ ]ij for i = j
e ⊕ [A+
λ ]ij for i = j
Then from (1.3), for j ∈ Vc(A) it follows that
[A+
λ ]·j = [A∗
λ]·j (1.4)
Now, note that we have
A+
λ = A⊗1
λ ⊕ A⊗2
λ ⊕ . . .
= Aλ ⊗ (A⊗0
λ ⊕ A⊗1
λ ⊕ . . . ) = Aλ ⊗ A∗
λ
So substituting this into (1.4) gives for j ∈ Vc(A)
[Aλ ⊗ A∗
λ]·j = [A∗
λ]·j ⇐⇒ Aλ ⊗ [A∗
λ]·j = [A∗
λ]·j
⇐⇒ A ⊗ [A∗
λ]·j = λ ⊗ [A∗
λ]·j
Therefore λ is an eigenvalue of A and the j-th column of A∗
λ is an associated eigenvector for
any j ∈ Vc(A).
Theorem 1.30. Let A ∈ Rn×n
max be irreducible. Then A has a unique eigenvalue, denoted λ(A),
which is finite and equal to the maximal average circuit weight in G(A).
Proof. Let the maximal average circuit weight in G(A) be denoted by λ. Since A is irreducible,
G(A) must contain a circuit and therefore λ is necessarily finite. Thus by Theorem 1.29 we
know that λ is an eigenvalue of A, and it remains to show uniqueness.
Let c = (j1, . . . , jl+1) be an arbitrary circuit in C(A) of length l = |c|l, with jl+1 = j1. Then
ajk+1jk
= ε for all k ∈ {1, . . . , l}. Further, suppose that µ is an eigenvalue of A with an associated
eigenvector v. Note that A is irreducible, so by Corollary 1.26 we have that µ = ε. Now, since
A ⊗ v = µ ⊗ v, it follows that
ajk+1jk
⊗ vjk
≤ µ ⊗ vjk+1
, k ∈ {1, . . . , l}.
and arguing as in Lemma 1.27 (replacing equalities with the appropriate inequalities), we see
that the average weight of the circuit c satisfies
|c|w
|c|l
≤
µ⊗l
l
= µ (1.5)
That is, µ ≥ λ (since (1.5) holds for all c ∈ C(A), and we already have that the maximal
average circuit weight is λ). But by Lemma 1.27, µ is equal to the average weight of some
circuit c ∈ C(A), and so µ ≤ λ also. Hence µ = λ, i.e. λ is the unique eigenvalue of A.
19
25. When A is large it is often difficult to identify the maximal average circuit weight in G(A). In
fact, there exist several numerical procedures used to determine the eigenvalue of an irreducible
matrix in max-plus, including Karp’s Algorithm and the Power Algorithm. However, none of
these has a particularly attractive order of complexity - for example, the complexity of Karp’s
Algorithm is of order n3, and the complexity of the Power Algorithm is not known precisely
(see [11]). We do not have space here to describe the methods in detail; for more information
readers are referred to chapter five of [13].
We end this section with a simple proposition that, while interesting in its own right, will come
in handy when we begin to look at the eigenspace.
Proposition 1.31. Let A ∈ Rn×n
max be an irreducible matrix with eigenvalue λ and associated
eigenvector v. We have that vi > ε for all i ∈ {1, . . . , n}.
Proof. Call the set of vertices of G(A) corresponding to the finite entries of v the support of
v, denoted Z(v). Suppose that Z(v) does not contain all the elements of V(A). Since A is
irreducible, there must be edges from the vertices in Z(v) to vertices not belonging to Z(v).
Hence there exists vertices j ∈ Z(v), i /∈ Z(v) with aij = ε. Then
[A ⊗ v]i ≥ aij ⊗ vj > ε
That is, Z(A ⊗ v) is strictly bigger than Z(v). But A ⊗ v = λ ⊗ v (and λ is finite by Theorem
1.30), so Z(v) and Z(A⊗v) should be equal. This is a contradiction, and so Z(v) must contain
all the elements of V(A).
1.4.2 The Eigenspace
Let A ∈ Rn×n
max have finite eigenvalue λ. In this part of our analysis we let V (A, λ) denote the
set of all eigenvectors of A associated with the eigenvalue λ, which we call the eigenspace of A
w.r.t. λ. If A is irreducible then by Theorem 1.30 we know that it has a unique eigenvalue, so
we can drop the dependence on λ and denote the eigenspace of A simply by V (A).
The main aim of this section is to find an expression that completely characterises the eigenspace
of A. In Theorem 1.29 we established that [A∗
λ]·j is an eigenvector of A for any j ∈ Vc(A),
but are these the only eigenvectors (of course, up to taking linear combinations, as discussed
above)? We will eventually see that the answer to this question is yes, but first we require some
intermediate steps.
Lemma 1.32. Let A ∈ Rn×n
max . We have that A∗
λ = (E ⊕ Aλ)⊗(n−1).
Proof. If n = 1 then the result is trivial. Otherwise, since E and Aλ commute, we can carry
out the iterated multiplication (E ⊕ Aλ) ⊗ · · · ⊗ (E ⊕ Aλ) to obtain
(E ⊕ Aλ)⊗(n−1)
= E ⊕
n−1
i=1
A⊗i
λ ⊕ · · · ⊕ A⊗i
λ
(n−1
i ) times
(1.6)
20
26. Each power A⊗0
λ , . . . , A
⊗(n−1)
λ occurs at least once, so by idempotency of ⊕, (1.6) becomes
(E + Aλ)⊗(n−1)
= E ⊕ Aλ ⊕ A⊗2
λ ⊕ . . . A
⊗(n−1)
λ (1.7)
However, noting that every circuit in G(Aλ) must have non-positive weight, we can apply Lemma
1.20 to see that the right-hand side of (1.7) is equal to A∗
λ. This completes the proof.
Lemma 1.33. Let A ∈ Rn×n
max be an irreducible matrix, with eigenvalue λ and an associated
eigenvector v. Then the matrix A∗
λ has eigenvalue e, also with an associated eigenvector v.
Proof. Firstly, note that for any j ∈ {1, . . . , n}
[λ ⊗ v]j = [A ⊗ v]j ⇐⇒ vj = [A ⊗ v]j − λ ⇐⇒ e ⊗ vj = [Aλ ⊗ v]j
That is, e ⊗ v = Aλ ⊗ v, and v is also an eigenvector of Aλ (whose unique eigenvalue must be e
by Theorem 1.30). Thus the eigenspaces V (A) and V (Aλ) coincide. Next, note that
(E ⊕ Aλ) ⊗ v = (E ⊗ v) ⊕ (Aλ ⊗ v) = v ⊕ v = v
Therefore, using Lemma 1.32:
A∗
λ ⊗ v = (E ⊕ Aλ)⊗(n−1)
⊗ v = v = e ⊗ v
as required.
Definition 1.34. Let A ∈ Rn×n
max be a matrix with eigenvalue λ and associated eigenvector v.
The saturation graph of A with respect to λ, denoted Sλ(A, v), is the graph consisting of those
edges (j, i) ∈ E(A) such that aij ⊗ vj = λ ⊗ vi, with vi, vj = ε.
Recall that by definition, if v is an eigenvector of A then there exists at least one i ∈ {1, . . . , n}
such that vi = ε. Then, since A ⊗ v = λ ⊗ v we have that n
j=1 aij ⊗ vj = λ ⊗ vi, which implies
that there exists (at least one) j ∈ {1, . . . , n} such that aij ⊗ vj = λ ⊗ vi. This value is finite
(assuming λ = ε), so we must have (j, i) ∈ E Sλ(A, v) . That is, the saturation graph of A
w.r.t. λ is never empty. Indeed, if A is irreducible, by Proposition 1.31 we know that vi > ε for
all i ∈ {1, . . . , n}, and so by the same argument, Sλ(A, v) contains all the vertices in V(A). In
this case we know that the eigenvalue λ is unique, and therefore we drop the dependence on λ
and simply refer to the saturation graph of A.
Lemma 1.35. Let A ∈ Rn×n
max be an irreducible matrix, with eigenvalue λ and associated eigen-
vector v. We have:
(i) For each vertex i ∈ V(A), there exists a circuit in S(A, v) from which vertex i can be
reached in a finite number of steps.
(ii) Any circuit in S(A, v) belongs to Gc(A).
Proof. (i) A is irreducible, so by Proposition 1.31 we know that vi > ε for all i ∈ {1, . . . , n}. Let
21
27. i ∈ V(A), which by the discussion above we know is a vertex of the saturation graph S(A, v).
Thus there is a vertex j such that λ ⊗ vi = aij ⊗ vj. Repeating this argument, we can identify
a vertex k such that λ ⊗ vj = ajk ⊗ vk. Repeating this argument an arbitrary number of times,
say, m, we get a path in S(A, v) of length m. If m > n, the constructed path must contain a
circuit.
(ii) Let c = (i1, i2, . . . , il+1) be a circuit of length l in S(A, v). By definition, for all k ∈ {1, . . . , n}
we have that
λ ⊗ vik+1
= aik+1ik
⊗ vik
which implies that
λ⊗l
⊗ vi1 =
l
k=1
aik+1ik
⊗ vi1
Hence, recalling that vi1 is finite:
λ⊗l
=
l
k=1
aik+1ik
But the right-hand side is simply equal to the weight of the circuit c, which thus has average
weight λ. But A is irreducible, so by Theorem 1.30 λ is equal to the maximal average circuit
weight in G(A). Thus c is critical, and belongs to Gc(A).
Lemma 1.36. Let A ∈ Rn×n
max be an irreducible matrix, with eigenvalue λ and associated eigen-
vector v. Then v can be written as
v =
j∈Vc(A)
αj ⊗ [A∗
λ]·j
for some αj ∈ Rmax, j ∈ Vc(A).
Proof. Consider two vertices i, j in S(Aλ, v) such that there exists a path from i to j, say,
(i1, i2, . . . , il+1), with i1 = i and il+1 = j. Then by definition of the saturation graph, this gives
[Aλ]ik+1ik
⊗ vik
= vik+1
, k ∈ {1, . . . , l}
Hence vj = a ⊗ vi, where a is given by
a =
l
k=1
[Aλ]ik+1ik
≤ [A⊗l
λ ]ji ≤ [A∗
λ]ji (1.8)
Now, using that vj = a ⊗ vi, for an arbitrary vertex ν ∈ {1, . . . , n}:
[A∗
λ]νj ⊗ vj = [A∗
λ]νj ⊗ a ⊗ vi
≤ [A∗
λ]νj ⊗ [A∗
λ]ji ⊗ vi (by (1.8))
22
28. ≤ [A∗
λ]νi ⊗ vi (1.9)
where the last inequality follows from Proposition 1.21. By applying Lemma 1.35, for any vertex
j in S(Aλ, v) there exists a vertex i = i(j) ∈ Vc(A). Inequality (1.9) therefore implies
j∈S(Aλ,v)
[A∗
λ]νj ⊗ vj ≤
i∈Vc(Aλ)
[A∗
λ]νi ⊗ vi (1.10)
and this holds for any ν ∈ {1, . . . , n}.
Now, by Lemma 1.33, A∗
λ has eigenvalue e with an associated eigenvector v, i.e. v = A∗
λ ⊗ v.
The value of vν is equal to [A∗
λ]νj ⊗vj for some j, which by definition has to be in the saturation
graph S(Aλ, v). Thus it holds for ν ∈ {1, . . . , n} that
vν =
j∈S(Aλ,v)
[A∗
λ]νj ⊗ vj
(1.10)
≤
j∈Vc(Aλ)
[A∗
λ]νj ⊗ vj
On the other hand, since v is an eigenvector of A∗
λ associated with the eigenvalue e,
vν = [A∗
λ ⊗ v]ν =
n
j=1
[A∗
λ]νj ⊗ vj ≥
i∈Vc(Aλ)
[A∗
λ]νi ⊗ vi
which also holds for any ν ∈ {1, . . . , n}. Thus we have shown
vν =
i∈Vc(Aλ)
[A∗
λ]νi ⊗ vi
and since Vc(Aλ) = Vc(A) (see the proof of Theorem 1.29), the proof is complete.
The lemma above shows that for an irreducible matrix A, the vectors [A∗
λ]·j, with j ∈ Vc(A),
constitute a generating set for the eigenspace of A. Notice that in the proof we have actually
identified the coefficients αi to which we referred in the statement of the lemma. If some of the
columns of A∗
λ are colinear then the αi’s are non-unique and some can be chosen to equal ε.
We have now done most of the work in characterising the eigenspace of an irreducible matrix.
We now require a small extension of our notation and one more lemma before we are able to
give a complete expression for the eigenspace, and we will end this section by referring to a
theorem which shows that it is not possible to simplify this expression any further.
Notation. Recall that the critical classes of a matrix A ∈ Rn×n
max are the maximal strongly
connected subgraphs of Gc(A). Let Nc(A) denote the number of critical classes of A, so Nc(A) ∈
N. For r ∈ {1, . . . , Nc(A)}, let Gc
r(A) = (Vc
r (A), Ec
r (A)) denote the r-th critical class of A and
let jc
r := min{j ∈ Vc
r (A)} be the smallest numbered vertex in the r-th critical class. We call
{jc
1, . . . , jc
Nc(A)} a set of representative vertices of the critical classes of A.
Note that in the way defined above, the set of representative vertices is unique. However, this
is not important - in general, a representative vertex jc
r of the rth critical class of A can be any
23
29. j ∈ Vc
r (A).
Lemma 1.37. Let A ∈ Rn×n
max be an irreducible matrix with eigenvalue λ. Then for i, j ∈ Vc(A),
there exists α ∈ Rmax{ε} such that
α ⊗ [A∗
λ]·i = [A∗
λ]·j
iff i and j are members of the same critical class.
Proof. Suppose that i, j ∈ Vc(A) are members of the same critical class of Aλ. Then i and j
communicate with each other in the critical graph, i.e. (i, j, i) is an elementary circuit in Gc(Aλ).
As we have argued before (see Theorem 1.29), any circuit in Gc(Aλ) must have weight e, and
therefore in this case we have [Aλ]ji ⊗ [Aλ]ij = e. Then by definition of A∗
λ, we have that
[A∗
λ]ji ⊗ [A∗
λ]ij ≥ [Aλ]ji ⊗ [Aλ]ij = e (1.11)
Now by a previous observation we know that [A∗
λ]jj = e, and by Proposition 1.21 we have that
A∗
λ = A∗
λ ⊗ A∗
λ. Therefore we also have
[A∗
λ]ji ⊗ [A∗
λ]ij ≤
n
l=1
[A∗
λ]jl ⊗ [A∗
λ]lj = [A∗
λ ⊗ A∗
λ]jj = [A∗
λ]jj = e (1.12)
and from (1.11) and (1.12) we conclude that [A∗
λ]ji ⊗ [A∗
λ]ij = e. Thus for all l ∈ {1, . . . , n}:
[A∗
λ]li ⊗ [A∗
λ]ij ≤ [A∗
λ]lj
= [A∗
λ]lj ⊗ [A∗
λ]ji ⊗ [A∗
λ]ij
≤ [A∗
λ]li ⊗ [A∗
λ]ij
and therefore [A∗
λ]lj = [A∗
λ]li ⊗ [A∗
λ]ij. Hence the statement of the lemma has been proved, with
α = [A∗
λ]ij.
Conversely, suppose now that i, j ∈ Vc(A) do not belong to the same critical class, and suppose
for contradiction that we can find α ∈ Rmax{ε} such that α ⊗ [A∗
λ]·i = [A∗
λ]·j. The i-th and
j-th components of this equation read
α ⊗ e = [A∗
λ]ij and α ⊗ [A∗
λ]ji = e
respectively, from which it follows that
[A∗
λ]ij ⊗ [A∗
λ]ji = e
Therefore the elementary circuit (i, j, i) has average weight e, and therefore belongs to Gc(Aλ).
Thus vertices i and j are members of the same critical class (since they communicate with each
other), which is a contradiction.
Theorem 1.38. Let A ∈ Rn×n
max be an irreducible matrix with (unique) eigenvalue λ. The
24
30. eigenspace of A is given by
V (A) =
Nc(A)
r=1
αr ⊗ [A∗
λ]·jc
r
: αr ∈ Rmax, at least one αr finite
for any set of representative vertices {jc
1, . . . , jc
Nc(A)} of the critical classes of A.
Proof. By Lemma 1.36 we know that any eigenvector of A is a linear combination of the columns
[A∗
λ]·j, for j ∈ Vc(A). However, by Lemma 1.37 we know that the columns [A∗
λ]·j for j in some
critical class Vc
r (A) are all colinear. Therefore to build any eigenvector we only need one column
corresponding to each critical class, and so it suffices to take the sum over a set of representative
vertices of the critical classes of A.
Theorem 1.39. No column [A∗
λ]·i, for i ∈ Vc(A), can be expressed as a linear combination of
columns [A∗
λ]·jc
r
, where jc
r varies over the representative vertices of critical classes distinct from
that of i.
Proof. The proof of this statement requires substantial groundwork which we do not have the
space to include. For all the details and a full proof, readers are referred to theorem 3.101 in
[2].
Theorem 1.39 above tells us that we cannot simplify any further the expression for V (A) given
in Theorem 1.38. It also tells us that for an irreducible matrix A, the columns [A∗
λ]·jc
r
, where
{jc
1, . . . , jc
Nc(A)} is a set of representative vertices of the critical classes of A, form a basis for
the eigenspace V (A).
1.4.3 A Worked Example
Consider the matrix
A =
ε −2 ε 6
1 ε 4 ε
ε 8 ε ε
ε 5 ε 6
Thus G(A) looks like
25
31. 1
4
2 3
1 8
-2
5
4
6
6
Figure 1.1: Communication graph of the matrix A given above. Vertices are represented as
circles and numbered 1-4 by convention. Edges are present only if the corresponding entry in
A is finite, in which case this value specifies the edge weight.
We can see that G(A) is strongly connected, so A is irreducible. Thus by Theorem 1.30, A has
a unique eigenvalue λ given by the maximal average circuit weight in G(A). The elementary
circuits and their average weights are
c1 = (1, 2, 1) |c1|w/|c1|l = (1 ⊗ −2)/2 = −0.5
c2 = (1, 2, 4, 1) |c2|w/|c2|l = (1 ⊗ 5 ⊗ 6)/3 = 4
c3 = (2, 3, 2) |c3|w/|c3|l = (8 ⊗ 4)/2 = 6
c4 = (4, 4) |c4|w/|c4|l = (6)/1 = 6
and therefore λ = max{−0.5, 4, 6, 6} = 6. Circuits c3 and c4 are critical, so the critical graph
Gc(A) looks like
2 3 4
8
4
6
Figure 1.2: Critical graph of the matrix A given above. Both the circuits have maximal average
weight of 6. The other circuits present in Figure 1.1 are no longer included because they are
not critical (their average weight is not maximal).
We can see that Vc(A) = {2, 3, 4}, and Gc(A) has two critical classes with vertex sets Vc
1(A) =
{2, 3} and Vc
2(A) = {4} respectively. Thus {jc
1 = 2, jc
2 = 4} is a set of representative vertices of
the critical classes of A. Now, using that [Aλ]ij = aij − λ, we have
Aλ =
ε −8 ε e
−5 ε −2 ε
ε 2 ε ε
ε −1 ε e
and either by inspection of G(Aλ), or by using Lemma 1.17 and computing A⊗1
λ , A⊗2
λ , A⊗3
λ and
26
32. A⊗4
λ , we can see that
A+
λ =
−6 −1 −3 e
−5 e −2 −5
−3 2 e −3
−6 −1 −3 e
Similarly, by using Lemma 1.20 (or by simply replacing any non-zero diagonal values in A+
λ
above by e), we obtain
A∗
λ =
e −1 −3 e
−5 e −2 −5
−3 2 e −3
−6 −1 −3 e
Now by theorems 1.38 and 1.39, the columns [A∗
λ]·2 & [A∗
λ]·4 form a basis for the eigenspace of
A, i.e.
V (A) =
α1 ⊗
−1
e
2
−1
α2 ⊗
e
−5
−3
e
: α1, α2 ∈ Rmax, at least one αr finite
For example, if we take α1 = −2, α2 = 1 we get
v := −2 ⊗
−1
e
2
−1
1 ⊗
e
−5
−3
e
=
−3
−2
e
−3
1
−4
−2
1
=
1
−2
e
1
and we can easily verify that this is indeed an eigenvector of A, associated with the unique
eigenvalue λ = 6:
A ⊗ v =
ε −2 ε 6
1 ε 4 ε
ε 8 ε ε
ε 5 ε 6
1
−2
e
1
=
7
4
6
7
= 6 ⊗
1
−2
e
1
= λ ⊗ v
Finally, we can observe that
[A∗
λ]·3 =
−3
−2
e
−3
= −2 ⊗
−1
e
2
−1
= −2 ⊗ [A∗
λ]·2
That is, columns [A∗
λ]·2 and [A∗
λ]·3 are scalar multiples of each other, which we would expect
27
33. (see Lemma 1.37) since vertices 2 and 3 are in the same critical class.
1.5 Recurrence Relations & Periodicity
1.5.1 Solving Max-Plus Recurrence Relations
In many of the applications discussed in Chapters 2 and 3 we will need to solve recurrence
relations over the max-plus semiring. A key insight in doing this is to view implicit first-order
recurrence relations of the form x(k + 1) = (A ⊗ x(k + 1)) ⊕ (B ⊗ x(k)) as a system of max-plus
linear equations x = (A ⊗ x) ⊕ b. The result below uses the ∗ operator (see Definition 1.19) to
solve systems of this form.
Theorem 1.40. Let A ∈ Rn×n
max and b ∈ Rn
max. If the communication graph G(A) has no circuit
with positive average weight, then the equation
x = (A ⊗ x) ⊕ b (1.13)
has the solution x = A∗ ⊗ b. Furthermore, if all the circuit weights in G(A) are negative, then
this solution is unique.
Proof. By Lemma 1.20 we know that A∗ exists. We therefore have
A∗
⊗ b =
∞
k=0
A⊗k
⊗ b
=
∞
k=1
A⊗k
⊗ b
⊕ (E ⊗ b)
= A ⊗
∞
k=0
A⊗k
⊗ b
⊕ (E ⊗ b)
= A ⊗ (A∗
⊗ b) ⊕ b
and therefore A∗ ⊗ b is indeed a solution of (1.13). To show uniqueness, suppose that x is a
solution of x = b⊕(A⊗x); then we can substitute the expression for x back into the right-hand
side of the equation to obtain
x = b ⊕ (A ⊗ b) ⊕ (A⊗2
⊗ x)
Repeating this procedure yields
x = b ⊕ (A ⊗ b) ⊕ (A⊗2
⊗ b) ⊕ (A⊗3
⊗ x)
= . . .
= b ⊕ (A ⊗ b) ⊕ · · · ⊕ (A⊗(k−1)
⊗ b) ⊕ (A⊗k
⊗ x)
=
k−1
l=0
(A⊗l
⊗ b) ⊕ (A⊗k
⊗ x) (1.14)
28
34. By Theorem 1.16, the entries A⊗k are the maximal weights of paths of length k. For k large
enough, these paths necessarily contain elementary circuits, which have negative weight by
assumption. Indeed, as k → ∞ the number of elementary circuits in these paths also necessarily
tends to ∞, and so the elements of A⊗k tend to ε. Hence, letting k → ∞ in (1.14) gives that
x = A∗ ⊗ b (where once again we have applied Lemma 1.20), as required.
As we mentioned above, Theorem 1.40 above can be applied to the implicit recurrence relation
x(k + 1) = A ⊗ x(k + 1) ⊕ B ⊗ x(k)
to yield the explicit recurrence relation
x(k + 1) = A∗
⊗ B ⊗ x(k)
and this technique will be used several times in Chapter 2. However, can we extend this theory?
In many applications we will encounter systems whose dynamics follow a recurrence relation of
order higher than one. Consider the most general (explicit) linear recurrence relation of order
M ≥ 1:
x(k) =
M
m=0
Am ⊗ x(k − m), k ≥ 0 (1.15)
Here, A0, . . . , AM ∈ Rn×n
max and x(m) ∈ Rn
max, −M ≤ m ≤ −1 are given. We show below that
we can transform (1.15) into a first-order recurrence relation of the form x(k + 1) = A ⊗ x(k),
provided that A0 has no circuit of positive weight.
To begin, set
b(k) =
M
m=1
Am ⊗ x(k − m)
Then (1.15) becomes
x(k) = A0 ⊗ x(k) ⊕ b(k) (1.16)
Then, since A0 has no circuit of positive weight by assumption, we can apply Theorem 1.40 to
write (1.16) as
x(k) = A∗
0 ⊗ b(k)
= A∗
0 ⊗ A1 ⊗ x(k − 1) ⊕ · · · ⊕ A∗
0 ⊗ AM ⊗ x(k − M) (1.17)
Note that we have now changed the implicit M-th order recurrence relation (1.15) into the
explicit M-th order recurrence relation (1.17) (the x(k) term does not feature on the right hand
side). To finish the job, we set
x(k) := (x (k − 1), x (k − 2), . . . , x (k − M))
29
35. and (with E denoting a matrix of all εs):
A :=
A∗
0 ⊗ A1 A∗
0 ⊗ A2 . . . . . . A∗
0 ⊗ AM
E E . . . . . . E
E E
... E
...
...
...
E E . . . E E
Then (1.15) can be written as
x(k + 1) = A ⊗ x(k), k ≥ 0 (1.18)
which is what we were aiming for.
It will come as no surprise that problems of this form are closely related to the concept of
eigenvalues and eigenvectors studied in the previous section. For example, if the recurrence
relation x(k + 1) = A ⊗ x(k) is given the initial condition x(0), where x(0) is an eigenvector of
A with corresponding eigenvalue λ, then the solution x(k) is given by x(k) = λ⊗k ⊗ x(0). It
could then be said that the solution is periodic. The final section of this chapter explores the
limiting behaviour of the solution x(k) when the system is initialised with an arbitrary vector
x(0), and in particular whether we can say anything about its periodicity in general.
1.5.2 Limiting Behaviour
In this section we state and prove a theorem which establishes an important result on the
asymptotic behaviour of the powers of an irreducible matrix A in terms of its unique eigenvalue
λ. In simple terms, this theorem says that sequential powers of A always exhibit periodic
behaviour after a finite number of steps. We will then apply this result to the recurrence
relations we studied in the previous section. It turns out that the periodicity depends on a
quantity known as the cyclicity of A, which we define below in two steps.
Definition 1.41. The cyclicity of a graph G, denoted σG, is defined as follows:
• If G is strongly connected, then its cyclicity equals the greatest common divisor of the
lengths of all the elementary circuits in G. If G consists of just one vertex without a
self-loop, then its cyclicity is taken to be 1.
• If G is not strongly connected, then its cyclicity equals the least common multiple of the
cyclicities of all the maximal strongly connected subgraphs of G.
Definition 1.42. The cyclicity of a matrix A ∈ Rn×n
max , denoted σ(A), is equal to σGc(A), the
cyclicity of the critical graph of A.
If A is a square matrix over Rmax then we often talk of the graph cyclicity and matrix cyclicity
of A, where the graph cyclicity refers to the cyclicity of the graph Gc(A).
30
36. It may seem strange to define the cyclicity of a matrix A via its critical graph and not its
communication graph. However, as we will see below, it turns out that the former quantity
determines the periodic behaviour of the powers of A, so the reason for this choice should be
clear.
Before proving the main theorem of this section we require several preliminary results. The first
one is an important lemma from graph theory, which we explore below.
Lemma 1.43. Let A ∈ Rn×n
max be an irreducible matrix, and let the cyclicity of its communica-
tion graph be σG. Then, after a suitable relabelling of the vertices of G(A), the matrix A⊗σG
corresponds to a block diagonal matrix with σG blocks on the diagonal. The communication graph
of each diagonal block is strongly connected and has cyclicity one. Moreover, the eigenvalues of
all diagonal blocks have the same value.
Proof. For i, j ∈ V(A), define the relation
i ∼ j ⇐⇒ the length of every path from i to j in G(A) is a multiple of σG.
It is easy to show that this is an equivalence relation on V(A). Therefore if k0 ∈ V(A) is fixed,
we can introduce equivalence classes C0, C1, . . . , CσG−1 as
i ∈ Cl ⇐⇒ every path from k0 to i in G(A) has length (mod σG) equal to l, (1.19)
for l = 0, 1, . . . , σG. Then for i, j ∈ V(A), we have that i ∼ j ⇐⇒ i, j ∈ Cl for some
l = 0, 1, . . . , σG − 1.
Assume that there is a path from i to j of length σG. By definition of cyclicity, the length of any
circuit starting and finishing at i must be divisible by σG, so there must also be a path from j
to i whose length is a multiple of σG. Therefore every path from i to j must have a length that
is a multiple of σG (since if not, we could use such a path to create a circuit whose length is not
divisible by σG). Hence, every path of length σG must start and end in the same equivalence
class as defined in (1.19). Since A⊗σG can be computed by considering all paths of length σG in
G(A) (see Theorem 1.16), it follows that A⊗σG is block-diagonal, possibly after an appropriate
relabelling of the vertices according to the classes C1, . . . , CσG−1; for instance, by first labelling
all vertices in C0, then all the vertices in C1, and so on.
Now let l ∈ {0, 1, . . . , σG − 1}. From our remark above we know that if i, j ∈ Cl then i ∼ j, i.e.
the length of every path from i to j is a multiple of σG. Since A is irreducible there must be at
least one such path, which can be split up into a number of subpaths, all of length σG and going
from one vertex in Cl to another vertex in Cl. It follows that the block of A⊗σG corresponding
to class Cl is irreducible.
Next, note that every circuit in G(A) must go through all the equivalence classes C1, . . . , CσG−1.
To see this, suppose there is a circuit going through just τ of the classes, where τ < σG. Then
there must be a class Cl and vertices i, j ∈ Cl such that there is a path from i to j of length
31
37. less than or equal to τ. This is a contradiction, since the length of a path between vertices in
the same class must be a multiple of σG. Hence the number of circuits in G(A) is the same
as the number of circuits going through any class Cl. Observe that circuits in G(A) of length
κ × σG can be associated with circuits in G(A⊗σG ) of length κ. Since the greatest common
divisor of all circuits in G(A) is σG, it follows that the communication graph of the block in
A⊗σG corresponding to class Cl has cyclicity one.
Finally, the fact that the eigenvalues of the diagonal blocks are identical follows immediately
from the irreducibility of A.
Corollary 1.44. Under the conditions of Lemma 1.43, let τ be a multiple of σG. Then, after
a relabelling of the vertices of G(A), the matrix A⊗τ corresponds to a block diagonal matrix
with σG blocks on the diagonal. The communication graph of each diagonal block is strongly
connected and has cyclicity one.
Proof. This follows along the same lines as the proof of Lemma 1.43.
Let A ∈ Rn×n
max be an irreducible matrix and let Gc(A) be its critical graph. Define the critical
matrix of A, denoted Ac, to be the submatrix of A such that the communication graph of Ac is
equal to the critical graph of A, i.e. G(Ac) = Gc(A). Matrix Ac can be obtained from matrix A
by restricting A to those entries that correspond to edges in Gc(A). Clearly the critical graph of
Ac is the same as its communication graph, i.e. Gc(Ac) = G(Ac), and therefore σGc(Ac) = σG(Ac).
It then follows that the cyclicity of the matrix Ac is equal to the cyclicity of the communication
graph G(Ac) (i.e. σ(Ac) = σG(Ac)); that is, for the critical matrix Ac both types of cyclicity
coincide and are equal to σ(A). We know that G(Ac) = Gc(A) = Gc(Ac), but we can prove
more:
Lemma 1.45. Let A be an irreducible matrix, and let Ac be its corresponding critical matrix.
Then, for all k ≥ 1 we have
G((Ac
)⊗k
) = Gc
(A⊗k
) = Gc
((Ac
)⊗k
).
Proof. As we noted above, Ac is a submatrix of A, and therefore (Ac)⊗k is a submatrix of A⊗k.
Furthermore, note that Gc(·) is a subgraph of G(·), which we shall denote Gc(·) ⊆ G(·). It follows
that Gc((Ac)⊗k) ⊆ Gc(A⊗k) and Gc((Ac)⊗k) ⊆ G((Ac)⊗k).
To prove the converse inclusions, note that any edge in G(A⊗k) from vertex i to vertex j
corresponds to a path in G(A) of length k from vertex i to vertex j. Thus if a number of edges
in G(A⊗k) form a circuit of length l, then the corresponding paths in G(A) form a circuit of
length k×l. Conversely, consider a circuit in G(A), choose any vertex on the circuit and traverse
the circuit with steps of length k until the chosen vertex is reached again. If l such steps are
required then there is a corresponding circuit in G(A⊗k) of length l. In the same way, critical
circuits in G(A⊗k) of length l correspond to critical circuits in G(A) of length k × l, and vice
versa.
32
38. If c is a critical circuit of length l in G(A⊗k) then there is a corresponding critical circuit c of
length k ×l in G(A). This circuit must be in Gc(A) (because it is critical), which in turn implies
that c is a critical circuit in G((Ac)⊗k). Hence, it follows that Gc((Ac)⊗k) ⊇ Gc(A⊗k). The other
inclusion is proved in the same way.
Lemma 1.46. Let A ∈ Rn×n
max be an irreducible matrix with cyclicity σ = σ(A). Then the
cyclicity of the matrix A⊗σ is equal to one.
Proof. Firstly, suppose the critical matrix Ac is irreducible. By the remarks prior to Lemma 1.45
we know that the cyclicity of Ac and that of its communication graph is equal to σ, so by Lemma
1.43, after a suitable relabelling of vertices, (Ac)⊗σ corresponds to a block diagonal matrix with
square diagonal blocks that are irreducible and have graph cyclicity one. However, by Lemma
1.45 with k = σ, we have that Gc((Ac)⊗σ) = G((Ac)⊗σ), and therefore the communication graph
of each of the diagonal blocks of (Ac)⊗σ coincides with its critical graph. Thus for each diagonal
block both cyclicities coincide, and therefore both are one.
If Ac is reducible then the same process can be done for each of the critical classes of Gc(A) with
their individual cyclicities. According to Definition 1.41, the least common multiple of these
cyclicities equals σ, the matrix cyclicity of A. Noting that σ is a multiple of σG(A), it follows
from Corollary 1.44 that each diagonal block of (Ac)⊗σ corresponds to a block diagonal matrix
with square diagonal blocks that are irreducible and have cyclicity one. Note that if Gc(A) does
not cover all the vertices of G(A) then we must augment the overall block diagonal matrix with
a square block with entries equal to ε in order to keep it the same size as the original matrix A.
In both cases it follows that each diagonal block of the block diagonal matrix corresponding to
(Ac)⊗σ is irreducible and has cyclicity one. Taking the least common multiple of all cyclicities,
this means that the cyclicity of the whole matrix (Ac)⊗σ is equal to one, and therefore the graph
cyclicity of Gc((Ac)⊗σ) is also equal to one. But by Lemma 1.45 with k = σ, this graph is the
same as Gc(A⊗σ), which therefore must also have cyclicity one. Thus A⊗σ has matrix cyclicity
one, which completes the proof.
We now state a fundamental theorem, the proof of which can be found in [4].
Theorem 1.47. Let β1, . . . , βq ∈ N be such that gcd{β1, . . . , βq} = 1. Then there exists N ∈ N
such that for all k ≥ N there exist n1, . . . , nq ∈ N0 such that k = (n1 × β1) + · · · + (nq × βq).
We finally state and prove one last prerequisite result which is essentially a special case of the
theorem that follows. It turns out that the generalisation is relatively straightforward, so in
proving this lemma we will have done most of the work in proving the main result.
Lemma 1.48. Let A ∈ Rn×n
max be an irreducible matrix with unique eigenvalue e and cyclicity
one. Then there exists N ∈ N such that
A⊗(k+1)
= A⊗k
33
39. for all k ≥ N.
Proof. The proof comes in three stages. We show that there exists N ∈ N such that for all
k ≥ N:
1. [A⊗k]ii = [A+]ii = e for all i ∈ Vc(A),
2. [A⊗k]ij = [A+]ij for all i ∈ Vc(A) and j ∈ {1, . . . , n},
3. [A⊗k]ij = l∈Vc(A)[A+]il ⊗ [A+]lj for all i, j ∈ {1, . . . , n}.
The result then follows immediately from statement 3 since the right hand side does not depend
on k.
Statement 1. Consider i ∈ Vc(A). Then there is a critical class of Gc(A), say Gc
1(A) =
(Vc
1(A), Ec
1(A)), such that i ∈ Vc
1(A). Since the cyclicity of matrix A is one, it follows that
the cyclicity of graph Gc
1(A) is equal to one too. Hence there exist circuits in Gc
1(A), say
c1, . . . , cq, whose lengths have a greatest common divisor equal to one. Since Gc
1(A) is a critical
class it must be strongly connected, and therefore there exists a circuit α in Gc
1(A) that passes
through i and through all circuits c1, . . . , cq (i.e. α ∩ cj = ∅ ∀j = 1, . . . , q).
Now, by Theorem 1.47, there exists N ∈ N such that for each k ≥ N, there exist n1, . . . , nq ∈ N0
such that
k = |α|l + (n1 × |c1|l) + · · · + (nq × |cq|l).
For these n1, . . . , nq, we can construct a circuit passing through i, built from circuit α, n1 copies
of circuit c1, n2 copies of circuit c2 and so on, up to nq copies of circuit cq. Clearly this circuit
is in Gc
1(A), so it must be critical with weight e. Since the maximal average circuit weight in
G(A) is e, it follows that [A⊗k]ii = e for all k ≥ N, which, by the definition of A+, also implies
that [A+]ii = e, as required.
Statement 2. By the definition of A+ there exists l ∈ N such that [A⊗l]ij = [A+]ij. In fact, since
the eigenvalue of A is e, it follows from Lemma 1.17 that l ≤ n. From statement 1, for k large
enough, i ∈ Vc(A) and j ∈ {1, . . . , n}, we then have
[A⊗(k+l)
]ij ≥ [A⊗k
]ii ⊗ [A⊗l
]ij = [A⊗l
]ij = [A+
]ij.
In addition, clearly we also have
[A+
]ij =
∞
m=1
[A⊗m
]ij ≥ [A⊗(k+l)
]ij ≥ [A+
]ij,
so by replacing k + l with k, it therefore follows that [A⊗k]ij = [A+]ij for all i ∈ Vc(A) and
j ∈ {1, . . . , n}, with k large enough. This is what we wanted to prove.
Statement 3. Following the same lines as in the proof of statement 2, we can also show that
[A⊗m]ij = [A+]ij for all i ∈ {1, . . . , n}, j ∈ Vc(A) and with m large enough. Together, take k
34
40. and m large enough such that [A⊗k]il = [A+]il and [A⊗m]lj = [A+]lj for all l ∈ Vc(A). Then
[A⊗(k+m)
]ij ≥ [A⊗k
]il ⊗ [A⊗m
]lj = [A+
]il ⊗ [A+
]lj,
for all l ∈ Vc(A). By replacing k + m with k, it follows that for k large enough
[A⊗k
]ij ≥
l∈Vc(A)
[A+
]il ⊗ [A+
]lj.
Now let the maximal average weight of a non-critical circuit (i.e. a circuit not passing through
any vertex in Vc(A)) be δ. Then the weight of a path from j to i of length k + 1 in G(A) not
passing through any vertex in Vc(A) can be bounded above by [A+]ij + (k × δ) = [A+]ij ⊗ δ⊗k,
since such a path consists of an elementary path from j to i (whose weight is bounded above by
[A+]ij) and at most k non-critical circuits (whose weights are each bounded above by δ). Since
the maximal average circuit weight in G(A) is e we must have δ < e, and so for k large enough
[A+
]ij ⊗ δ⊗k
≤
l∈Vc(A)
[A+
]il ⊗ [A+
]lj.
Indeed, the right-hand side is fixed, while the left-hand side tends to ε as k → ∞. Hence for k
large enough we have that
[A⊗k
]ij =
l∈V(A)
[A+
]il ⊗ [A+
]lj =
l∈Vc(A)
[A+
]il ⊗ [A+
]lj,
for all i, j = 1, . . . , n.
We can now state and prove the main theorem of this section.
Theorem 1.49. Let A ∈ Rn×n
max be an irreducible matrix with unique eigenvalue λ and cyclicity
σ := σ(A). Then there exists N ∈ N such that
A⊗(k+σ)
= λ⊗σ
⊗ A⊗k
for all k ≥ N.
Proof. Consider the matrix B := (Aλ)⊗σ. Recall that σ is the cyclicity of the critical graph of
A, which is a multiple of the cyclicity of the communication graph G(A). By Corollary 1.44,
after a suitable relabelling of the vertices of G(A), matrix B is a block diagonal matrix with
square diagonal blocks whose communication graphs are strongly connected and have cyclicity
one. By Lemma 1.46 we have that the cyclicity of B is one, which implies that the cyclicity of
each of its diagonal blocks is one. Hence by applying Lemma 1.48 to each diagonal block, it
ultimately follows that there exists M ∈ N such that B⊗(l+1) = B⊗l for all l ≥ M. That is,
(Aλ)⊗σ
⊗(l+1)
= (Aλ)⊗σ
⊗l
,
35
41. which can further be written as (Aλ)⊗(l×σ+σ) = (Aλ)⊗(l×σ), or
A⊗(l×σ+σ)
= λ⊗σ
⊗ A⊗(l×σ)
,
for all l ≥ M. Finally, note that A⊗(l×σ+j+σ) = λ⊗σ ⊗A⊗(l×σ+j) for any 0 ≤ j ≤ σ−1, implying
that for all k ≥ N := M × σ it follows that
A⊗(k+σ)
= λ⊗σ
⊗ A⊗k
,
as required.
Theorem 1.49 can be seen as the max-plus analogue of the Perron-Frobenius theorem in con-
ventional linear algebra. Strictly speaking it is the normalised matrix Aλ that exhibits periodic
behaviour, since the unique eigenvalue of Aλ is e = 0, and then A
⊗(k+σ)
λ = A⊗k
λ for k sufficiently
large. However, we use the term ‘periodic’ to describe the more general behaviour seen here.
Note that the cyclicity of A is the smallest possible length of such periodic behaviour (see [2] for
the proof of this). For our purposes, we now move on to applying this result to the recurrence
relations studied in Section 1.5.1.
Recall the form of the basic first-order recurrence relation
x(k + 1) = A ⊗ x(k), k ≥ 0, (1.20)
which has the solution
x(k) = A⊗k
⊗ x(0).
We can apply Theorem 1.49 in this context to give us that for k sufficiently large:
x(k + σ(A)) = A⊗(k+σ(A))
⊗ x(0)
= λ⊗σ(A)
⊗ A⊗k
⊗ x(0)
= λ⊗σ(A)
⊗ x(k).
That is, the solution x(k) is periodic with period σ(A). If we interpret k as a time index,
then also by Theorem 1.49, the solution enters this periodic regime after N =: t(A) time steps,
where we call t(A) is the transient time of A. In particular, if A has cyclicity equal to 1 then
x(k+1) = A⊗x(k) = λ⊗x(k) ∀k ≥ t(A), and so for k sufficiently large x(k) effectively becomes
an eigenvector of A. In other words, after t(A) time steps, x(k) behaves like an eigenvector, and
the effect of the initial condition x(0) has died out.
Note that the transient time of a matrix can be large even for systems of small dimension. For
example, the matrix A defined by
A =
−1 −N
e e
36
42. where N ∈ {2, 3, . . . } has transient time t(A) = N, while its cyclicity is clearly 1.
Finally, we make some observations regarding the growth rate of the solution x(k). Note that
if we take x(0) = v in (1.20), where v is an eigenvector of A, then we immediately obtain that
for all j = 1, . . . , n:
lim
k→∞
xj(k)
k
= λ,
where λ is the unique eigenvalue of A. By applying Theorem 1.49 it should be clear that this
holds true for any initial value x(0) and not just for eigenvectors; indeed this result is proved
in [13]. We therefore say that the solution has an asymptotic growth rate of λ. Assuming
irreducibility, all recurrence relations over max-plus exhibit this behaviour, regardless of the
choice of the matrix A!
37
43. Chapter 2
Petri Nets and Timed Event Graphs
2.1 A Motivating Example
The following example is adapted from chapter 1 of [2]. Consider a manufacturing system
consisting of three machines M1, M2 and M3, which produces three kinds of parts P1, P2 and
P3 according to different product mixes. The manufacturing process for each part is depicted
below.
M1 M2 M3
P2
P3
P1
Figure 2.1: Manufacturing Process for each part. Grey boxes represent the three machines;
arrows represent the routes that the different parts must take in their respective manufacture.
Processing times are different for each machine and each part, and are given in the following
table:
P1 P2 P3
M1 - 1 5
M2 3 2 3
M3 4 3 -
Table 2.1: Processing times for each part at each machine (arbitrary time units). Blank entries
correspond to combinations of machine & part that do not form part of the manufacturing
process.
Parts are carried through the manufacturing process on a limited number of pallets. We make
38
44. the following assumptions:
1. Only one pallet is available for each part type.
2. Once production of a part is completed, it is removed from its respective pallet and the
pallet returns to the beginning of the production line.
3. There are no set-up times or traveling times between machines.
4. The sequencing of part types on the machines is fixed, and for M1 is (P2, P3), for M2
(P1, P2, P3) and for M3 (P1, P2).
Assumption (3) gives no loss of generality since if set-up times or traveling times did exist,
we could combine them with the processing time at the appropriate machine. Assumption (4)
means that machines have to wait for the appropriate part rather than starting work on any
part that arrives first (see below for an example). This may or may not be realistic; extensions
to the theory presented below in which this assumption is dropped are discussed in chapter 9
of [2].
We can model the time evolution of this system by considering the time that each machine starts
working on the k-th part of type i, for i = 1, 2, 3 and k ∈ N. There are seven combinations of
machines and parts, so we define x(k) = (x1(k), . . . , x7(k)) as follows:
Variable xi(k) Definition
x1(k) time that M1 starts working on the k-th unit of P2
x2(k) time that M1 starts working on the k-th unit of P3
x3(k) time that M2 starts working on the k-th unit of P1
x4(k) time that M2 starts working on the k-th unit of P2
x5(k) time that M2 starts working on the k-th unit of P3
x6(k) time that M3 starts working on the k-th unit of P1
x7(k) time that M3 starts working on the k-th unit of P2
Table 2.2: Definitions of each entry of the state vector x(k), for k ∈ N.
By examining the production process, work by each machine on the (k+1)-st part is constrained
in the following way:
x1(k + 1) ≥ max x7(k) + 3, x2(k) + 5
x2(k + 1) ≥ max x5(k) + 3, x1(k + 1) + 1
x3(k + 1) ≥ max x6(k) + 4, x5(k) + 3
x4(k + 1) ≥ max x3(k + 1) + 3, x1(k + 1) + 1
x5(k + 1) ≥ max x2(k + 1) + 5, x4(k + 1) + 2
x6(k + 1) ≥ max x3(k + 1) + 3, x7(k) + 3
x7(k + 1) ≥ max x6(k + 1) + 4, x4(k + 1) + 2
39
45. For example, the inequality for x6(k + 1) comes from the fact that M3 cannot start working on
the (k + 1)-st unit of P1 until it has finished working on the k-th unit of P2, and until M2 has
finished working on the (k + 1)-st unit of P1.
If we are to optimise the system, the inequalities above will actually be equalities. This is where
the theory of max-plus algebra comes to the fore. We can write the system in max-plus matrix
form as
x(k + 1) = A0 ⊗ x(k + 1) ⊕ A1 ⊗ x(k)
where
A0 =
ε ε ε ε ε ε ε
1 ε ε ε ε ε ε
ε ε ε ε ε ε ε
1 ε 3 ε ε ε ε
ε 5 ε 2 ε ε ε
ε ε 3 ε ε ε ε
ε ε ε 2 ε 4 ε
; A1 =
ε 5 ε ε ε ε 3
ε ε ε ε 3 ε ε
ε ε ε ε 3 4 ε
ε ε ε ε ε ε ε
ε ε ε ε ε ε ε
ε ε ε ε ε ε 3
ε ε ε ε ε ε ε
This is a first-order recurrence relation like we have seen in Section 1.5. A quick examination of
G(A0) shows that it does not contain any circuits of positive weight (indeed it does not contain
any circuits at all), and therefore we can apply Theorem 1.40 to find the unique solution
x(k + 1) = A∗
0 ⊗ A1 ⊗ x(k)
= B ⊗ x(k) (2.1)
where B := A∗
0 ⊗ A1, or explicitly:
B =
ε 5 ε ε ε ε 3
ε 6 ε ε 3 ε 4
ε ε ε ε 3 4 ε
ε 6 ε ε 6 7 4
ε 11 ε ε 8 9 9
ε ε ε ε 6 7 3
ε 8 ε ε 10 11 7
If numerical values of x1(1), . . . , x7(1) are given then these values constitute the initial condition,
and the future evolution of the system is uniquely determined. There are no restrictions on
x(1) from a mathematical point of view, but given the physical interpretation of the system,
limitations do exist. For example, if we assume that all three pallets start at the beginning of
their respective production lines (with M1 working on P2 first), we have x1(1) = x3(1) = 0, but
x2(1) cannot be less than 1 since M1 has to finish working on P2 before it can start working on
P3.
40
46. Note that if we had allowed more than one pallet on any of the three production lines then the
system would have been of higher order (for example, if the production line of P1 had three
pallets then work on the (k +1)-st unit could start once the (k −2)-th unit had been produced).
This system would be solvable using the techniques developed at the end of Section 1.5.
Another possible extension would be to incorporate variable processing times rather than the
constant values given in table 2.1. The processing times could vary according to how many
parts the machines have already processed (i.e. vary with k), or they could exhibit stochastic
variability (i.e. following some specified probability distribution). The first type of variability
will be introduced with the basic autonomous equation below; stochastic event graph theory
will be discussed in Chapter 3.
Note that since we can describe the evolution of the system by a recurrence relation of the form
(2.1), we might expect that we can apply Theorem 1.49 to see that the system settles down into
a periodic regime after a finite length of time. However, upon closer inspection we see that the
matrix B has a column of ε, so it is not irreducible and thus Theorem 1.49 does not apply. Later
on in this chapter we will discuss some techniques which ensure that the evolution equation does
involve an irreducible matrix and therefore enables us to draw the relevant conclusions.
To end this introductory example, note that the way we have modeled our system does not
immediately give us the production times of the k-th unit of P1, P2 and P3. In order to find
these we could introduce an output vector y(k) = (y1(k), y2(k), y3(k)) defined by
y(k) = C ⊗ x(k)
where
C =
ε ε ε ε ε 4 ε
ε ε ε ε ε ε 3
ε ε ε ε 3 ε ε
Left multiplication by C adds the appropriate processing time to the starting time at the last
machine in the production line of each part. Thus yi(k) gives us the time of production of the
k-th unit of part Pi.
In the following section we will introduce the concept of timed event graphs, which are the tools
we will use to model discrete event systems such as the production line we have considered here.
2.2 Preliminaries of Event Graph Theory
2.2.1 Definitions and Set-up
As we have seen above, max-plus algebra allows us to describe the evolution of events on
a network subject to synchronisation constraints. In our example, a part moving from one
machine to the next is an event. An appropriate tool to model events on a certain class of
41
47. networks is known as a Petri net. We will focus on certain type of Petri net called an event
graph, which can be modeled by max-plus linear recurrence relations of the form discussed in
Section 1.5. We start by defining the relevant terms and setting out some notation. In order to
fully appreciate all the concepts we introduce, it may be helpful to read this section alongside
the example that follows (Section 2.2.2).
Definition 2.1. Let G = (V, E) be a graph and let i, j ∈ V. We say that i is a predecessor (or
an upstream vertex) of j if (i, j) ∈ E, and that i is a successor (or a downstream vertex) of j
if (j, i) ∈ E.
Definition 2.2. A Petri net is a pair (G, µ) where G = (V, E) is a directed graph and µ is a
vector, satisfying the following properties:
(i) G is bipartite, i.e. V is partitioned into two disjoint sets P and Q (called places and
transitions respectively) such that E only consists of edges of the form (pi, qj) and (qj, pi),
with pi ∈ P and qj ∈ Q.
(ii) µ is a |P|-vector of non-negative integers, known as the initial marking.
Definition 2.3. An event graph is a Petri net in which every place has exactly one upstream
and downstream transition.
Notation. For general i ∈ V, we let π(i) denote the set of all predecessors of i and σ(i) denote
the set of all successors of i. In the case of Petri nets and event graphs, when we want to work
with indices we will sometimes use the following additional notation: if pi ∈ π(qj), we write
i ∈ πq(j), and if qj ∈ π(pi), we write j ∈ πp(i). Similarly, if pi ∈ σ(qj), we write i ∈ σq(j), and
if qj ∈ σ(pi), we write j ∈ σp(i). Note that in the case of an event graph, for any place pi we
have that |πp(i)| = |σp(i)| = 1, so we often allow the abuse of notation πp(i) = j (as opposed
to πp(i) = {j}).
We can think of places as conditions and transitions as events. For example, a machine working
on a part is a place, and a transition occurs when the part moves on to the next machine. Each
place has an associated marking (given initially by the vector µ) which indicates whether or not
the condition has been fulfilled, e.g. whether or not a machine is working on a given part type.
Equivalently we say that each place has an associated number of tokens, which can be thought
of as the number of data items or resources available at each place. In our example each place
can have either 0 or 1 tokens, but in general there can be any amount (e.g. if machines are
capable of working on more than one part at once).
We say that a transition is enabled if each of its upstream places contains at least one token.
When this is the case the transition fires, meaning that one token is removed from each of its
upstream places and one token is added to each of its downstream places. If the initial marking
42
48. is µ, a transition firing gives a new marking µ, defined by
µi =
µi − 1 if pi ∈ π(qj)
µi + 1 if pi ∈ σ(qj)
µi otherwise
In this case we say that the marking µ is reachable from µ. It is easy to see that for a general
Petri net the total number of tokens can change when a transition fires; for example a transition
may have one upstream place but two downstream places, in which case the transition firing
causes the total number of tokens to increase by one. Furthermore, note that the definition of an
event graph allows for input and output transitions (known and sources and sinks respectively),
i.e. transitions that do not have any upstream or downstream places. Source transitions are
enabled by the outside world and deliver tokens into the system; sink transitions remove tokens
from the system completely. The following definition makes an important distinction between
two types of event graph:
Definition 2.4. An event graph is autonomous if it contains no source transitions, and non-
autonomous otherwise.
The important property of event graphs is that they do not allow for models conflicts; that is,
a token in a given place can be consumed by only one predetermined transition. The ‘opposite’
to an event graph (i.e. a Petri net in which each transition has exactly one upstream place
and one downstream place), known as a state machine, does allow for this competition element
but does not admit synchronisation. It can be shown that state machines are equivalent to
the automata studied in computer science, which shows that Petri nets in general have more
modelling power than automata.
Up until now, the theory we have introduced is only concerned with the ordering of events. If
we wish to investigate network performance, it is necessary to introduce time. There are two
ways in which this could be done: we can either associate durations with transition firings, or
holding times with places. In fact, in many applications it could be that both times are present;
for example the real-life manufacturing system in Section 2.1 would exhibit travel times as well
as processing times. However, as we noted before, by incorporating the firing times into the
holding times at places, in the case of event graphs it may be assumed without loss of generality
that the firing times are equal to 0. We therefore introduce the concept of timed event graph
below.
Definition 2.5. A timed event graph is an event graph endowed with a |P|-vector α of holding
times associated with each place.
Note that the definition of a timed event graph does not uniquely determine all future firing
times. This is because the initial marking does not specify how long each token has spent
in its respective place. We will deal with this more fully when we come to look at the basic
autonomous equation in the next section.
43
49. 2.2.2 A Simple Example
To consolidate all of this theory, consider this simple example. A train network connects the
main stations of two cities. There are two routes from station S1 to station S2; one visiting
an intermediate station S3 along the way and the other visiting a different intermediate station
S4. Trains link up at S2 and return to S1 via a single fast track with no stops, where they then
split up again and repeat their respective journeys. There are also two inner-city loops at S1
and S2 which visit the suburbs of their respective cities. The travel time from Sj to Sl is given
as the (l, j)-th entry of the matrix A below:
A =
2 5 ε ε
ε 3 5 3
2 ε ε ε
4 ε ε ε
. (2.2)
We can represent this network as a standard graph as follows:
S1
S3
S4
S22
2
4
3
5
5
3
Figure 2.2: Standard graph of the simple train network. Stations (the vertices) are represented
by circles and tracks by weighted edges. The travel times are given by the edge weights.
Similarly to before, we can assume that there are no waiting times at stations by incorporating
them into the travel times. We want the system to be synchronised in the sense that trains
arriving at a station should wait for each other to allow for the changeover of passengers. This
means that departures from a given station will coincide (once the last train has arrived, all
trains can then depart). We can model this system with a timed event graph, where ‘tracks’
are represented by places (the timed elements of the network); trains by tokens and departures
at each station by transitions. Note that each transition has an upstream place so the event
graph will be autonomous.
In order to fully specify the event graph we need to state the positions of the trains in the
network at time 0, which corresponds to the initial marking. We assume that at time 0 there
is one train travelling from S1 to S3, two trains travelling from S1 to S4, one train travelling
back from S2 to S1 and one train on each of the inner-city loops. This gives the following timed
event graph, pictured at time 0:
44
50. q1
2
4
q4
3
q2
3
2
q3
5
5
Figure 2.3: Timed event graph of the train network depicted in Figure 2.2. The transitions q1,
q2, q3 and q4 represent departures from the four respective stations. The edges can be thought
of as the tracks between stations, with the intermediate places (depicted as circles) specifying
the travel times. Tokens inside the places represent trains on the tracks.
Note that transitions are depicted by bars, places by circles and tokens by counters inside the
circles. As we have noted before, we cannot tell which transition will fire first since we do not
know how long each token of the initial marking has spent in its respective place (i.e. how close
to their respective destinations the trains are at time 0). If transitions q3 and q4 both fire once,
the token distribution changes to the following:
q1
2
4
q4
3
q2
3
2
q3
5
5
Figure 2.4: Timed event graph of the train network after transitions q3 and q4 have fired. One
token has been removed from each of their upstream places and one token has been added to
each of their downstream places.
This corresponds to the train on the track from S1 to S3 having reached S3 and departed for
S2, and also one of the trains on the track from S1 to S4 having reached S4 and departed for
S2. Once these trains both reach S2 they link up to form one train, and assuming the inner-city
train at S2 is ready and waiting, transition q2 will fire and the token distribution of the event
graph will change to:
45