This document discusses Markov chains and provides an example of modeling the distribution of red squirrels and gray squirrels in Great Britain over time.
1) A Markov chain models transitions between states based on conditional probabilities, with the transition matrix representing the probabilities of moving between states. This is used to model the distribution of red and gray squirrels across different regions.
2) Squares of land are classified into four states based on which squirrel species are present. Transition counts between states over time are used to construct a transition matrix representing conditional probabilities.
3) The steady-state matrix and distribution indicate that in the long run, around 17% of regions will contain only red squirrels, 6% only gray
1. 3.8 Fundamentals of Markov Chains
A Markov chain is a special class of state model. As with earlier state models, it consists of a
collection of states, only now we are modeling probabilities of transitions between states. The
weight assigned to each arrow is now interpreted as either the probability that something in the state
at the arrow's tail moves to the state at the arrow's head, or the percentage of things at the arrow's
tail which move to the state at the arrow's head. At each time step, something in one state must
either remain where it is or move to another state. Thus the sum of the arrows into (or out of) a state
must be one. The state vector X(i) in a Markov model traditionally lists either the probability that a
system is in a particular state at a particular time, or the percentage of the system which is in each
state at a given time. Thus X(t) is a probability distribution vector and must sum to one. We have
occasionally mentioned such vectors in what we have done before, but when dealing with a Markov
model we deal with probability distribution vectors exclusively. Recapping, there are three
properties which identify a state model as being a Markov model: 1) The Markov assumption: the
probability of one's moving from state i to state j is independent of what happened before moving to
state j and of how one got to state i. 2) Conservation: the sum of the probabilities out of a state must
be one. 3) The vector X(t) is a probability distribution vector which describes the probability of the
system's being in each of the states at time n.
In some sense, we have been assuming the Markov assumption all along. By this we mean that we
have been assuming that the number being assigned to a state during a time step depends only on
the way things were distributed during the prior time step and not any further back than that. This
was the fourth convention we made when defining state diagrams. Essentially it says that we are
considering only first-order recurrence relations. Strictly speaking the Markov assumption refers to
only probabilities, but we used equivalents of it with birth rates that were greater than one. When
discussing the probabilities associated with a Markov chain, the term conditional probability is
often used. Conditional probability means just the probability of something's happening given that
something else has already happened. In our case the probability of moving from state i to state j
assumes we were in state i to begin with, so, technically, this is a conditional probability.
The transition matrix for a Markov chain is then a matrix of probabilities (conditional probabilities
if we are perfectly correct) of moving from one state to another. Thus
We also require that the each column sums to one in order to satisfy the conservation property. The
system moves from states given by column indices to states given by row indices. For example, p21
is the probability of the system's moving from state 2 to state 1.
We can represent a Markov chain using a state diagram (Figure 3.12).
The transition probabilities pl:j are shown as the flows between states.
Stages, States, and Classes
1
2. FIGURE 3.12 General State diagram of a Markov Chain.
Consider the following transition matrix for a Markov chain:
There are three states for this chain, which we label i = 1,2,3. The state diagram for this chain is
shown in Figure 3.13.
Unlike models discussed earlier, the vector X(t) does not give the number of individuals in each
state at time t; rather it gives the probability that the system is in each state at time r. It is
conventional with Markov chains to denote X(t) as Xt. An initial distribution Xo, is a distribution for
the chance that the system is initially in each of the states. For instance, suppose
The interpretation X0 is that there is a 50% chance the system is initially in state 1, 30% chance it is
in state 2, and a 20% chance it is in state 3.
In this context, matrix multiplication gives the probability distribution one time step later. That is,
2
3. where X0 is an initial distribution. Using the transition matrix and initial distribution from above, we
have
so that after one time step, there is a 36% chance of the system's being in state 1, and 35% and 29%
chances of being in states 2 and 3 respectively. Using this notation, the distribution after t = n time
steps is given by
(62)
3
4. An important idea, which we make use of in the next two sections, is whether the sequence of
column vectors Xn, n > 1 converges to a steady-state (unchanging from time step to time step)
column vector . Determining allows us to answer long-term behavior questions we may
pose.
We observe here that if Xn —> as n —> oo, we must have the matrix Tn approaching some fixed
matrix L, that is
(63)
The matrix L, if it exists, is referred to as the "steady-state" matrix. The convergence of the matrix
Tn to the steady-state matrix L is independent of the initial distribution XQ, as-equation (63) shows.
The steady-state distribution and the steady-state matrix L can be shown to exist, provided that
the transition matrix T satisfies the property that some power has all positive entries. Matrices
satisfying this condition are called regular. If T is regular, we find the steady-state distribution
by solving the set of equations.
(64)
for , along with the condition that the sum of the entries in must be one. The matrix equation
(64) clearly conveys the idea that the steady-state distribution is a fixed point of the system of
equations (62). Equivalently, is an eigenvector with eigenvalue one. An intuitively appealing
method for determining the steady-state distribution , is to compute (or approximate) the steady-
state matrix L. Traditionally, this is done analytically using a method called matrix diagonalization.
Since
we approximate L by computing Tn for large values of n. This is easily done using a software
package, or, if the number of states is small, a calculator with matrix capabilities.
For our example, the steady-state matrix is approximately
(65)
The form of the matrix in equation (65) might at first glance appear surprising. If the steady-state
matrix L exists, it has the form given in (65) where each of the columns are identical. This fact
4
5. follows from the equation = LXo and recalling that the sum of the entries in the column vector
XQ isone. This equation also demonstrates that each column of L( ≈ Tn, for n large) is . Thus, for
our example, we have the steady-state distribution
There is another class of Markov chains which have important modeling properties. Consider the
following example of a transition matrix.
(66)
The state diagram for this transition matrix is shown in Figure 3.14.
This system has some important features. States 4 and 5 are called "absorbing" states. Once the
system enters an absorbing state, the system remains in that state from that time on. The absorbing
states are easily identified from the state diagram in that they have loops with weight one. States 4
and 5 both have loops with weight one.
Absorbing Markov chains are different in structure than those we have previously considered. An
absorbing state precludes the transition matrix from being regular. The assumption that the
transition matrix is regular is enough to ensure the existence of a steady-state matrix, but is not a
characterization. Steady-state matrices exist for absorbing.
FIGURE 3.14 State diagram for the Absorbing Markov Chain.
5
6. Markov chains, and the additional structure of the absorbing chains provides useful information.
The project section considers an example of a nonabsorbing, non-regular transition matrix for which
a steady-state matrix can be calculated.
If we compute the steady-state matrix for the above absorbing chain, we obtain
This matrix exhibits several properties that we need later on. Examining the structure of the
transition matrix T in equation (66), we see that it can be decomposed into blocks
of the form
The matrix I2x2 is just the 2 x 2 identity matrix, and if it was not obvious, we formed the blocks
around the identity matrix block. This decomposition is always possible for absorbing Markov
chains, though we may need to re-label the states so that the absorbing states are listed last (so the
identity matrix is in the proper position). In general, if a Markov chain has a absorbing states and b
nonabsorbing states, we can arrange the transition matrix to have the form
6
7. This block decomposition gives useful information about the absorbing Markov chain. The steady-
state matrix L has the form
The entries in the matrix represent the probability of being absorbed in ith
absorbing state if the system was initially in the jth nonabsorbing state. In the example,
These entries are viewed as "absorption" probabilities. For example, there is a 71.43% chance that
the system will be absorbed in state 4, given that it initially started in state 2. To understand which
state is which, refer back to the columns and rows of (67). The other entries have a similar
interpretation.
Further information is obtained from the fundamental matrix .The entries
of this matrix are the average number of times the process is in state j, given that it began in state i.
A proof of this result is in Olinick [48]. For our example, the fundamental matrix is
Recalling the block form of the transition matrix (68), the position of the submatrix A indicates that
i and j have values 1, 2, or 3, so that f1,1 = 1.25 is the average number of time steps that the system
is in state 1, given that it was initially in state 1. The other entries have analogous interpretations.
The sum of the entries of the jth column of the fundamental matrix F is the average number of time
steps for a process initially in state j to be absorbed. For example, if the system is initially in state 1,
it takes an average of 1.25 + 0.7143 + 0.7143 = 2.6786 time steps before the system enters an
absorbing state. The next two sections present models based upon Markov chains and use the above
analysis. The project section also contains some interesting Markov models, as well as some further
points of the theory of Markov chains.
3.9 Markovian Squirrels
The American gray squirrel (Sciurus carolinensis Gmelin) was introduced in Great Britain by a
series of releases from various sites starting in the late nineteenth century. In 1876, the first gray
7
8. squirrels were imported from North America, and have subsequently spread throughout England
and Wales, as well as parts of Scotland and Ireland.
Simultaneously, the native red squirrel (Sciurus vulgaris L.), considered the endemic subspecies,
has disappeared from most of the areas colonized by gray squirrels. Originally, the red squirrel was
distributed throughout Europe and eastward to northern China, Korea, and parts of the Japanese
archipelago. During the last century, the red squirrel has consistently declined, becoming extinct in
many areas of England and Wales, so that it is now confined almost solely to Northern England and
Scotland. A few isolated red squirrel populations exist on offshore islands in southern England and
mountainous Wales.
The introduction of the American gray squirrel continued until the early 1920s, by which time the
gray squirrels had rapidly spread throughout England. By 1930 it was apparent that the gray squirrel
was a pest in deciduous forests, and control measures were attempted. Once the pest status of the
gray squirrel was recognized, national distribution surveys were undertaken. The resulting
distribution maps clearly showed the tendency for the red squirrel to be lost from areas that had
been colonized by the gray squir during the preceding 15 to 20 years.
Since 1973, an annual questionnaire has been circulated to foresters by the British Forestry
Commission. The questionnaire concerns the presence or absence of the m squirrel species. It also
includes questions on the changes of squirrel abundance, details of tree damage, squirrel control
measures, and the number of squirrels killed. Using c data collected by the Forestry Commission,
we wish to construct a model to predict the trends in the distribution of both species of squirrels in
Great Britain.
Several researchers have studied the British squirrel populations, notably Reynolds [53]. and Usher
et al. [68]. The annual Forestry Commission data has been summarized in tl form of distribution
maps reflecting change over a two-year period.
Usher et al., [68] used an overlay technique to extract data from the distribution map Each 10-km
square on the overlay map that contained Forestry Commission land classified into one of four
states:
R: only red squirrels recorded in that year.
G: only gray squirrels recorded in that year.
B: both species of squirrels recorded in that year.
O: neither species of squirrels recorded in that year.
In order to satisfy the Markov assumption, squares that were present only in tv.: consecutive years
were counted. Counting the pairs of years, squares are allocated to any one of 16 classes, e.g., R ->
R, R -> G, G -> G, B -> O, etc.
A summary of these transition counts for each pair of years from 1973-74 to 1987-88 is given in
table 3.3 and is reprinted by permission of Blackwell Science Inc.
A frequency interpretation is required to employ the Markov chain analysis. If the entries in each
column are totaled, the corresponding matrix entry is found by division. For example, column R has
a total 2.529 + 61 + 282 + 3 = 2,875, so that the entry in the R, R position is 2,529/2,875 0.8797.
Care must be taken when calculating these frequencies. Inappropriate rounding will violate the
requirement that the columns sum to
8
9. TABLE 3.3 Red and Gray Squirrel Distribution Map Data for Great Britain.
3 Stages, States, and Classes
FIGURE 3.15 State diagram for the Markov Squirrels.
one. The transition matrix (rows and columns are in R, G, B, O order) is
The state diagram of this transition matrix T is given in Figure 3.15. We interpret these transition
frequencies as conditional probabilities. For example, there is an 87.97% chance that squares that
9
10. are currently in state R (red squirrels only) will remain in state R; similarly, there is a 2.73% chance
of squares that are currently occupied by both squirrel species, state G, will become occupied by
neither species, state B, after the next time step. Since the data taken from the annual Forestry
Commission survey is summarized as pairs of years, each time step represents a two-year period.
The matrix form of the transition probabilities is convenient for calculations. Using matrix
multiplication, we compute the two-time-step transition matrix as T2 = T x T, which is given by
The entries of this transition matrix are again interpreted as conditional probabilities. For instance,
there is a 17.33% chance that squares currently occupied by only red squirrels, state R, will be
occupied by both species, state B, in two time steps (four years).
Using the transition matrix T, it is possible to gain insight into the long-term behavior of the two
species of squirrels. We compute the steady-state matrix L for the two squirrel populations. The
question of interest in the study of the squirrel populations is what happens to the distribution of the
squirrel populations over a long period of time.
For our squirrel model, the steady-state matrix is approximately
Thus the steady-state distribution is
This result is interpreted as the long-term behavior of the squirrel populations in Great Britain as
follows: 17.05% of the squares will be in state R, containing only red squirrels. There will be 5.6%
of the squares in state G containing only gray squirrels. There will be populations of both squirrels
10
11. in 34.21% of the squares (state B), with the majority of the squares, 43.14%, being occupied by
neither species of squirrels (state O).
If the assumptions made in this model are correct, the red squirrel is not currently in danger. In fact,
it will have sole possession of more regions than the gray squirrel will have. In the long term, the
gray squirrels do not drive the reds to extinction. Actually this analysis says nothing about
population sizes, only about the number of regions controlled by each type of squirrel. While it
seems plausible that if the red squirrel territory (number of regions) is declining, then the population
is declining; the opposite may be true. A problem in the projects section asks you to perform this
analysis for the two squirrel species in Scotland, where the red squirrel is still widely distributed.
11