1) The document describes a toy model to help explain Markov chains used in forecasting. It involves the metabolic carbon flow in extinct arborescent lycopsids.
2) The toy model uses a transition matrix and state vector to represent the flow of carbon through different states over a diurnal cycle.
3) The toy model demonstrates how Markov chains can be used to model complex real-world processes and help explain sources of change and variability in large Markov chain forecasts.
1. #analyticsx
Peter Hickman
Palmetto GBA/(Reforest the Tropics-Hickman.pw@gmail.com)
Markov Chains in Forcasting: From Toy Models to Reality
Abstract
The internal workings of Markov Chain forecasts based on ever-changing real
transitions and inflow data can be extremely difficult to explain as models of complex
real-world processes. As an approach to understanding the dynamics of Markov
Chains used in forecasting, a toy model is described. Some structures and problems
in explaining large Markov Chain forecasts are noted with the final suggestion being
that toy models may be an efficient way of elucidating the construction and use of
Markov Chain forecasts.
2. #analyticsx
What is a Markov Chain?
For this discussion, a Markov chain is a way of representing a set of related transitions as a matrix with dimensions n by n
that can then be multiplied by a vector of length n. The vector holds the states of interest (for example the number of
loans in different stages of processing) and the multiplication gives a new vector representing the totals in states resulting
from the transition probabilities represented by the matrix.
This is, of course, a tiny subset of the ways Markov Chains can used. For example, I will only be discussing discrete time
uses and only models that involve multiplying a fixed transition matrix by a state vector.
The development of a toy model is shown next. This presentation concludes
with a discussion of problems with explaining or modifying a forecast or limited
predictive model that toy models might help solve.
How does it work in a forecast?
The matrix of transition probabilities is generated from data for a time period. The increments in this
time period (e. g. business days) then form the structure in the model or forecast for stepping
incrementally into a series of future states by successive multiplications by the vector of states (e.g.
numbers of loans in the stages with transitions represented by the matrix/Markov Chain). In terms of
SAS processes, the stepping can be done via Proc IML.
3. #analyticsx
Toy Model : Metabolic Carbon Flow in Arborescent Lycopsids
These now-extinct plants are thought to have had CAM-like photosynthetic metabolism with aerenchyma (spaces for storing
and moving carbon dioxide) as an adaptation to the carbon-poor/oxygen-rich atmosphere of the late paleozoic. (see references
1-3). The arrows show the fixed transition rates. Different amounts are moved through the transitions by the state vector
according to the 4 different diurnal metabolic phases. The metabolic carbon is in arbitrary units related flow in the diurnal cycle.
Carbon in Sediment
Carbon in
Aerenchyma
Carbon in
Photosynthetic
cycles
Carbon in Sugars and
the atmosphere
.30
.85
.10
.70
.80
.20
.60
.20
.20
.05
4. #analyticsx
Sediment
aerenchyma
Photo cycles Sugars and the air
Sediment .85 .10 .05
Aerenchyma .30 .70
Photo Cycles .20 .80
Sugars and the air .60 .20 .20
From the flow diagram, we write up the matrix, which gives transition probabilities (the
outgoing probabilities for each state add up to 1 {the rows}, but the inflows {the columns}
can be greater than 1, hence the build-up of coal in the Carboniferous):
5. #analyticsx
Toy Model : The four vectors of the diurnal cycle
So with the base vector showing the carbon in each state (at the end of the day, the beginning of phase I of the CAM metabolic
cycle – see summary in reference 3. This is the end of the photosynthetic cycle and not very efficient):
Sediment Aerenchyma Photocycle Sugar and air
12 1 2 4
To this we add (vector addition – in a bank loan model this would be new loans coming into the inventory) the changes in the
presence of sunlight and shifts in enzyme cycles (at night for example to PEPC and Hatch/Slack/Korshak which fix carbon
interactively with the aerenchyma -- acting like vacuoles in a standard CAM cycle):
And then multiply the transition matrix by the phase I vector (vresult = VphaseI * Tmatrix in SAS IML terms) giving a new vector of
states (vresult), which go on to be added to by the phase II environment vector at sunrise.
To summarize the four phases: I Night – PEPC and other enzyme cycles store up carbon in the aerenchyma to be metabolized later; II
Early Morning – a mixed photosynthetic cycle; III – optimized use of all metabolic sources of carbon; IV – inefficient (photorespiration)
and transition back to carbon storage.
Sediment Aerenchyma Photocycle Sugar and air
12 8 0 0
6. #analyticsx
Toy Model : From Night to Day
So night comes to the late Paleozoic and adding the vectors and taking it through the transitions with the base vector below
showing the carbon in each state we end up with states as they would be at dawn:
Sediment Aerenchyma Photocycle Sugar and air
22.8 3.6 7.5 3.6
This represents the metabolic carbon distribution as the photosynthetic cycles enter their most productive phase near mid-day.
The carbon in the aerenchyma will be used up and by the end of the day the photosynthetic enzymes will be inefficiently
respiring due to a lack of metabolic carbon versus and abundance of oxygen, but at least that happens later in the day thanks to
the aerenchyma. That’s life in the late Paleozoic.
Next: Things to note based on the toy model
7. #analyticsx
Things to note:
1. The diagonal flow with a relatively high level of “non-transitions” where the quantities tracked by the
matrix and the state vector tend to stay in the self-to-self states of stasis. These characteristics of the
diagonal are typical of the kind of stable processes that this kind of Markov Chain structure can model
with some success. The implied or ideal time-sequencing of the diagonal transitions has to be built into
the structure of this type of model, ie in the ordering of the states in the vector and the matrix. Or, to put
it another way, the flow in the diagonal is based on the way the builder of the matrix and vectors reads
the flow of the process.
2. Sources of change. The modeling of a diurnal cycle showed how variation in what is added to the state
vector can be a driver of overall changes in the dynamics of the model. In large, data-driven bank models,
the variation of the inflow to the state vector can be a crucial but subtle source of overall change in the
model dynamics. Another source of change (not shown in the toy model) is data-driven change in the
transition matrix, a type of change that is seemly simple, but tending toward patterns that are often difficult
to explain except in terms of the model itself.
3. Representing or simulating sources of change as additions ( new volume/external
inflow/environmental impacts) to the state vector or as changes in the transition matrix. It is of course
possible to use multiple transition matrices sequentially in the same forecast – to simulate an
anticipated change in the future for example.
8. #analyticsx
Transitions in the Business World: scenarios and sources of
variability
1. Scenario development is potentially challenging when the scenarios have to function eventually as
modifications to full-blown transition-matrix-based models.
2. Isolating cause-and-effect can be non-trivial when both inputs and the matrices vary over time.
4. Interpretation and description of parameter variation (from model to matrix and back again) both in
terms of observed anomalies and forecasting future impacts.
Leaving behind, for the moment , the landscape of the late Paleozoic, what happens with transition
models of loan processing featuring more than 20 loan states and hundreds of transitions? Loading
a series of matrices over time with the results of observing millions of transactions of course has an
impact all its own. Here are some problems I noticed that could be addressed by using toy models
as explanatory or heuristic tools. Of course toy models would need a better name. Something like
“business frames” might work, maybe.
3. Combining the problems of both scenario development and cause-and-effect problems when
forecasting a range of future impacts.
10. #analyticsx
Plant Physiol. 1999 Nov; 121(3): 849–856.
PMCID: PMC59447
Modulation of Rubisco Activity during the Diurnal Phases of the Crassulacean Acid Metabolism
Plant Kalanchoë daigremontiana1
Kate Maxwell,* Anne M. Borland, Richard P. Haslam, Brent R. Helliker,2 Andrew Roberts, and Howard
Griffiths
Environmental and Molecular Plant Physiology Laboratory, King George VI Building, Department of
Agricultural and Environmental Science, The University, Newcastle upon Tyne NE1 7RU, United
Kingdom
2Present address: Department of Biology, University of Utah, Salt Lake City, UT 84112.
*Corresponding author; e-mail ku.ca.eltsacwen@llewxam.etak; fax 44–191–222–5228.
Author information ► Article notes ► Copyright and License information ►
Received 1999 May 5; Accepted 1999 Jul 29.
Reference 3