#analyticsx
Peter Hickman
Palmetto GBA/(Reforest the Tropics-Hickman.pw@gmail.com)
Markov Chains in Forcasting: From Toy Models to Reality
Abstract
The internal workings of Markov Chain forecasts based on ever-changing real
transitions and inflow data can be extremely difficult to explain as models of complex
real-world processes. As an approach to understanding the dynamics of Markov
Chains used in forecasting, a toy model is described. Some structures and problems
in explaining large Markov Chain forecasts are noted with the final suggestion being
that toy models may be an efficient way of elucidating the construction and use of
Markov Chain forecasts.
#analyticsx
What is a Markov Chain?
For this discussion, a Markov chain is a way of representing a set of related transitions as a matrix with dimensions n by n
that can then be multiplied by a vector of length n. The vector holds the states of interest (for example the number of
loans in different stages of processing) and the multiplication gives a new vector representing the totals in states resulting
from the transition probabilities represented by the matrix.
This is, of course, a tiny subset of the ways Markov Chains can used. For example, I will only be discussing discrete time
uses and only models that involve multiplying a fixed transition matrix by a state vector.
The development of a toy model is shown next. This presentation concludes
with a discussion of problems with explaining or modifying a forecast or limited
predictive model that toy models might help solve.
How does it work in a forecast?
The matrix of transition probabilities is generated from data for a time period. The increments in this
time period (e. g. business days) then form the structure in the model or forecast for stepping
incrementally into a series of future states by successive multiplications by the vector of states (e.g.
numbers of loans in the stages with transitions represented by the matrix/Markov Chain). In terms of
SAS processes, the stepping can be done via Proc IML.
#analyticsx
Toy Model : Metabolic Carbon Flow in Arborescent Lycopsids
These now-extinct plants are thought to have had CAM-like photosynthetic metabolism with aerenchyma (spaces for storing
and moving carbon dioxide) as an adaptation to the carbon-poor/oxygen-rich atmosphere of the late paleozoic. (see references
1-3). The arrows show the fixed transition rates. Different amounts are moved through the transitions by the state vector
according to the 4 different diurnal metabolic phases. The metabolic carbon is in arbitrary units related flow in the diurnal cycle.
Carbon in Sediment
Carbon in
Aerenchyma
Carbon in
Photosynthetic
cycles
Carbon in Sugars and
the atmosphere
.30
.85
.10
.70
.80
.20
.60
.20
.20
.05
#analyticsx
Sediment
aerenchyma
Photo cycles Sugars and the air
Sediment .85 .10 .05
Aerenchyma .30 .70
Photo Cycles .20 .80
Sugars and the air .60 .20 .20
From the flow diagram, we write up the matrix, which gives transition probabilities (the
outgoing probabilities for each state add up to 1 {the rows}, but the inflows {the columns}
can be greater than 1, hence the build-up of coal in the Carboniferous):
#analyticsx
Toy Model : The four vectors of the diurnal cycle
So with the base vector showing the carbon in each state (at the end of the day, the beginning of phase I of the CAM metabolic
cycle – see summary in reference 3. This is the end of the photosynthetic cycle and not very efficient):
Sediment Aerenchyma Photocycle Sugar and air
12 1 2 4
To this we add (vector addition – in a bank loan model this would be new loans coming into the inventory) the changes in the
presence of sunlight and shifts in enzyme cycles (at night for example to PEPC and Hatch/Slack/Korshak which fix carbon
interactively with the aerenchyma -- acting like vacuoles in a standard CAM cycle):
And then multiply the transition matrix by the phase I vector (vresult = VphaseI * Tmatrix in SAS IML terms) giving a new vector of
states (vresult), which go on to be added to by the phase II environment vector at sunrise.
To summarize the four phases: I Night – PEPC and other enzyme cycles store up carbon in the aerenchyma to be metabolized later; II
Early Morning – a mixed photosynthetic cycle; III – optimized use of all metabolic sources of carbon; IV – inefficient (photorespiration)
and transition back to carbon storage.
Sediment Aerenchyma Photocycle Sugar and air
12 8 0 0
#analyticsx
Toy Model : From Night to Day
So night comes to the late Paleozoic and adding the vectors and taking it through the transitions with the base vector below
showing the carbon in each state we end up with states as they would be at dawn:
Sediment Aerenchyma Photocycle Sugar and air
22.8 3.6 7.5 3.6
This represents the metabolic carbon distribution as the photosynthetic cycles enter their most productive phase near mid-day.
The carbon in the aerenchyma will be used up and by the end of the day the photosynthetic enzymes will be inefficiently
respiring due to a lack of metabolic carbon versus and abundance of oxygen, but at least that happens later in the day thanks to
the aerenchyma. That’s life in the late Paleozoic.
Next: Things to note based on the toy model
#analyticsx
Things to note:
1. The diagonal flow with a relatively high level of “non-transitions” where the quantities tracked by the
matrix and the state vector tend to stay in the self-to-self states of stasis. These characteristics of the
diagonal are typical of the kind of stable processes that this kind of Markov Chain structure can model
with some success. The implied or ideal time-sequencing of the diagonal transitions has to be built into
the structure of this type of model, ie in the ordering of the states in the vector and the matrix. Or, to put
it another way, the flow in the diagonal is based on the way the builder of the matrix and vectors reads
the flow of the process.
2. Sources of change. The modeling of a diurnal cycle showed how variation in what is added to the state
vector can be a driver of overall changes in the dynamics of the model. In large, data-driven bank models,
the variation of the inflow to the state vector can be a crucial but subtle source of overall change in the
model dynamics. Another source of change (not shown in the toy model) is data-driven change in the
transition matrix, a type of change that is seemly simple, but tending toward patterns that are often difficult
to explain except in terms of the model itself.
3. Representing or simulating sources of change as additions ( new volume/external
inflow/environmental impacts) to the state vector or as changes in the transition matrix. It is of course
possible to use multiple transition matrices sequentially in the same forecast – to simulate an
anticipated change in the future for example.
#analyticsx
Transitions in the Business World: scenarios and sources of
variability
1. Scenario development is potentially challenging when the scenarios have to function eventually as
modifications to full-blown transition-matrix-based models.
2. Isolating cause-and-effect can be non-trivial when both inputs and the matrices vary over time.
4. Interpretation and description of parameter variation (from model to matrix and back again) both in
terms of observed anomalies and forecasting future impacts.
Leaving behind, for the moment , the landscape of the late Paleozoic, what happens with transition
models of loan processing featuring more than 20 loan states and hundreds of transitions? Loading
a series of matrices over time with the results of observing millions of transactions of course has an
impact all its own. Here are some problems I noticed that could be addressed by using toy models
as explanatory or heuristic tools. Of course toy models would need a better name. Something like
“business frames” might work, maybe.
3. Combining the problems of both scenario development and cause-and-effect problems when
forecasting a range of future impacts.
#analyticsx
Proc Biol Sci. 2010 Aug 7; 277(1692): 2257–2267.
Published online 2010 Mar 31. doi: 10.1098/rspb.2010.0224
PMCID: PMC2894907
The function of the aerenchyma in arborescent lycopsids: evidence
of an unfamiliar metabolic strategy
W. A. Green*
Department of Paleobiology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
*Email: ten.locirb@neergaw
Author information ► Article notes ► Copyright and License information ►
Received 2010 Feb 4; Accepted 2010 Mar 9.
Copyright © 2010 The Royal Society
Published online 2010 Mar 31.
doi: 10.1098/rspb.2010.0224
PMCID: PMC2894907
Reference 1
Proc Natl Acad Sci U S A. 2002 Oct 1;99(20):12567-71. Epub 2002 Sep 16.
Low atmospheric CO(2) levels during the Permo- Carboniferous glaciation inferred from
fossil lycopsids.
Beerling DJ1.
Reference 2
#analyticsx
Plant Physiol. 1999 Nov; 121(3): 849–856.
PMCID: PMC59447
Modulation of Rubisco Activity during the Diurnal Phases of the Crassulacean Acid Metabolism
Plant Kalanchoë daigremontiana1
Kate Maxwell,* Anne M. Borland, Richard P. Haslam, Brent R. Helliker,2 Andrew Roberts, and Howard
Griffiths
Environmental and Molecular Plant Physiology Laboratory, King George VI Building, Department of
Agricultural and Environmental Science, The University, Newcastle upon Tyne NE1 7RU, United
Kingdom
2Present address: Department of Biology, University of Utah, Salt Lake City, UT 84112.
*Corresponding author; e-mail ku.ca.eltsacwen@llewxam.etak; fax 44–191–222–5228.
Author information ► Article notes ► Copyright and License information ►
Received 1999 May 5; Accepted 1999 Jul 29.
Reference 3

28359-eposter-PeterHickman

  • 1.
    #analyticsx Peter Hickman Palmetto GBA/(Reforestthe Tropics-Hickman.pw@gmail.com) Markov Chains in Forcasting: From Toy Models to Reality Abstract The internal workings of Markov Chain forecasts based on ever-changing real transitions and inflow data can be extremely difficult to explain as models of complex real-world processes. As an approach to understanding the dynamics of Markov Chains used in forecasting, a toy model is described. Some structures and problems in explaining large Markov Chain forecasts are noted with the final suggestion being that toy models may be an efficient way of elucidating the construction and use of Markov Chain forecasts.
  • 2.
    #analyticsx What is aMarkov Chain? For this discussion, a Markov chain is a way of representing a set of related transitions as a matrix with dimensions n by n that can then be multiplied by a vector of length n. The vector holds the states of interest (for example the number of loans in different stages of processing) and the multiplication gives a new vector representing the totals in states resulting from the transition probabilities represented by the matrix. This is, of course, a tiny subset of the ways Markov Chains can used. For example, I will only be discussing discrete time uses and only models that involve multiplying a fixed transition matrix by a state vector. The development of a toy model is shown next. This presentation concludes with a discussion of problems with explaining or modifying a forecast or limited predictive model that toy models might help solve. How does it work in a forecast? The matrix of transition probabilities is generated from data for a time period. The increments in this time period (e. g. business days) then form the structure in the model or forecast for stepping incrementally into a series of future states by successive multiplications by the vector of states (e.g. numbers of loans in the stages with transitions represented by the matrix/Markov Chain). In terms of SAS processes, the stepping can be done via Proc IML.
  • 3.
    #analyticsx Toy Model :Metabolic Carbon Flow in Arborescent Lycopsids These now-extinct plants are thought to have had CAM-like photosynthetic metabolism with aerenchyma (spaces for storing and moving carbon dioxide) as an adaptation to the carbon-poor/oxygen-rich atmosphere of the late paleozoic. (see references 1-3). The arrows show the fixed transition rates. Different amounts are moved through the transitions by the state vector according to the 4 different diurnal metabolic phases. The metabolic carbon is in arbitrary units related flow in the diurnal cycle. Carbon in Sediment Carbon in Aerenchyma Carbon in Photosynthetic cycles Carbon in Sugars and the atmosphere .30 .85 .10 .70 .80 .20 .60 .20 .20 .05
  • 4.
    #analyticsx Sediment aerenchyma Photo cycles Sugarsand the air Sediment .85 .10 .05 Aerenchyma .30 .70 Photo Cycles .20 .80 Sugars and the air .60 .20 .20 From the flow diagram, we write up the matrix, which gives transition probabilities (the outgoing probabilities for each state add up to 1 {the rows}, but the inflows {the columns} can be greater than 1, hence the build-up of coal in the Carboniferous):
  • 5.
    #analyticsx Toy Model :The four vectors of the diurnal cycle So with the base vector showing the carbon in each state (at the end of the day, the beginning of phase I of the CAM metabolic cycle – see summary in reference 3. This is the end of the photosynthetic cycle and not very efficient): Sediment Aerenchyma Photocycle Sugar and air 12 1 2 4 To this we add (vector addition – in a bank loan model this would be new loans coming into the inventory) the changes in the presence of sunlight and shifts in enzyme cycles (at night for example to PEPC and Hatch/Slack/Korshak which fix carbon interactively with the aerenchyma -- acting like vacuoles in a standard CAM cycle): And then multiply the transition matrix by the phase I vector (vresult = VphaseI * Tmatrix in SAS IML terms) giving a new vector of states (vresult), which go on to be added to by the phase II environment vector at sunrise. To summarize the four phases: I Night – PEPC and other enzyme cycles store up carbon in the aerenchyma to be metabolized later; II Early Morning – a mixed photosynthetic cycle; III – optimized use of all metabolic sources of carbon; IV – inefficient (photorespiration) and transition back to carbon storage. Sediment Aerenchyma Photocycle Sugar and air 12 8 0 0
  • 6.
    #analyticsx Toy Model :From Night to Day So night comes to the late Paleozoic and adding the vectors and taking it through the transitions with the base vector below showing the carbon in each state we end up with states as they would be at dawn: Sediment Aerenchyma Photocycle Sugar and air 22.8 3.6 7.5 3.6 This represents the metabolic carbon distribution as the photosynthetic cycles enter their most productive phase near mid-day. The carbon in the aerenchyma will be used up and by the end of the day the photosynthetic enzymes will be inefficiently respiring due to a lack of metabolic carbon versus and abundance of oxygen, but at least that happens later in the day thanks to the aerenchyma. That’s life in the late Paleozoic. Next: Things to note based on the toy model
  • 7.
    #analyticsx Things to note: 1.The diagonal flow with a relatively high level of “non-transitions” where the quantities tracked by the matrix and the state vector tend to stay in the self-to-self states of stasis. These characteristics of the diagonal are typical of the kind of stable processes that this kind of Markov Chain structure can model with some success. The implied or ideal time-sequencing of the diagonal transitions has to be built into the structure of this type of model, ie in the ordering of the states in the vector and the matrix. Or, to put it another way, the flow in the diagonal is based on the way the builder of the matrix and vectors reads the flow of the process. 2. Sources of change. The modeling of a diurnal cycle showed how variation in what is added to the state vector can be a driver of overall changes in the dynamics of the model. In large, data-driven bank models, the variation of the inflow to the state vector can be a crucial but subtle source of overall change in the model dynamics. Another source of change (not shown in the toy model) is data-driven change in the transition matrix, a type of change that is seemly simple, but tending toward patterns that are often difficult to explain except in terms of the model itself. 3. Representing or simulating sources of change as additions ( new volume/external inflow/environmental impacts) to the state vector or as changes in the transition matrix. It is of course possible to use multiple transition matrices sequentially in the same forecast – to simulate an anticipated change in the future for example.
  • 8.
    #analyticsx Transitions in theBusiness World: scenarios and sources of variability 1. Scenario development is potentially challenging when the scenarios have to function eventually as modifications to full-blown transition-matrix-based models. 2. Isolating cause-and-effect can be non-trivial when both inputs and the matrices vary over time. 4. Interpretation and description of parameter variation (from model to matrix and back again) both in terms of observed anomalies and forecasting future impacts. Leaving behind, for the moment , the landscape of the late Paleozoic, what happens with transition models of loan processing featuring more than 20 loan states and hundreds of transitions? Loading a series of matrices over time with the results of observing millions of transactions of course has an impact all its own. Here are some problems I noticed that could be addressed by using toy models as explanatory or heuristic tools. Of course toy models would need a better name. Something like “business frames” might work, maybe. 3. Combining the problems of both scenario development and cause-and-effect problems when forecasting a range of future impacts.
  • 9.
    #analyticsx Proc Biol Sci.2010 Aug 7; 277(1692): 2257–2267. Published online 2010 Mar 31. doi: 10.1098/rspb.2010.0224 PMCID: PMC2894907 The function of the aerenchyma in arborescent lycopsids: evidence of an unfamiliar metabolic strategy W. A. Green* Department of Paleobiology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA *Email: ten.locirb@neergaw Author information ► Article notes ► Copyright and License information ► Received 2010 Feb 4; Accepted 2010 Mar 9. Copyright © 2010 The Royal Society Published online 2010 Mar 31. doi: 10.1098/rspb.2010.0224 PMCID: PMC2894907 Reference 1 Proc Natl Acad Sci U S A. 2002 Oct 1;99(20):12567-71. Epub 2002 Sep 16. Low atmospheric CO(2) levels during the Permo- Carboniferous glaciation inferred from fossil lycopsids. Beerling DJ1. Reference 2
  • 10.
    #analyticsx Plant Physiol. 1999Nov; 121(3): 849–856. PMCID: PMC59447 Modulation of Rubisco Activity during the Diurnal Phases of the Crassulacean Acid Metabolism Plant Kalanchoë daigremontiana1 Kate Maxwell,* Anne M. Borland, Richard P. Haslam, Brent R. Helliker,2 Andrew Roberts, and Howard Griffiths Environmental and Molecular Plant Physiology Laboratory, King George VI Building, Department of Agricultural and Environmental Science, The University, Newcastle upon Tyne NE1 7RU, United Kingdom 2Present address: Department of Biology, University of Utah, Salt Lake City, UT 84112. *Corresponding author; e-mail ku.ca.eltsacwen@llewxam.etak; fax 44–191–222–5228. Author information ► Article notes ► Copyright and License information ► Received 1999 May 5; Accepted 1999 Jul 29. Reference 3