Markov-Switching Models for Volatility Forecasting and Trading

of 16
(Version 1)
Volatility trading using Markov-Regime Switching models:
Dynamic replication of variance swap
Chong Seng Choi
Imperial College London, MSc
ARTICLE INFO ABSTRACT
Keywords:
Markov-Switching Multifractal
model
Markov-Swithing GARCH model
Volatility
Variance swap
Volatility trading
In this paper, I evaluate the performance of Markov-Switching Multifractal
model and Markov-Switching GARCH in forecasting and trading volatility.
Using data on four major U.S index, I find the following results. First, MSM
outperforms MS-GARCH out-of-sample at forecasting horizons of 10-50 days
but is comparable to MS-GARCH at 1-day forecasting horizon. Second, MS-
GARCH generates too high a forecast in volatile period and for assets with
high historical volatility and too low a forecast in low volatility environment
and in multi-step ahead forecast. In contrast, MSM is able to capture the
multiscaling and long memory characteristics of volatility as well as structural
changes in the volatility process. Third, in terms of trading profits, MS-
GARCH outperforms MSM in both intra-day volatility and monthly variance
swap trading. However, I find that the outperformance in monthly trading is
due to inefficient market pricing of implied volatilities, which tend to be under-
priced in volatile period but over-priced in low volatility period. Lastly, using
at-the-money implied volatility as predictor of the direction of future realized
volatility, I find significant return from monthly variance swap trading - results
that lend support to the conclusion that implied volatilities are mispriced.
1. Introduction
Volatility of asset returns is an integral part of profitably
trading and pricing of financial derivatives such as options.
However, the time-varying and high persistent
characteristics of volatility mean that volatility can move
suddenly but also clusters around different volatility levels.
Early empirical researches have showed that volatility
clustering can remain substantial over long horizons (Ding,
Granger, and Engle, 1993) but volatility persistence can
vary from a horizon of few days to few years. Thus, under
the conditions that volatility is both highly persistence and
highly variable, it is natural to expect that volatility
fluctuations have significant impact on valuation and risk-
management.
Many academia and practitioners use autoregressive
conditional heteroskedasticity (ARCH) model by Engle
(1982) and the generalized ARCH (GARCH) model by
Bollerslev (1986) to model time-varying volatility.
Although ARCH/GARCH models generally provide better
forecasts over historical and implied volatility, Klaassen
(2002) showed, using daily data on U.S dollar exchange
rates, that these models generate forecasts that are,
nonetheless, too high in volatile periods. Klaassen (2002)
attributed the excessive GARCH forecasts to the well-
known high persistence of individual shocks in those
forecasts. According to Lamoureux and Lastrapes (1990),
as cited by Klaassen (2002), persistence of shocks in
volatility may originate from structural change in the
volatility process, if shocks persist and remain constant for
some time, albeit short, the persistence of shocks in those
periods may result in volatility persistence. Standard
GARCH models, which pick up the short-run
autocorrelation in volatility, put all volatility persistence in
the persistence of individual shocks (Klaassen, 2002). For
this reason, one would expect to improve forecasts by
incorporating the structural changes in volatility process in
GARCH models. Markov regime-switching model, first
adopted by Hamilton (1989, 1990) to describe the U.S.
business cycle, can be used to describe the switches between
regimes with different volatility process. Klaassen (2002)
developed a two-regimes GARCH model to solve the
problem of excessive GARCH forecasts. The resulting
Markov-Switching GARCH permits the conditional mean
and return volatility to depend on an unobserved latent state
that switches stochastically, thus capturing the changes in
volatility dynamics while yielding an extra source of
volatility persistence.

of 16
Although MS-GARCH generally improves upon
standard GARCH model, application of the MS-GARCH
requires researchers to process separately regime switches
at different frequencies and rely on GARCH components to
capture the autoregressive dynamics in volatility. If one
thinks of volatility as consisting finitely many states, with
each state capturing different degree of volatility persistence,
regime-switching ARCH/GARCH models quickly become
unusable when one tries to incorporate the entire volatility
dynamic as the number of parameters grows quadratically
with the number of regimes.
The Markov-Switching Multifractal model of Calvet and
Fisher (2004) breaks this barrier by assuming that volatility
is the product of large number discrete states, each of which
represents different degree of frequencies and can randomly
switch to a new value drawn from a pre-specified
distribution. Volatility jumps when regime switches affect
the high frequency components of volatility but the change
in volatility can be extremely persistent if the switches
affect the low frequency components. The multi-frequency
structure of the model is consistent with the intuition that
volatility shocks have highly heterogeneous degrees of
persistence, and the decomposition of volatility into
frequency components means that MSM can capture long-
memory feature in volatility, intermediate volatility
transition, and high-frequency volatility shocks all within a
single regime-switching model.
This paper examines the forecasting performance of MS-
GARCH 1
and MSM and compares their performance
according to a set of statistical loss functions and the trading
profits obtained from a portfolio of hedged options that
replicates the payoffs of variance swaps. Section 1 reviews
the Markov-Switching GARCH. Section 2 devotes into
details of MSM model. Section 3 describes the data and out-
of-sample results. Section 4 discusses the trading strategies
and performance. Section 5 concludes.
2. Markov-Switching GARCH2
The main feature of Markov-Switching GARCH is that it
combines short-run autoregressive dynamics of GARCH
with regime switches to capture structural changes in the
data generating process (Gray, 1996; Klaassen, 2002). Let
𝑠" 𝜖 1,2 be the variance regime at time t in which 𝑠" = 1
denotes the low variance regime and otherwise. Further, it
is assumed that the latent state 𝑠" follows a first-order
Markov process with transition probability 𝑝*+ = ℙ 𝑠" =
𝑗 𝑠"./ = 𝑖) which represents the probability that state 𝑖 will
be followed by state 𝑗 (Hamilton, 1994). Collecting the
probabilities in matrix form, the transition matrix of a two-
state Markov chain is given by
1
Appendix 1 provides a formal review of the properties of
standard GARCH model and its likelihood function.
2
The first Markov-Switching GARCH was introduced by
Gray (1996) but the model does not permit a convenient
ℙ =
𝑝// 1 − 𝑝33
1 − 𝑝// 𝑝33
where 𝑝//is the probability of going to regime 1 and 1 −
𝑝// is the probability of going to regime 2 given that the
current state is regime 1.
Hence, in the most general form, a two-state Markov-
Switching GARCH can be written as
𝑟"|𝐼"./ ~
𝑓(𝜃"
/
)
𝑓(𝜃"
3
)

𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑝/,"
𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 1 − 𝑝/,"

where 𝐼"./ denotes the information set at date 𝑡 − 1 ,
𝑓 · represents the conditional distribution of the variance
process and 𝜃"
*
denotes the vector of parameters in the 𝑖-𝑡ℎ
variance regime that characterizes the distribution.
For regime 𝑖 𝜖 1,2 , let ℎ"
*
= 𝑉𝑎𝑟"./ 𝑟" 𝑠" = 𝑖) be the
variance of return 𝑟" conditional on 𝑠" = 𝑖 and past return
{𝑟+}+G*
"./
. Klaassen (2002) assumes that, given the unobserved
regime path 𝑠" = (𝑠", 𝑠"./…), the conditional variance of 𝑟"
equals 𝑉𝑎𝑟"./ 𝑟" 𝑠") and thus the conditional dynamics of
his specification equal
ℎ"
*
𝑟" 𝑠" = 𝜔*
+ 𝛼*
𝜀3
"./
+ 𝛽*
Ε"./ ℎ*
"./ 𝑟"./ 𝑠"./ 𝑠" ]
and
𝑟" = 𝜇"
*
+ 𝜀" = 𝜇"
*
+ 𝜂" ℎ"
*
where 𝜇*
= Ε 𝑟" 𝐼"./) is the conditional mean of regime 𝑖,
𝜂" is a zero mean, unit variance process. The conditional
variance not only depends on information 𝐼"./ and 𝑟" but
also depends on the regime path 𝑠"./ . To simplify
computation, Klaassen integrated out the regime 𝑠"./ at
time 𝑡 − 1 such that ℎ"./ 𝑟"./ 𝑠"./ is independent of 𝑠".3
and ℎ" 𝑟" 𝑠" depends on the current regime only, thus
ℎ"
*
𝑟" 𝑠" = 𝜔*
+ 𝛼*
𝜀3
"./
+ 𝛽*
Ε"./ ℎ*
"./ 𝑟"./ 𝑠"./ 𝑠" ]
Further, one of the main advantages of Klaassen’s (2002)
specification over Gray’s (1996) is that it allows for
convenient multi-step ahead volatility forecasts for which
forecasts at time 𝑇 depend only on information at time 𝑇 −
1. Let ℎS,STU be the 𝜏-day later volatility forecast at date 𝑇,
the total volatility over a period of 𝐾 days from 𝑇 + 1 to
𝑇 + 𝐾 is calculated as follows:
multi-period ahead variance forecasts. For simplicity, this
paper uses the model proposed by Klaassen (2002) and will
be focusing on normal distribution.

of 16
ℎST/: STY = Pr 𝑠STU = 𝑖 𝐼S./) ℎS,STU {𝑠STU = 𝑖}
3
*G/
Y
UG/
where ℎS,STU {𝑠STU = 𝑖} is 𝜏-day ahead volatility forecast
in regime 𝑖 made at date 𝑇 and can be calculated as follows:
ℎS,STU 𝑠STU = 𝑖 = Ε"./ ℎSTU./ 𝑠STU = 𝑖
= 𝜔]^_G* + 𝛼]^_G* Ε"./ 𝜀STU./
3
𝑠STU = 𝑖
+ 𝛽]^_G* Ε"./ ℎ*
STU./ 𝑠STU = 𝑖 ]
= 𝜔]^_G* + 𝛼]^_G* Ε"./ Ε"./ 𝜀STU./
3
𝑠STU 𝑠STU = 𝑖
+ 𝛽]^_G* Ε"./ ℎ*
STU./ 𝑠STU = 𝑖 ]
= 𝜔]^_G* + 𝛼]^_G* + 𝛽]^_G* Ε"./ ℎS,STU./ 𝑠STU = 𝑖
Hence, the forecast is a weighted average of volatility
forecast in each regime where the weights are the ex ante
probabilities Pr 𝑠STU = 𝑖 𝐼S./) given by
Pr 𝑠STU = 1 𝐼S./)
Pr 𝑠STU = 2 𝐼S./)
= ℙ ∙
Pr 𝑠S./ = 1 𝐼S./)
Pr 𝑠S./ = 2 𝐼S./)
where ℙ is the transition matrix and Pr 𝑠S./ = 1 𝐼S./)
will be discussed in more detail later. Let us first derive the
expectation for Ε"./ ℎS,STU./ 𝑠STU = 𝑖 :
Ε"./ ℎS,STU./ 𝑠STU = 𝑖 = E ℎSTU./ 𝑠STU = 𝑖 , 𝐼S./
= E
Ε 𝑟STU./
3
𝑠STU = 𝑗 , 𝐼S./
− Ε 𝑟STU./ 𝑠STU = 𝑗 , 𝐼S./ 3
𝑠STU = 𝑖 , 𝐼S./
= E Ε 𝑟STU./
3
𝑠STU = 𝑗 , 𝐼S./ 𝑠STU = 𝑖 , 𝐼S./
−E Ε 𝑟STU./ 𝑠STU = 𝑗 , 𝐼S./ 3
where,
E Ε 𝑟STU./
3
𝑠STU = 𝑗 , 𝐼S./ 𝑠STU = 𝑖 , 𝐼S./
= Ε 𝑟STU./
3
3
+G/
∙ Pr 𝑠STU./ = 𝑗 𝑠STU./ = 𝑖 , 𝐼S./)
= Ε 𝜇STU./ + 𝜀STU./
3
3
+G/
∙ Pr 𝑠STU./ = 𝑗 𝑠STU./ = 𝑖 , 𝐼S./)
= 𝑝+*,"./ Ε 𝜇STU./
3
+ ℎSTU./ 𝑠STU = 𝑗 , 𝐼S./
3
+G/
Applying the same reasoning to the second term, we have
E Ε 𝑟STU./ 𝑠STU = 𝑗 , 𝐼S./ 3
= 𝑝+*,"./ Ε 𝜇STU./ 𝑠STU = 𝑗 , 𝐼S./
3

3
+G/
where,
𝑝+*,S./ = Pr 𝑠STU./ = 𝑗 𝑠STU = 𝑖 , 𝐼S./)
=
𝑝+* Pr 𝑠STU./ = 𝑗 𝐼S./)
Pr 𝑠STU./ = 𝑖 𝐼S./)
and,
Ε"./ ℎS,STU./ 𝑠STU = 𝑖
= 𝑝+*,"./ Ε 𝜇STU./
3
+ ℎSTU./ 𝑠STU = 𝑗 , 𝐼S./
3
+G/
− 𝑝+*,"./ Ε 𝜇STU./ 𝑠STU = 𝑗 , 𝐼S./
3

3
+G/
For example,
Ε"./ ℎ /
"./ 𝑠"] = 𝑝//,"./ 𝜇"./
/ 3
+ ℎ /
"./
+ (1 − 𝑝//,"./) 𝜇"./
3 3
+ ℎ 3
"./
− [ 𝑝//,"./ 𝜇"./
/
+ 1 − 𝑝//,"./ 𝜇"./
3
]3
To calculate the regime probabilities, let us recall from
above that the conditional distribution of the return series 𝑟"
can be written as
𝑟"|𝐼"./ ~
𝑓(𝑟"|𝑠" = 1 , 𝐼"./)
𝑓(𝑟"|𝑠" = 2 , 𝐼"./)

𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑝/,"
𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 1 − 𝑝/,"

where 𝑓(𝑟"|𝑠" = 1 , 𝐼"./) denotes the assumed distribution
of the return series, i.e. a normal distribution. Then given
Pr 𝑠" = 𝑖 𝐼"./) at time 𝑡 − 1, the regime probabilities can
be calculated as
𝑝*" = Pr 𝑠" = 𝑖 𝐼"./) = Pr 𝑠" = 𝑖 , 𝑠"./ = 𝑗 𝐼"./)
3
+G/

= Pr 𝑠" = 𝑖 𝑠"./ = 𝑗)
3
+G/
Pr 𝑠".3 = 𝑗 𝐼"./)
= 𝑝+* Pr 𝑠"./ = 𝑗 𝐼"./)
3
+G/

of 16
because the current regime 𝑠" depends only on the regime
one period ago 𝑠"./.
Moreover, at the end of time 𝑡, we can observe the return
at time 𝑡 along with the information set. Thus, Pr 𝑠" =
𝑖 𝐼") can be calculated as follows:
Pr 𝑠" = 𝑖 𝐼") = Pr 𝑠" = 𝑖 𝑟", 𝐼") =
𝑓(𝑟", 𝑠" = 𝑖 , 𝐼"./)
𝑓(𝑟" | 𝐼"./)
where
𝑓 𝑟", 𝑠" = 𝑖 , 𝐼"./ = 𝑓 𝑟"|𝑠" = 𝑖 , 𝐼"./ 𝑓 𝑠" = 𝑖 | 𝐼"./
= 𝑓 𝑟"|𝑠" = 𝑖 , 𝐼"./ Pr 𝑠" = 𝑖 𝐼"./)
and
𝑓(𝑟" | 𝐼"./) = 𝑓 𝑟", 𝑠" = 𝑖 | 𝐼"./
3
*G/

= 𝑓 𝑟"|𝑠"
3
*G/
= 𝑖 , 𝐼"./ Pr 𝑠" = 𝑖 𝐼"./)
denote respectively the joint density of returns and the 𝑖-𝑡ℎ
regime, and marginal density function of returns. Hence,
Pr 𝑠" = 𝑖 𝐼") =
𝑓(𝑟", 𝑠" = 𝑖 , 𝐼"./)
𝑓(𝑟" | 𝐼"./)

=
𝑓 𝑟"|𝑠" = 𝑖 , 𝐼"./ Pr 𝑠" = 𝑖 𝐼"./)
𝑓 𝑟"|𝑠" = 𝑖 , 𝐼"./ Pr 𝑠" = 𝑖 𝐼"./)3
*G/
=
𝑝*" 𝑓 𝑟"|𝑠" = 𝑖 , 𝐼"./
𝑝*" 𝑓 𝑟"|𝑠" = 𝑖 , 𝐼"./
3
*G/
Lastly, the likelihood function for calculating the model
parameters is given by
ℒ 𝜃 = log Pr 𝑠" = 1 𝐼"./
S
"G/
𝑓 𝑟" 𝑠" = 1, 𝐼"./)
+ [1 − Pr 𝑠" = 1 𝐼"./)] 𝑓 𝑟" 𝑠" = 2, 𝐼"./)]
where
𝑓 𝑟" 𝑠" = 𝑖, 𝐼"./) =
1
2𝜋ℎ"
*
exp −0.5
(𝑟" − 𝜇"
*
)3

ℎ"
*
3
According to Lux, Arias and Sattarhoff (2011), the notion
of multiscaling or multifractality in financial time series
data refers to ‘the variations in the scaling behavior of
2. Markov-Switching Multifractal model
The Markov-Switching Multifractal model of Calvet and
Fisher (2004) is fundamentally different from GARCH-
class models in that it incorporates the multiscaling3
or
multifractality behaviour of time-series data (Lux and
Kaizoji, 2007). MSM assumed that instantaneous volatility
is determined by the product of 𝑘 random multipliers,
𝑀" = (𝑀/,", 𝑀3," … 𝑀p,") ∈ ℝT
p
, with heterogeneous decay
rates, and that financial return is of the form 𝑟" = 𝜎" 𝜖".
In this specification, return volatility 𝜎" is decomposed
into large number of volatility components and is driven by
the following multiplicative structure
𝜎" = 𝜎 𝑀*,"
p
*G/
//3
where the scale factor 𝜎 is the unconditional standard
deviation of innovation 𝑟" and the volatility components
𝑀p," are independent, persistent, non-negative and satisfy
the conditions 𝑀p," ≥ 0 and Ε(𝑀p,") = 1. Thus, for any
state values 𝑚 = (𝑚/,", 𝑚3," … 𝑚p,") ∈ ℝT
p
if 𝑔(𝑚)
denotes the product 𝑚*,"
p
*G/ , the stochastic volatility
process can be written as 𝜎" = 𝜎[𝑔 𝑚 ]//3
(Calvet and
Fisher, 2004).
The volatility components of the state vector 𝑀" are
assumed to have the same marginal distribution but evolves
at different frequencies. For example, if the volatility state
vector is constructed up to date 𝑡, then for 𝑘 ∈ {1, … , 𝑘 },
the next period multiplier 𝑀p,"T/ is drawn from a pre-
defined distribution 𝑀 with probability 𝛾p and is otherwise
equal to its previous value 𝑀p,"T/ = 𝑀p," with probability
1 − 𝛾p.
The switching probabilities 𝛾 ≡ (𝛾/, 𝛾3, … , 𝛾p) are given
by
𝛾* = 1−(1 − 𝛾p)z{.p
where 𝛾p ∈ (0,1) and 𝑏 ∈ (1, ∞).
The specification implies that the switching probabilities
of low frequency components increase approximately at
geometric rate 𝑏 while the switching probabilities of high
frequency components converge to 1, (𝛾* ~ 1) . This is
consistent with the intuition that volatility jumps frequently
(high frequency switches) but is less likely to cluster around
its long-run mean (low frequency switches).
Although MSM imposes minimal restrictions on the
distribution of 𝑀p , this paper follows the lognormal
distribution of Lux, Arias and Sattarhoff (2011) in which the
various moments or to different degrees of long-term
dependence of various moments’ (p.5).

of 16
random multipliers take the form of 𝑀p,"~ 𝑓(−𝜆, 𝜈3
). The
condition that Ε(𝑀p,") = 1 leads to exp −𝜆 + 0.5𝜈3
= 1,
where the shape parameter 𝜈 = 2𝜆 . Thus, the full
parameter vector of MSM is
𝜃 ≡ (𝜆, 𝜎, 𝑏, 𝛾p) ∈ ℝT
p
where 𝜆 characterizes the distribution of the volatility
components, 𝜎 is the unconditional return volatility, 𝑏 and
𝛾p specifies the transition probabilities.
Similar to MS-GARCH/ARCH models, the volatility
state vector 𝑀" is assumed to follow a first-order Markov
process and its dynamics are characterized by the transition
matrix ℙ with transition probabilities 𝑝*+ = Pr 𝑀"T/ =
𝑚* 𝑀" = 𝑚+). Since the state vector 𝑀" is latent, one can
only observe the returns, 𝑟" = 𝜎[𝑔 𝑚 ]//3
𝜖", but not the
state vector. The state vector 𝑀" must therefore be
computed recursively by Bayesian updating (Calvet and
Fisher, 2004).
Let Π"
+
= Pr (𝑀" = 𝑚+
|𝑟/, 𝑟3, … , 𝑟") denotes the
conditional probabilities that the period 𝑡 volatility state
takes the value 𝑚+
conditional on past returns. The
conditional probabilities4
for the unobserved state values
(𝑚/
, 𝑚3
, … , 𝑚+
) can be computed by
Π" =
𝑓 𝑟" 𝑀" = 𝑚+
)⨀ (Π"./ℙ)
𝑓 𝑟" 𝑀" = 𝑚+ ⨀ (Π"./ℙ)]1′
where 𝑎 𝑏 denotes the Hadamard product
(𝑎/ 𝑏/, … , 𝑎+ 𝑏+) ∈ ℝ+
, ℙ denotes the transition matrix, and
𝑓 𝑟" 𝑀" = 𝑚+
) is given by
𝑓 𝑟" 𝑀" = 𝑚+
) =
1
2𝜋
exp −0.5
𝑟"
𝜎 𝑔 𝑚
3
𝜎 𝑔 𝑚
Similarity, multi-step ahead forecast can be calculated by
𝜎ST/: STY = 𝜎 Pr 𝑀STU = 𝑚* 𝐼S./) 𝑚*
/
3
+
*G/
Y
UG/
where the ex ante probabilities Pr 𝑀STU = 𝑚* 𝐼S./) =
Π"ℙU
. The log likelihood function is given by
ℒ 𝜃 = 𝑙𝑛
S
"G/
𝑓 𝑟" 𝑀" = 𝑚+
) ∗ (Π"./ℙ)
(Hamilton, 1994; Calvet and Fisher, 2004)
4
Similar to MS-GARCH, the initial vector Π„ is chosen to be
the ergodic distribution of the Markov process.
5
For the 20-day forecasting horizon, the number of days
corresponds to the number of actual trading days in any
3. Data
I consider four major U.S. equity indices namely S&P500,
S&P100, Dow and NASDAQ 100, whose characteristics are
summarized in Appendix 2. The sample uses the daily
adjusted closing prices of the indices and covers the period
from July 1997 to July 2017, resulting in 5031 daily
observations. To calculate the 1-month implied variance
swap strikes, I use the implied volatilities derived from at-
the-money options with 1-month to expiration. For delta
hedging, I use the adjusted closing prices of the indices’
front month futures. All data are obtained from Bloomberg.
3.1 Forecasting methodology
Using maximum likelihood, I employ a rolling window
forecasting method where I estimate the parameters of the
models in-sample with data up to date 𝑡 and use the
parameters to forecast volatility 𝑘 period ahead for horizons
𝑘 = 1,10, 𝟐𝟎, 505
. I then re-estimate the parameters of the
models with data up to date 𝑡 + 𝑘. Moreover, to evaluate the
forecasting performance in different period, I separate the
out-of-sample results in three different periods. The first
sample corresponds to the turmoil period from July 2007 to
July 2012. The second sample corresponds to the tranquil
period from July 2012 to July 2017. The third sample
contains the full forecasting results.
3.2 Performance measure
As has been pointed out by previous studies such as
Bollerslev, Engle and Nelson (1994), it is difficult to choose
a particular statistical loss function as the best and unique
standard to evaluate the forecasting performance of
volatility models. Therefore, I employ the following four
popular loss functions:
𝑀𝐴𝐸 =
1
𝑇
|
S
"G"Tp
𝜎" − 𝜎"|
𝑀𝑆𝐸 =
1
𝑇
(
S
"G"Tp
𝜎" − 𝜎" )3

𝑅3
𝑙𝑜𝑔 =
1
𝑇
ln
𝜎"
𝜎"
3
S
"G"Tp

𝑄𝐿𝐼𝐾𝐸 =
1
𝑇
ln 𝜎"
3
+
S
"G"Tp
𝜎"
3
𝜎"
3
where 𝜎" is the volatility forecast and 𝜎" is the 𝑘 period
realized volatility computed as 𝑟"
3"Tp
"G"T/ . Furthermore,
when evaluating forecasting accuracy of different models, it
given month. For example, the number of trading days in
July 2007 is 21 days and thus the forecasting horizon
corresponds to 𝜎ST/: ST3/.

of 16
would be useful to determine the number of times that a
particular model has successfully predicted the change in
the direction of realized volatility. For this analysis, I use a
simple measure denoted as Success Rate (SR) to calculate
the percentage of time that a model has successfully
forecasted the direction of 𝑘 period ahead volatility. The
results are summarized in Table 1 and Appendix 3 plots the
one-month volatility forecasts versus the subsequent
realized volatility.
Table 1: Multi-step ahead forecasting results
I. One-day ahead forecasting results
II. Ten-day ahead forecasting results
III. One-month ahead forecasting results
Table 1: Loss function statistics and Success Rate. The table display results of the relative MAE, MSE, QLIKE, 𝑅3
𝑙𝑜𝑔 and SR
for each forecasting horizon 𝑘 = 1,10,20,50. The models in comparison are MSM with 8 volatility states and Two-regime MS-
GARCH (MSG). The forecasting performance is measured using daily absolute returns 𝑟"
3"Tp
"G"T/ , where 𝑟" = 100 ∗
log 𝑠" − log 𝑠"./ . Numbers in bold indicate that the corresponding models have the lowest forecasting losses or the highest
success rate.
MAE MSE QLIKE SR MAE MSE QLIKE SR MAE MSE QLIKE SR
MSM 0.86 1.3 1.58 2.17 73.33% 0.45 0.32 0.4 2.34 67.09% 0.65 0.81 0.99 2.25 70.23%
MSG 1.16 3.25 1.85 2.37 69.13% 0.43 0.35 2.14 2.06 70.75% 0.79 1.8 1.99 2.22 69.95%
MSM 0.83 1.22 1.51 1.96 71.90% 0.45 0.31 0.38 2.23 66.93% 0.64 0.77 0.94 2.09 69.43%
MSG 1.18 3.49 1.77 2.21 67.46% 0.43 0.37 2.3 1.93 70.59% 0.8 1.93 2.03 2.07 69.04%
MSM 0.78 1.07 1.39 2.02 71.35% 0.43 0.28 0.32 2.17 68.20% 0.61 0.68 0.86 2.09 69.79%
MSG 0.99 2.57 1.98 2.05 70.63% 0.4 0.3 2.26 1.83 72.26% 0.69 1.43 2.12 1.94 71.46%
MSM 0.91 1.41 1.75 2.03 70.56% 0.56 0.46 0.77 2.17 67.41% 0.73 0.94 1.26 2.1 69.00%
MSG 2.14 16.23 2.13 2.94 61.59% 0.71 1.15 1.02 2.35 65.90% 1.43 8.69 1.57 2.64 63.76%
SPXOEXDJINDX
Tranquil period Full periodTurmoil period
!"#$% !"#$% !"#$%
MSM 4.68 38.91 5.88 0.25 58.87% 2.95 13.07 4.7 0.35 54.76% 3.81 25.89 5.29 0.3 56.57%
MSG 6.08 107.06 5.99 0.27 50.81% 2.51 11.29 5.55 0.37 44.44% 4.28 58.79 5.77 0.32 47.41%
MSM 4.54 37.32 5.81 0.26 57.26% 2.87 12.56 4.66 0.36 52.38% 3.7 24.84 5.23 0.31 54.58%
MSG 6.58 130.03 5.91 0.31 49.19% 2.58 11.93 5.7 0.4 45.24% 4.56 70.51 5.8 0.35 47.01%
MSM 4.33 33.97 5.71 0.27 57.26% 2.79 11.38 4.62 0.34 53.97% 3.55 22.59 5.16 0.31 55.38%
MSG 5.38 98.51 6.09 0.29 51.61% 2.31 9.67 6.17 0.42 46.03% 3.83 53.74 6.13 0.36 49.00%
MSM 5.04 42.84 6.08 0.25 50.81% 3.35 17.54 5.06 0.34 54.76% 4.19 30.09 5.56 0.3 52.59%
MSG 20.16 1314.02 6.66 0.9 45.97% 5.68 101.92 5.24 0.55 50.79% 12.86 703.16 5.95 0.72 48.21%
NDX
Turmoil period Tranquil period Full period
SPXOEXDJI
!"#$% !"#$% !"#$%
MSM 8.97 134.95 7.33 0.22 57.63% 6.24 59.36 6.17 0.32 50.85% 7.6 97.16 6.75 0.27 53.78%
MSG 8.67 188.04 7.77 0.24 49.15% 5.47 50.33 7.99 0.49 44.07% 7.07 119.18 7.88 0.36 47.06%
MSM 8.85 132.78 7.25 0.22 55.93% 6.06 55.9 6.14 0.33 52.54% 7.45 94.34 6.69 0.27 53.78%
MSG 9.86 248.21 7.6 0.24 44.07% 5.66 51.13 7.81 0.49 47.46% 7.76 149.67 7.71 0.37 46.22%
MSM 8.79 128.38 7.14 0.23 59.32% 5.97 52.97 6.11 0.34 50.85% 7.38 90.67 6.62 0.28 54.62%
MSG 8.67 211.51 8.12 0.34 54.24% 5.65 50.73 9 0.64 49.15% 7.16 131.12 8.56 0.49 52.10%
MSM 10 150.5 7.51 0.21 52.54% 7.2 78.27 6.54 0.31 54.24% 8.6 114.38 7.02 0.26 52.94%
MSG 42.1 5342.69 8.1 0.86 45.76% 12.81 572.7 6.74 0.56 52.54% 27.46 2957.7 7.42 0.71 48.74%
NDX
SPXOEXDJI
!"#$% !"#$% !"#$%

of 16
IV. Fifty-day ahead forecasting results
Table 1: Loss function statistics and Success Rate. The table display results of the relative MAE, MSE, QLIKE, 𝑅3
𝑙𝑜𝑔 and SR
for each forecasting horizon 𝑘 = 1,10,20,50. The models in comparison are MSM with 8 volatility states and Two-regime MS-
GARCH (MSG). The forecasting performance is measured using daily absolute returns 𝑟"
3"Tp
"G"T/ , where 𝑟" = 100 ∗
log 𝑠" − log 𝑠"./ . Numbers in bold indicate that the corresponding models have the lowest forecasting losses or the highest
success rate.
3.3 Out-of-sample performance
Table 1 reports the out-of-sample performance of one-
day, ten-day, one-month, and fifty-day forecasts of the
models in terms of the statistical loss functions and the
success rate discussed above. Consistent with Calvet and
Fisher (2004), MSM is comparable to MS-GARCH in one-
day forecasting horizon, but dominates MS-GARCH in ten-
day and one-month horizon. However, the results between
the two models are mixed in fifty-day forecasting horizon.
The relative merits of MSM and MS-GARH are more
clearly revealed when the sample is separated into turmoil
and tranquil period.
Looking at the volatile sub-period, MSM dominates MS-
GARCH for the MAE, MSE and SR criterions over one-day,
ten-day and one-month forecasting horizons across the four
equity indices. The larger forecasting losses and lower SR
suggest that MS-GARCH generates too high a forecast in
volatile period – results that contradict with the conclusion
of Klaassen (2002).
For the tranquil sub-period, it is clear that MS-GARCH
performs significantly better in less volatile period than in
volatile period in terms of forecasting losses. However, the
relative performance between MSM and MS-GARCH is
unclear. MS-GARCH dominates MSM for the MAE and
MSE criterions over one-day, ten-day and one-month
forecasting horizon for three of the equity indices. Although
MS-GARCH has lower forecasting losses relative to MSM
over ten-day and one-month horizon, it also has lower
forecasting accuracy in terms of SR. I attribute this
observation to the explanation of Lamoureux and Lastrapes
(1990) that the persistence generated by GARCH models
becomes much weaker following structural changes in the
volatility process.
Turning to fifty-day forecast, MS-GARCH seems to
dominate MSM in both forecasting losses and forecasting
accuracy. However, upon closer evaluation, the fifty-day
forecasts generate by MS-GARCH for S&P500, S&P100,
and Dow are too low compared to subsequent realized
volatilities (Appendix 4). In majority of the cases, MS-
GARCH is unable generate a forecast that is higher than
current fifty-day realized volatility, and the SR criterion
simply denotes the percentage of observations in which
realized volatility has decreased over the sample period. The
results are thus consistent with Calvet and Fisher (2004) that
GARCH-class models are unable to capture the long-
memory property of volatility.
From the results of NASDAQ 100 (Appendix 3 and
Appendix 4), the forecasts generate by MS-GARCH are too
high compared to realized volatilities. As the in-sample data
includes period of the dot-com bubble, the results suggest
that the high persistence of individual shocks in GARCH-
class models continues to exit in MS-GARCH even after
accounting for structural changes in the variance process.
In sum, the results support the conclusion that MSM
performs significantly better in volatile period and for assets
with high volatility. While GARCH-class models capture
well the short-run autoregressive dynamics, the weaker
persistence following a structural break explains the large
decrease in forecasting losses but lower SR relative to
volatile period because volatility forecasts remain too low.
4. Trading strategies
In recent years, volatility has emerged as an asset class in
that financial engineering has provided various ways to gain
exposure to the volatility of an asset. For example, one way
to gain an exposure to volatility of S&P500 is to trade
directly the VIX futures or volatility exchange-traded notes
such as VXX. Another way is through variance swaps
which offer pure exposure to the realized volatility of the
underlying. To evaluate the trading performance of MSM
and MS-GARCH over different horizons, I consider an
intra-day trading strategy on VXX and front month VIX
futures as well as a monthly strategy on the variance swaps
of the four indices considered in this paper.
MSM 28.93 1424.48 9.29 0.25 41.67% 9.94 165.26 7.8 0.13 54.17% 19.43 794.87 8.55 0.19 46.94%
MSG 20.98 1165.95 10.84 0.4 58.33% 15.16 304.17 13.71 0.9 58.33% 18.07 735.06 12.27 0.65 59.18%
MSM 28.11 1341.87 9.2 0.25 37.50% 9.48 158.73 7.76 0.12 58.33% 18.8 750.3 8.48 0.19 46.94%
MSG 19.21 1008.64 10.14 0.32 66.67% 13.33 247.02 10.25 0.62 58.33% 16.27 627.83 10.2 0.47 63.27%
MSM 26.81 1217.35 9.08 0.27 37.50% 9.48 163.41 7.73 0.13 62.50% 18.14 690.38 8.4 0.2 48.98%
MSG 24.71 1272.75 14.05 0.79 62.50% 16.26 314.23 13.03 1.03 54.17% 20.49 793.49 13.54 0.91 59.18%
MSM 31.13 1476.06 9.43 0.25 45.83% 11.51 197.36 8.15 0.12 58.33% 21.32 836.71 8.79 0.18 51.02%
MSG 103.29 23589.78 10.01 1.01 41.67% 28.85 1445.48 8.5 0.42 41.67% 66.07 12517.63 9.26 0.72 40.82%
NDX
SPXOEXDJI
!"#$% !"#$% !"#$%

of 16
4.1 Intra-day volatility trading
Since volatility futures and exchange-traded notes offer
exposure to an asset’s implied volatility, it may be logical
to trade the securities in the same direction of volatility
forecasts. In the case of S&P500, however, the negatively
skewed6
return distribution means that a predication of an
increase in realized volatility has a higher probability of a
positive return tomorrow than a negative return while a
forecast of a decrease in realized volatility may simply
imply a mean reversion after consecutive positive returns.
Given this view and the inverse relationship between return
and implied volatility, I consider a strategy that short sells
VXX and VIX futures when the models forecast an increase
in realized volatility and vice versa. Table 2 summarize the
results of the daily strategy. Table 2a shows the results7
of
trading the S&P500 directly, which confirm the above
argument.
Table 2: Intra-day trading results
I. Intra-day volatility trading strategy
Table 2: Out-of-sample volatility trading results.
Note: Since VXX only began trading in January 2009, the
trading results only cover period from January 2009 to July
2017 for both VIX Futures and VXX in order to facilitate
comparison.
6
The skewness of the return distribution of S&P500 for the
sample from July 2007 to July 2017 is -0.3. Separating the
sample into turmoil and tranquil period, the skewness is -
0.08 and -0.32 respectively.
7
To examine the argument, I trade directly the S&P500 and
separate the trading gains in 2 cases. In the first case, I
reverse the volatility trading signals but filter out the
long/short S&P500 signals that have a negative/positive
skewness over the past five trading days. In the second case,
I trade only signals that were filtered out. In both cases, I
find positive returns over the sample period.
Table 2a: Intra-day S&P500 trading results
II. Intra-day trading on S&P500
Table 2a: Out-of-sample S&P500 trading results.
4.2 Medium term volatility trading
Although it is widely held that vanilla options can be used
to trade volatility, options are exposed factors such as
movements of the underlying, volatility and time-to-
maturity. From the standard Black-Scholes model, exposure
to underlying movements can be hedged away in the form
of continuous delta-hedging and exposure to volatility is
paid for in the form of option Theta and the Gamma PnL
resulted from delta-hedging (Allen, Einchcomb, and
Granger, 2006). However, in practice, stocks are not traded
continuously nor volatility remains constant over time. The
convexity of option payoff means that a replicating trade is
not only sensitive to stock movements but also sensitive to
option gamma8
. In light of this disadvantage, I consider a
trading strategy using variance swaps, which provide pure
exposure to realized volatility
4.3 Variance swap
The path dependence problem of trading volatility via
delta hedged options can be attributed to the fact that dollar
gamma9
increases linearly with strikes and stock prices
(Figure 1), thus causing non-constant exposure to realized
volatility. One solution to obtain constant exposure to dollar
gamma is to construct a portfolio with out-of-the-money
calls and puts each weighted by the inverse of strike squared
(Figure 1a). The method was originally pioneered by Carr
and Madan (1998) but the impractical assumption of trading
a continuous range of strikes was later modified by Derman
et al. (1999). According to Derman et al. (1999), exposure
8
Such path dependence means that the amount of Gamma
Pnl from delta hedging not only depends on the difference
between implied and realized volatility but also depends on
the strike of the option and where and when the volatility is
realized (Allen, Einchcomb, and Granger, 2006).
9
The dollar gamma represents the change in dollar delta, $∆,
for a 1% change in stock price and is given by multiplying
option gamma by the square of stock price divide by 100,
$Γ = Γ𝑆3
/100.
Max. gain Max. loss
Average
daily
return
Annualized
return
Sharpe
ratio
MSM 30.70% -27.25% 0.09% 13.89% 0.27
MSG 30.70% -27.25% 0.28% 25.94% 0.81
VXX MSM 21.24% -18.81% 0.27% 25.34% 1.11
MSG 21.24% -18.81% 0.30% 26.94% 1.26
Sample period from January 2009 to July 2017
VIX
Futures
Max. gain Max. loss
Average
daily
return
Annualized
return
Sharpe
ratio
MSM 6.82% -4.79% 0.04% 6.96% 0.79
MSG 6.82% -4.79% 0.05% 9.37% 1.18
Case 2 MSM 6.92% -6.15% 0.01% 2.70% 0.25
MSG 6.92% -6.15% -5.30E-05 -1.41% -0.11
Combine MSM 6.92% -6.15% 0.05% 8.67% 0.72
MSG 6.92% -6.15% 0.05% 8.67% 0.72
Full sample
Case 1

of 16
to realized volatility can be obtained by trading and re-
hedging a position in a log contract, whose payoffs can be
approximated through a portfolio of out-of-the-money calls
and puts weighted by the inverse of strike squared as well
as through intra-day delta hedging via forward/future
contracts. At maturity, a long options/short forward
portfolio represents a long position in realized volatility
with a payoff of a long variance swap given by 𝑁’“”(𝜎”
3
−
𝜎p
3
)10
.
Figure 1: Dollar gamma
across strikes
Figure 1a: Option
portfolio weighted by 1/𝒌 𝟐
4.4 Replication methodology
To replicate the payoff of variance swap, I use the method
of Allen, Einchcomb, and Granger (2006) which is an
analytical derivation of the variance swap pricing model of
Derman et al. (1999). For option data, I use the implied
volatility of at-the-money options with 1-month to expiry
and approximate the volatility skew11
using the implied
volatilities of out-of-the-money calls and puts with the same
expiration date from January 2007 to June 2007. Due to
limited access to data, I only consider 20 strikes for out-of-
the-money calls and another 20 strikes for out-of-the-money
puts with the spot price as the splitting point between calls
and puts, i.e. 𝑘3„,– < ⋯ < 𝑘/,– < 𝑆" < 𝑘/,™ < ⋯ < 𝑘3„,™.
The approximated skews for the four equity indices are
illustrated in Figure 2.
Figure 2: Volatility skew
10
𝜎”
3
is the realized variance, 𝜎p
3
is the variance swap strike,
and 𝑁’“” is the variance notional.
Figure 2: Average approximated implied volatilities of out-
of-the-money options for S&P500, S&P100, Dow and
NASDAQ 100 over the period from June 2007 to June 2017.
The x-axis represents the strike points above (call) and
below (put) spot price.
Furthermore, I consider the following equation from
Derman et al. (1999) to calculate the monthly variance swap
strikes for each index
𝐾’“” =
2
𝑇
(𝑟𝑇 −
𝑠„
𝑠∗
𝑒”S
− 1 − log
𝑠∗
𝑠„
+ 𝑒”S
1
𝑘3
𝑃 𝑘 𝑑𝑘 +
∗
„
1
𝑘3
𝐶 𝑘 𝑑𝑘
ž
∗
‘Fair value of future variance’ (Derman et al., 1999:p.23)
where I set 𝑠∗ = 𝑠„ = 𝑠" , the spot price at which the
replicating portfolio is formed, and derive 𝑃 𝑘 and 𝐶 𝑘
from the standard Black-Scholes model with interest rate, 𝑟,
equals the 1-year U.S treasury yields. However, from a
practical perspective, it is not optimal to hedge with large
number of out-of-the-money options due to transaction
costs and liquidity issues. Therefore, I use only limited
strike range and strike interval for each index. Table 3
summarizes the strike data and Figure 3 the variance swap
strikes in volatility term.
Table 3: Strike interval data
Table 3: Strike interval data for S&P500, S&P100, Dow and
NASDAQ 100.
Note: Since there are no options traded on the full level of
Dow (DJIA), I use the CBOE 1/100 Dow (DJX) options.
The contract multiplier for the four indices is $100 per index
point.
11
I approximate the volatility skew by calculating a set of
skew betas based on constrained regression, min
¡
/
3
𝐶 ∗ 𝑥 −
𝑑 , using the out-of-the-money implied volatilities.
S&P500 S&P100 Dow* NASDAQ 100
Strike interval (index
point)
5 5 1 5
Out-of-the-money
calls
20 10 5 20
Out-of-the-money
puts
20 10 5 20

of 16
Figure 3: Variance swap strike
Figure 3: One-month variance swap strikes estimated using
the pricing model of Derman et al., (1999). Swap strikes are
expressed in volatility term.
4.5 Variance swap trading strategy
For the monthly volatility strategy, I consider a buy-and-
hold approach where I forecast next month aggregate
volatility at the end of the last trading day of each month
and enter a long/short swap position corresponding to a
forecast of an increase/decrease 12
in realized volatility.
Aside from the standard approach, I also consider two
additional scenarios where the long/short signals are filtered
by a set of trading rules. The three cases considered are
summarized in the following table.
Table 4: Trading signals
Table 4: Trading signals
The reason for considering Scenario 1 is that a forecast of
increase/decrease in realized volatility may have already
been priced in by the market via higher/lower swap strike
and conversely an unusually high/low strike at month 𝑡
relative to month 𝑡’s aggregate realized volatility may mean
that the swap strike is overpriced/under-priced due to
market irrationality. Moreover, since implied volatility
represents market expectation of future realized volatility,
Scenario 2 simply examines whether economic profits can
be obtained from trading market expectations alone.
The trading results for the three cases are summarized in
Table 5, and Appendix 5 the equity curves. Instead of
calculating trading profits via the simple identity
𝑁’“”(𝜎”
3
− 𝜎p
3
), the profit13
of each position is calculated by
summing the net payoff of the option portfolio as well as the
PnL resulted from daily delta hedging using front-month
futures of the respective index. Lastly, results are shown in
dollar amount with a variance notational of one-dollar,
𝑁’“” = $1, for each trade.
Table 5: Variance swap trading results
I. Standard case
Table 5: Variance swap trading results. Positions are entered at the last trading day of each month and held until end of next
month, and variance notional for all positions
12
Since variance swap provides direct exposure to future
realized volatility, the argument presents in above for the
intra-day trading strategy on implied volatility does not
hold in this case.
13
Since the replication of log contract is an imperfect hedge,
there are differences between the theoretical payoff given
by the above identity and the PnL obtained through
replication. Thus, the trading payoff is calculated in this way
to facilitate practical evaluation. Nevertheless, the average
adjusted 𝑅3
and mean square error between the theoretical
and actual payoff for the four indices across the sample is
around 90% and $150 for 𝑁’“” = $1.
Long Short
Standard !"#$ > !" !"#$ < !"
Scenario 1
!"#$ > !" and
()*+,-
.-
< 0.9
!"#$ < !" and
()*+,-
.-
< 0.7
!"#$ < !" and
()*+,-
.-
> 0.7
!"#$ > !" and
()*+,-
.-
> 0.9
Scenario 2 3456," > 3456,"7$ 3456," < 3456,"7$
Max gain Max loss
Average
monthly
gain
Total gain Max gain Max loss
Average
monthly
gain
Average
monthly
gain
Total gain
SPX MSM $4,427.00 -$2,251.31 $46.84 $2,810.23 $548.50 -$302.14 -$30.00 -$1,800.03 $4,427.00 -$2,251.31 $8.42 $1,010.21
MS-GARCH $4,429.25 -$2,504.30 $122.77 $7,366.50 $301.37 -$548.68 $13.71 $822.51 $4,429.25 -$2,504.30 $68.24 $8,189.01
OEX MSM $4,906.37 -$1,479.12 $47.31 $2,838.54 $657.00 -$328.05 -$34.43 -$2,065.90 $4,906.37 -$1,479.12 $6.44 $772.64
MS-GARCH $4,908.02 -$2,602.61 $91.46 $5,487.82 $245.31 -$657.10 $22.26 $1,335.58 $4,908.02 -$2,602.61 $56.86 $6,823.39
DJI MSM $4,793.37 -$1,345.61 $41.29 $2,477.51 $652.80 -$630.90 -$47.06 -$2,823.38 $4,793.37 -$1,345.61 -$2.88 -$345.87
MS-GARCH $4,793.37 -$2,217.82 $82.82 $4,969.18 $630.90 -$652.80 $36.29 $2,177.58 $4,793.37 -$2,217.82 $59.56 $7,146.77
NDX MSM $4,141.86 -$2,755.69 $171.17 $10,270.42 $584.57 -$788.89 $16.62 $997.17 $4,141.86 -$2,755.69 $93.90 $11,267.58
MS-GARCH $4,141.86 -$656.62 $260.68 $15,640.85 $584.57 -$788.89 $24.48 $1,469.00 $4,141.86 -$788.89 $142.58 $17,109.85

of 16
II. Scenario 1 & Scenario 2
Table 5: Variance swap trading results. Positions are entered at the last trading day of each month and held until end of next
month, and variance notional for all positions
4.6 Trading results evaluation
From a trading perspective, MS-GARCH dominates
MSM in both intra-day (Table 5) and monthly trading over
the period considered. In intra-day trading, it is not
surprising to see that MS-GARCH slightly outperforms
MSM. The short forecasting timeframe benefits GARCH-
class models as they capture well the short-run
autoregressive dynamics of volatility while the hyperbolic
decay of the autocorrelation function of absolute returns
combines with the multi-scaling behaviour of time series
data implies that MSM is a better candidate for long-run
forecasts.
Under the Standard case of the swap trading strategy,
MSM fails to generate significant economic profits and only
outperforms MS-GARCH in Scenario 2 for two of the
equity indices. However, the monthly trading results alone
should not be taken for grounds to disprove the superior
statistical results of MSM because a large part of the
contradicting results can be attributed to inefficient market
pricing of implied volatilities. The main findings are
summarized in the following table:
As shown in the above table, implied volatilities of the
four equity indices have a strong tendency to be overpriced
by the market. Taking as an example the case of S&P500,
over half of the sample period has implied volatilities and
thus implied swap strikes trading higher than subsequent
realized volatilities. More importantly, there are also 26
months in which a successful prediction of the direction of
realized volatility by MSM have realized trading losses.
Turning to trading positions, the net short position of MS-
GARCH is consistent with the conclusion in Section 3.3.
The opposite of the argument is also reinforced by the
findings in NASDAQ 100.
To test whether implied volatilities are on average
mispriced, I use the change in direction in swap strikes
between month 𝑡 − 1 and month 𝑡 as a predictor of next
month’s realized volatility (Scenario 3). If implied
volatilities are fairly priced, trading on market expectations
alone should not yield any significant economic profits.
Nevertheless, as shown in Table 5, trading volatility in this
way outperforms both model in volatile period but
underperforms in low volatility environment. The results
from Scenario 3 suggest that implied volatilities are under-
priced in volatile environment but are also over-priced in
low volatility environment. Thus, I argue that the
discrepancies in the trading gains between MSM and MS-
GARCH are due to fact that MS-GARCH continues to
generate too high a forecast in volatile period and too low a
forecast in low volatility period, leading to a trend-
following type strategy that longs volatility in volatile
period and shorts volatility in low volatility period. The
discrepancies are also reinforced by inefficient market
pricing of implied volatilities.
Max gain Max loss
Average
monthly
gain
Average
monthly
gain
Average
monthly
gain
Total gain
SPX MSM $4,427.00 -$1,871.47 $186.20 $11,171.90 $548.50 -$302.14 $21.79 $1,307.56 $4,427.00 -$1,871.47 $104.00 $12,479.46
MS-GARCH $4,427.00 -$2,503.61 $117.66 $7,059.90 $302.14 -$548.50 $26.10 $1,565.85 $4,427.00 -$2,503.61 $71.88 $8,625.75
!"#$,&'( $4,427.00 -$2,503.61 $154.38 $9,262.87 $302.14 -$548.50 -$9.65 -$578.88 $4,427.00 -$2,503.61 $72.37 $8,684.00
OEX MSM $4,906.37 -$1,574.45 $190.99 $11,459.10 $657.00 -$197.50 $31.66 $1,899.38 $4,906.37 -$1,574.45 $111.32 $13,358.49
MS-GARCH $4,906.37 -$2,602.20 $106.36 $6,381.50 $328.05 -$657.00 $29.94 $1,796.26 $4,906.37 -$2,602.20 $68.15 $8,177.76
!"#$,&'( $4,906.37 -$2,602.20 $134.68 $8,080.82 $188.18 -$657.00 -$35.32 -$2,119.02 $4,906.37 -$2,602.20 $49.68 $5,961.79
DJI MSM $4,793.37 -$2,217.82 $61.77 $3,706.50 $630.90 -$652.80 $28.03 $1,681.71 $4,793.37 -$2,217.82 $44.90 $5,388.21
MS-GARCH $4,793.37 -$2,217.82 $57.23 $3,433.66 $630.90 -$652.80 $42.41 $2,544.60 $4,793.37 -$2,217.82 $49.82 $5,978.26
!"#$,&'( $4,793.37 -$1,345.61 $120.23 $7,214.01 $326.35 -$652.80 -$25.94 -$1,556.31 $4,793.37 -$1,345.61 $47.15 $5,657.70
NDX MSM $4,141.86 -$1,980.53 $188.46 $11,307.67 $584.57 -$788.89 -$21.51 -$1,290.40 $4,141.86 -$1,980.53 $83.48 $10,017.27
MS-GARCH $4,141.86 -$1,980.53 $188.46 $11,307.67 $584.57 -$788.89 -$21.51 -$1,290.40 $4,141.86 -$1,980.53 $83.48 $10,017.27
!"#$,&'( $4,141.86 -$2,755.69 $104.69 $6,281.19 $788.89 -$584.57 -$0.41 -$24.47 $4,141.86 -$2,755.69 $52.14 $6,256.72
Net position
!"#$,& > (&)*
+
(&)*
+
> (&
+ !"#$,& > (&)*
+

(&)*
+
> (&
+
Total
LossMSM MS-GARCH
SPX 100L -58S 78 56 26 -$2,236.55
OEX 100L -60S 80 56 24 -$1,812.15
DJI 102L -78S 86 58 31 -$1,281.78
NDX 106L 120L 45 58 7 -$345.51

of 16
5. Conclusions
Multiscaling and long-run dependence are well-known
stylized facts of financial return volatility. However,
traditional GARCH-class models, which assume
exponential rather than hyperbolic decay of autocorrelation
functions of absolute returns, fail to capture many of the
characteristics of volatility especially in multi-step ahead
forecasts. Such issues have spurred the development of
Markov-Switching GARCH in an effort to capture the
multi-frequency and long-memory characteristics of
volatility as well as structural breaks in the volatility process.
More recently, the Markov-Switching Multifractal model of
Calvet and Fisher (2004) has been developed to account for
the multiscaling behaviour of financial time series. The
multiplicative structure of MSM and the decomposition of
volatility into large number of states give rise to many of the
apparent characteristics of volatility, and in many cases,
MSM has been found to improve upon GARCH-class
models in terms of mean square errors.
Although numerous studies have found superior
forecasting performance of MSM, few have investigated its
performance in practical application. In this paper, I
investigated performance of Markov-Switching
Multifractal and Markov-Switching GARCH in 1-day, 10-
day, 20-day and 50-day ahead forecasts as well as their
performance in volatility trading – the ability to forecast
change in direction of future realized volatility. My results
suggest that MSM outperform MS-GARCH in multi-step
ahead forecasts in terms of several statistical loss functions
and success rate. However, MS-GARCH continues to
generate too high a forecast in volatile period and for asset
with high volatility and too low a forecast in low volatility
period and in multi-step ahead forecasts, leading to spurious
outperformance over MSM in some cases. Further, from a
trading perspective, MSM is unable to generate significant
economic returns in variance swap trading and is only
comparable to MS-GARCH in intra-day trading.
However, the outperformance of MS-GARCH over MSM
in the monthly trading strategy is mainly due inefficient
market pricing of implied volatilities. In sum, my results
lend support to earlier studies that MSM has greater
forecasting abilities than GARCH-class models especially
in multi-step ahead forecasts.
Reference
Allen, P., Einchcomb, S., and Granger, N. (2006) “Variance
Swap.” European Equity Derivatives Research, J.P. Morgan.
Bollerslev, T. (1986) “Generalized autoregressive
conditional heteroskedasticity.” Journal of Econometrics 31,
3077327.
Bollerslev, T., Engle, R. F., and Nelson, D. (1994) “ARCH
models.” In R. F. Engle & D. L. McFadden (Eds.),
Handbook of econometrics, vol. IV (pp. 2961–3038).
Amsterdam: Elsevier Science B.V.
Calvet, L. E., and Fisher, A. J. (2004) “How to forecast
long-run volatility: regime switching and the estimation of
multifractal process.” Journal of Financial Econometrics, 2,
49–83.
Carr, P., and Madan, D. (2001) “Towards a Theory of
Volatility Trading.” In R. Jarrow (Ed), Volatility: New
Estimation Techniques for Pricing Derivatives (London:
Risk Publications), 417–427.
Derman, E., Demeterfi ,K., Kamal, M. and Zou, J. (1999)
“More than you ever wanted to know about variance swaps.”
Quantitative Strategies Research Notes, Goldman Sachs.
Ding, Z., Granger, C.W.J., and Engle, R.F. (1993) “A long
memory property of stock market returns and a new model.”
Journal of Empirical Finance 1, 83–106. 
Engle, R. (1982) “Autoregressive Conditional
Heteroscedasticity with Estimates of the Variance of United
Kingdom Inflation.” Econometrica 50, 987-1007. 
Gray, S.F. (1996a) “Modeling the conditional distribution
of interest rates as a regime-switching process.” Journal of
Financial Economics 42:27–62
Gray, S.F. (1996b) “An analysis of conditional regime-
switching models.” Working paper Fuqua School of
Business Duke University. 
Hamilton, J. (1989) “A New Approach to the Economic
Analysis of Nonstationary Time Series and the Business
Cycle.” Econometrica 57, 357-384.
Hamilton, J. (1990) “Analysis of Time Series Subject to
Change in Regime.” Journal of Econometrics 45, 39-70.
Hamilton, J. (1994) Time Series Analysis. Princeton, NJ:
Princeton University Press.
Klaassen, F. (2002) “Improving GARCH Volatility
Forecasts with Regime Switching GARCH.” Empirical
Economics 27, 363-394. 
Lamoureux, C., and Lastrapes, W. (1990) “Persistence in
variance, structural change and the GARCH model.”
Journal of Business and Economic Statistics, 8, 225–234. 
Lux, T., and Kaizoji, T. (2007) “Forecasting volatility and
volume in the Tokyo stock market: Long memory, fractality
and regime switching.” Journal of Economic Dynamics and
Control, 31, 1808–1843.

of 16
Lux, T., Arias, M., and Sattarhoff, C. (2011) “A Markov-
switching Multifractal Approach to Forecasting Realized
Volatility.” Working Paper, Kiel Institute for the World
Economy.
Marcucci, J. (2005) “Forecasting Stock Market Volatility
with Regime-Switching GARCH models.” Working Paper,
University of California, San Diego.
Sopipan, N., Sattayatham, P., and Premanode., B. (2011)
“Forecasting Volatility of Gold Price Using Markov
Regime Switching and Trading Strategy.” Journal of
Mathematical Finance, 2, 121-131.
Appendix
Appendix 1: GARCH model
A1. GARCH
Under standard GARCH(p,q) model, returns are
characterized by
𝑟" = ℎ" 𝜀"
where 𝜀" is i.i.d. with zero mean and unit variance and
where ℎ" denotes the conditional variance of 𝑟" at date 𝑡 −
1 . The conditional variance follows an autoregressive
process and evolves according to
ℎ" = 𝜔 + 𝛽/ℎ"./ + ⋯ + 𝛽*ℎ".* + 𝛼/ 𝑟3
"./ + ⋯ + 𝛼+ 𝑟3
".+
ℎ" = 𝜔 + 𝛽*ℎ".*
–
*G/
+ 𝛼+ 𝑟3
".+
£
+G/
(Hamilton, 1994)
for 𝜔 > 0, 𝛽* ≥ 0, and 𝛼+ ≥ 0 to ensure non-negativity of
ℎ". Therefore, GARCH is a smooth deterministic function
of past squared returns. Moreover, calculation of the
conditional variance {ℎ"}"G/
S
would require a pre-sample to
estimate values for ℎ.–T/, … , ℎ„ and 𝑟3
.–T/, … , 𝑟3
„ . For
GARCH (1,1) model, the variables can be initialized
according to a set of starting values where
ℎ„ = (1 − 𝛽/ − 𝛼/) ./
Iterating forward to obtain {ℎ"}"G/
S
, the sequence can be
used to evaluate the log likelihood function, which is then
maximized numerically to obtain the optimal parameters for
the GARCH process.
According to Hamilton (1994), the likelihood function is
given by
ℒ 𝜃 = −
𝑇
2
log 2𝜋 −
1
2
log ℎ"
S
"G/
−
1
2
ℎ"
./
𝑟" − 𝜇 − 𝜙𝑟"./
S
"G/
3
‘Likelihood function of GARCH model’ (Hamilton,
1994:p.352)
See also Baillie and Bollerslev (1992) for detailed
discussions of forecasts and deviations for GARCH process,
and Bollerslev (1986) for maximum likelihood estimates of
GARCH parameters.
Appendix 2: Descriptive statistics of S&P500, S&P100,
Dow and NASDAQ 100 daily returns
Note: The statistics cover the full sample period from July
1997 to July 2017.
Appendix 3: One-month volatility forecasts and
subsequent realized volatility
S&P 500
SPX OEX DJI NDX
Mean 0.02% 0.02% 0.02% 0.03%
Daily volatility 1.23% 1.23% 1.16% 1.85%
Maximum 10.96% 10.66% 10.51% 17.20%
Minimum -9.47% -9.19% -8.20% -11.11%
Skewness -0.23 -0.18 -0.14 0.10
Kurtosis 10.86 10.35 10.77 8.82

of 16
S&P 100
Dow Jones Industrial Average
NASDAQ 100
Appendix 4: MS-GARCH Fifty-day volatility forecasts
and subsequent realized volatility
S&P 500

of 16
S&P 100
NASDAQ 100
Appendix 5: Variance swap trading equity curves
S&P 500
S&P 100

of 16
NASDAQ 100
Appendix 5: Equity curves of monthly variance swap
trading strategy of S&P500, S&P100, Dow and NASDAQ
100.
Note: ‘o’ denotes the Standard case. ‘+’ denotes Scenario 1
and ‘∗’ denotes Scenario 2.

Markov-Switching Models for Volatility Forecasting and Trading

Recommended

Recommended

More Related Content

Similar to Markov-Switching Models for Volatility Forecasting and Trading

Similar to Markov-Switching Models for Volatility Forecasting and Trading (20)

Recently uploaded

Recently uploaded (20)

Markov-Switching Models for Volatility Forecasting and Trading