2. Abstract
We forecast one-step ahead volatility for the MSCI Emerging Markets Index
using a Stochastic Volatility model that is solved with Kalman Filtering tech-
nique. The stochastic model is evaluated against popular generalized autoregressive
GARCH model. The Stochastic Differential Equations for the model are derived
and linearized into State Space form that can be solved with a Kalman Filter. The
source code in R is provided.
1 Introduction
The ability to estimate volatility effectively is key to understanding financial markets and
predicting returns. Unlike asset returns, volatility is not directly observable in the market.
In other words, volatility is a latent variable. Most of the academic literature on this sub-
ject is dedicated to estimating volatility using various econometric style models (ARCH,
GARCH, IGARCH, EGARCH) [1]. The focus of this article is estimating volatility using
a stochastic model implemented using State Space Filtering technique.
MSCI Emerging Markets Index 1
is a free float-adjusted market capitalization index
designed to capture large and mid cap representations across 23 Emerging Market coun-
tries2
. Several funds use emerging markets portfolio for their long term growth potential
by capitalizing on increasing consumption of the middle classes in these markets. The
nature of the Emerging Markets mean that the index is highly volatile. Since 2002 the
index has lost value in six calendar years and gained in eight with the worst annual return
in financial crisis of 2008 when it plummeted by 53.33% but made annual gain of 78.51%
the following year.
The highly fluctuating nature of the Emerging Markets index makes it an interesting
observation for volatility estimation. In this article, the focus is on the daily MSCI
Emerging Markets (EM) USD returns for estimation of the one-day ahead volatility using a
Stochastic Volatility (SV) model. The fitting and forecasting of the SV model is compared
to an auto-regressive GARCH model that is trained on the same dataset.
The implementation has been done using R version 3.2.3 and the source code is also
available to download from GitHub 3
.
1
https://www.msci.com/resources/
2
EM countries include: Brazil, Chile, China, Colombia, Czech Republic, Egypt, Greece, Hungary,
India, Indonesia, Korea, Malaysia, Mexico, Peru, Philippines, Poland, Russia, Qatar, South Africa,
Taiwan, Thailand, Turkey and United Arab Emirates.
3
https://github.com/1000084/AdvancedFinancialDataAnalysis
1
3. 2 Stochastic Volatility Model
Some of the empirically observed stylized facts for volatility are clustering and persistence,
leptokurtosis and mean reversion [2]. We follow the seminal paper by Harvey, Ruiz and
Shepard [3] who describe a stochastic volatility model that attempts to capture these
stylized facts of the volatility. In this section we derive the master equation for stochastic
volatility model starting from lognormal asset return process and an Ornstein-Uhlenbeck
process for logarithm of volatility.
Let St represent the value of MSCI Index at time t, µ be the mean and σt the
volatility at time t. The ln σ2
t follows a mean-reverting process with mean reversion speed
of κ, mean reversion level of ν and volatility of γ.
We get the following Stochastic Differential Equations where the Brownian Motions,
Wt and Bt, are correlated with factor ρ.
dSt
St
= µdt + σtdWt
d ln σ2
t = κ(ν − ln σ2
t )dt + γdBt
d Wt, Bt = ρdt
(1)
The first step is to solve the Geometric Brownian Motion and represent it in terms of
log returns. We perform discretization by selecting ∆ = 1
252
where 252 is approximately
the number of business days in a year.
ln St+∆ − ln St = µ −
σ2
t
2
∆ + σt Wt (2)
We substitute xt for log returns, set ht = ln σ2
t and µ − eht
2
∆ = αt. We also know
that the Brownian Motion, Wt, can be written as Wt =
√
∆Ut where Ut is a standard
normal random variable.
xt = µ −
σ2
t
2
∆ + σt
√
∆Ut
= µ −
eht
2
∆ + e
ht
2
√
∆Ut
= αt + e
ht
2
√
∆Ut
(3)
At this point we approximate αt as an estimator of the mean of the log returns and
represent it as ˆα = ∆ 252
i=1 xi. We square both sides of the equation and take natural
logarithms to get,
2
4. ln(xt − ˆα)2
= ht + ln(U2
t ) (4)
ln(U2
t ) is a logarithmic χ2
distribution with known expectation of -1.27 and variance
of approximately, π2
2
, which is roughly 4.93. However, these theoretical moments can only
be replicated with very large number of samples. To take this into account we add and
subtract E [ln(U2
t )] in Equation (4).
ln(xt − ˆα)2
= E ln(U2
t ) + ht + ln(U2
t ) − E[ln(U2
t )]
= η0 − 1.27 + ht + ξt
(5)
where ξt = ln(U2
t ) + 1.27 and has variance of π2
2
≈ 4.93.
We now focus on the mean reverting part of Equation (1). A mean-reverting stochas-
tic differential equation can be solved by choosing a suitable integrating factor. In this
case, we select an integrating factor of e−κt
. This gives us,
ht = hse−κ(t−s)
+ ν 1 − e−κ(t−s)
+ γ
t
s
e−κ(t−u)
dBu (6)
We follow similar discretization scheme as before and substitute for constant factors
to get,
ht+∆ = φ + βht + ζt (7)
where ζt is Gaussian white noise with mean 0 and whose variance is estimated and
φ = ν 1 − e−κ∆
and β = e−κ∆
.
In summary, Equations (5) and (7) give us Master Equations for the Stochastic
Volatility model.
ln y2
t = η0 − 1.27 + ht + ξt
ht+∆ = φ + βht + ζt
(8)
As we will see in later sections that this Stochastic Volatility model is able to capture
Excess Kurtosis and Clustering properties of the volatility process. In addition, the
Ornstein-Uhlenbeck nature of the volatility ensures mean reversion.
The estimation of Equation (8) is tricky because unlike GARCH model the volatility
cannot be observed one-step ahead. Harvey and Shepard in [4] proposed estimation
of the Stochastic Volatility model using Quasi-Maximum Likelihood (QML) procedure
by transforming Equation (8) into a linear state space form. This allows estimation of
the parameters φ, β, η0 and the variance of ξt and ζt by treating them as normal and
3
5. maximizing the prediction-error decomposition form of the likelihood obtained via the
Kalman Filter.
3 Kalman Filter
A State Space model has it’s origin in Control Theory and it’s dynamics are given by
Equation (9). The notations in this section follow that of Koopman et al. [5] and should
not be confused with the model described in previous section.
yt = dt + Ztαt + t
αt = ct + Ttαt−1 + Rtηt
(9)
where ηt ∼ N(0, Qt) and t ∼ N(0, Ht).
The first equation is the measurement equation that links the observation, yt, with
the latent variable, αt, along with a noise, t, and a deterministic input, dt. The second
equation is the state transition equation that updates the latent variable using information
from the previous step and some random noise, ηt. The matrices Zt, Tt, Rt, Qt, Ht can
evolve over time as long as they are known at time t − 1.
The Kalman Filter method is an iterative computational algorithm that is used to
forecast the latent variable, αt, and it’s variance at each step using a combination of
measurement and prediction update functions.
1. Time Update Equations: Perform one step ahead forecast of the state variable
and compute it’s variance conditional on the observations up to the last time step.
Therefore, we define,
at−1 = E[αt−1|y0, ..., yt−1]
Pt−1 = E[(αt−1 − at−1)(αt−1 − at−1)T
]
(10)
where, at, is the estimate of the state vector at time t conditional on past observa-
tions and Pt is the conditional covariance matrix at time t. These are given by the
time update equations,
at|t−1 = Ttat−1 + ct
Pt|t−1 = TtPt−1TT
t + RtQtRT
t
(11)
The Kalman Filter requires initial estimates of at|t−1 and Pt|t−1 to start the iteration.
2. Measurement Update Equations: In this step, we update the conditional es-
timates of the state vector using the new observations. Let Ft = ZtPt|t−1ZT
t + Ht,
then we can update at and Pt as,
4
6. at = at|t−1 + Pt|t−1ZT
t F−1
t (yt − Ztat|t−1 − dt)
Pt = Pt|t−1 − Pt|t−1ZtF−1
t ZT
t Pt|t−1
(12)
Fernando Tussell provides detailed comparison of various R implementations of Kalman
Filtering in [7]. For this paper, we have chosen R Package FKF version 0.1.3 4
which ap-
peared on CRAN in March 2012.
4 Time Series Analysis and Pre-Processing
We choose MSCI Emerging Markets Index ETF data during the last 5 years (11 April
2011 until 02 March 2016) for our study. We download the daily closing prices from Yahoo
Finance (ticker: EEM) for this index and convert them into log returns using Equation
(13).
rt = ln St − ln St−1 (13)
where rt is the daily log returns computed from daily closing prices St.
Figure 1: MSCI Emerging Markets ETF Daily Prices
Figures 1 and 2 plot the MSCI Emerging Markets Index daily price level and log
returns during the sample period of last 5 years. In total, we have 1257 data points for
4
https://cran.r-project.org/web/packages/FKF/FKF.pdf
5
7. Figure 2: MSCI Emerging Markets Daily Log Returns
the index. We divide this into in-sample data that we use to train our model and the
out-sample data that is used for testing the model for the purposes of forecasting. The
in-sample data consists of 1131 points and span the period from 11 April 2011 until 07
October 2015. The descriptive statistics for the MSCI EM return series is shown in Table
1. It shows the mean, median, kurtosis, skewness and the standard deviation of the series.
Table 1: Descriptive Statistics during 11 April 2011 - 08 April 2016
MSCI Emerging Markets Index ETF Returns
Statistics Value
Mean −0.000320
Median 0.0
Standard Deviation 0.014278
Excess Kurtosis 3.035950
Skewness −0.283108
Maximum 0.060530
Minimum −0.087054
We notice that the mean of the series is very close to 0 and the volatility as measured
by standard deviation is 0.01428. The returns are negatively skewed, which is explained by
the index performing badly in the recent years, and the excess kurtosis is close to normal
distribution indicating lack of leptokurtosis (fat tailed-ness) in this particular dataset.
6
8. 5 Benchmark Model Selection
Figure 3 below shows the serial auto-correlation in log returns of the Emerging Markets
in-sample data. We notice that the ACF dies out after a lag of around 5 in this particular
dataset. A popular econometric way of modeling volatility is through a Generalised
Autoregressive Conditionally Heteroscedastic, GARCH(p,q), model. We use GARCH as
benchmark to compare the performance of the Stochastic Volatility model.
Consider a time series, yt, then the GARCH(p, q), where p is the order of GARCH
terms σ2
and q is the order of the ARCH terms y2
t , is given by the conditional variance
equation as below.
σ2
t = ω +
q
i=1
αiy2
t−i +
p
i=1
βiσ2
t−i (14)
where the series yt is assumed to have variance σ2
t . The distribution of the series
is typically chosen to be either normal or student-t depending on the series statistical
properties.
Figure 3: MSCI Emerging Markets Auto-correlations
Akaike Information Criteria (AIC) rewards a model for goodness of fit and penalizes
the number of free parameters employed in achieving that fit. Lower AIC values are
preferred. It is defined as, AIC = −2 ln L + 2k, where L is the maximized value of the
log likelihood with k parameters. Similarly, Bayesian Information Criteria (BIC) is also
a goodness of fit measure of a model and is defined as BIC = −2 ln L + 2k ln(N), where
N is the number of observations.
7
9. Table 2: AIC for GARCH(p,q) model
p/q 1 2 3 4 5
1 -5.90917 −5.90745 −5.90573 −5.90433 −5.90310
2 -5.90785 -5.90904 -5.90598 -5.90671 -5.90487
3 -5.90608 -5.90725 -5.90398 -5.90574 -5.90386
4 -5.90432 -5.90539 -5.90221 -5.90441 -5.90252
5 -5.90266 -5.90343 -5.90042 -5.90252 -5.90075
Table 3: BIC for GARCH(p,q) model
p/q 1 2 3 4 5
1 -5.89138 -5.88521 -5.87904 -5.87320 -5.86751
2 -5.88561 -5.88235 -5.87484 -5.87112 -5.86483
3 -5.87939 -5.87611 -5.86839 -5.86571 -5.85938
4 -5.87318 -5.86981 -5.86218 -5.85993 -5.85359
5 -5.86707 -5.86340 -5.85594 -5.85359 -5.84737
Even though the reference [6] gives compelling evidence to select GARCH(1,1) model
we perform our own research and find that indeed GARCH(1,1) is the optimal model
according to the AIC and BIC values for different combinations of p and q as shown
in Tables 2 and 3. Henceforth, we only discuss GARCH(1,1) model as a benchmark to
evaluate the Stochastic Volatility model.
6 Model Estimation using Kalman Filter
In Section 2, we had derived the Stochastic Volatility State Space model that is given by
the following measurement and update equations,
ln y2
t = η0 − 1.27 + ht + ξt
ht+∆ = φ + βht + ζt
(15)
The Quasi-maximum likelihood (QML) approach for estimating the stochastic volatil-
ity model using Kalman Filter is proposed in [3] and [4]. Since ln(y2
t ) is not Gaussian,
the Kalman Filter returns a minimum mean square linear estimators (MMSLE) for ht as
opposed to minimum mean square estimators (MMSE). For this purpose, we assume that
ξt ∼ N(0, π2
2
) and estimate the variance of ζt, given by θ2
.
8
10. Using the R package for Kalman Filtering, FKF 5
, we get the negative log likelihood
of the stochastic volatility estimator with initial values of the parameters. We maximize
this objective function using quasi-Newton method BFGS [8] implemented in R statistical
package, optim{stats}. The in-sample data we use are the log returns squared ranging
from 11 April 2011 until 07 October 2015. The fitted parameters that are output from
the optimization function in R are given in Table 4.
Table 4: SV model fitted parameters
Parameters Fit 95% Confidence Interval
η0 0.088506 (−1.046131, 1.223141)
φ -0.164523 (−0.368023, 0.038977)
β 0.981534 (0.965822, 0.997245)
θ 0.124870 (0.071250, 0.178491)
Table 5: GARCH(1,1) model fitted parameters
Fit Std. Error t value Pr(> |t|)
ω 2.598018e-06 1.322631e-06 1.964281 4.949757e-02
α1 7.769311e-02 1.901114e-02 4.086714 4.375255e-05
β1 9.111031e-01 2.111014e-02 43.159494 0.000000e+00
We also fit GARCH(1,1) model to the in-sample dataset. The fitted parameters of
GARCH(1,1) and their standard errors are given in Table 5. The small p-values for the ω
and α1 parameters indicate that we should reject the null hypothesis. Given log returns,
yt, GARCH(1,1) volatility equation as described before is given by,
σ2
t = ω + α1y2
t−1 + β1σ2
t−1 (16)
The log volatility estimate, ht, from the Stochastic Volatility model is converted to
annual volatility using the formula, σt =
√
252eht/2
. The annualized volatility is compared
with the GARCH(1,1) volatilities and the realized historical volatility using 20 business
days period. The plot in Figure 4 shows the three estimates. We notice that during mid-
2011 the GARCH and Realized volatilites move closely whereas the Stochastic Volatility
doesn’t display similar shocks.
5
https://cran.r-project.org/web/packages/FKF/FKF.pdf
9
11. Figure 4: MSCI EM Vols Fit Comparison
The AIC and BIC scores of the Stochastic Volatility and GARCH(1,1) models are
given in Table 6 showing lower scores for the GARCH model.
Table 6: AIC and BIC scores of SV and GARCH(1,1) models
GARCH(1,1) Stochastic Vol
AIC -5.909171 -4.460508
BIC -5.891378 -4.442715
7 Volatility Forecasting
We now use the fitted parameters from the Stochastic Volatility model that are returned
by the QML estimator based on Kalman Filter to perform forecasting on the out-sample
dataset. The MSCI Emerging Markets Index ETF out-sample dataset ranges from 08
October 2015 until 02 March 2016.
Kalman Filter lends itself nicely for forecasting of the state variable which in this case
is the daily volatility. We use the time update equation of the filter, and in particular,
the estimate of the state vector at = Ttat−1 + ct, as explained in Section 3, to get one step
ahead volatility forecast for the MSCI EM Index. We similarly compute the GARCH(1,1)
forecast using the fitted parameters from the in-sample data.
Figure 5 shows the comparison between the annualized volatility computed using
10
12. GARCH and the Stochastic Volatility methods. It also shows the realized historical
annualized volatility computed using 20 days window. The black dotted line shows the
one-step ahead forecast using the SV model and the red dotted line computes the same
but using the GARCH(1,1) model.
Figure 5: MSCI EM Annualized Vols Forecast Comparison
Kalman Filter provides at each iteration the forecast covariance matrix for the state
vector. This allows us to construct a confidence interval around the one step ahead
annualized volatilities as shown in Figure 6. Figure 6 shows the 95% confidence interval
around the forecast volatility for the first 30 days. As we can see the prediction error
stays relatively constant with each new observation.
Figure 6: Out Sample Volatility Forecast Confidence Intervals
11
13. 8 Model Evaluation
Some popular metrics for measuring forecast accuracy are Mean Square Error (MSE),
Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Given estimates of
the one step ahead volatility at time t, σt, these can be defined as,
MSE =
1
n
n
t=1
(ˆσt − σt)2
RMSE =
1
n
n
t=1
(ˆσt − σt)2
MAE =
1
n
n
t=1
|ˆσt − σt|
(17)
where ˆσ = 1
T−1
T
t=1(rt − µ)2
We compute the realized volatility on a 20 day window for the out-sample dataset
and compare the forecast measures between the Stochastic Volatility and the GARCH(1,1)
models. The model with least error is generally considered to be a better model. Table 7
shows the forecast statistics between the SV model and the GARCH(1,1) model.
Table 7: Model Evaluation (Equation 17)
GARCH(1,1) Stochastic Vol
MSE 0.017533 0.022913
RMSE 0.132412 0.151371
MAE 0.084495 0.091245
An alternate definition of the forecast errors is given by [9]. Since volatility is a latent
variable, they suggest that the forecast accuracy measures can also be defined in terms of
returns, rt, as,
MSE =
1
n
n
t=1
(rt+1 − ¯r)2
− σ2
t
2
RMSE =
1
n
n
t=1
((rt+1 − ¯r)2 − σ2
t )
2
MAE =
1
n
n
t=1
(rt+1 − ¯r)2
− σ2
t
(18)
12
14. We show the output using this definition of evaluation measure in Table 8. Using
this new definition of the forecast error we see that the Stochastic Volatility performs
marginally better.
Table 8: Model Evaluation (Equation 18)
GARCH(1,1) Stochastic Vol
MSE 0.003788 0.002832
RMSE 0.061544 0.053215
MAE 0.0588245 0.052035
9 Conclusion
The Stochastic Volatility model described in this paper is fundamentally driven from the
empirical properties or stylized facts of the volatility and is commonly used for pricing
complex derivatives. However, it’s accuracy for forecasting the MSCI Emerging Markets
Index Volatility is questionable.
We have seen that it performs either worse or marginally better than a GARCH(1,1)
model using the evaluation criteria in Section 8. It’s AIC and BIC scores are also higher
than a GARCH(1,1) model. However, since volatility is an unobserved variable only
implied from index returns, we draw the conclusion that the selection of forecasting tech-
nique shouldn’t be a binary decision and should take into consideration the nature of the
use of volatility in making a financial decision.
In summary, we don’t have a strong indicator that would enable us to recommend a
complex Stochastic Volatility model for forecasting volatility of MSCI Emerging Market
Index.
References
[1] Torben G. Andersen, Tim Bollerslev, Francis X. Diebold and Paul Labys. Modeling
and Forecasting Realized Volatility. Econometrica, 71, 529-626.
[2] Rama Cont. Volatility Clustering in Financial Markets: Empirical Facts and Agent-
Based Models. Long memory in economics, A Kirman and G Teyssiere (eds.),
Springer(2005)
[3] Andrew Harvey, Esther Ruiz, Neil Shepard. Multivariate Stochastic Variance Models,
The Review of Economic Studies, 61 (247-264)
13
15. [4] Andrew C. Harvey and Neil Shephard. Estimation of an Asymmetric Stochastic
Volatility Model for Asset Returns, Journal of Business & Economic Statistics,
14(4):429434, 1996.
[5] Koopman, S. J., Shephard, N., Doornik, J. A. Statistical algorithms for models in
state space using SsfPack 2.2. Econometrics Journal, Royal Economic Society, vol.
2(1), pages 107-160.
[6] Hansen, P., and Lunde, A. (2004). A Forecast Comparison of Volatility Models: Does
Anything Beat a GARCH(1,1) Model?. Journal of Applied Econometrics, 20, 873-889.
[7] Fernando Tussell. Kalman Filtering in R. Journal of Statistical Software, 2011, Vol.
39, Issue 2
[8] Nash, J. C. Compact Numerical Methods for Computers. Linear Algebra and Function
Minimisation. Adam Hilger.
[9] Awartani, B.M.A. and V. Corradi. (2005). Predicting the volatility of the S&P-500
stock index via GARCH models: the role of asymmetries. International Journal of
Forecasting, 21, 167-183.
14