SlideShare a Scribd company logo
Faculteit der Economische Wetenschappen en Bedrijfskunde
MSc. Econometrics
Forecasting Inflation in Ghana: Bayesian and
Frequentist Estimation in Cointegration models.
Author
Victor Amankwah
Supervisor
Dr. L.F. Hoogerheide
June 2015
Forecasting Inflation in Ghana: Bayesian and
Frequentist Estimation in Cointegration models.
Victor Amankwah∗
August 1, 2015
Abstract
In this paper an in depth cointegration analysis of the inflation (CPI) in Ghana is consid-
ered. The Johansen approach for testing for the presence of cointegration is used. Further-
more Bayesian estimation of the VECM is performed. Here I considered prior distributions
for which the properties of interest of the posterior distribution can be analytically computed:
the natural conjugate prior, the noninformative prior and the Minnesota prior. The goal of
this thesis is to detect which estimation method is the best method to use for forecasting
inflation in Ghana. Thus I compare the estimation of the VECM under the assumption
of cointegration in the variables, the VECM without the assumption of cointegration, the
Bayesian estimation using the natural conjugate prior, the Bayesian estimation using the
noninformative prior and the Bayesian estimation using the Minnesota prior. The Bayesian
estimation using the natural conjugate prior proved to be the best estimation method when
predicting inflation in Ghana.
Keywords: Cointegration, VECM, Bayesian estimation, Ghana, Inflation.
1
Master student in Econometrics at the Vrije Universiteit Amsterdam.
e-mail: victoramank@live.nl tel: +31 615 692 793.
iii
Acknowledgements
I would like to thank dr. Lennart Hoogerheide for supervising and guiding me throughout
this thesis. It is not often a student approaches him in writing a thesis covering a developing
country. Though critical, he never broke my spirit and was always encouraging. I also want
to thank him for the great conversations that always started with an econometric subject
but could diverge into all kinds of areas, whether it is politics, philosophy or even soccer.
Furthermore I want to thank dr. Francisco Blasques for lending me his book - Applied
Econometric Time Series - for a tremendous amount of time. And last I also want to thank
the Bank of Ghana for making the data I needed available. After searching for the data for
a very long time I was quite relieved when I found it on their website.
iv
CONTENTS
Contents
1 Introduction 3
2 Inflation in Ghana 5
3 Data Analysis 7
3.1 The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Time Series Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Augmented Dickey-Fuller Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Lag Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.2 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.3 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 KPSS Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Testing for Cointegration 14
4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.1 Vector Error Correction Model (VECM) . . . . . . . . . . . . . . . . . . . 14
4.1.2 Johansen Test Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.3 Calculating the characteristic roots . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Granger Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Bayesian Estimation 21
5.1 Basic Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3.1 Natural Conjugate Priors and The Noninformative Priors . . . . . . . . . 24
5.3.2 The Minnesota Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 Forecasting 26
6.1 Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2 Predictive Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2.1 MAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2.2 RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2.3 Diebold-Mariano test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 20-step Ahead Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7 Conclusion 32
A APPENDIX: Data Statistics 36
B APPENDIX: Scatterplots of the Data 36
C APPENDIX: Results from Cointegration Estimation 37
D APPENDIX: KPSS Test Results 37
1
1 Introduction
All central banks share a common goal i.e. to preserve economic and financial stability and
growth. Many share my opinion that monetary policy is the strongest tool central banks have to
achieve this goal. Through the regulation of the money supply - which is basically the objective
of monetary policies - interest rates are affected which of course can be an effective way to
control inflation. When implemented in monetary policies money is only useful if it contains
information that can be used in forecasting future changes in the price level. One would then
expect the monetary policy authorities to design a policy targeting money aggregates. Ghana
chooses an inflation targeting arrangement. The countryâs economy is currently facing some
challenges as the inflation rate has been increasing rapidly since the beginning of the year 2013
and has been persistent throughout the year 2014. The Bank of Ghana has tried to fight this
unpleasant event by recently raising its policy rate. For monetary policy to have any impact
here I believe it is of extreme importance to have a reliable model of the inflation dynamics.
Once these dynamics are clear researchers can perform forecasts of future inflation movements.
After going through the literature, I have to conclude that there is a significant amount
of research done on inflation dynamics in Ghana. One worth mentioning is the work of Jihad
Dagher and Arto Kovanen (2011), where they examined the long term demand for money in
Ghana, by utilizing an alternative to the Johansen procedure to estimate the long run demand
for money, a procedure elaborated in Pesaran et al. (2001). Arto Kovanen (2011) went so far
to see if money was a valuable predictor for inflation. What I did not encounter often in the
literature is the comparison of different models in predicting inflation in Ghana, which I believe
could be of worth to the monetary policy in Ghana. Erasmus and Abdalla (2005) conducted a
great research, where they compared cointegration and ARIMA models in predicting inflation
in Ghana. Inspired by their work I decided to conduct a research of my own, benefiting from
the more recent datasets and using more advanced econometric methods.
The objective of this thesis is to compare different models in predicting inflation (in the form
of the consumer price index (CPI)) in Ghana. Like Erasmus and Abdalla (2005) I will examine
the performance of a cointegration model in forecasting inflation in Ghana. As an econometrician
I will go deeper in the dynamics of cointegration compared to Erasmus and Abdalla (2005).
Cointegration models have become quite popular in studies concerning macro-economic data.
It has proven to be reliable to capture the long run dynamics of variables. It is this long run
information that could help during the forecasting process. But this long run relationship in
the variables under consideration is not always found. Luckily Johansen (1988, 1991) outlined
a procedure that can help us determine if there is a long run relationship in the variables or
not. This procedure necessitates us to work with multivariate models (vector autoregression).
At the same time I believe that such large models can be much more beneficial compared to
univariate models when the right variables are chosen. Thus in order to obtain reliable forecasts
for inflation we need to use variables in our analysis that indeed have a significant influence
on the inflation dynamics and/or vice versa. I will examine whether including this long run
relationship in the model is more beneficial in predicting inflation or not.
The estimation of these large models involve a large number of parameters. As a consequence
forecasts can become imprecise using classical estimation methods, for example when classical
methods are used to estimate cointegration models. For this reason I will also conduct a Bayesian
analysis as a comparison. The Bayesian approach assumes that the parameters are unknown and
are described by a probability distribution. The question is if this assumption can lead to more
precise forecasts. Additionally this approach allows for the incorporation of prior information
in the form of a prior distribution into the estimation process. I will examine the performance
of the natural conjugate prior, the noninformative prior and the Minnesota prior in predicting
inflation in Ghana.
3
1 INTRODUCTION
The structure of this thesis is as follows. In the next section I will discuss economic devel-
opments concerning inflation in Ghana. In the third section I will describe and investigate the
data used for this study. Section four consist of deep cointegration analysis. In section five I will
elaborate on the Bayesian estimation procedure and the different prior distributions. Finally, in
section six I will compare the performance of these methods when forecasting inflation. To end,
conclusions of the research are drawn in section seven.
4
2 Inflation in Ghana
For the last three decades Ghanaian authorities have been steering Ghana’s economy in
what seems to be the right direction. Figure 1 shows this growth in terms of GDP. GDP has
risen from approximately 5 billion US dollars in 1980 to almost 48 billion dollars in 2013. In
the early 1980’s the Ghanaian authorities understood that further economic growth was only
possible if strong financial markets were developed in the country. They took important steps
to achieve this goal. The first step was to liberalize the financial sector, exchange and credit
controls. So government intervention was lessened, which allowed prices and the exchange rate
to be determined through market forces.
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
0
5
10
15
20
25
30
35
40
45
50
Figure 1: Ghana’s GDP (in billions of dollars) from 1960 till 2013.
In the early 1990’s the Bank of Ghana moved to an indirect control of liquidity by initiating
open market operations, which made it possible for the central bank to issue treasury and central
bank bills to regulate the money supply. Many reforms in the financial system followed, which
led - among others - to renewed banking laws and a shift to a floating exchange rate construction.
As a result of these changes the 2002 Bank of Ghana Act was brought to life. This led to the
strengthening of the independence of the central bank.
During this period of financial transformation the Bank of Ghana upheld a money target-
ing regime. Under this regime, the Bank of Ghana used its instruments to control monetary
aggregates, which were considered the main determinants of inflation in the long run. Thus
steering monetary aggregates would be equivalent to stabilizing the inflation rate around the
target value. However the ability of monetary aggregates to function effectively as intermediate
targets is strongly based on the stability of their relationship to the goal variable, which is the
inflation rate. There were and still are different opinions on whether a money targeting regime
is effective on the long run or not.
With the establishment of the monetary policy committee (MPC) in the year 2002, price
stability became the primary monetary policy objective. This new course pushed growth and
exchange rate stability to the background. The MPC has full responsibility over the monetary
policy in the country. In their quest in achieving stable prices the Bank of Ghana analyzed
a broader set of variables and developed the institutional structures needed to implement an
inflation targeting policy. A inflation targeting policy is a monetary policy in which the central
bank tries to steer the inflation rate towards a beforehand estimated target rate by making use of
5
2 INFLATION IN GHANA
different monetary policy tools. On May 2007 the Bank of Ghana formally adopted an inflation
targeting policy. However, this new policy did not prove to be as satisfactory as was hoped.
When inflation targeting was initiated in 2007 inflation was approximately at a 10% level, but
increased up to 20% around early 2009 (see Figure 2).
1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
0
10
20
30
40
50
60
70
80
Figure 2: Ghana’s annual Inflation (%) from 1990 till 2014.
As of now authorities are still having their hands full with controlling the ongoing rise of
inflation in Ghana. Figure 2 shows how the inflation rate has been increasing rapidly since the
beginning of the year 2013 and has been persistent ever since. The drawback in the Ghanaian
economy can - according to the World Bank - be blamed for the most part on the large fiscal
deficit (expenditures exceeding revenues). Further causes are the continued decrease of inter-
national commodity prices of gold, cocoa and oil that account for more than 75% of Ghana’s
exports. The Bank of Ghana has tried to fight this unpleasant event by recently raising its policy
rate. To further tackle this hardship the Ghanaian government recently reached an agreement
with the IMF concerning a new three-year funding deal. Under the terms of the agreement,
inflation should fall to approximately 12% by the end of 2015. Taking this and other factors in
consideration makes Ghana’s growth prospects positive in the long run.
6
3 Data Analysis
3.1 The Data
It is known to be difficult to acquire financial or macroeconomic data from developing coun-
tries. After intensive search I was able to find the needed data for this thesis on the website of
the Bank of Ghana. Not all desired data were found as monthly observations of the GDP of
Ghana are nowhere to be found. Erasmus and Abdalla (2005) had the same problem, but could
still carry out their research. The rest of the data were fortunately found.
The data used in this thesis contain monthly observations from July 1990 till November 2013
of the Consumer Price Index (CPI), Exchange Rate (Ex), which are the monthly averages of the
inter-bank Exchange Rates GHC/USD, Interest rate (IR) and money supply (M2). Making it a
total of 281 in-sample datapoints1. Six data points are stored as out-of-sample data, which I will
be forecasting later on. This contains data of the previously stated macroeconomic indicators
from December 2013 till May 2014. I chose to start from 1990 because data from preceding
years are not very credible. As explained in the previous section, financial markets in Ghana
were extremely regulated by the authorities in those periods.
At first some adjustments needed to be done to the data, because the series were not always
consistent or logical. A clear case of inconsistency was found in the CPI data which were not
adjusted for the changing base year, from July 2013 and ongoing. A correction was needed here,
otherwise our results would be unreliable. Another case worth mentioning is that some values
of the M2 series were incorrectly multiplied by 10000. After I made these corrections, the data
were ready to be used.
3.2 Time Series Properties
1990 1994 1998 2002 2006 2010 2013
1
2
3
4
5
6
7
Log CPI
1990 1994 1998 2002 2006 2010 2013
−4
−3
−2
−1
0
1
Log Ex
1990 1994 1998 2002 2006 2010 2013
3
4
5
6
7
8
9
10
Log M2
1990 1994 1998 2002 2006 2010 2013
2.5
3
3.5
4
Log IR
Figure 3: Variables in log levels.
1
See Appendix A.
7
3 DATA ANALYSIS
It is observable from Figure 3 that the Log CPI, log Ex and log M2 contain a trend. Log Ex
seems to be the most volatile series of these 3, reflecting the possible presence of a stochastic
trend. The graph of log M2 on the other hand nearly resembles a straight line, reflecting the
possible presence of a deterministic trend in this variable; however, this series may also contain
both a deterministic and a stochastic trend. In any case all three time series contain a upward
trend. Thus one may expect a positive relationship between each two of these three variables, if
any such relationship exists. One may get the same expectation from the scatter-plots displayed
in Appendix B. The plot of log IR in Figure 3 seems to have one or more structural breaks. In
contrast to the other three variables, it is not at once clear if log IR has a trend or not. But it
is clearly not moving around one mean, indicating that the series as a whole is not stationary.
When comparing Log IR to the other three variables there is no clear relationship to be seen
as is portrayed in Appendix B. The presence of a trend clearly indicates that we are dealing
with non stationary time series. Thus regressing these variables on each other could exhibit a
spurious relationship except if a cointegrating relationship can be found.
1990 1994 1998 2002 2006 2010 2013
−0.04
−0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
1990 1994 1998 2002 2006 2010 2013
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1990 1994 1998 2002 2006 2010 2013
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
1990 1994 1998 2002 2006 2010 2013
−0.3
−0.2
−0.1
0
0.1
0.2
DLog CPI DLog Ex
DLog M2 DLog IR
Figure 4: Variables in log levels and first differences.
The first differences of our variables show an expected result, namely that these macroe-
conomic variables are stationary after differencing. The graph of DLog Ex in Figure 4 is the
most convincing one. Dlog Ex clearly seems to wander around zero, indicating a constant mean.
Though less convincing, this can also be said of DLog M2 and DLog IR. Dlog CPI also seems
to be stationary but has the most ’non-standard’ distribution with many observations equal
to zero compared to the other differenced series. As mentioned before I suspect the presence
of one or more structural changes in Log IR. Testing this variable for the presence of an unit
root using Dickey-Fuller can lead to unreliable conclusions. Reason for this, is the fact that the
Dickey Fuller test statistics are biased towards the non-rejection of the null of a unit root when
structural breaks are present. So the Dickey Fuller test often does not reject the null of a unit
root while the series with structural breaks may be a stationary process between the moments
when the structural breaks take place. However one can observe from Figure 4 that there may
be no structural changes in Dlog IR, which is the variable I am going to compute the Augmented
Dickey-Fuller test with.
8
3 DATA ANALYSIS
3.3 Augmented Dickey-Fuller Tests
Although Figure 3 and Figure 4 suggest that the series are I(1), we need a formal test to
determine the order of integration of our time series. Hence I will be performing the Augmented
Dickey-Fuller test (ADF). To get an idea of what this unit root test is about let us consider the
model yt = α1yt−1 +εt with t = 1, 2, . . . , n, which is the most basic model to consider for a ADF
test. After subtracting yt−1 from both sides we end up with ∆yt = γyt−1 +εt with γ = α1 −1. In
order to test the null hypothesis that an unit-root is present, we can test the hypothesis γ = 0,
which is obviously equivalent to α1 = 1.
I used the most basic model to explain the essence of unit-root tests. However, for the ADF
test I will be considering the following less basic regression equations:
∆yt = γyt−1 +
p
i=2
βi∆yt−i+1 + εt. (3.3.1)
∆yt = α0 + γyt−1 +
p
i=2
βi∆yt−i+1 + εt. (3.3.2)
∆yt = α0 + γyt−1 + α2t +
p
i=2
βi∆yt−i+1 + εt. (3.3.3)
The first equation neither contains an intercept nor a linear time trend. The second equation
contains an intercept and the last equation - which is the unrestricted ADF test - contains both
an intercept and a linear time trend. Using the wrong regression equation, will lead to bias
in the estimates of the parameters, which in return will decrease the reliability and the power
of the unit root test. This is the main reason I chose to investigate all three possible models.
Before doing so we need to know the proper amount of lags needed for these regression models.
3.3.1 Lag Selection
To determine the most suitable lag length for the models we will be inspecting the Akaike
information criterion (AIC) and the Bayesian (Schwarz) information criterion (BIC), which are
defined as follows (under the assumption that the error terms εt are normally distributed with
constant variance):
AIC(p) = log(ˆσ2
p) +
2p
n
and
BIC(p) = log(ˆσ2
p) +
p log(n)
n
,
where ˆσ2
p is the maximum likelihood estimator of the variance of the error term in the model
with p parameters. Once we have obtained these values the next procedure is to choose the
model with the smallest AIC or BIC. Compared to the AIC, the BIC seems more strict due to
its larger penalty term (the second term on the right hand side) when n > e2, which holds for
n ≥ 8. This is of course the case, because we have n = 281. As a consequence the BIC prefers
electing a smaller amount of lags and thus chooses a smaller model than the AIC. The reasoning
behind this is that with AIC we assume that none of the models we are comparing actually is
the true model, but instead the AIC method chooses the model that comes closest to the true
model (in terms of the Kullback-Leibler divergence measure). Due to the complexity of the ’real
9
3 DATA ANALYSIS
world’ the AIC is flexible in adding more explanatory variables. The BIC on the other hand
assumes that one of the models we are comparing is the true model, thus answering a different
question than AIC does. Because the true Data Generating Process (DGP) of my variables is
unknown it may seem appropriate to choose the AIC as the decision criterion. However, we also
report the values of the BIC. Due to the limited number of observations, the maximum number
of lags is chosen to be 10.
Table 1: The AIC and BIC values for LogCPI, LogEx, LogM2 and LogIR in the model with
neither an intercept nor linear trend.
LogCPI LogEx LogM2 LogIR
Lags AIC BIC AIC BIC AIC BIC AIC BIC
0 -8.1004 -8.0874 -4.9215 -4.9085 -6.4174 -6.4044 -6.2484 -6.2354
1 -8.6849 -8.6589 -5.2537 -5.2276 -6.4148 -6.3888 -6.2489 -6.2228
2 -8.6881 -8.6490 -5.2435 -5.2044 -6.4229 -6.3838 -6.3039 -6.2648
3 -8.6793 -8.6270 -5.2534 -5.2011 -6.4179 -6.3655 -6.3222 -6.2698
4 -8.6864 -8.6208 -5.2452 -5.1797 -6.4220 -6.3564 -6.3585 -6.2929
5 -8.6851 -8.6062 -5.2401 -5.1612 -6.4171 -6.3382 -6.3475 -6.2686
6 -8.6742 -8.5819 -5.2383 -5.1460 -6.4063 -6.3140 -6.3734 -6.2811
7 -8.6679 -8.5621 -5.2351 -5.1293 -6.3954 -6.2896 -6.3640 -6.2583
8 -8.6859 -8.5666 -5.2259 -5.1066 -6.3911 -6.2718 -6.3564 -6.2371
9 -8.6983 -8.5653 -5.2212 -5.0883 -6.3913 -6.2584 -6.4056 -6.2727
10 -8.7598 -8.6132 -5.2112 -5.0646 -6.3834 -6.2368 -6.4171 -6.2705
Table 1 shows the results for the most restricted ADF model (3.3.1). According to the AIC
values lag 10, 1, 2, and 10 2 are to be chosen for the ADF test for LogCPI, LogEx, LogM2 and
LogIR respectively. Comparing these AIC results with the BIC counterpart (lag 1, 1, 0 and 4 for
LogCPI, LogEx, LogM2 and LogIR respectively) I can conclude that the AIC and BIC methods
both choose a larger amount of lags for the variable LogIR compared to the other variables and
the same amount of lags for LogEx. It is also interesting to note that the BIC always chooses a
smaller or equal amount of lags compared to the AIC, which I expected.
Table 2: The AIC and BIC values for LogCPI, LogEx, LogM2 and LogIR in the model with only
an intercept.
LogCPI LogEx LogM2 LogIR
Lags AIC BIC AIC BIC AIC BIC AIC BIC
0 -8.3272 -8.3013 -4.9218 -4.8959 -6.4567 -6.4307 -6.2425 -6.2165
1 -8.7407 -8.7017 -5.2801 -5.2411 -6.4647 -6.4256 -6.2435 -6.2045
2 -8.7348 -8.6826 -5.2709 -5.2187 -6.4660 -6.4138 -6.3008 -6.2486
3 -8.7331 -8.6677 -5.2723 -5.2069 -6.4688 -6.4034 -6.3214 -6.2560
4 -8.7528 -8.6741 -5.2622 -5.1835 -6.4885 -6.4098 -6.3594 -6.2807
5 -8.7438 -8.6517 -5.2540 -5.1619 -6.4787 -6.3867 -6.3483 -6.2562
6 -8.7374 -8.6319 -5.2486 -5.1431 -6.4751 -6.3696 -6.3746 -6.2692
7 -8.7265 -8.6075 -5.2437 -5.1247 -6.4732 -6.3542 -6.3649 -6.2459
8 -8.7294 -8.5968 -5.2343 -5.1017 -6.4947 -6.3622 -6.3566 -6.2241
9 -8.7318 -8.5856 -5.2317 -5.0855 -6.5397 -6.3935 -6.4021 -6.2559
10 -8.7824 -8.6225 -5.2215 -5.0616 -6.5461 -6.3862 -6.4121 -6.2521
Having the AIC as our decision criterion, I can conclude from table 2 that lags 10, 1, 10, and
10 are to be chosen for the ADF test for LogCPI, LogEx, LogM2 and LogIR respectively. The
2
Lag 0 means the model with no differenced lag term, lag 1 has 1 differenced lag term and so on.
10
3 DATA ANALYSIS
results from the BIC on the other hand did not change after a constant term was added (3.3.2).
Again according to the BIC I should choose lag 1, 1, 0 and 4 for LogCPI, LogEx, LogM2 and
LogIR respectively. It seems that after adding the constant we need more regressors to explain
∆yt than was first the case, according to the AIC.
Table 3: The AIC and BIC values for LogCPI, LogEx, LogM2 and LogIR in the model with an
intercept and a linear trend.
LogCPI LogEx LogM2 LogIR
Lags AIC BIC AIC BIC AIC BIC AIC BIC
0 -8.3284 -8.2894 -4.9228 -4.8839 -6.4909 -6.4520 -6.2489 -6.2099
1 -8.7336 -8.6815 -5.2733 -5.2213 -6.4957 -6.4436 -6.2509 -6.1989
2 -8.7278 -8.6626 -5.2640 -5.1987 -6.4952 -6.4300 -6.3122 -6.2469
3 -8.7259 -8.6474 -5.2660 -5.1875 -6.4898 -6.4113 -6.3376 -6.2591
4 -8.7457 -8.6539 -5.2561 -5.1642 -6.4993 -6.4075 -6.3694 -6.2776
5 -8.7361 -8.6309 -5.2473 -5.1421 -6.4897 -6.3845 -6.3589 -6.2537
6 -8.7516 -8.6329 -5.2445 -5.1259 -6.4825 -6.3638 -6.3793 -6.2607
7 -8.7226 -8.5904 -5.2300 -5.0978 -6.4782 -6.3460 -6.3704 -6.2381
8 -8.6865 -8.5407 -5.2498 -5.1040 -6.4937 -6.3479 -6.3600 -6.2142
9 -8.6159 -8.4564 -5.2100 -5.0505 -6.5477 -6.3882 -6.4063 -6.2468
10 -8.6951 -8.5218 -5.1887 -5.0154 -6.4876 -6.3143 -6.4099 -6.2366
Now when looking at model (3.3.3), the AIC chooses lag 6, 1, 9, and 10 for the ADF test
for LogCPI, LogEx, LogM2 and LogIR respectively. Again the decision according to the BIC
remains the same. This means that the optimal number of lags according to the BIC is invariant
to the model we use.
3.3.2 Model selection
Now that we have determined the proper amount of lags, we need to select the most likely
model out of the three models. For this purpose I will be comparing the AIC values once more.
So for each variable the model I am going to use for the ADF test is the model with the smallest
AIC value.
Table 4: The AIC value and its designated amount of lags in parentheses.
LogCPI LogEx LogM2 LogIR
No constant and no trend -8.7598 (10) -5.2537 (1) -6.4229 (2) -6.4171 (10)
With constant -8.7824 (10) -5.2801 (1) -6.5461 (10) -6.4121 (10)
With constant and trend -8.7516 (6) -5.2733 (1) -6.5477 (9) -6.4099 (10)
Table 4 is basically a summary of the results I obtained in the previous section. This
framework makes it easier to see which model I should select. For LogCPI model (3.3.2) with
10 lags (p = 11) should be chosen. Model (3.3.2) should also be chosen for LogEx but with 1 lag
(p = 2). Model (3.3.3) should be chosen for LogM2 with 9 lags (p = 10). Finally model (3.3.1)
should be used for the ADF test for LogIR with 10 lags (p = 11).
11
3 DATA ANALYSIS
3.3.3 Test Results
Table 5: ADF test results with amount of lags in parentheses.
LogCPI LogEx LogM2 LogIR
H0 = I(1) and H1 = I(0)
DF critical values (5% significance) -2.88 -2.88 -3.43 -1.95
t-statistic -2.0124 -2.2888 -1.3734 -0.0598
ˆγ -0.0013 (10) -0.0077 (1) -0.0418 (9) -0.0007 (10)
H0 = I(2) and H1 = I(1)
DF critical values (5% significance) -2.88 -2.88 -3.43 -1.95
t-statistic -3.0189 -30.9775 -10.9548 -2.0401
ˆγ -0.2508 (9) -1.5497 (0) -1.4259(3) -0.2307 (9)
The results from the ADF test are depicted in table 5. The upper half of table 5 shows
results from the test of the time series LogCPI, LogEx, LogM2 and LogIR being stationary or
not. To be precise, I tested the null of the presence of a unit root (H0 = I(1)) compared to the
alternative of the absence of a unit root (H1 = I(0)) in our series. I used the AIC criterion to
choose which model to use for the ADF tests as mentioned before. Interesting to note is that
apart from the amount of observations we have and the confidence interval we are considering,
the critical values from the ADF test differ between the types of model we choose. As can be seen
from the upper half of table 5 the absolute values of all t-statistics are below its corresponding
critical value. Therefore we cannot reject the null hypothesis of a unit root in any of the time
series.
To determine the order of differentiation needed for the series to be stationary, I also con-
ducted a second test depicted in the lower half of table 5. So I tested the null of the series being
I(2) versus the alternative of the series being I(1) (which we could not reject in the previous
test). So I am basically testing whether the series ∆LogCPI, ∆LogEx, ∆LogM2 and ∆LogIR are
stationary or not. Here I used the same models as with the first test3 but different lags (shown
in parentheses) according to the smallest AIC values in these new models. As can be seen from
the lower half of table 5 the absolute values of all t-statistics are well above its corresponding
critical value. Thus we must reject the null hypothesis of the time series being I(2). According
to the ADF tests I therefore conclude that my time series LogCPI, LogEx, LogM2 and LogIR
are I(1).
3.4 KPSS Test
To be more certain about the previous conclusion of the variables being stationary I con-
ducted the KPSS (Kwiatkowski-Phillips-Schmidt-Shin) test. This test can be seen as the com-
plement of the previous unit root test. The KPSS test examines the null hypothesis of the
variables being stationary instead of not being stationary, which was the case with the ADF
test. The results of these tests are depicted in Appendix D. Looking at the case where we test
the null of stationarity and including an intercept it is clear that the null is rejected for all our
variables4. Now focusing on the case where an intercept and trend is included in the model,
the null could not be rejected for LogIR when considering the 1% significance level5. For all
3
According to the same procedure done for LogCPI, LogEx, LogM2 and LogIR, the optimal models from the
AIC values remained the same.
4
The critical values for the 10%, 5% and 1% significance levels are 0.347, 0.463 and 0.739 respectively (only
intercept included).
5
The critical values for the 10%, 5% and 1% significance levels are 0.119, 0.146 and 0.216 respectively (intercept
and trend included).
12
3 DATA ANALYSIS
other significance levels the null hypothesis of the variables being stationary is rejected for all
the variables. According to this first test I am able to conclude that the variables are not sta-
tionary. The second test which is also depicted in Appendix D (second half of the table) tests
the null hypothesis of the variables being I(1). Here I expected the critical values to be small
because then we would not be able to reject the null of the variables being integrated of order
one. This would strengthen the credibility of the obtained results from the ADF test. However
the results are not entirely satisfactory. For LogCPI the null is rejected in the case an intercept
is included but not rejected when we also include a trend. This contradicts the previous result
that the model including only an intercept is the most appropriate model for LogCPI, which was
concluded in the previous section. The null is clearly not rejected for LogEx when the model
with intercept and trend is considered. The null hypothesis is rejected for LogM2 for both model
types. Finally the null is never rejected for LogIR. Although the null hypothesis of the variables
being I(1) was rejected for LogM2 I still argue that it makes sense to assume that the variables
are I(1). This is due to the fact that I was very precise in finding the most appropriate model
for the variables in the previous section where I computed the ADF test, and due to the fact
that at 5% significance we have a probability of 5% that a true null hypothesis is rejected, so
that (especially since we perform many tests) we may expect to make some Type 1 errors.
13
4 Testing for Cointegration
There are several ways to test our variables for the existence of a cointegration relationship.
In the literature two methods are frequently discussed, namely the Johansen methodology and
the Engle and Granger methodology. The Engle and Granger method basically performs a
regression of one series on the other series (and a constant) and then test the residuals for
stationarity. This methodology however has some defects I find very disturbing. First of all it
necessitates us to put one variable on the left-hand side and use the other variables as regressors.
It does not matter asymptotically which variable we place on the left or right-hand side, as long
as all variables are I(1) and all variables occur in the cointegration relationship. Unfortunately
it is not common to have a sample size in practice large enough that has this asymptotic
property. This means that the role of a variable - whether it is a dependent variable or a
regressor - has influence on the values of the cointegrating vector and therefore also on the
cointegration test. This is of course a problem when we have three or more variables, as is
the case in my research. Another drawback from the Engle and Granger approach is that it
relies on the two-step estimator. To clarify, consider the equation yt = β0 + β1zt + β2wt + εt,
with yt, zt and wt all I(1) variables. According to the Engle and Granger approach the linear
combination of integrated variables yt − β0 − β1zt − β2wt is stationary if εt is stationary. This
would mean that there is a cointegration relationship amongst the variables with cointegrating
vector β = (1, −β0, −β1, −β2). To ascertain if εt is indeed I(0) the ADF test is performed
using equation ∆ˆet = α1ˆet−1 + . . . . Thus to obtain an estimate of the coefficient α1 we need
to estimate a regression which uses residuals from another regression. Problem here is that
errors made in the first regression are passed on to the next regression. Although it was at first
my intention to perform the Engle and Granger test, I decided not to pursue this due to the
difficulties mentioned. Instead I decided to perform the Johansen approach for cointegration
testing, which is a bit more complicated compared to the Engle and Granger method but much
more effective here.
4.1 Methodology
4.1.1 Vector Error Correction Model (VECM)
To illustrate this approach let us consider the VAR(p) model:
yt = A0 + A1yt−1 + A2yt−2 + · · · + Apyt−p + εt t = 1, 2, . . . , T, (4.1.1)
with yt a (n × 1) vector containing n I(1) variables observed at time t, A0 a (n × 1) vector of
intercept terms, Ai a (n × n) matrix of coefficients with i = 1, 2, . . . , p and εt a (n × 1) vector of
error terms at time t. The error terms are independently, identically and normally distributed
with variance-covariance matrix Σ, εt ∼ IID(0, Σ).
Equation (4.1.1) can be re-written by adding and subtracting Apyt−p+1(= Apyt−(p−1)) from
the right-hand side to obtain:
yt = A0 + A1yt−1 + A2yt−2 + · · · + Ap−2yt−p+2 + (Ap−1 + Ap)yt−p+1 − Ap∆yt−p+1 + εt
and after adding and subtracting (Ap−1 + Ap)yt−p+2 we obtain
yt = A0 + A1yt−1 + A2yt−2 + · · · − (Ap−1 + Ap)∆yt−p+2 − Ap∆yt−p+1 + εt.
This procedure can be repeated to obtain the Vector Error-Correction Model (VECM):
∆yt = A0 + πyt−1 +
p−1
i=1
πi∆yt−i + εt, (4.1.2)
14
4 TESTING FOR COINTEGRATION
with π = −(I − p
i=1 Ai) and πi = − p
j=i+1 Aj.
The left hand side of (4.1.2) consists of stationary variables. On the right hand side I have
a constant, the πyt−1 term and the rest are all stationary variables (including the error term).
Thus the term πyt−1 must also be stationary for the variables to be cointegrated, implying a
long-run equilibrium relationship amongst the variables. Because yt−1 is non-stationary (I(1)),
the stationarity of the term πyt−1 depends solely on the matrix of coefficients π. This matrix
contains cointegrating vectors as columns if indeed a long-run equilibrium exists amongst the
time series.
The number of cointegrating vectors is equal to the rank of matrix π. This rank can take
3 different forms. In the first extreme case where the rank of the coefficient matrix is equal
to zero - meaning that there are no linearly independent columns in π - we have that each
element of the coefficient matrix must be equal to zero. Analogous to the univariate situation
(γ = 0 in previous section) this means that all yit sequences are non-stationary and that no linear
combination of the yit processes can be found that is stationary. This results in the variables not
being cointegrated. In the second extreme case where the rank is equal to n, we have that the
matrix of coefficients is of full rank. This is only the case when all yit sequences are stationary,
which is obviously not the case. We are interested in the case where the matrix has reduced rank
1 ≤ r ≤ n − 1. In this case there exist r cointegrating relations. The idea is that we can write
our matrix of coefficients as two (n × r) matrices when dealing with reduced rank, as follows:
π = αβ . (4.1.3)
Here the β is the (r × n) matrix of cointegration coefficients and α is the (n × r) matrix of
weights. The latter can be interpreted as the matrix of speed of adjustment (to the long-run
relations) parameters. The rows of β form a basis for the r cointegrating vectors. Both matrices
have rank r. If we substitute (4.1.3) into (4.1.2) we get:
∆yt = A0 + αβ yt−1 +
p−1
i=1
πi∆yt−i + εt. (4.1.4)
The VECM in (4.1.4) implies that each element of the r-dimensional vector β yt−1 is sta-
tionary, meaning that there exist r - not necessarily unique6 - cointegration relations.
4.1.2 Johansen Test Statistics
After estimating π we could get hold of the (estimated) characteristic roots of the matrix.
We need these characteristic roots, because by testing the significance of these characteristic
roots we can determine the number of cointegrating vectors. To be more specific, we want to
find out how many characteristic roots of π we have that differ from zero or equally we could
try to determine the amount of roots of π + I = p
i=1 Ai that insignificantly differ from unity.
This number is equal to the rank of the matrix π. Johansen proposed two test statistics testing
the number of characteristic roots that are insignificantly different from one:
λtrace(r) = −T
n
i=r+1
ln(1 − ˆλi), (4.1.5)
λmax(r, r + 1) = −T ln(1 − ˆλr+1), (4.1.6)
6
If β yt ∼ I(0) then for any scalar c the linear combination cβ yt ∼ I(0).
15
4 TESTING FOR COINTEGRATION
where, ˆλi with i = 1, 2, . . . , n are the estimated values of the characteristic roots (eigenvalues
of the matrix discussed below) obtained from the estimated π and T is the number of used
observations. λtrace tests the null hypothesis that the number of different cointegrating vectors
is less than or equal to r against a general alternative. λmax on the other hand tests the
null hypothesis that the number of cointegrating vectors is r against the alternative of r + 1
cointegrating vectors. Unlike conventional tests the distribution of these statistics is not a known,
’standard’ distribution. The critical values of both statistics are obtained using the Monte Carlo
approach.
4.1.3 Calculating the characteristic roots
If we assume that the most appropriate lag length p is known, we can calculate the charac-
teristic roots of π by computing the Frisch Waugh (partial regression) method. The idea is to
run a regression of e1t on e2t according to:
e1t = πe2t + ξ.
The two residuals e1t and e2t are obtained by estimating the VAR in first differences and re-
gressing yt−1 on its lagged differenced values respectively. So the following two regressions are
performed:
∆yt = B0 + B1∆yt−1 + B2∆yt−2 + · · · + Bp−1∆yt−p+1 + e1t
yt−1 = C0 + C1∆yt−1 + C2∆yt−2 + · · · + Cp−1∆yt−p+1 + e2t.
The next step is to compute the squares of the canonical correlations between e1t and e2t.
The canonical correlations in our case are the n values of λi. They are the solutions to the
equation:
λiS22 − S12S−1
11 S12 = 0
where
Sii = T−1
T
i=1
eit(eit)
and
S12 = T−1
T
i=1
e2t(e1t) .
The last step is to find the n columns vi that are nontrivial solutions for
λiS22vi = S12S−1
11 S12vi.
These columns are the maximum likelihood estimates of the cointegrating vectors. Note that
the λi are the eigenvalues of S−1
22 S12S−1
11 S12, and that the vi are the corresponding eigenvalues.
16
4 TESTING FOR COINTEGRATION
4.2 Model Selection
Now that the method has been explained, let us start with the first step, which is to determine
the amount of lags needed in equation (4.1.1). To do so I am going to estimate the restricted
and unrestricted VARs (in which the error terms ε1t or ε2t are assumed to be identically and
independently distributed with a multivariate normal distribution):
yt = A0 +
s
i=1
Aiyt−i + ε1t, s = 1, 2, . . . (4.2.1)
yt = A0 +
l
i=1
Aiyt−i + ε2t, l = 2, 3, . . . (4.2.2)
using undifferenced data, respectively. According to the literature this is the most common
procedure. Here s and l are the number of lags in the restricted and unrestricted model re-
spectively, with l > s. If we define the variance-covariance matrix of the residuals from model
(4.2.1) as Σr and that of model (4.2.2) as Σu we can compute the likelihood ratio test statistic
recommended by Sims (1980):
(T − c)(log |Σr| − log |Σu|) (4.2.3)
where T is the number of observations (i.e., the number of periods), c is the number of param-
eters in the unrestricted model in each equation and log |Σk| is the natural logarithm of the
determinant of Σk, which I estimate by means of OLS. This alternative LR statistic can be
viewed as a small-sample correction of the LR test (where without a correction we would have
the factor T instead of (T − c)). The probability distribution of this statistic is approximately
a chi-squared distribution with degrees of freedom equal to the number of coefficient restric-
tions. Large values of this statistic indicate that the null hypothesis that the restricted model
should be used, should be rejected. The critical value is from the chi-squared distribution (with
the degrees of freedom parameter equal to the number of coefficients that is set to zero in the
restricted model, as compared with the unrestricted model).
Table 6: Likelihood Ratio test results.
Hypothesis LR-Statistic (Degrees of Freedom) Critical value (5% significance)) Decision
2 lags against 1 lags 206.1896 (16) 26.296 Reject 1 lag
3 lags against 2 lags 105.5431 (16) 26.296 Reject 2 lags
4 lags against 3 lags 60.6647 (16) 26.296 Reject 3 lags
5 lags against 4 lags 46.3607 (16) 26.296 Reject 4 lags
6 lags against 5 lags 11.1580 (16) 26.296 Do not reject 5 lags
7 lags against 5 lags 55.0583 (32) 46.194 Reject 5 lags
8 lags against 7 lags 33.2695 (16) 26.296 Reject 7 lags
9 lags against 8 lags 53.7635 (16) 26.296 Reject 8 lags
10 lags against 9 lags 34.6816 (16) 26.296 Reject 9 lags
11 lags against 10 lags 46.6505 (16) 26.296 Reject 10 lags
12 lags against 11 lags 8.7503 (16) 26.296 Do not reject 11 lags
13 lags against 11 lags 78.5158 (32) 46.194 Reject 11 lags
14 lags against 13 lags 25.3731 (16) 26.296 Do not reject 13 lags
15 lags against 13 lags 80.3953 (32) 46.194 Reject 13 lags
16 lags against 15 lags 28.7579 (16) 26.296 Reject 15 lags
17 lags against 16 lags 16.6836 (16) 26.296 Do not reject 16 lags
18 lags against 16 lags 51.1282 (32) 46.194 Reject 16 lags
19 lags against 18 lags 10.7015 (16) 26.296 Do not reject 18 lags
20 lags against 18 lags 26.7273 (32) 46.194 Do not reject 18 lags
17
4 TESTING FOR COINTEGRATION
The results from the likelihood ratio tests are displayed in table 6. The table shows that
the LR test could not reject the null hypothesis of using a restricted model with 5 lags against
the alternative of using an unrestricted model with 6 lags. Now testing for the null of 5 lags
compared to the alternative of 7 lags, the null of 5 lags is rejected. However the null of 7 lags is
rejected right away. The same goes for the null of 8, 9 and 10 lags. Testing the null of 11 lags
against the alternative of 12 lags, I had to conclude not to reject the null of 11 lags. The null
of 11 lags is however rejected when tested against the alternative of 13 lags. The null of 13 lags
is not rejected when tested against the alternative of 14 lags. The null of 13 lags is eventually
rejected when tested against the alternative of 15 lags. The null of 15 lags is rejected right away
when tested against the alternative of 16 lags. The null of 16 lags is not rejected against the
alternative of 17 lags but it is rejected when tested against the alternative of 18 lags. The null
of 18 lags is never rejected compared to any alternative with more than 18 lags. So according
to this alternative LR test I conclude that 18 lags is the most appropriate amount of lags to be
used in our analysis for this particular dataset.
4.3 Empirical Results
Table 7: The λtrace and λmax tests
Null Hypothesis Alternative Hypothesis 1% critical value 5% critical value
λtrace tests λtrace value
r = 0 r > 0 60.043 54.46 47.21
r ≤ 1 r > 1 19.266 35.65 29.68
r ≤ 2 r > 2 6.837 20.04 15.41
r ≤ 3 r > 3 5.436 6.65 3.76
λmax tests λmax value
r = 0 r = 1 40.777 32.24 27.07
r = 1 r = 2 12.430 25.52 20.97
r = 2 r = 3 1.400 18.63 14.07
r = 3 r = 4 5.436 6.65 3.76
After estimating the characteristic roots7according to the procedure mentioned previously, I
computed the Johansen test statistics. Table 7 shows the results from the λtrace and λmax tests.
Let us first focus on the test results concerning the 5% critical value. Since 60.043 is larger than
the 5% critical value of the λtrace, the test rejects the null hypothesis of no cointegration and
therefore accepts the alternative of one or more cointegrating vectors. Looking at the λmax test
we have a similar conclusion as 40.777 is larger than the 5% critical value of this test. This leads
to the rejection of the null of no cointegration and the acceptance of the alternative of having
one cointegrating vector. So both tests reject the notion of no cointegration in our variables for
the 5% critical value.
Now looking at λtrace(1) - which tests the null of r ≤ 1 against the alternative of two, three
or four cointegrating vectors - we can not reject the null hypothesis at the 5% critical value.
Reason for this is the fact that the statistic 19.266 does not exceed the 5% critical value. So
according to the λtrace(0) and λtrace(1) test we have one cointegrating vector. The λmax(1) test
can not reject the null hypothesis of having exactly one cointegrating vector, because 12.430
does not exceed the 5% critical value. Combining the results from both tests up till now we can
conclude that we have one cointegrating vector.
7
See Appendix C for the values
18
4 TESTING FOR COINTEGRATION
Considering the λtrace(2) test - with the null hypothesis of 2 or less cointegrating vectors -
again due to the fact that the statistic of 6.837 is smaller than the 5% critical value we can not
reject the null. So the notion of having more than two cointegrating vectors is not accepted.
The λmax(2) statistic of 1.400 clearly does not exceed the 5% critical value. So according to
this test we can not reject the null of having exactly two cointegrating vectors. Combining the
results from both tests and the previous ones we can confirm that there is one cointegrating
vector.
The next tests - which are the λtrace(3) and λmax(3) tests - should logically also not reject
the null of having less than three cointegrating vectors and exactly three cointegrating vectors
respectively, against the alternatives. Combining this result with the previous ones would lead
me to conclude that we have exactly one cointegrating vector. However for both cases I get
a statistic of 5.436 which clearly exceeds their corresponding 5% critical value. So according
to this last test the null must be rejected and thus accepting the alternative of having four
cointegrating vectors. This is quite an unexpected result because, as previously mentioned, the
alternatives of having more than one and two cointegrating vectors were not accepted. Thus
considering the 5% critical values the Johansen test could not give a plausible result for the
number of cointegrating vectors in our model.
I therefore considered the smaller 1% significance level. Its critical values are also displayed
in table 7. Considering this significance level the null of no cointegration is rejected. Again
both λtrace(0) and λmax(0) statistics exceed the corresponding critical values. The next tests
computed all could not reject the null hypothesis. This means that the λmax test could not reject
the notion of there being exactly one, two and three cointegrating vectors. Only considering this
test would not be very insightful. But combining these results with the results from the λtrace
test could be. The λtrace test could not reject the null of having one or less, two or less and three
or less cointegrating vectors at the 1% significance level. So combining all these results as before,
I conclude that there is one cointegrating vector. Selecting r = 1 the estimated normalized8
cointegrating vector and speed of adjustment parameters are displayed in Appendix C.
8
Normalized with respect to the first element of β.
19
4 TESTING FOR COINTEGRATION
4.4 Granger Causality
The mechanism that binds cointegrated series together is called error correction, which can
be considered a specific form of Granger causality for (some of) the involved variables (Granger
1988). Granger causality can be seen as the event that turning points in one series precede
turning point in the other series, when considering two series. If taking into account the past of
a series wt does not improve the forecasting performance of series zt (given the past of zt itself),
then wt does not Granger cause zt. So basically Granger causality considers the effects of past
values of wt on the current value of zt (in addition to the effects of the past values of zt on zt).
It measures whether current and past values of wt help to forecast future values of zt. To bring
it back to our case, I wish to find out if the variables LogEx, LogM2 and LogIR Granger cause
LogCPI. But first let us rewrite the VECM in equation (4.1.2) with p = 18 as:





∆y1t
∆y2t
∆y3t
∆y4t





=





a10
a20
a30
a40





+





a11(1) a12(1) a13(1) a14(1)
a21(1) a22(1) a23(1) a24(1)
a31(1) a32(1) a33(1) a34(1)
a41(1) a42(1) a43(1) a44(1)










y1,t−1
y2,t−1
y3,t−1
y4,t−1





+





a11(2) a12(2) a13(2) a14(2)
a21(2) a22(2) a23(2) a24(2)
a31(2) a32(2) a33(2) a34(2)
a41(2) a42(2) a43(2) a44(2)










∆y1,t−1
∆y2,t−1
∆y3,t−1
∆y4,t−1





+ · · · +





a11(18) a12(18) a13(18) a14(18)
a21(18) a22(18) a23(18) a24(18)
a31(18) a32(18) a33(18) a34(18)
a41(18) a42(18) a43(18) a44(18)










∆y1,t−17
∆y2,t−17
∆y3,t−17
∆y4,t−17





+





ε1,t
ε2,t
ε3,t
ε4,t





with y1t, y2t, y3t and y4t equal to LogCPI, LogEx, LogM2 and LogIR at time t respectively.
In this form it is clear that at each time t we are dealing with four equations. The null hypothesis
that lags of LogEx, LogM2 or LogIR do not Granger cause LogCPI is properly tested using the
likelihood ratio test stated in (4.2.3). We need to estimate the ∆y1t (LogCPI) equation - which
is the first equation of the four equations above - using all lagged values to calculate Σu. We
then have to estimate the same equation but now excluding the lagged values of the variable
in consideration (y2t, y3t or y4t) to calculate Σr. Again this statistic has (asymptotically, under
the null hypothesis of no Granger causality) a chi-squared distribution with degrees of freedom
equal to p = 18 (this is the number of restricted variables in the equation of ∆y1t).
Table 8: Likelihood Ratio test results for Granger Causality.
Null Hypothesis LR-Statistic Decision
LogEx does not Granger cause LogCPI 1291.65 Reject Null
LogM2 does not Granger cause LogCPI 1669.82 Reject Null
LogIR does not Granger cause LogCPI 1639.57 Reject Null
Results from the Granger causality test is displayed in table 8. With a degrees of freedom equal
to 18 and considering a confidence interval of 95% the chi-squared critical value is 28.869. The
obtained LR-statistics are clearly much larger than this critical value, leading me to reject all
three null hypotheses. Thus according to the test, the notion that LogEx, LogM2 and LogIR do
not Granger cause LogCPI is rejected. Though the LR-statistics are very big, the results are of
course not surprising. These variables all have something to do with each other economically.
The obtained results from the Granger causality tests confirm that using these variables in my
analysis and later on during the forecasting segment is a valid decision.
20
5 BAYESIAN ESTIMATION
5 Bayesian Estimation
Up till now the estimation of the VECM is done using the method of maximum likelihood.
This classical method assumes the parameters to be fixed but unknown. This section focuses
on the Bayesian estimation method. In the Bayesian approach the Data generating process of
the population is assumed uncertain and the parameters are described by a prior probability
distribution instead of fixed values. This distribution is usually very broad to reflect the fact
that the true value of the parameters are unknown. To better understand this approach, let us
consider the VAR(p) model in equation (4.1.1). Classical estimation of this model can lead to
a problem called overfitting. Overfitting may especially occur when the number of parameters
(n+n2p) is large compared to the number of observations Tn. If this is the case, the estimates are
influenced by noise rather than by signal. This may especially occur when the estimation method
is designed to fit the data as closely as possible. To reduce the dimension of the parameter space,
we could impose restrictions on this space. Thus the problem is to find restrictions that are as
credible as possible. The Bayesian approach avoids overfitting not necessarily by imposing zero
restrictions on the parameters but making the parameters vary within a certain range. As
such the uncertainty of the exact values of the model’s parameters can be seen as a probability
distribution for the parameter vector. So in a sense this distribution represents the degree of
uncertainty of the parameters and is amended by the information contained in the data if the
prior information is different from the information obtained from the data. As long as the
prior information is not too obscure or non-informative it should be amended only by the signal
and not by the noise contained in the sample. One can imagine that the choice of the prior
distribution is a crucial step in the computation of the Bayes estimates. This step summarizes
the uncertainty we have over the model parameters. In this light I find it important to consider
more than one prior distribution. Before doing so let us take a better look at what the Bayesian
treatment of a VAR entails.
5.1 Basic Principle
Let us consider the VAR(p) model in (4.1.1), assuming that the variables are stationary. So
in practice we are considering the model:
∆yt = A0 + A1∆yt−1 + A2∆yt−2 + · · · + Ap∆yt−p + εt t = 2, . . . , T, (5.1.1)
but for notational convenience we will be working with yt - instead of ∆yt - and assume it is a
(n×1) vector in the rest of this section. Note that I do not consider the Bayesian estimation of a
VECM with a cointegration relation, even though there seems to be cointegration relation in the
data. The reason for this is that I consider only models and priors that lead to posteriors which
can be analytically evaluated. I leave the Bayesian estimation of a VECM with a cointegration
relation for these data as a topic for further research. Now define Xt = (1, yt−1, yt−2, . . . , yt−p)
which is a (1 × k) vector with k = 1 + np and
X =






X1
X2
...
XT






which is a (T × k) matrix. Let us further define α = vec(A) which is a (nk × 1) vector with
A = (A0, A1, . . . , Ap) a (k ×n) matrix of stacked VAR coefficients and intercepts. The full VAR
can then be written as:
y = (In ⊗ X)α + ε, (5.1.2)
21
5 BAYESIAN ESTIMATION
where ε ∼ N(0, Σ⊗IT ) and y is the (nT ×1) vector which stacks all T observations of the first
time series, then all T observations of the second series and so on. 9 The unknown parameters
of the model are α and Σ. Before the data are observed the parameters are described by the
joint prior distribution p(α, Σ). Once the data are observed we use the Bayes’ rule to obtain the
joint posterior distribution of the parameters conditioned on the data, which is defined as:
p(α, Σ|y) =
p(α, Σ, y)
p(y)
=
p(α, Σ)p(y|α, Σ)
p(y)
=
p(α, Σ)L(y|α, Σ)
p(y)
∝ p(α, Σ)L(y|α, Σ), (5.1.3)
where the last equation10 is obtained due to the fact that p(y) is a constant (in the sense that it
does not depend on α or Σ) and will not have any significance during the estimation. Given the
joint posterior distribution p(α, Σ|y), the marginal posterior distributions p(α|y) and p(Σ|y) can
be obtained by integrating out α and Σ from p(α, Σ|y) respectively. The analytical integration of
p(α, Σ|y) can be very cumbersome or even impossible to perform. To circumvent this difficulty
numerical integration methods based on Monte Carlo simulation can be used. However analytical
solutions are available for some prior specifications. I am going to focus the analysis on these
analytical approaches of computing the marginal posterior distributions. That is, a difficulty
faced in implementing Bayesian estimation of model (5.2.1) is the choice of the prior distribution.
I am going to discuss several alternatives for which analytical integration is possible. Once we
get hold of p(α|y) and p(Σ|y) the final step is to analyze the location (mean) and dispersion
(variance) of these distributions, which will yield point estimates of the parameters and posterior
standard deviations (which are the Bayesian counterpart of the classical standard errors).
9
If we define YT to be a (T × n) matrix which stacks the T observations of each series in columns next to one
another, then y = vec(YT ).
10
∝ means ’proportional to’ , which means that the ratio of the left-hand side and the right-hand side is a
constant (that does not depend on α or Σ).
22
5 BAYESIAN ESTIMATION
5.2 Likelihood Function
Apart from the prior and the posterior distributions the likelihood function is also of immense
importance , as it is used to transform the prior into the posterior. Understanding how its
functional form looks like will make it easier to understand the priors we will be considering for
the estimation and forecasting process later on.
Given the parameters α and Σ the likelihood is defined as:
p(y|α, Σ) ∝ |Σ ⊗ IT |−1/2
exp −
1
2
(y − (In ⊗ X)α) (Σ−1
⊗ IT )(y − (In ⊗ X)α) . (5.2.1)
If we view the likelihood as a function of the parameters L(α, Σ), then we can derive a useful
decomposition breaking the likelihood into two parts:
(y − (In ⊗ X)α) (Σ−1
⊗ IT )(y − (In ⊗ X)α)
= (Σ−1
2 ⊗ IT )(y − (In ⊗ X)α) (Σ−1
2 ⊗ IT )(y − (In ⊗ X)α)
= (Σ−1
2 ⊗ IT )y − (Σ−1
2 ⊗ X)α (Σ−1
2 ⊗ IT )y − (Σ−1
2 ⊗ X)α ,
where in the last equation I used the Kronecker property (A ⊗ B)(C ⊗ D) = AC ⊗ BD with A,
B, C and D matrices of such sizes that these multiplications are possible.
We can rewrite:
(Σ−1
2 ⊗ IT )y − (Σ−1
2 ⊗ X)α = (Σ−1
2 ⊗ IT )y − (Σ−1
2 ⊗ X)αols + (Σ−1
2 ⊗ X)(αols − α)
where
αols = (Σ−1
⊗ X X)−1
(Σ−1
⊗ X) y.
Now substituting this into the last result we get:
(y − (In ⊗ X)α) (Σ−1
⊗ IT )(y − (In ⊗ X)α)
= (Σ−1
2 ⊗ IT )y − (Σ−1
2 ⊗ X)αols (Σ−1
2 ⊗ IT )y − (Σ−1
2 ⊗ X)αols + (αols − α) (Σ−1
⊗ X X)(αols − α)
Now substituting this into the likelihood function we get:
L(α, Σ) ∝ |Σ ⊗ IT |−1/2
exp{−
1
2
(α − αols) (Σ−1
⊗ X X)(α − αols)
−
1
2
[(Σ−1
2 ⊗ IT )y − (Σ−1
2 ⊗ X)αols] [(Σ−1
2 ⊗ IT )y − (Σ−1
2 ⊗ X)αols]}
= |Σ|−1
2
k
exp −
1
2
(α − αols) (Σ−1
⊗ X X)(α − αols)
×|Σ|− 1
2
(T−k)
exp −
1
2
Tr([(Σ−1
2 ⊗ IT )y − (Σ−1
2 ⊗ X)αols] [(Σ−1
2 ⊗ IT )y − (Σ−1
2 ⊗ X)αols])
∝ N(α|αols, Σ, X, y) × W(Σ−1
|αols, X, y, T − k − n − 1)
where Tr(·) is the trace function. By decomposing the likelihood into two parts I showed it to be
proportional to the product of a Normal and a Wishart density which is the extended version of
the chi-squared distribution (for positive definite symmetric matrices instead of positive scalars).
That is:
α|Σ, y ∼ N(αols, Σ ⊗ (X X)−1
) (5.2.2)
and
Σ−1
|y ∼ W(S−1
, T − k − n − 1), (5.2.3)
with S = (y − (In ⊗ X)αols) (y − (In ⊗ X)αols).
23
5 BAYESIAN ESTIMATION
5.3 Priors
As stated before the choice of the prior p(α, Σ) is of great importance. Due to the large
number of coefficients withing a VAR framework it is cumbersome to obtain precise estimates.
This can lead to forecasts being imprecise. Prior information is helpful in decreasing the predic-
tive standard deviations. There are several priors in the literature and each of them have their
own benefits and drawbacks, depending on the specification of the VAR. One big difference be-
tween the different priors is whether they lead to analytical results for the posterior or whether
Markov Chain Monte Carlo methods are required. Natural conjugate priors, noninformative
priors and the Minnesota priors are three type of priors that can lead to analytical results. I
will be considering these three priors.
5.3.1 Natural Conjugate Priors and The Noninformative Priors
Natural conjugate priors are priors with the same functional form as the likelihood and the
posterior; that is, the prior, likelihood and posterior all belong to the same ’family’ of (density)
functions. The way the distribution of the likelihood looks (see previous section) suggests that
the natural conjugate prior has the following form:
α|Σ ∼ N(ˆα, Σ ⊗ ˆV ) (5.3.1)
and
Σ−1
∼ W( ˆS−1
, ˆτ), (5.3.2)
where the ˆα, ˆV , ˆS and ˆτ are the parameters of the prior, the so called hyperparameters. To
get the posterior we need to multiply this prior with the likelihood (according to (5.1.2)). So
the posterior becomes:
α|Σ, y ∼ N(˜α, Σ ⊗ ˆW) (5.3.3)
and
Σ−1
|y ∼ W( ˆZ−1
, ˆν), (5.3.4)
where
ˆW = ( ˆV −1
+ X X)−1
and
ˆZ = S + ˆS + AolsX XAols + ˆA ˆV −1 ˆA − ˜A ( ˆV −1
+ X X) ˜A.
Furthermore
˜A = ˆW ˆV −1 ˆA + X XAols
and
ˆν = T + ˆτ.
The vectors Aols and ˆA are computed by unstacking the (nk × 1) vectors αols and ˆα re-
spectively. The fact that the marginal posterior p(α|y) is a multivariate t-distribution, makes
analytical posterior inference possible. This distribution is obtained after integrating Σ out
of the distribution stated in 5.3.3 and has mean ˜α with degrees of freedom parameter ˆν and
covariance matrix defined as:
var(α|y) =
1
ˆν − n − 1
ˆZ ⊗ ˆW
As a researcher I am free to choose any value for the hyperparameters ˆα, ˆV , ˆS and ˆτ. For the
noninformative prior I do not have this freedom. The difference between the natural conjugate
24
5 BAYESIAN ESTIMATION
prior and the noninformative prior is that for the latter the hyperparameters are restricted to
be ˆτ = ˆS = ˆV −1 = cI where c is a constant with c → 0. According to the literature the
noninformative prior does not decrease the predictive standard deviations. This is of course a
huge drawback since in our case the whole point in using priors is to shrink these deviations so
forecasting can be more precise.
Using this prior makes it unnecessary to use posterior simulation algorithms, which is a huge
benefit. However there is a possibly undesirable property to this prior which is caused by the
way the prior covariance matrix (Σ ⊗ ˆV ) is defined. If we denote the individual elements of Σ
by σij then the prior covariance of the coefficients in the i’th equation is equal to σii
ˆV . Due
to the recurrence of ˆV in the prior covariance of the coefficients of each equation, the prior
covariances of the coefficients for every two equations are proportional to each other. This is
quite a restrictive property and thus could have a negative effect on the forecasts.
5.3.2 The Minnesota Prior
Researchers at the University of Minnesota (Doan, Litterman and Sims, 1984 and Litterman,
1986) came up with priors that have great simplifications in the computation process. These so
called Minnesota priors approximate Σ by replacing it with an estimate ˆΣ. Replacing Σ by an
estimate simplifies the prior in the sense that we now only need to consider α. So in line with
the previous case we can define the Minnesota prior as:
α ∼ N(ˆαms, ˆVms) (5.3.5)
The elements of the prior mean ˆαms are set equal to zero to mitigate the risk of overfitting.
The prior covariance matrix ˆVms is assumed to be diagonal. If ˆVms is viewed as a block-diagonal
matrix with (k×k) block-matrix ˆVi with i = 1, 2 . . . n, then we can define ˆVi,jj to be the diagonal
elements of ˆVi according to:
ˆVi,jj =



a1
r2 for coefficients on own lag r with r = 1, . . . , p
a2σii
r2σjj
for coefficients on lag r of variable j = i with r = 1, . . . , p
a3σii for coefficients on deterministic variables
With this specification, we do not need to specify all elements of ˆVms. Instead of that we
only need to choose the three scalars a1, a2 and a3. Imposing the restriction a1 > a2 leads to
the pleasant property that lags of the same variable a priori have a larger prediction power than
lags of other variables. Furthermore I choose σii = si which is the OLS estimate. This prior
leads to a posterior which only requires the Normal distribution:
α|y ∼ N(˜αms, ˜Wms) (5.3.6)
where
˜Wms = ˆV −1
ms + (ˆΣ−1
⊗ (X X))
−1
and
˜αms = ˜Wms
ˆV −1
ms ˆαms + (ˆΣ−1
⊗ X) y .
This simplification is of course a big advantage, yet the fact that Σ is replaced by an estimator
can be seen as a drawback. This drawback is the reason why I believe the natural conjugate
prior will perform better in forecasting. The Minnesota prior does not take any uncertainty in
the estimator ˆΣ into account. Thus it does not provide a full Bayesian treatment of Σ.
25
6 FORECASTING
6 Forecasting
Perhaps the most important feature of time series analysis is the forecasting part. The main
objective is to find out which estimation method of the VECM is best in forecasting the inflation
(LogCPI) in Ghana. Additionally the forecasting performance of the other variables will also be
touched on. For this purpose - as mentioned in section 3 - I have kept six monthly datapoints,
dating from December 2013 till May 2014, as out of sample data. Note that this is obviously a
very small number of out-of-sample observations. However, this choice has been made for two
reasons. First, the total number of observations that I have is not very large. Second, I am
mainly interested in the performance of the models in the most recent period, because it may be
expected that the best model for predicting the recent past is also the best model for predicting
the (nearby) future. I will be forecasting these values using the different estimation methods
discussed for the VECM model.
The one-step forecast based on information available at time T is:
yT+1|T = A0 + A1yT + A2yT−1 + · · · + ApyT−p+1. (6.0.7)
The chain rule of forecasting can be used to obtain h-step forecasts according to:
yT+h|T = A0 + A1yT+h−1|T + A2yT+h−2|T + · · · + ApyT+h−p|T , (6.0.8)
where yT+i|T = yT+i for i ≤ 0. Clearly in our case we have h = 1, 2, . . . 6.
6.1 Forecasts
Figure 5 shows the LogCPI datapoints from January 2012 till November 2013 attached to
the 6 consecutive (forecast) datapoints. The graph shows the forecast values using the different
estimation methods. As a recap the estimation methods used are from the cointegration model
(4.1.4), the first difference model (5.1.1) using OLS (stated as ’No Coint’ in the legend) and the
first difference model (5.1.1) using Bayesian estimation. Not surprisingly I used the noninforma-
tive, the natural conjugate and the Minnesota prior for the Bayesian estimation. Additionally
the actual datapoints are also displayed in the Figure as a comparison tool.
2012 2012.5 2013 2013.5 2014 2014.5
5.9
5.95
6
6.05
6.1
6.15
6.2
6.25
Coint
No Coint
Non Inf
Minnes
Nat Conj
Actual Data
Figure 5: Log CPI forecasts
26
6 FORECASTING
It is clear from Figure 5 that the Bayesian forecasts using the natural conjugate prior are the
closest to the actual datapoints. The noninformative prior seems to deliver the worst forecasts
from all the three priors used. It is also clear that the cointegration forecasts are a little bit closer
to the actual datapoints compared to the forecasts obtained without considering cointegration.
6.2 Predictive Accuracy
To be more certain about the accuracy of our forecasts let us have a look at the MAE
(Mean Absolute Error), the RMSE (Root Mean Squared Error) and the Diebold-Mariano test
results, which are used to evaluate whether the difference in performance between the methods
is significant.
6.2.1 MAE
The Mean Absolute Error (MAE) is defined as:
MAE =
1
6
6
i=1
|ˆyT+i − yT+i|, (6.2.1)
with ˆyT+i the forecast value of yT+i, which is the true value at time T + i.
Table 9: MAE values.
LogCPI LogEx LogM2 LogIR
Cointegration 20.8225 0.4046 583.0851 1.7481
No Cointegration 21.1034 0.4297 568.2084 1.5211
Non Informative 23.1752 0.3509 779.4893 2.4558
Minnesota 18.1238 0.3621 1228.5578 1.8556
Natural Conjugate 13.9855 0.3281 772.9586 2.0574
The MAE values are depicted in table 9. Focusing on the LogCPI we can conclude that
the Bayesian estimation method using a natural conjugate prior has the best forecasting per-
formance. Its MAE value (13.9855) is the smallest compared to the MAE values of the other
estimation methods. This is not surprising, because we already saw that its forecasts are the
closest to the actual data depicted in Figure 5. The Bayesian method with the Minnesota prior
is ranked second (18.1238) in best forecasting performance according to the MAE value. The
third place goes to the cointegration approach. The difference betweeen the MAE value of the
cointegration approach (20.8225) and that without (21.1034) is quite small. So according to
the MAE the gain we get from considering the long term relationship in the VECM is not very
big. The worst inflation forecaster according to the MAE values is the Bayesian method using
noninformative priors. I already stated that the noninformative prior is known not to decrease
the predictive standard deviation, so this result is not surprising.
If we look at the other variables, we see that overall the Bayesian estimation using the natural
conjugate prior also has the lowest MAE value for LogEx. For the forecasting of LogM2 I must
conclude that using Bayesian estimation methods is not the best choice. Classical estimation
methods used in case of cointegration and in case of no cointegration seems to outperform the
Bayesian methods. For LogIR, again we see that the Bayesian approach using the noninformative
prior does not perform well forecasting.
27
6 FORECASTING
6.2.2 RMSE
Another loss function is the Root Mean Squared Error (RMSE) and is defined as:
RMSE =
6
i=1(ˆyT+i − yT+i)2
6
, (6.2.2)
Table 10: RMSE values.
LogCPI LogEx LogM2 LogIR
Cointegration 22.8155 0.4491 646.3448 1.9948
No Cointegration 23.2204 0.4828 613.9482 1.7477
Non Informative 25.1429 0.4104 869.4263 2.8215
Minnesota 19.4639 0,4209 1326,6086 2,1682
Natural Conjugate 15.1183 0.3786 866.6952 2.3961
Again focusing on LogCPI, the RMSE results are not different from that of the MAE as I
expected. The two loss functions do not differ very much as can be seen in their specification
in equation (6.2.1) and (6.2.2). Again we see that the Bayesian estimation method using the
natural conjugate prior outperforms the other methods. The ranking is the same as before. The
Bayesian estimation using the noninformative prior is again the worst predictor for inflation.
The same conclusions can be drawn for the other variables as before.
6.2.3 Diebold-Mariano test
From the two loss function we have covered, it is clear what how the different estimation
approaches perform when used to forecast. I am now interested to find out whether or not the
differences in the forecast quality between the different estimation methods are significantly large
enough. To find that out I will be performing the Diebold-Mariano test. The Diebold-Mariano
test has the null hypothesis of equal predictive accuracy. The DM statistic is given by:
DM =
d
ˆvar(d)
, (6.2.3)
with dt = |ˆyk
t − yt| − |ˆyl
t − yt| and ˆd = 1
6
6
t=1 dt. Here ˆyk and ˆyl are the forecast values of
the true value yt using estimation method k and l respectively. The DM statistic is approxi-
mately standard normally distributed. Since there are only 6 out-of-sample observations, I use
a significance level of 10%. In a two-sided test the null of equal predictive accuracy of two
methods is rejected if |DM| > 1.645. However, since I already expect the natural conjugate
prior to perform well, I also consider a one-sided test to test whether the natural conjugate prior
performs significantly better. In this case the null hypothesis (of equal or worse performance of
the natural conjugate prior) is rejected if DM > 1.282.
28
6 FORECASTING
Table 11: Results from the Diebold-Mariano test for LogCPI.
Contest |DM| Test outcome (2-sided test, 10% sign.) Test outcome (1-sided test, 10% sign.)
Cointegration vs No Cointegration 0.2989 Do not reject null -
Cointegration vs Non Informative 0.5783 Do not reject null -
Cointegration vs Minnesota 0.8619 Do not reject null -
Cointegration vs Natural Conjugate 1.6035 Do not reject null Reject null
No Cointegration vs Non Informative 0.4222 Do not reject null -
No Cointegration vs Minnesota 0.7673 Do not reject null -
No Cointegration vs Natural Conjugate 1.4904 Do not reject null Reject null
Non Informative vs Minnesota 1.4098 Do not reject null -
Non Informative vs Natural Conjugate 1.6965 Reject null Reject null
Minnesota vs Natural Conjugate 2.2302 Reject null Reject null
Table 11 displays the results from the Diebold-Mariano tests. We saw that the VECM model
taking cointegration into account performed better during forecasting than the model without
considering cointegration. The Diebold-Mariano (two-sided) test however did not reject the null
of equal predictive accuracy. This means that the improvement the cointegration model delivers
is significantly small. Comparing the cointegration model with the model where we estimated
the VECM using the Bayesian approach with the noninformative prior, again I have to conclude
that the predictive accuracy of both methods is not significantly large. The same conclusion
can be drawn from the Diebold-Mariano test concerning the cointegration estimation approach
compared to the Bayesian estimation approach using the Minnesota prior. The DM statistic of
the test between the cointegration estimation compared to the Bayesian estimation using the
natural conjugate prior (1.6035) is quite big compared to the previous ones. It is large enough
to reject the null hypothesis in a one-sided test with 10% significance.
The same conclusion can be drawn when comparing the forecast power of the VECM not
considering cointegration with all other estimation methods, except for the Bayesian approach
under the natural conjugate prior (one-sided test). We saw this before in the test considering
cointegration in the VECM. We know from the MAE and RMSE that the Bayesian estimation
method using the natural conjugate prior is the best in forecasting LogCPI. Given this fact and
the Diebold-Mariano test results up till now I would recommend using the Bayesian estimation
method using the natural conjugate prior instead of the classical estimation approaches.
The Diebold-Mariano test comparing the Bayesian approach using the noninformative prior
compared to the Minnesota prior and the natural conjugate prior resulted in rejecting the null
hypothesis. Also in the comparison of the Minnesota prior and the natural conjugate prior, the
natural conjugate prior is significantly better. To conclude, the natural conjugate prior is indeed
significantly better than the four alternative methods.
29
6 FORECASTING
6.3 20-step Ahead Forecasts
To obtain more results of the Diebold-Mariano tests I decided to also consider an alternative
out-of-sample period where I forecast twenty values of our monthly variables instead of six. To
be precise, I kept 20 monthly datapoints dating from October 2012 till May 2014 as out of
sample data. Furthermore I changed the amount of lags into 12. It seemed reasonable to use 12
lags due to the fact that we are handling monthly datapoints.
2012 2012.5 2013 2013.5 2014 2014.5
5.9
5.95
6
6.05
6.1
6.15
6.2
6.25
6.3
6.35
Coint
No Coint
Non Inf
Minnes
Nat Conj
Actual Data
Figure 6: Log CPI forecasts for h = 20.
Figure 6 shows the LogCPI datapoints from January 2012 till September 2012 attached to the
20 consecutive (forecast) datapoints. Again the Bayesian forecasts using the natural conjugate
prior seem to be the closest to the actual datapoints. Another eye striking observation is that the
Bayesian forecasts using the noninformative prior and the forecasts obtained without considering
cointegration seems to be the same.
Table 12: MAE values for h = 20.
LogCPI LogEx LogM2 LogIR
Cointegration 8.5722 0.1721 1157.3700 3.4503
No Cointegration 18.7909 0.1675 1239.8661 3.7385
Non Informative 18.7909 0.1675 1239.8661 3.7385
Minnesota 17.5946 0.1760 1441.5419 3.2820
Natural Conjugate 5.5489 0.2077 1230.8590 3.0093
Table 12 shows the MAE values. From this table we can observe that the Bayesian approach
using the natural conjugate prior is ranked number one in predicting LogCPI. This was also the
case in the previous section. However the forecasts incorporating cointegration dynamics in the
model is ranked second instead of third. Forecasting 20 datapoints proves that including this
long term dynamics is far more advantageous than excluding it. The Bayesian forecasts using
the Minnesota prior is now ranked third. Finally - as we already saw in Figure 6 - the Bayesian
forecasts using the noninformative prior and forecast without incorporating cointegration dy-
namics is the same according to the MAE values. These two estimation methods prove to be
the worst method to use in predicting inflation in Ghana.
30
6 FORECASTING
Table 13: RMSE values for h = 20.
LogCPI LogEx LogM2 LogIR
Cointegration 10.5658 0.1882 1483.8178 3.9400
No Cointegration 23.1909 0.1883 1591.9582 4.2978
Non Informative 23.1909 0.1883 1591.9582 4.2978
Minnesota 21.1469 0.1932 1735.1487 3.7787
Natural Conjugate 7.2561 0.2250 1509.1141 3.5212
Table 13 shows the RMSE values. The same conclusions (ranking) can be drawn from this
table as was drawn according to the MAE values, with respect to LogCPI.
Table 14: Results from the Diebold-Mariano test for LogCPI for h = 20.
Contest |DM| Test outcome (2-sided test, 5% sign.)
Cointegration vs No Cointegration 1.0956 Do not reject null
Cointegration vs Non Informative 1.0956 Do not reject null
Cointegration vs Minnesota 1.2435 Do not reject null
Cointegration vs Natural Conjugate 0.3035 Do not reject null
No Cointegration vs Non Informative 1.2918 Do not reject null
No Cointegration vs Minnesota 0.4369 Do not reject null
No Cointegration vs Natural Conjugate 0.8068 Do not reject null
Non Informative vs Minnesota 0.4369 Do not reject null
Non Informative vs Natural Conjugate 0.8068 Do not reject null
Minnesota vs Natural Conjugate 0.8048 Do not reject null
Since there are now 20 out-of-sample observations, I used a significance level of 5% for the
Diebold-Mariano test. In a two-sided test the null of equal predictive accuracy of two methods
is then rejected if |DM| > 1.96. The new Diebold-Mariano test results are displayed in table 14.
The tests were never able to reject the null hypothesis of equal predictive accuracy. Therefore
no strong conclusions can be drawn about the difference in predictive accuracy of the different
estimation methods. However it is interesting to note that the smallest DM statistic is obtained
when comparing the forecasts incorporating cointegration with the Bayesian forecasts using the
natural conjugate prior. This means that the predictive accuracy of these two methods are the
most similar compared to the other comparisons. This of course is not a surprise as we can
conclude that these two methods are best in forecasting inflation in Ghana. Furthermore the
test comparing the Bayesian forecasts using the noninformative prior and the model without
cointegration resulted in the largest DM statistic. Thus the predictive accuracy of these two
models are less similar compared to the other comparisons. This may seem a surprising result,
since the MAE and RMSE are almost the same for these methods. Indeed this implies that the
numerator of the test statistic is very small. However, the denominator of the test statistic is
also very small in this case, so that the ratio can still be relatively large.
To end, forecasting 20 steps ahead instead of 6 steps ahead proved to lead to interesting
conclusions. In both cases the Bayesian method using the natural conjugate prior leads to the
best forecasts. Big difference between the two is the fact that the 20-step ahead forecasts showed
that incorporating cointegration dynamics into the model does indeed prove to be valuable. This
shows that including long term dynamics in the model can lead to substantial improvements
when forecasting a larger forecast horizon.
31
7 CONCLUSION
7 Conclusion
The purpose of this study was to compare different estimation methods in predicting inflation
(CPI) in Ghana. After going through the literature it struck me that there is not a lot of work on
predicting inflation in Ghana. In this light I found it important to conduct this research. Inspired
by the work of Erasmus and Abdalla (2005) this study tried to dig deeper in the dynamics
of cointegration for the variables under consideration (CPI, Ex, M2 and IR). A Further step
was to implement Bayesian estimation procedures, with the goal of comparing the prediction
performance of these methods.
But the first step was to determine the order of integration of the variables. For this reason
the AIC criterion was used to determine the type of model and the amount of lags that best
fitted the variables. After obtaining this insight the ADF test - testing the null hypothesis of the
presence of a unit root - could be performed. According to the ADF tests the variables LogCPI,
LogEx, LogM2 and LogIR are integrated of order one.
Knowing this, a VECM was set up. This was needed to test for cointegration using the
Johansen approach. The number of lags most appropriate for the VECM proved to be 18,
according to the LR test. Considering a significance level of 1% the Johansen cointegration tests
showed that our time series are cointegrated, containing one cointegration vector. Additionally
the Granger causality test gave reassurance on the use of the variables under consideration for
forecasting.
When forecasting six periods ahead, the Bayesian estimation approach using the natural
conjugate prior seems to best predict the inflation dynamics in Ghana. The MAE and RMSE
values were all in favor of this approach. The second best approach is the Bayesian method
using the Minnesota prior. This confirms my intuition that the natural conjugate prior would
outperform the Minnesota prior. Reason for this is the fact that the Minnesota prior replaces
Σ an estimator and therefore does not take any uncertainty in the estimator into account. The
natural conjugate prior provides a full treatment of Σ This could be the reason why the Diebold-
Mariano test suggested that the predictive accuracy of an estimation using these priors is not
the same. The cointegration model is the third best inflation predictor. However the difference
between the performance of the cointegration model compared to the model without considering
cointegration is small. Thus the gain from considering the long term relationship in the VECM
- in this case - is small. The Bayesian estimation using the noninformative prior is the worst
predictor for inflation in Ghana. This did not come as a surprise since the noninformative prior
does not decrease the predictive standard deviations. Forecasting twenty values ahead proved
to lead to more reliable conclusions though. Again the Bayesian estimation approach using
the natural conjugate prior is best in predicting inflation dynamics in Ghana. Having a larger
forecast horizon lead to the pleasant surprise that incorporating cointegration dynamics in the
model is indeed valuable in predicting inflation dynamics in Ghana. This method is now ranked
second place. The Bayesian estimation method using the Minnesota prior now ranks third.
Not surprisingly the Bayesian approach using the noninformative prior and the model excluding
cointegration dynamics are both ranked last.
There are several areas where further research could be useful. The first is considering non-
normally distributed disturbances. This would not be a very odd idea since the kurtosis of
the data - see Appendix A - is large, which indicates the presence of fat tails. Furthermore
different models could be combined. For example we could use a model which incorporates a
cointegrating vector and use estimates of the other coefficients obtained from Bayesian estimation
using the natural conjugate prior to predict future movements. Another possible extension is
to use priors for which the posterior can not be evaluated analytically, so that we need to make
use of a numerical integration method. Thus numerical integration methods based on Monte
Carlo simulations like the Gibbs sampler and the Metropolis-Hasting sampler could be used. As
32
7 CONCLUSION
mentioned before, the Bayesian estimation of a VECM with a cointegration relation (using a
Markov chain Monte Carlo (MCMC) simulation method) for our data is an interesting topic for
further research, especially for longer forecast horizons.
33
REFERENCES
References
[1] Canova, Fabio (2007).
Chapter 10: Bayesian VARs.
2007, Book: Methods for Applied Macroeconomic Research.
[2] Ciccarelli, Matteo & Rebucci, Alessandro (2003).
Bayesian VARs: A Survey of The Recent Literature with an Application To The European
Monetary System.
2003, International Monetary Fund
[3] Dagher, Jihad & Kovanen, Arto (2011).
On the Stability of Money Demand in Ghana: A Bounds Testing Approach.
2011, International Monetary Fund
[4] Enders, Walter (2003).
Chapter 4: Models With Trend & Chapter 5: Multiequation Time-Series Models & Chapter
6: Cointegration And Error-Correction Models.
2003, Book: Applied Econometric Time Series, Second Edition
[5] Engle, Robert F. & Granger, Clive W.J. (1987).
Cointegration and Error-Correction: Representation, Estimation and Testing.
1987, Econometrica 55, 251-276
[6] Erasmus, Alnaa Samuel & Abdalla, Abdul-Mumuni (2005).
Predicting Inflation in Ghana: A Comparison of Cointegration and ARIMA Models.
2005, University of Skovde
[7] Havi, Emmanuel Dodzi K.& Enu, MPhil Patrick, MA & Opoku, C.D.K, MSc (2014).
Demand For Money And Long Run Stability In Ghana: Cointegration Approach.
2014, Department of Economics, Methodist University College Ghana
[8] Hendry, David F. & Juselius, Katarina (1999)
Explaining Cointegration Analysis: Part I.
1999, Nuffield College, Oxford - European University Institute, Florence
[9] Hendry, David F. & Juselius, Katarina (2000)
Explaining Cointegration Analysis: Part II.
2000, Nuffield College, Oxford - Department of Economics, University of Copenhagen
[10] Hjalmarsson, Erik & Österholm, Pär (2007)
Testing for Cointegration Using the Johansen Methodology when Variables are Near-
Integrated.
2007, International Monetary Fund
34
REFERENCES
[11] Johansen, Søren (1988)
Statistical analysis of cointegration vectors.
1988, Journal of Economic Dynamics and Control 12, 231-254
[12] Johansen, Søren & Juselius, Katarina (1990)
Maximum Likelihood Estimation And Inference On Cointegration - With Applications To
The Demand For Money.
1990, Oxford Bulletin Of Economics And Statistics
[13] Koop, Gary & Korobilis, Dimitris (2010).
Bayesian Multivariate Time Series Methods for Empirical Macroeconomics.
2010, University of Strathclyde
[14] Kovanen, Arto (2011).
Does Money Matter for Inflation in Ghana?.
2011, International Monetary Fund
[15] Kwiatkowski, Denis & Phillips, Peter C.B. & Schmidt, Peter & Shin, Yongcheol (1991).
Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root.
1991, Elsevier Science Publishers B.V.
[16] Ni, Shawn & Sun, Dongchu (2005).
Bayesian Estimates for Vector-Autoregressive Models.
2005, University of Missouri, Columbia
[17] Sims, Christopher A. (1980).
Macroeconomics and Reality.
Jan. 1980, Econometrica, Vol. 48, No. 1, 1-48
[18] Sjö, Bo (2008).
Testing for Unit Roots and Cointegration.
2008, Linköping University
35
B APPENDIX: SCATTERPLOTS OF THE DATA
A APPENDIX: Data Statistics
Table 15: Data Statistics.
DLogCPI DLogEx DLogM2 DLogIR
Mean 0.0152 0.0148 0.0241 -0.0017
Median 0.0146 0.0070 0.0206 0
Maximum 0.1033 0.6326 0.2151 0.1671
Minimum -0.0259 -0.5569 -0.2354 -0.2231
Std. Dev. 0.0159 0.0853 0.0394 0.0439
Skewness 1.0352 1.0088 -0.3175 -0.3578
Kurtosis 6.8216 35.0430 10.9880 10.3630
Observations 280 280 280 280
B APPENDIX: Scatterplots of the Data
1 2 3 4 5 6 7
−4
−3
−2
−1
0
1
Log CPI
LogEx
1 2 3 4 5 6 7
3
4
5
6
7
8
9
10
Log CPI
LogM2
1 2 3 4 5 6 7
2.5
3
3.5
4
Log CPI
LogIR
−4 −3 −2 −1 0 1
2
4
6
8
10
Log Ex
LogM2
−4 −3 −2 −1 0 1
2.5
3
3.5
4
Log Ex
LogIR
2 4 6 8 10
2.5
3
3.5
4
Log M2
LogIR
Figure 7: Scatterplots of the Data.
36
D APPENDIX: KPSS TEST RESULTS
C APPENDIX: Results from Cointegration Estimation
Table 16: The (normalized) cointegrating vector and speed of adjustment coefficients.
β α
LogCPI -1.0000 -0.0116
LogEx -120.4431 0.0001
LogM2 -0.0007 -1.4562
LogIR -2.7009 0.0067
Table 17: The Characteristic roots.
λ
0.1436
0.0462
0.0053
0.0205
D APPENDIX: KPSS Test Results
Table 18: The KPSS statistic values.
LogCPI LogEx LogM2 LogIR
H0 = I(0) and H1 = I(1)
With intercept 1.8450 1.8818 1.4315 1.3499
With intercept and trend 0.4909 0.2405 0.4289 0.1988
H0 = I(1) and H1 = I(2)
With intercept 1.9753 0.4504 2.1832 0.1053
With intercept and trend 0.0368 0.0600 0.3518 0.0800
Table 19: The critical values for the KPSS Test.
significance levels 10% 5% 1%
With intercept 0.347 0.463 0.739
With intercept and trend 0.119 0.146 0.216
37
Final version.
Any further copying or communication of this material should be done with
permission of Victor K. Amankwah or the Vrije Universiteit Amsterdam.
c 2015 Victor K. Amankwah (1818376). All rights reserved.

More Related Content

Viewers also liked

Organigrama institucional
Organigrama institucionalOrganigrama institucional
Organigrama institucional
albani silva
 
Prezentacja Oli
Prezentacja OliPrezentacja Oli
Prezentacja Oli
G2
 
Agua y p h [presentación]
Agua y p h [presentación]Agua y p h [presentación]
Agua y p h [presentación]
Armando del Río
 
Fa27assignment5
Fa27assignment5Fa27assignment5
Fa27assignment5
goldenj234
 
Intro to Social Media for Professionals
Intro to Social Media for ProfessionalsIntro to Social Media for Professionals
Intro to Social Media for Professionals
Natalia Quintero
 
Easy Dataweave transformations - Ashutosh
Easy Dataweave transformations - AshutoshEasy Dataweave transformations - Ashutosh
Easy Dataweave transformations - Ashutosh
StrawhatLuffy11
 
Presentation Magento OroCRM - MageConf 2014
Presentation Magento OroCRM - MageConf 2014Presentation Magento OroCRM - MageConf 2014
Presentation Magento OroCRM - MageConf 2014
Sylvain Rayé
 
General information oF the Kuwait international Boat show kibs-kw
General information oF the Kuwait international Boat show kibs-kwGeneral information oF the Kuwait international Boat show kibs-kw
General information oF the Kuwait international Boat show kibs-kw
John G. Hermanson
 
Patrycja Poland
Patrycja PolandPatrycja Poland
Patrycja Poland
G2
 
Is Our Faith A Fairy Tale
Is Our Faith A Fairy TaleIs Our Faith A Fairy Tale
Is Our Faith A Fairy Tale
Andrew Schmiedicke
 
Bristol EF End Technician
Bristol EF End TechnicianBristol EF End Technician
Bristol EF End Technician
Roger Martin
 
Independence&benefits
Independence&benefitsIndependence&benefits
Independence&benefits
xomxomxom
 
Juan león mera
Juan león meraJuan león mera
Juan león mera
Rothman Caluña
 
Молодое Имя Кубани Тлишев Харун Аминович
Молодое Имя Кубани Тлишев Харун АминовичМолодое Имя Кубани Тлишев Харун Аминович
Молодое Имя Кубани Тлишев Харун Аминович
POPOVA DIANA
 
Юровникова Антонина Николаевна
Юровникова Антонина НиколаевнаЮровникова Антонина Николаевна
Юровникова Антонина Николаевна
POPOVA DIANA
 
Kuwait Discover the opportunity
Kuwait   Discover the opportunityKuwait   Discover the opportunity
Kuwait Discover the opportunity
John G. Hermanson
 
Fa27 John Golden Assignment 2
Fa27 John Golden Assignment 2Fa27 John Golden Assignment 2
Fa27 John Golden Assignment 2
goldenj234
 
Flux Design - Portfolio Complete
Flux Design - Portfolio CompleteFlux Design - Portfolio Complete
Flux Design - Portfolio Complete
Yohanes Auri
 
anastasia
anastasiaanastasia
Aia rifiuti e reflui industriali 16.03.16
Aia rifiuti e reflui industriali 16.03.16Aia rifiuti e reflui industriali 16.03.16
Aia rifiuti e reflui industriali 16.03.16
Camillo Campioli
 

Viewers also liked (20)

Organigrama institucional
Organigrama institucionalOrganigrama institucional
Organigrama institucional
 
Prezentacja Oli
Prezentacja OliPrezentacja Oli
Prezentacja Oli
 
Agua y p h [presentación]
Agua y p h [presentación]Agua y p h [presentación]
Agua y p h [presentación]
 
Fa27assignment5
Fa27assignment5Fa27assignment5
Fa27assignment5
 
Intro to Social Media for Professionals
Intro to Social Media for ProfessionalsIntro to Social Media for Professionals
Intro to Social Media for Professionals
 
Easy Dataweave transformations - Ashutosh
Easy Dataweave transformations - AshutoshEasy Dataweave transformations - Ashutosh
Easy Dataweave transformations - Ashutosh
 
Presentation Magento OroCRM - MageConf 2014
Presentation Magento OroCRM - MageConf 2014Presentation Magento OroCRM - MageConf 2014
Presentation Magento OroCRM - MageConf 2014
 
General information oF the Kuwait international Boat show kibs-kw
General information oF the Kuwait international Boat show kibs-kwGeneral information oF the Kuwait international Boat show kibs-kw
General information oF the Kuwait international Boat show kibs-kw
 
Patrycja Poland
Patrycja PolandPatrycja Poland
Patrycja Poland
 
Is Our Faith A Fairy Tale
Is Our Faith A Fairy TaleIs Our Faith A Fairy Tale
Is Our Faith A Fairy Tale
 
Bristol EF End Technician
Bristol EF End TechnicianBristol EF End Technician
Bristol EF End Technician
 
Independence&benefits
Independence&benefitsIndependence&benefits
Independence&benefits
 
Juan león mera
Juan león meraJuan león mera
Juan león mera
 
Молодое Имя Кубани Тлишев Харун Аминович
Молодое Имя Кубани Тлишев Харун АминовичМолодое Имя Кубани Тлишев Харун Аминович
Молодое Имя Кубани Тлишев Харун Аминович
 
Юровникова Антонина Николаевна
Юровникова Антонина НиколаевнаЮровникова Антонина Николаевна
Юровникова Антонина Николаевна
 
Kuwait Discover the opportunity
Kuwait   Discover the opportunityKuwait   Discover the opportunity
Kuwait Discover the opportunity
 
Fa27 John Golden Assignment 2
Fa27 John Golden Assignment 2Fa27 John Golden Assignment 2
Fa27 John Golden Assignment 2
 
Flux Design - Portfolio Complete
Flux Design - Portfolio CompleteFlux Design - Portfolio Complete
Flux Design - Portfolio Complete
 
anastasia
anastasiaanastasia
anastasia
 
Aia rifiuti e reflui industriali 16.03.16
Aia rifiuti e reflui industriali 16.03.16Aia rifiuti e reflui industriali 16.03.16
Aia rifiuti e reflui industriali 16.03.16
 

Similar to Master Thesis

Seminar- Robust Regression Methods
Seminar- Robust Regression MethodsSeminar- Robust Regression Methods
Seminar- Robust Regression Methods
Sumon Sdb
 
Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...
Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...
Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...
CamWebby
 
Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)
Sardana Nazarova
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniques
Shantanu Deshpande
 
A systematic review_of_internet_banking
A systematic review_of_internet_bankingA systematic review_of_internet_banking
A systematic review_of_internet_banking
saali5984
 
Manual de tecnicas de bioestadística basica
Manual de tecnicas de bioestadística basica Manual de tecnicas de bioestadística basica
Manual de tecnicas de bioestadística basica
KristemKertzeif1
 
Thesis- Multibody Dynamics
Thesis- Multibody DynamicsThesis- Multibody Dynamics
Thesis- Multibody Dynamics
Guga Gugaratshan
 
eclampsia
eclampsiaeclampsia
eclampsia
Prabha Amandari
 
final_report_template
final_report_templatefinal_report_template
final_report_template
Panayiotis Charalampous
 
Booklet - GRA White Papers - Second Edition
Booklet - GRA White Papers - Second EditionBooklet - GRA White Papers - Second Edition
Booklet - GRA White Papers - Second Edition
Ziad Fares
 
Booklet_GRA_RISK MODELLING_Second Edition (002).compressed
Booklet_GRA_RISK  MODELLING_Second Edition (002).compressedBooklet_GRA_RISK  MODELLING_Second Edition (002).compressed
Booklet_GRA_RISK MODELLING_Second Edition (002).compressed
Genest Benoit
 
EvalInvStrats_web
EvalInvStrats_webEvalInvStrats_web
EvalInvStrats_web
Igor Vilček
 
my thesis
my thesismy thesis
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
KBHN KT
 
20150324 Strategic Vision for Cancer
20150324 Strategic Vision for Cancer20150324 Strategic Vision for Cancer
20150324 Strategic Vision for Cancer
Sally Rickard
 
An Introduction to Statistical Learning R Fourth Printing.pdf
An Introduction to Statistical Learning R Fourth Printing.pdfAn Introduction to Statistical Learning R Fourth Printing.pdf
An Introduction to Statistical Learning R Fourth Printing.pdf
DanielMondragon15
 
EC331_a2
EC331_a2EC331_a2
tkacik_final
tkacik_finaltkacik_final
tkacik_final
Marcel Tkacik
 
Bma
BmaBma
quantitative-risk-analysis
quantitative-risk-analysisquantitative-risk-analysis
quantitative-risk-analysis
Duong Duy Nguyen
 

Similar to Master Thesis (20)

Seminar- Robust Regression Methods
Seminar- Robust Regression MethodsSeminar- Robust Regression Methods
Seminar- Robust Regression Methods
 
Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...
Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...
Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...
 
Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniques
 
A systematic review_of_internet_banking
A systematic review_of_internet_bankingA systematic review_of_internet_banking
A systematic review_of_internet_banking
 
Manual de tecnicas de bioestadística basica
Manual de tecnicas de bioestadística basica Manual de tecnicas de bioestadística basica
Manual de tecnicas de bioestadística basica
 
Thesis- Multibody Dynamics
Thesis- Multibody DynamicsThesis- Multibody Dynamics
Thesis- Multibody Dynamics
 
eclampsia
eclampsiaeclampsia
eclampsia
 
final_report_template
final_report_templatefinal_report_template
final_report_template
 
Booklet - GRA White Papers - Second Edition
Booklet - GRA White Papers - Second EditionBooklet - GRA White Papers - Second Edition
Booklet - GRA White Papers - Second Edition
 
Booklet_GRA_RISK MODELLING_Second Edition (002).compressed
Booklet_GRA_RISK  MODELLING_Second Edition (002).compressedBooklet_GRA_RISK  MODELLING_Second Edition (002).compressed
Booklet_GRA_RISK MODELLING_Second Edition (002).compressed
 
EvalInvStrats_web
EvalInvStrats_webEvalInvStrats_web
EvalInvStrats_web
 
my thesis
my thesismy thesis
my thesis
 
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
 
20150324 Strategic Vision for Cancer
20150324 Strategic Vision for Cancer20150324 Strategic Vision for Cancer
20150324 Strategic Vision for Cancer
 
An Introduction to Statistical Learning R Fourth Printing.pdf
An Introduction to Statistical Learning R Fourth Printing.pdfAn Introduction to Statistical Learning R Fourth Printing.pdf
An Introduction to Statistical Learning R Fourth Printing.pdf
 
EC331_a2
EC331_a2EC331_a2
EC331_a2
 
tkacik_final
tkacik_finaltkacik_final
tkacik_final
 
Bma
BmaBma
Bma
 
quantitative-risk-analysis
quantitative-risk-analysisquantitative-risk-analysis
quantitative-risk-analysis
 

Master Thesis

  • 1. Faculteit der Economische Wetenschappen en Bedrijfskunde MSc. Econometrics Forecasting Inflation in Ghana: Bayesian and Frequentist Estimation in Cointegration models. Author Victor Amankwah Supervisor Dr. L.F. Hoogerheide June 2015
  • 2.
  • 3. Forecasting Inflation in Ghana: Bayesian and Frequentist Estimation in Cointegration models. Victor Amankwah∗ August 1, 2015 Abstract In this paper an in depth cointegration analysis of the inflation (CPI) in Ghana is consid- ered. The Johansen approach for testing for the presence of cointegration is used. Further- more Bayesian estimation of the VECM is performed. Here I considered prior distributions for which the properties of interest of the posterior distribution can be analytically computed: the natural conjugate prior, the noninformative prior and the Minnesota prior. The goal of this thesis is to detect which estimation method is the best method to use for forecasting inflation in Ghana. Thus I compare the estimation of the VECM under the assumption of cointegration in the variables, the VECM without the assumption of cointegration, the Bayesian estimation using the natural conjugate prior, the Bayesian estimation using the noninformative prior and the Bayesian estimation using the Minnesota prior. The Bayesian estimation using the natural conjugate prior proved to be the best estimation method when predicting inflation in Ghana. Keywords: Cointegration, VECM, Bayesian estimation, Ghana, Inflation. 1 Master student in Econometrics at the Vrije Universiteit Amsterdam. e-mail: victoramank@live.nl tel: +31 615 692 793. iii
  • 4. Acknowledgements I would like to thank dr. Lennart Hoogerheide for supervising and guiding me throughout this thesis. It is not often a student approaches him in writing a thesis covering a developing country. Though critical, he never broke my spirit and was always encouraging. I also want to thank him for the great conversations that always started with an econometric subject but could diverge into all kinds of areas, whether it is politics, philosophy or even soccer. Furthermore I want to thank dr. Francisco Blasques for lending me his book - Applied Econometric Time Series - for a tremendous amount of time. And last I also want to thank the Bank of Ghana for making the data I needed available. After searching for the data for a very long time I was quite relieved when I found it on their website. iv
  • 5. CONTENTS Contents 1 Introduction 3 2 Inflation in Ghana 5 3 Data Analysis 7 3.1 The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Time Series Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Augmented Dickey-Fuller Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.1 Lag Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.2 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.3 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4 KPSS Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 Testing for Cointegration 14 4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.1.1 Vector Error Correction Model (VECM) . . . . . . . . . . . . . . . . . . . 14 4.1.2 Johansen Test Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1.3 Calculating the characteristic roots . . . . . . . . . . . . . . . . . . . . . . 16 4.2 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4 Granger Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5 Bayesian Estimation 21 5.1 Basic Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2 Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.3 Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.3.1 Natural Conjugate Priors and The Noninformative Priors . . . . . . . . . 24 5.3.2 The Minnesota Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6 Forecasting 26 6.1 Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.2 Predictive Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.2.1 MAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.2.2 RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6.2.3 Diebold-Mariano test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6.3 20-step Ahead Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 7 Conclusion 32 A APPENDIX: Data Statistics 36 B APPENDIX: Scatterplots of the Data 36 C APPENDIX: Results from Cointegration Estimation 37 D APPENDIX: KPSS Test Results 37 1
  • 6.
  • 7. 1 Introduction All central banks share a common goal i.e. to preserve economic and financial stability and growth. Many share my opinion that monetary policy is the strongest tool central banks have to achieve this goal. Through the regulation of the money supply - which is basically the objective of monetary policies - interest rates are affected which of course can be an effective way to control inflation. When implemented in monetary policies money is only useful if it contains information that can be used in forecasting future changes in the price level. One would then expect the monetary policy authorities to design a policy targeting money aggregates. Ghana chooses an inflation targeting arrangement. The countryâs economy is currently facing some challenges as the inflation rate has been increasing rapidly since the beginning of the year 2013 and has been persistent throughout the year 2014. The Bank of Ghana has tried to fight this unpleasant event by recently raising its policy rate. For monetary policy to have any impact here I believe it is of extreme importance to have a reliable model of the inflation dynamics. Once these dynamics are clear researchers can perform forecasts of future inflation movements. After going through the literature, I have to conclude that there is a significant amount of research done on inflation dynamics in Ghana. One worth mentioning is the work of Jihad Dagher and Arto Kovanen (2011), where they examined the long term demand for money in Ghana, by utilizing an alternative to the Johansen procedure to estimate the long run demand for money, a procedure elaborated in Pesaran et al. (2001). Arto Kovanen (2011) went so far to see if money was a valuable predictor for inflation. What I did not encounter often in the literature is the comparison of different models in predicting inflation in Ghana, which I believe could be of worth to the monetary policy in Ghana. Erasmus and Abdalla (2005) conducted a great research, where they compared cointegration and ARIMA models in predicting inflation in Ghana. Inspired by their work I decided to conduct a research of my own, benefiting from the more recent datasets and using more advanced econometric methods. The objective of this thesis is to compare different models in predicting inflation (in the form of the consumer price index (CPI)) in Ghana. Like Erasmus and Abdalla (2005) I will examine the performance of a cointegration model in forecasting inflation in Ghana. As an econometrician I will go deeper in the dynamics of cointegration compared to Erasmus and Abdalla (2005). Cointegration models have become quite popular in studies concerning macro-economic data. It has proven to be reliable to capture the long run dynamics of variables. It is this long run information that could help during the forecasting process. But this long run relationship in the variables under consideration is not always found. Luckily Johansen (1988, 1991) outlined a procedure that can help us determine if there is a long run relationship in the variables or not. This procedure necessitates us to work with multivariate models (vector autoregression). At the same time I believe that such large models can be much more beneficial compared to univariate models when the right variables are chosen. Thus in order to obtain reliable forecasts for inflation we need to use variables in our analysis that indeed have a significant influence on the inflation dynamics and/or vice versa. I will examine whether including this long run relationship in the model is more beneficial in predicting inflation or not. The estimation of these large models involve a large number of parameters. As a consequence forecasts can become imprecise using classical estimation methods, for example when classical methods are used to estimate cointegration models. For this reason I will also conduct a Bayesian analysis as a comparison. The Bayesian approach assumes that the parameters are unknown and are described by a probability distribution. The question is if this assumption can lead to more precise forecasts. Additionally this approach allows for the incorporation of prior information in the form of a prior distribution into the estimation process. I will examine the performance of the natural conjugate prior, the noninformative prior and the Minnesota prior in predicting inflation in Ghana. 3
  • 8. 1 INTRODUCTION The structure of this thesis is as follows. In the next section I will discuss economic devel- opments concerning inflation in Ghana. In the third section I will describe and investigate the data used for this study. Section four consist of deep cointegration analysis. In section five I will elaborate on the Bayesian estimation procedure and the different prior distributions. Finally, in section six I will compare the performance of these methods when forecasting inflation. To end, conclusions of the research are drawn in section seven. 4
  • 9. 2 Inflation in Ghana For the last three decades Ghanaian authorities have been steering Ghana’s economy in what seems to be the right direction. Figure 1 shows this growth in terms of GDP. GDP has risen from approximately 5 billion US dollars in 1980 to almost 48 billion dollars in 2013. In the early 1980’s the Ghanaian authorities understood that further economic growth was only possible if strong financial markets were developed in the country. They took important steps to achieve this goal. The first step was to liberalize the financial sector, exchange and credit controls. So government intervention was lessened, which allowed prices and the exchange rate to be determined through market forces. 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 0 5 10 15 20 25 30 35 40 45 50 Figure 1: Ghana’s GDP (in billions of dollars) from 1960 till 2013. In the early 1990’s the Bank of Ghana moved to an indirect control of liquidity by initiating open market operations, which made it possible for the central bank to issue treasury and central bank bills to regulate the money supply. Many reforms in the financial system followed, which led - among others - to renewed banking laws and a shift to a floating exchange rate construction. As a result of these changes the 2002 Bank of Ghana Act was brought to life. This led to the strengthening of the independence of the central bank. During this period of financial transformation the Bank of Ghana upheld a money target- ing regime. Under this regime, the Bank of Ghana used its instruments to control monetary aggregates, which were considered the main determinants of inflation in the long run. Thus steering monetary aggregates would be equivalent to stabilizing the inflation rate around the target value. However the ability of monetary aggregates to function effectively as intermediate targets is strongly based on the stability of their relationship to the goal variable, which is the inflation rate. There were and still are different opinions on whether a money targeting regime is effective on the long run or not. With the establishment of the monetary policy committee (MPC) in the year 2002, price stability became the primary monetary policy objective. This new course pushed growth and exchange rate stability to the background. The MPC has full responsibility over the monetary policy in the country. In their quest in achieving stable prices the Bank of Ghana analyzed a broader set of variables and developed the institutional structures needed to implement an inflation targeting policy. A inflation targeting policy is a monetary policy in which the central bank tries to steer the inflation rate towards a beforehand estimated target rate by making use of 5
  • 10. 2 INFLATION IN GHANA different monetary policy tools. On May 2007 the Bank of Ghana formally adopted an inflation targeting policy. However, this new policy did not prove to be as satisfactory as was hoped. When inflation targeting was initiated in 2007 inflation was approximately at a 10% level, but increased up to 20% around early 2009 (see Figure 2). 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 0 10 20 30 40 50 60 70 80 Figure 2: Ghana’s annual Inflation (%) from 1990 till 2014. As of now authorities are still having their hands full with controlling the ongoing rise of inflation in Ghana. Figure 2 shows how the inflation rate has been increasing rapidly since the beginning of the year 2013 and has been persistent ever since. The drawback in the Ghanaian economy can - according to the World Bank - be blamed for the most part on the large fiscal deficit (expenditures exceeding revenues). Further causes are the continued decrease of inter- national commodity prices of gold, cocoa and oil that account for more than 75% of Ghana’s exports. The Bank of Ghana has tried to fight this unpleasant event by recently raising its policy rate. To further tackle this hardship the Ghanaian government recently reached an agreement with the IMF concerning a new three-year funding deal. Under the terms of the agreement, inflation should fall to approximately 12% by the end of 2015. Taking this and other factors in consideration makes Ghana’s growth prospects positive in the long run. 6
  • 11. 3 Data Analysis 3.1 The Data It is known to be difficult to acquire financial or macroeconomic data from developing coun- tries. After intensive search I was able to find the needed data for this thesis on the website of the Bank of Ghana. Not all desired data were found as monthly observations of the GDP of Ghana are nowhere to be found. Erasmus and Abdalla (2005) had the same problem, but could still carry out their research. The rest of the data were fortunately found. The data used in this thesis contain monthly observations from July 1990 till November 2013 of the Consumer Price Index (CPI), Exchange Rate (Ex), which are the monthly averages of the inter-bank Exchange Rates GHC/USD, Interest rate (IR) and money supply (M2). Making it a total of 281 in-sample datapoints1. Six data points are stored as out-of-sample data, which I will be forecasting later on. This contains data of the previously stated macroeconomic indicators from December 2013 till May 2014. I chose to start from 1990 because data from preceding years are not very credible. As explained in the previous section, financial markets in Ghana were extremely regulated by the authorities in those periods. At first some adjustments needed to be done to the data, because the series were not always consistent or logical. A clear case of inconsistency was found in the CPI data which were not adjusted for the changing base year, from July 2013 and ongoing. A correction was needed here, otherwise our results would be unreliable. Another case worth mentioning is that some values of the M2 series were incorrectly multiplied by 10000. After I made these corrections, the data were ready to be used. 3.2 Time Series Properties 1990 1994 1998 2002 2006 2010 2013 1 2 3 4 5 6 7 Log CPI 1990 1994 1998 2002 2006 2010 2013 −4 −3 −2 −1 0 1 Log Ex 1990 1994 1998 2002 2006 2010 2013 3 4 5 6 7 8 9 10 Log M2 1990 1994 1998 2002 2006 2010 2013 2.5 3 3.5 4 Log IR Figure 3: Variables in log levels. 1 See Appendix A. 7
  • 12. 3 DATA ANALYSIS It is observable from Figure 3 that the Log CPI, log Ex and log M2 contain a trend. Log Ex seems to be the most volatile series of these 3, reflecting the possible presence of a stochastic trend. The graph of log M2 on the other hand nearly resembles a straight line, reflecting the possible presence of a deterministic trend in this variable; however, this series may also contain both a deterministic and a stochastic trend. In any case all three time series contain a upward trend. Thus one may expect a positive relationship between each two of these three variables, if any such relationship exists. One may get the same expectation from the scatter-plots displayed in Appendix B. The plot of log IR in Figure 3 seems to have one or more structural breaks. In contrast to the other three variables, it is not at once clear if log IR has a trend or not. But it is clearly not moving around one mean, indicating that the series as a whole is not stationary. When comparing Log IR to the other three variables there is no clear relationship to be seen as is portrayed in Appendix B. The presence of a trend clearly indicates that we are dealing with non stationary time series. Thus regressing these variables on each other could exhibit a spurious relationship except if a cointegrating relationship can be found. 1990 1994 1998 2002 2006 2010 2013 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 1990 1994 1998 2002 2006 2010 2013 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1990 1994 1998 2002 2006 2010 2013 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 1990 1994 1998 2002 2006 2010 2013 −0.3 −0.2 −0.1 0 0.1 0.2 DLog CPI DLog Ex DLog M2 DLog IR Figure 4: Variables in log levels and first differences. The first differences of our variables show an expected result, namely that these macroe- conomic variables are stationary after differencing. The graph of DLog Ex in Figure 4 is the most convincing one. Dlog Ex clearly seems to wander around zero, indicating a constant mean. Though less convincing, this can also be said of DLog M2 and DLog IR. Dlog CPI also seems to be stationary but has the most ’non-standard’ distribution with many observations equal to zero compared to the other differenced series. As mentioned before I suspect the presence of one or more structural changes in Log IR. Testing this variable for the presence of an unit root using Dickey-Fuller can lead to unreliable conclusions. Reason for this, is the fact that the Dickey Fuller test statistics are biased towards the non-rejection of the null of a unit root when structural breaks are present. So the Dickey Fuller test often does not reject the null of a unit root while the series with structural breaks may be a stationary process between the moments when the structural breaks take place. However one can observe from Figure 4 that there may be no structural changes in Dlog IR, which is the variable I am going to compute the Augmented Dickey-Fuller test with. 8
  • 13. 3 DATA ANALYSIS 3.3 Augmented Dickey-Fuller Tests Although Figure 3 and Figure 4 suggest that the series are I(1), we need a formal test to determine the order of integration of our time series. Hence I will be performing the Augmented Dickey-Fuller test (ADF). To get an idea of what this unit root test is about let us consider the model yt = α1yt−1 +εt with t = 1, 2, . . . , n, which is the most basic model to consider for a ADF test. After subtracting yt−1 from both sides we end up with ∆yt = γyt−1 +εt with γ = α1 −1. In order to test the null hypothesis that an unit-root is present, we can test the hypothesis γ = 0, which is obviously equivalent to α1 = 1. I used the most basic model to explain the essence of unit-root tests. However, for the ADF test I will be considering the following less basic regression equations: ∆yt = γyt−1 + p i=2 βi∆yt−i+1 + εt. (3.3.1) ∆yt = α0 + γyt−1 + p i=2 βi∆yt−i+1 + εt. (3.3.2) ∆yt = α0 + γyt−1 + α2t + p i=2 βi∆yt−i+1 + εt. (3.3.3) The first equation neither contains an intercept nor a linear time trend. The second equation contains an intercept and the last equation - which is the unrestricted ADF test - contains both an intercept and a linear time trend. Using the wrong regression equation, will lead to bias in the estimates of the parameters, which in return will decrease the reliability and the power of the unit root test. This is the main reason I chose to investigate all three possible models. Before doing so we need to know the proper amount of lags needed for these regression models. 3.3.1 Lag Selection To determine the most suitable lag length for the models we will be inspecting the Akaike information criterion (AIC) and the Bayesian (Schwarz) information criterion (BIC), which are defined as follows (under the assumption that the error terms εt are normally distributed with constant variance): AIC(p) = log(ˆσ2 p) + 2p n and BIC(p) = log(ˆσ2 p) + p log(n) n , where ˆσ2 p is the maximum likelihood estimator of the variance of the error term in the model with p parameters. Once we have obtained these values the next procedure is to choose the model with the smallest AIC or BIC. Compared to the AIC, the BIC seems more strict due to its larger penalty term (the second term on the right hand side) when n > e2, which holds for n ≥ 8. This is of course the case, because we have n = 281. As a consequence the BIC prefers electing a smaller amount of lags and thus chooses a smaller model than the AIC. The reasoning behind this is that with AIC we assume that none of the models we are comparing actually is the true model, but instead the AIC method chooses the model that comes closest to the true model (in terms of the Kullback-Leibler divergence measure). Due to the complexity of the ’real 9
  • 14. 3 DATA ANALYSIS world’ the AIC is flexible in adding more explanatory variables. The BIC on the other hand assumes that one of the models we are comparing is the true model, thus answering a different question than AIC does. Because the true Data Generating Process (DGP) of my variables is unknown it may seem appropriate to choose the AIC as the decision criterion. However, we also report the values of the BIC. Due to the limited number of observations, the maximum number of lags is chosen to be 10. Table 1: The AIC and BIC values for LogCPI, LogEx, LogM2 and LogIR in the model with neither an intercept nor linear trend. LogCPI LogEx LogM2 LogIR Lags AIC BIC AIC BIC AIC BIC AIC BIC 0 -8.1004 -8.0874 -4.9215 -4.9085 -6.4174 -6.4044 -6.2484 -6.2354 1 -8.6849 -8.6589 -5.2537 -5.2276 -6.4148 -6.3888 -6.2489 -6.2228 2 -8.6881 -8.6490 -5.2435 -5.2044 -6.4229 -6.3838 -6.3039 -6.2648 3 -8.6793 -8.6270 -5.2534 -5.2011 -6.4179 -6.3655 -6.3222 -6.2698 4 -8.6864 -8.6208 -5.2452 -5.1797 -6.4220 -6.3564 -6.3585 -6.2929 5 -8.6851 -8.6062 -5.2401 -5.1612 -6.4171 -6.3382 -6.3475 -6.2686 6 -8.6742 -8.5819 -5.2383 -5.1460 -6.4063 -6.3140 -6.3734 -6.2811 7 -8.6679 -8.5621 -5.2351 -5.1293 -6.3954 -6.2896 -6.3640 -6.2583 8 -8.6859 -8.5666 -5.2259 -5.1066 -6.3911 -6.2718 -6.3564 -6.2371 9 -8.6983 -8.5653 -5.2212 -5.0883 -6.3913 -6.2584 -6.4056 -6.2727 10 -8.7598 -8.6132 -5.2112 -5.0646 -6.3834 -6.2368 -6.4171 -6.2705 Table 1 shows the results for the most restricted ADF model (3.3.1). According to the AIC values lag 10, 1, 2, and 10 2 are to be chosen for the ADF test for LogCPI, LogEx, LogM2 and LogIR respectively. Comparing these AIC results with the BIC counterpart (lag 1, 1, 0 and 4 for LogCPI, LogEx, LogM2 and LogIR respectively) I can conclude that the AIC and BIC methods both choose a larger amount of lags for the variable LogIR compared to the other variables and the same amount of lags for LogEx. It is also interesting to note that the BIC always chooses a smaller or equal amount of lags compared to the AIC, which I expected. Table 2: The AIC and BIC values for LogCPI, LogEx, LogM2 and LogIR in the model with only an intercept. LogCPI LogEx LogM2 LogIR Lags AIC BIC AIC BIC AIC BIC AIC BIC 0 -8.3272 -8.3013 -4.9218 -4.8959 -6.4567 -6.4307 -6.2425 -6.2165 1 -8.7407 -8.7017 -5.2801 -5.2411 -6.4647 -6.4256 -6.2435 -6.2045 2 -8.7348 -8.6826 -5.2709 -5.2187 -6.4660 -6.4138 -6.3008 -6.2486 3 -8.7331 -8.6677 -5.2723 -5.2069 -6.4688 -6.4034 -6.3214 -6.2560 4 -8.7528 -8.6741 -5.2622 -5.1835 -6.4885 -6.4098 -6.3594 -6.2807 5 -8.7438 -8.6517 -5.2540 -5.1619 -6.4787 -6.3867 -6.3483 -6.2562 6 -8.7374 -8.6319 -5.2486 -5.1431 -6.4751 -6.3696 -6.3746 -6.2692 7 -8.7265 -8.6075 -5.2437 -5.1247 -6.4732 -6.3542 -6.3649 -6.2459 8 -8.7294 -8.5968 -5.2343 -5.1017 -6.4947 -6.3622 -6.3566 -6.2241 9 -8.7318 -8.5856 -5.2317 -5.0855 -6.5397 -6.3935 -6.4021 -6.2559 10 -8.7824 -8.6225 -5.2215 -5.0616 -6.5461 -6.3862 -6.4121 -6.2521 Having the AIC as our decision criterion, I can conclude from table 2 that lags 10, 1, 10, and 10 are to be chosen for the ADF test for LogCPI, LogEx, LogM2 and LogIR respectively. The 2 Lag 0 means the model with no differenced lag term, lag 1 has 1 differenced lag term and so on. 10
  • 15. 3 DATA ANALYSIS results from the BIC on the other hand did not change after a constant term was added (3.3.2). Again according to the BIC I should choose lag 1, 1, 0 and 4 for LogCPI, LogEx, LogM2 and LogIR respectively. It seems that after adding the constant we need more regressors to explain ∆yt than was first the case, according to the AIC. Table 3: The AIC and BIC values for LogCPI, LogEx, LogM2 and LogIR in the model with an intercept and a linear trend. LogCPI LogEx LogM2 LogIR Lags AIC BIC AIC BIC AIC BIC AIC BIC 0 -8.3284 -8.2894 -4.9228 -4.8839 -6.4909 -6.4520 -6.2489 -6.2099 1 -8.7336 -8.6815 -5.2733 -5.2213 -6.4957 -6.4436 -6.2509 -6.1989 2 -8.7278 -8.6626 -5.2640 -5.1987 -6.4952 -6.4300 -6.3122 -6.2469 3 -8.7259 -8.6474 -5.2660 -5.1875 -6.4898 -6.4113 -6.3376 -6.2591 4 -8.7457 -8.6539 -5.2561 -5.1642 -6.4993 -6.4075 -6.3694 -6.2776 5 -8.7361 -8.6309 -5.2473 -5.1421 -6.4897 -6.3845 -6.3589 -6.2537 6 -8.7516 -8.6329 -5.2445 -5.1259 -6.4825 -6.3638 -6.3793 -6.2607 7 -8.7226 -8.5904 -5.2300 -5.0978 -6.4782 -6.3460 -6.3704 -6.2381 8 -8.6865 -8.5407 -5.2498 -5.1040 -6.4937 -6.3479 -6.3600 -6.2142 9 -8.6159 -8.4564 -5.2100 -5.0505 -6.5477 -6.3882 -6.4063 -6.2468 10 -8.6951 -8.5218 -5.1887 -5.0154 -6.4876 -6.3143 -6.4099 -6.2366 Now when looking at model (3.3.3), the AIC chooses lag 6, 1, 9, and 10 for the ADF test for LogCPI, LogEx, LogM2 and LogIR respectively. Again the decision according to the BIC remains the same. This means that the optimal number of lags according to the BIC is invariant to the model we use. 3.3.2 Model selection Now that we have determined the proper amount of lags, we need to select the most likely model out of the three models. For this purpose I will be comparing the AIC values once more. So for each variable the model I am going to use for the ADF test is the model with the smallest AIC value. Table 4: The AIC value and its designated amount of lags in parentheses. LogCPI LogEx LogM2 LogIR No constant and no trend -8.7598 (10) -5.2537 (1) -6.4229 (2) -6.4171 (10) With constant -8.7824 (10) -5.2801 (1) -6.5461 (10) -6.4121 (10) With constant and trend -8.7516 (6) -5.2733 (1) -6.5477 (9) -6.4099 (10) Table 4 is basically a summary of the results I obtained in the previous section. This framework makes it easier to see which model I should select. For LogCPI model (3.3.2) with 10 lags (p = 11) should be chosen. Model (3.3.2) should also be chosen for LogEx but with 1 lag (p = 2). Model (3.3.3) should be chosen for LogM2 with 9 lags (p = 10). Finally model (3.3.1) should be used for the ADF test for LogIR with 10 lags (p = 11). 11
  • 16. 3 DATA ANALYSIS 3.3.3 Test Results Table 5: ADF test results with amount of lags in parentheses. LogCPI LogEx LogM2 LogIR H0 = I(1) and H1 = I(0) DF critical values (5% significance) -2.88 -2.88 -3.43 -1.95 t-statistic -2.0124 -2.2888 -1.3734 -0.0598 ˆγ -0.0013 (10) -0.0077 (1) -0.0418 (9) -0.0007 (10) H0 = I(2) and H1 = I(1) DF critical values (5% significance) -2.88 -2.88 -3.43 -1.95 t-statistic -3.0189 -30.9775 -10.9548 -2.0401 ˆγ -0.2508 (9) -1.5497 (0) -1.4259(3) -0.2307 (9) The results from the ADF test are depicted in table 5. The upper half of table 5 shows results from the test of the time series LogCPI, LogEx, LogM2 and LogIR being stationary or not. To be precise, I tested the null of the presence of a unit root (H0 = I(1)) compared to the alternative of the absence of a unit root (H1 = I(0)) in our series. I used the AIC criterion to choose which model to use for the ADF tests as mentioned before. Interesting to note is that apart from the amount of observations we have and the confidence interval we are considering, the critical values from the ADF test differ between the types of model we choose. As can be seen from the upper half of table 5 the absolute values of all t-statistics are below its corresponding critical value. Therefore we cannot reject the null hypothesis of a unit root in any of the time series. To determine the order of differentiation needed for the series to be stationary, I also con- ducted a second test depicted in the lower half of table 5. So I tested the null of the series being I(2) versus the alternative of the series being I(1) (which we could not reject in the previous test). So I am basically testing whether the series ∆LogCPI, ∆LogEx, ∆LogM2 and ∆LogIR are stationary or not. Here I used the same models as with the first test3 but different lags (shown in parentheses) according to the smallest AIC values in these new models. As can be seen from the lower half of table 5 the absolute values of all t-statistics are well above its corresponding critical value. Thus we must reject the null hypothesis of the time series being I(2). According to the ADF tests I therefore conclude that my time series LogCPI, LogEx, LogM2 and LogIR are I(1). 3.4 KPSS Test To be more certain about the previous conclusion of the variables being stationary I con- ducted the KPSS (Kwiatkowski-Phillips-Schmidt-Shin) test. This test can be seen as the com- plement of the previous unit root test. The KPSS test examines the null hypothesis of the variables being stationary instead of not being stationary, which was the case with the ADF test. The results of these tests are depicted in Appendix D. Looking at the case where we test the null of stationarity and including an intercept it is clear that the null is rejected for all our variables4. Now focusing on the case where an intercept and trend is included in the model, the null could not be rejected for LogIR when considering the 1% significance level5. For all 3 According to the same procedure done for LogCPI, LogEx, LogM2 and LogIR, the optimal models from the AIC values remained the same. 4 The critical values for the 10%, 5% and 1% significance levels are 0.347, 0.463 and 0.739 respectively (only intercept included). 5 The critical values for the 10%, 5% and 1% significance levels are 0.119, 0.146 and 0.216 respectively (intercept and trend included). 12
  • 17. 3 DATA ANALYSIS other significance levels the null hypothesis of the variables being stationary is rejected for all the variables. According to this first test I am able to conclude that the variables are not sta- tionary. The second test which is also depicted in Appendix D (second half of the table) tests the null hypothesis of the variables being I(1). Here I expected the critical values to be small because then we would not be able to reject the null of the variables being integrated of order one. This would strengthen the credibility of the obtained results from the ADF test. However the results are not entirely satisfactory. For LogCPI the null is rejected in the case an intercept is included but not rejected when we also include a trend. This contradicts the previous result that the model including only an intercept is the most appropriate model for LogCPI, which was concluded in the previous section. The null is clearly not rejected for LogEx when the model with intercept and trend is considered. The null hypothesis is rejected for LogM2 for both model types. Finally the null is never rejected for LogIR. Although the null hypothesis of the variables being I(1) was rejected for LogM2 I still argue that it makes sense to assume that the variables are I(1). This is due to the fact that I was very precise in finding the most appropriate model for the variables in the previous section where I computed the ADF test, and due to the fact that at 5% significance we have a probability of 5% that a true null hypothesis is rejected, so that (especially since we perform many tests) we may expect to make some Type 1 errors. 13
  • 18. 4 Testing for Cointegration There are several ways to test our variables for the existence of a cointegration relationship. In the literature two methods are frequently discussed, namely the Johansen methodology and the Engle and Granger methodology. The Engle and Granger method basically performs a regression of one series on the other series (and a constant) and then test the residuals for stationarity. This methodology however has some defects I find very disturbing. First of all it necessitates us to put one variable on the left-hand side and use the other variables as regressors. It does not matter asymptotically which variable we place on the left or right-hand side, as long as all variables are I(1) and all variables occur in the cointegration relationship. Unfortunately it is not common to have a sample size in practice large enough that has this asymptotic property. This means that the role of a variable - whether it is a dependent variable or a regressor - has influence on the values of the cointegrating vector and therefore also on the cointegration test. This is of course a problem when we have three or more variables, as is the case in my research. Another drawback from the Engle and Granger approach is that it relies on the two-step estimator. To clarify, consider the equation yt = β0 + β1zt + β2wt + εt, with yt, zt and wt all I(1) variables. According to the Engle and Granger approach the linear combination of integrated variables yt − β0 − β1zt − β2wt is stationary if εt is stationary. This would mean that there is a cointegration relationship amongst the variables with cointegrating vector β = (1, −β0, −β1, −β2). To ascertain if εt is indeed I(0) the ADF test is performed using equation ∆ˆet = α1ˆet−1 + . . . . Thus to obtain an estimate of the coefficient α1 we need to estimate a regression which uses residuals from another regression. Problem here is that errors made in the first regression are passed on to the next regression. Although it was at first my intention to perform the Engle and Granger test, I decided not to pursue this due to the difficulties mentioned. Instead I decided to perform the Johansen approach for cointegration testing, which is a bit more complicated compared to the Engle and Granger method but much more effective here. 4.1 Methodology 4.1.1 Vector Error Correction Model (VECM) To illustrate this approach let us consider the VAR(p) model: yt = A0 + A1yt−1 + A2yt−2 + · · · + Apyt−p + εt t = 1, 2, . . . , T, (4.1.1) with yt a (n × 1) vector containing n I(1) variables observed at time t, A0 a (n × 1) vector of intercept terms, Ai a (n × n) matrix of coefficients with i = 1, 2, . . . , p and εt a (n × 1) vector of error terms at time t. The error terms are independently, identically and normally distributed with variance-covariance matrix Σ, εt ∼ IID(0, Σ). Equation (4.1.1) can be re-written by adding and subtracting Apyt−p+1(= Apyt−(p−1)) from the right-hand side to obtain: yt = A0 + A1yt−1 + A2yt−2 + · · · + Ap−2yt−p+2 + (Ap−1 + Ap)yt−p+1 − Ap∆yt−p+1 + εt and after adding and subtracting (Ap−1 + Ap)yt−p+2 we obtain yt = A0 + A1yt−1 + A2yt−2 + · · · − (Ap−1 + Ap)∆yt−p+2 − Ap∆yt−p+1 + εt. This procedure can be repeated to obtain the Vector Error-Correction Model (VECM): ∆yt = A0 + πyt−1 + p−1 i=1 πi∆yt−i + εt, (4.1.2) 14
  • 19. 4 TESTING FOR COINTEGRATION with π = −(I − p i=1 Ai) and πi = − p j=i+1 Aj. The left hand side of (4.1.2) consists of stationary variables. On the right hand side I have a constant, the πyt−1 term and the rest are all stationary variables (including the error term). Thus the term πyt−1 must also be stationary for the variables to be cointegrated, implying a long-run equilibrium relationship amongst the variables. Because yt−1 is non-stationary (I(1)), the stationarity of the term πyt−1 depends solely on the matrix of coefficients π. This matrix contains cointegrating vectors as columns if indeed a long-run equilibrium exists amongst the time series. The number of cointegrating vectors is equal to the rank of matrix π. This rank can take 3 different forms. In the first extreme case where the rank of the coefficient matrix is equal to zero - meaning that there are no linearly independent columns in π - we have that each element of the coefficient matrix must be equal to zero. Analogous to the univariate situation (γ = 0 in previous section) this means that all yit sequences are non-stationary and that no linear combination of the yit processes can be found that is stationary. This results in the variables not being cointegrated. In the second extreme case where the rank is equal to n, we have that the matrix of coefficients is of full rank. This is only the case when all yit sequences are stationary, which is obviously not the case. We are interested in the case where the matrix has reduced rank 1 ≤ r ≤ n − 1. In this case there exist r cointegrating relations. The idea is that we can write our matrix of coefficients as two (n × r) matrices when dealing with reduced rank, as follows: π = αβ . (4.1.3) Here the β is the (r × n) matrix of cointegration coefficients and α is the (n × r) matrix of weights. The latter can be interpreted as the matrix of speed of adjustment (to the long-run relations) parameters. The rows of β form a basis for the r cointegrating vectors. Both matrices have rank r. If we substitute (4.1.3) into (4.1.2) we get: ∆yt = A0 + αβ yt−1 + p−1 i=1 πi∆yt−i + εt. (4.1.4) The VECM in (4.1.4) implies that each element of the r-dimensional vector β yt−1 is sta- tionary, meaning that there exist r - not necessarily unique6 - cointegration relations. 4.1.2 Johansen Test Statistics After estimating π we could get hold of the (estimated) characteristic roots of the matrix. We need these characteristic roots, because by testing the significance of these characteristic roots we can determine the number of cointegrating vectors. To be more specific, we want to find out how many characteristic roots of π we have that differ from zero or equally we could try to determine the amount of roots of π + I = p i=1 Ai that insignificantly differ from unity. This number is equal to the rank of the matrix π. Johansen proposed two test statistics testing the number of characteristic roots that are insignificantly different from one: λtrace(r) = −T n i=r+1 ln(1 − ˆλi), (4.1.5) λmax(r, r + 1) = −T ln(1 − ˆλr+1), (4.1.6) 6 If β yt ∼ I(0) then for any scalar c the linear combination cβ yt ∼ I(0). 15
  • 20. 4 TESTING FOR COINTEGRATION where, ˆλi with i = 1, 2, . . . , n are the estimated values of the characteristic roots (eigenvalues of the matrix discussed below) obtained from the estimated π and T is the number of used observations. λtrace tests the null hypothesis that the number of different cointegrating vectors is less than or equal to r against a general alternative. λmax on the other hand tests the null hypothesis that the number of cointegrating vectors is r against the alternative of r + 1 cointegrating vectors. Unlike conventional tests the distribution of these statistics is not a known, ’standard’ distribution. The critical values of both statistics are obtained using the Monte Carlo approach. 4.1.3 Calculating the characteristic roots If we assume that the most appropriate lag length p is known, we can calculate the charac- teristic roots of π by computing the Frisch Waugh (partial regression) method. The idea is to run a regression of e1t on e2t according to: e1t = πe2t + ξ. The two residuals e1t and e2t are obtained by estimating the VAR in first differences and re- gressing yt−1 on its lagged differenced values respectively. So the following two regressions are performed: ∆yt = B0 + B1∆yt−1 + B2∆yt−2 + · · · + Bp−1∆yt−p+1 + e1t yt−1 = C0 + C1∆yt−1 + C2∆yt−2 + · · · + Cp−1∆yt−p+1 + e2t. The next step is to compute the squares of the canonical correlations between e1t and e2t. The canonical correlations in our case are the n values of λi. They are the solutions to the equation: λiS22 − S12S−1 11 S12 = 0 where Sii = T−1 T i=1 eit(eit) and S12 = T−1 T i=1 e2t(e1t) . The last step is to find the n columns vi that are nontrivial solutions for λiS22vi = S12S−1 11 S12vi. These columns are the maximum likelihood estimates of the cointegrating vectors. Note that the λi are the eigenvalues of S−1 22 S12S−1 11 S12, and that the vi are the corresponding eigenvalues. 16
  • 21. 4 TESTING FOR COINTEGRATION 4.2 Model Selection Now that the method has been explained, let us start with the first step, which is to determine the amount of lags needed in equation (4.1.1). To do so I am going to estimate the restricted and unrestricted VARs (in which the error terms ε1t or ε2t are assumed to be identically and independently distributed with a multivariate normal distribution): yt = A0 + s i=1 Aiyt−i + ε1t, s = 1, 2, . . . (4.2.1) yt = A0 + l i=1 Aiyt−i + ε2t, l = 2, 3, . . . (4.2.2) using undifferenced data, respectively. According to the literature this is the most common procedure. Here s and l are the number of lags in the restricted and unrestricted model re- spectively, with l > s. If we define the variance-covariance matrix of the residuals from model (4.2.1) as Σr and that of model (4.2.2) as Σu we can compute the likelihood ratio test statistic recommended by Sims (1980): (T − c)(log |Σr| − log |Σu|) (4.2.3) where T is the number of observations (i.e., the number of periods), c is the number of param- eters in the unrestricted model in each equation and log |Σk| is the natural logarithm of the determinant of Σk, which I estimate by means of OLS. This alternative LR statistic can be viewed as a small-sample correction of the LR test (where without a correction we would have the factor T instead of (T − c)). The probability distribution of this statistic is approximately a chi-squared distribution with degrees of freedom equal to the number of coefficient restric- tions. Large values of this statistic indicate that the null hypothesis that the restricted model should be used, should be rejected. The critical value is from the chi-squared distribution (with the degrees of freedom parameter equal to the number of coefficients that is set to zero in the restricted model, as compared with the unrestricted model). Table 6: Likelihood Ratio test results. Hypothesis LR-Statistic (Degrees of Freedom) Critical value (5% significance)) Decision 2 lags against 1 lags 206.1896 (16) 26.296 Reject 1 lag 3 lags against 2 lags 105.5431 (16) 26.296 Reject 2 lags 4 lags against 3 lags 60.6647 (16) 26.296 Reject 3 lags 5 lags against 4 lags 46.3607 (16) 26.296 Reject 4 lags 6 lags against 5 lags 11.1580 (16) 26.296 Do not reject 5 lags 7 lags against 5 lags 55.0583 (32) 46.194 Reject 5 lags 8 lags against 7 lags 33.2695 (16) 26.296 Reject 7 lags 9 lags against 8 lags 53.7635 (16) 26.296 Reject 8 lags 10 lags against 9 lags 34.6816 (16) 26.296 Reject 9 lags 11 lags against 10 lags 46.6505 (16) 26.296 Reject 10 lags 12 lags against 11 lags 8.7503 (16) 26.296 Do not reject 11 lags 13 lags against 11 lags 78.5158 (32) 46.194 Reject 11 lags 14 lags against 13 lags 25.3731 (16) 26.296 Do not reject 13 lags 15 lags against 13 lags 80.3953 (32) 46.194 Reject 13 lags 16 lags against 15 lags 28.7579 (16) 26.296 Reject 15 lags 17 lags against 16 lags 16.6836 (16) 26.296 Do not reject 16 lags 18 lags against 16 lags 51.1282 (32) 46.194 Reject 16 lags 19 lags against 18 lags 10.7015 (16) 26.296 Do not reject 18 lags 20 lags against 18 lags 26.7273 (32) 46.194 Do not reject 18 lags 17
  • 22. 4 TESTING FOR COINTEGRATION The results from the likelihood ratio tests are displayed in table 6. The table shows that the LR test could not reject the null hypothesis of using a restricted model with 5 lags against the alternative of using an unrestricted model with 6 lags. Now testing for the null of 5 lags compared to the alternative of 7 lags, the null of 5 lags is rejected. However the null of 7 lags is rejected right away. The same goes for the null of 8, 9 and 10 lags. Testing the null of 11 lags against the alternative of 12 lags, I had to conclude not to reject the null of 11 lags. The null of 11 lags is however rejected when tested against the alternative of 13 lags. The null of 13 lags is not rejected when tested against the alternative of 14 lags. The null of 13 lags is eventually rejected when tested against the alternative of 15 lags. The null of 15 lags is rejected right away when tested against the alternative of 16 lags. The null of 16 lags is not rejected against the alternative of 17 lags but it is rejected when tested against the alternative of 18 lags. The null of 18 lags is never rejected compared to any alternative with more than 18 lags. So according to this alternative LR test I conclude that 18 lags is the most appropriate amount of lags to be used in our analysis for this particular dataset. 4.3 Empirical Results Table 7: The λtrace and λmax tests Null Hypothesis Alternative Hypothesis 1% critical value 5% critical value λtrace tests λtrace value r = 0 r > 0 60.043 54.46 47.21 r ≤ 1 r > 1 19.266 35.65 29.68 r ≤ 2 r > 2 6.837 20.04 15.41 r ≤ 3 r > 3 5.436 6.65 3.76 λmax tests λmax value r = 0 r = 1 40.777 32.24 27.07 r = 1 r = 2 12.430 25.52 20.97 r = 2 r = 3 1.400 18.63 14.07 r = 3 r = 4 5.436 6.65 3.76 After estimating the characteristic roots7according to the procedure mentioned previously, I computed the Johansen test statistics. Table 7 shows the results from the λtrace and λmax tests. Let us first focus on the test results concerning the 5% critical value. Since 60.043 is larger than the 5% critical value of the λtrace, the test rejects the null hypothesis of no cointegration and therefore accepts the alternative of one or more cointegrating vectors. Looking at the λmax test we have a similar conclusion as 40.777 is larger than the 5% critical value of this test. This leads to the rejection of the null of no cointegration and the acceptance of the alternative of having one cointegrating vector. So both tests reject the notion of no cointegration in our variables for the 5% critical value. Now looking at λtrace(1) - which tests the null of r ≤ 1 against the alternative of two, three or four cointegrating vectors - we can not reject the null hypothesis at the 5% critical value. Reason for this is the fact that the statistic 19.266 does not exceed the 5% critical value. So according to the λtrace(0) and λtrace(1) test we have one cointegrating vector. The λmax(1) test can not reject the null hypothesis of having exactly one cointegrating vector, because 12.430 does not exceed the 5% critical value. Combining the results from both tests up till now we can conclude that we have one cointegrating vector. 7 See Appendix C for the values 18
  • 23. 4 TESTING FOR COINTEGRATION Considering the λtrace(2) test - with the null hypothesis of 2 or less cointegrating vectors - again due to the fact that the statistic of 6.837 is smaller than the 5% critical value we can not reject the null. So the notion of having more than two cointegrating vectors is not accepted. The λmax(2) statistic of 1.400 clearly does not exceed the 5% critical value. So according to this test we can not reject the null of having exactly two cointegrating vectors. Combining the results from both tests and the previous ones we can confirm that there is one cointegrating vector. The next tests - which are the λtrace(3) and λmax(3) tests - should logically also not reject the null of having less than three cointegrating vectors and exactly three cointegrating vectors respectively, against the alternatives. Combining this result with the previous ones would lead me to conclude that we have exactly one cointegrating vector. However for both cases I get a statistic of 5.436 which clearly exceeds their corresponding 5% critical value. So according to this last test the null must be rejected and thus accepting the alternative of having four cointegrating vectors. This is quite an unexpected result because, as previously mentioned, the alternatives of having more than one and two cointegrating vectors were not accepted. Thus considering the 5% critical values the Johansen test could not give a plausible result for the number of cointegrating vectors in our model. I therefore considered the smaller 1% significance level. Its critical values are also displayed in table 7. Considering this significance level the null of no cointegration is rejected. Again both λtrace(0) and λmax(0) statistics exceed the corresponding critical values. The next tests computed all could not reject the null hypothesis. This means that the λmax test could not reject the notion of there being exactly one, two and three cointegrating vectors. Only considering this test would not be very insightful. But combining these results with the results from the λtrace test could be. The λtrace test could not reject the null of having one or less, two or less and three or less cointegrating vectors at the 1% significance level. So combining all these results as before, I conclude that there is one cointegrating vector. Selecting r = 1 the estimated normalized8 cointegrating vector and speed of adjustment parameters are displayed in Appendix C. 8 Normalized with respect to the first element of β. 19
  • 24. 4 TESTING FOR COINTEGRATION 4.4 Granger Causality The mechanism that binds cointegrated series together is called error correction, which can be considered a specific form of Granger causality for (some of) the involved variables (Granger 1988). Granger causality can be seen as the event that turning points in one series precede turning point in the other series, when considering two series. If taking into account the past of a series wt does not improve the forecasting performance of series zt (given the past of zt itself), then wt does not Granger cause zt. So basically Granger causality considers the effects of past values of wt on the current value of zt (in addition to the effects of the past values of zt on zt). It measures whether current and past values of wt help to forecast future values of zt. To bring it back to our case, I wish to find out if the variables LogEx, LogM2 and LogIR Granger cause LogCPI. But first let us rewrite the VECM in equation (4.1.2) with p = 18 as:      ∆y1t ∆y2t ∆y3t ∆y4t      =      a10 a20 a30 a40      +      a11(1) a12(1) a13(1) a14(1) a21(1) a22(1) a23(1) a24(1) a31(1) a32(1) a33(1) a34(1) a41(1) a42(1) a43(1) a44(1)           y1,t−1 y2,t−1 y3,t−1 y4,t−1      +      a11(2) a12(2) a13(2) a14(2) a21(2) a22(2) a23(2) a24(2) a31(2) a32(2) a33(2) a34(2) a41(2) a42(2) a43(2) a44(2)           ∆y1,t−1 ∆y2,t−1 ∆y3,t−1 ∆y4,t−1      + · · · +      a11(18) a12(18) a13(18) a14(18) a21(18) a22(18) a23(18) a24(18) a31(18) a32(18) a33(18) a34(18) a41(18) a42(18) a43(18) a44(18)           ∆y1,t−17 ∆y2,t−17 ∆y3,t−17 ∆y4,t−17      +      ε1,t ε2,t ε3,t ε4,t      with y1t, y2t, y3t and y4t equal to LogCPI, LogEx, LogM2 and LogIR at time t respectively. In this form it is clear that at each time t we are dealing with four equations. The null hypothesis that lags of LogEx, LogM2 or LogIR do not Granger cause LogCPI is properly tested using the likelihood ratio test stated in (4.2.3). We need to estimate the ∆y1t (LogCPI) equation - which is the first equation of the four equations above - using all lagged values to calculate Σu. We then have to estimate the same equation but now excluding the lagged values of the variable in consideration (y2t, y3t or y4t) to calculate Σr. Again this statistic has (asymptotically, under the null hypothesis of no Granger causality) a chi-squared distribution with degrees of freedom equal to p = 18 (this is the number of restricted variables in the equation of ∆y1t). Table 8: Likelihood Ratio test results for Granger Causality. Null Hypothesis LR-Statistic Decision LogEx does not Granger cause LogCPI 1291.65 Reject Null LogM2 does not Granger cause LogCPI 1669.82 Reject Null LogIR does not Granger cause LogCPI 1639.57 Reject Null Results from the Granger causality test is displayed in table 8. With a degrees of freedom equal to 18 and considering a confidence interval of 95% the chi-squared critical value is 28.869. The obtained LR-statistics are clearly much larger than this critical value, leading me to reject all three null hypotheses. Thus according to the test, the notion that LogEx, LogM2 and LogIR do not Granger cause LogCPI is rejected. Though the LR-statistics are very big, the results are of course not surprising. These variables all have something to do with each other economically. The obtained results from the Granger causality tests confirm that using these variables in my analysis and later on during the forecasting segment is a valid decision. 20
  • 25. 5 BAYESIAN ESTIMATION 5 Bayesian Estimation Up till now the estimation of the VECM is done using the method of maximum likelihood. This classical method assumes the parameters to be fixed but unknown. This section focuses on the Bayesian estimation method. In the Bayesian approach the Data generating process of the population is assumed uncertain and the parameters are described by a prior probability distribution instead of fixed values. This distribution is usually very broad to reflect the fact that the true value of the parameters are unknown. To better understand this approach, let us consider the VAR(p) model in equation (4.1.1). Classical estimation of this model can lead to a problem called overfitting. Overfitting may especially occur when the number of parameters (n+n2p) is large compared to the number of observations Tn. If this is the case, the estimates are influenced by noise rather than by signal. This may especially occur when the estimation method is designed to fit the data as closely as possible. To reduce the dimension of the parameter space, we could impose restrictions on this space. Thus the problem is to find restrictions that are as credible as possible. The Bayesian approach avoids overfitting not necessarily by imposing zero restrictions on the parameters but making the parameters vary within a certain range. As such the uncertainty of the exact values of the model’s parameters can be seen as a probability distribution for the parameter vector. So in a sense this distribution represents the degree of uncertainty of the parameters and is amended by the information contained in the data if the prior information is different from the information obtained from the data. As long as the prior information is not too obscure or non-informative it should be amended only by the signal and not by the noise contained in the sample. One can imagine that the choice of the prior distribution is a crucial step in the computation of the Bayes estimates. This step summarizes the uncertainty we have over the model parameters. In this light I find it important to consider more than one prior distribution. Before doing so let us take a better look at what the Bayesian treatment of a VAR entails. 5.1 Basic Principle Let us consider the VAR(p) model in (4.1.1), assuming that the variables are stationary. So in practice we are considering the model: ∆yt = A0 + A1∆yt−1 + A2∆yt−2 + · · · + Ap∆yt−p + εt t = 2, . . . , T, (5.1.1) but for notational convenience we will be working with yt - instead of ∆yt - and assume it is a (n×1) vector in the rest of this section. Note that I do not consider the Bayesian estimation of a VECM with a cointegration relation, even though there seems to be cointegration relation in the data. The reason for this is that I consider only models and priors that lead to posteriors which can be analytically evaluated. I leave the Bayesian estimation of a VECM with a cointegration relation for these data as a topic for further research. Now define Xt = (1, yt−1, yt−2, . . . , yt−p) which is a (1 × k) vector with k = 1 + np and X =       X1 X2 ... XT       which is a (T × k) matrix. Let us further define α = vec(A) which is a (nk × 1) vector with A = (A0, A1, . . . , Ap) a (k ×n) matrix of stacked VAR coefficients and intercepts. The full VAR can then be written as: y = (In ⊗ X)α + ε, (5.1.2) 21
  • 26. 5 BAYESIAN ESTIMATION where ε ∼ N(0, Σ⊗IT ) and y is the (nT ×1) vector which stacks all T observations of the first time series, then all T observations of the second series and so on. 9 The unknown parameters of the model are α and Σ. Before the data are observed the parameters are described by the joint prior distribution p(α, Σ). Once the data are observed we use the Bayes’ rule to obtain the joint posterior distribution of the parameters conditioned on the data, which is defined as: p(α, Σ|y) = p(α, Σ, y) p(y) = p(α, Σ)p(y|α, Σ) p(y) = p(α, Σ)L(y|α, Σ) p(y) ∝ p(α, Σ)L(y|α, Σ), (5.1.3) where the last equation10 is obtained due to the fact that p(y) is a constant (in the sense that it does not depend on α or Σ) and will not have any significance during the estimation. Given the joint posterior distribution p(α, Σ|y), the marginal posterior distributions p(α|y) and p(Σ|y) can be obtained by integrating out α and Σ from p(α, Σ|y) respectively. The analytical integration of p(α, Σ|y) can be very cumbersome or even impossible to perform. To circumvent this difficulty numerical integration methods based on Monte Carlo simulation can be used. However analytical solutions are available for some prior specifications. I am going to focus the analysis on these analytical approaches of computing the marginal posterior distributions. That is, a difficulty faced in implementing Bayesian estimation of model (5.2.1) is the choice of the prior distribution. I am going to discuss several alternatives for which analytical integration is possible. Once we get hold of p(α|y) and p(Σ|y) the final step is to analyze the location (mean) and dispersion (variance) of these distributions, which will yield point estimates of the parameters and posterior standard deviations (which are the Bayesian counterpart of the classical standard errors). 9 If we define YT to be a (T × n) matrix which stacks the T observations of each series in columns next to one another, then y = vec(YT ). 10 ∝ means ’proportional to’ , which means that the ratio of the left-hand side and the right-hand side is a constant (that does not depend on α or Σ). 22
  • 27. 5 BAYESIAN ESTIMATION 5.2 Likelihood Function Apart from the prior and the posterior distributions the likelihood function is also of immense importance , as it is used to transform the prior into the posterior. Understanding how its functional form looks like will make it easier to understand the priors we will be considering for the estimation and forecasting process later on. Given the parameters α and Σ the likelihood is defined as: p(y|α, Σ) ∝ |Σ ⊗ IT |−1/2 exp − 1 2 (y − (In ⊗ X)α) (Σ−1 ⊗ IT )(y − (In ⊗ X)α) . (5.2.1) If we view the likelihood as a function of the parameters L(α, Σ), then we can derive a useful decomposition breaking the likelihood into two parts: (y − (In ⊗ X)α) (Σ−1 ⊗ IT )(y − (In ⊗ X)α) = (Σ−1 2 ⊗ IT )(y − (In ⊗ X)α) (Σ−1 2 ⊗ IT )(y − (In ⊗ X)α) = (Σ−1 2 ⊗ IT )y − (Σ−1 2 ⊗ X)α (Σ−1 2 ⊗ IT )y − (Σ−1 2 ⊗ X)α , where in the last equation I used the Kronecker property (A ⊗ B)(C ⊗ D) = AC ⊗ BD with A, B, C and D matrices of such sizes that these multiplications are possible. We can rewrite: (Σ−1 2 ⊗ IT )y − (Σ−1 2 ⊗ X)α = (Σ−1 2 ⊗ IT )y − (Σ−1 2 ⊗ X)αols + (Σ−1 2 ⊗ X)(αols − α) where αols = (Σ−1 ⊗ X X)−1 (Σ−1 ⊗ X) y. Now substituting this into the last result we get: (y − (In ⊗ X)α) (Σ−1 ⊗ IT )(y − (In ⊗ X)α) = (Σ−1 2 ⊗ IT )y − (Σ−1 2 ⊗ X)αols (Σ−1 2 ⊗ IT )y − (Σ−1 2 ⊗ X)αols + (αols − α) (Σ−1 ⊗ X X)(αols − α) Now substituting this into the likelihood function we get: L(α, Σ) ∝ |Σ ⊗ IT |−1/2 exp{− 1 2 (α − αols) (Σ−1 ⊗ X X)(α − αols) − 1 2 [(Σ−1 2 ⊗ IT )y − (Σ−1 2 ⊗ X)αols] [(Σ−1 2 ⊗ IT )y − (Σ−1 2 ⊗ X)αols]} = |Σ|−1 2 k exp − 1 2 (α − αols) (Σ−1 ⊗ X X)(α − αols) ×|Σ|− 1 2 (T−k) exp − 1 2 Tr([(Σ−1 2 ⊗ IT )y − (Σ−1 2 ⊗ X)αols] [(Σ−1 2 ⊗ IT )y − (Σ−1 2 ⊗ X)αols]) ∝ N(α|αols, Σ, X, y) × W(Σ−1 |αols, X, y, T − k − n − 1) where Tr(·) is the trace function. By decomposing the likelihood into two parts I showed it to be proportional to the product of a Normal and a Wishart density which is the extended version of the chi-squared distribution (for positive definite symmetric matrices instead of positive scalars). That is: α|Σ, y ∼ N(αols, Σ ⊗ (X X)−1 ) (5.2.2) and Σ−1 |y ∼ W(S−1 , T − k − n − 1), (5.2.3) with S = (y − (In ⊗ X)αols) (y − (In ⊗ X)αols). 23
  • 28. 5 BAYESIAN ESTIMATION 5.3 Priors As stated before the choice of the prior p(α, Σ) is of great importance. Due to the large number of coefficients withing a VAR framework it is cumbersome to obtain precise estimates. This can lead to forecasts being imprecise. Prior information is helpful in decreasing the predic- tive standard deviations. There are several priors in the literature and each of them have their own benefits and drawbacks, depending on the specification of the VAR. One big difference be- tween the different priors is whether they lead to analytical results for the posterior or whether Markov Chain Monte Carlo methods are required. Natural conjugate priors, noninformative priors and the Minnesota priors are three type of priors that can lead to analytical results. I will be considering these three priors. 5.3.1 Natural Conjugate Priors and The Noninformative Priors Natural conjugate priors are priors with the same functional form as the likelihood and the posterior; that is, the prior, likelihood and posterior all belong to the same ’family’ of (density) functions. The way the distribution of the likelihood looks (see previous section) suggests that the natural conjugate prior has the following form: α|Σ ∼ N(ˆα, Σ ⊗ ˆV ) (5.3.1) and Σ−1 ∼ W( ˆS−1 , ˆτ), (5.3.2) where the ˆα, ˆV , ˆS and ˆτ are the parameters of the prior, the so called hyperparameters. To get the posterior we need to multiply this prior with the likelihood (according to (5.1.2)). So the posterior becomes: α|Σ, y ∼ N(˜α, Σ ⊗ ˆW) (5.3.3) and Σ−1 |y ∼ W( ˆZ−1 , ˆν), (5.3.4) where ˆW = ( ˆV −1 + X X)−1 and ˆZ = S + ˆS + AolsX XAols + ˆA ˆV −1 ˆA − ˜A ( ˆV −1 + X X) ˜A. Furthermore ˜A = ˆW ˆV −1 ˆA + X XAols and ˆν = T + ˆτ. The vectors Aols and ˆA are computed by unstacking the (nk × 1) vectors αols and ˆα re- spectively. The fact that the marginal posterior p(α|y) is a multivariate t-distribution, makes analytical posterior inference possible. This distribution is obtained after integrating Σ out of the distribution stated in 5.3.3 and has mean ˜α with degrees of freedom parameter ˆν and covariance matrix defined as: var(α|y) = 1 ˆν − n − 1 ˆZ ⊗ ˆW As a researcher I am free to choose any value for the hyperparameters ˆα, ˆV , ˆS and ˆτ. For the noninformative prior I do not have this freedom. The difference between the natural conjugate 24
  • 29. 5 BAYESIAN ESTIMATION prior and the noninformative prior is that for the latter the hyperparameters are restricted to be ˆτ = ˆS = ˆV −1 = cI where c is a constant with c → 0. According to the literature the noninformative prior does not decrease the predictive standard deviations. This is of course a huge drawback since in our case the whole point in using priors is to shrink these deviations so forecasting can be more precise. Using this prior makes it unnecessary to use posterior simulation algorithms, which is a huge benefit. However there is a possibly undesirable property to this prior which is caused by the way the prior covariance matrix (Σ ⊗ ˆV ) is defined. If we denote the individual elements of Σ by σij then the prior covariance of the coefficients in the i’th equation is equal to σii ˆV . Due to the recurrence of ˆV in the prior covariance of the coefficients of each equation, the prior covariances of the coefficients for every two equations are proportional to each other. This is quite a restrictive property and thus could have a negative effect on the forecasts. 5.3.2 The Minnesota Prior Researchers at the University of Minnesota (Doan, Litterman and Sims, 1984 and Litterman, 1986) came up with priors that have great simplifications in the computation process. These so called Minnesota priors approximate Σ by replacing it with an estimate ˆΣ. Replacing Σ by an estimate simplifies the prior in the sense that we now only need to consider α. So in line with the previous case we can define the Minnesota prior as: α ∼ N(ˆαms, ˆVms) (5.3.5) The elements of the prior mean ˆαms are set equal to zero to mitigate the risk of overfitting. The prior covariance matrix ˆVms is assumed to be diagonal. If ˆVms is viewed as a block-diagonal matrix with (k×k) block-matrix ˆVi with i = 1, 2 . . . n, then we can define ˆVi,jj to be the diagonal elements of ˆVi according to: ˆVi,jj =    a1 r2 for coefficients on own lag r with r = 1, . . . , p a2σii r2σjj for coefficients on lag r of variable j = i with r = 1, . . . , p a3σii for coefficients on deterministic variables With this specification, we do not need to specify all elements of ˆVms. Instead of that we only need to choose the three scalars a1, a2 and a3. Imposing the restriction a1 > a2 leads to the pleasant property that lags of the same variable a priori have a larger prediction power than lags of other variables. Furthermore I choose σii = si which is the OLS estimate. This prior leads to a posterior which only requires the Normal distribution: α|y ∼ N(˜αms, ˜Wms) (5.3.6) where ˜Wms = ˆV −1 ms + (ˆΣ−1 ⊗ (X X)) −1 and ˜αms = ˜Wms ˆV −1 ms ˆαms + (ˆΣ−1 ⊗ X) y . This simplification is of course a big advantage, yet the fact that Σ is replaced by an estimator can be seen as a drawback. This drawback is the reason why I believe the natural conjugate prior will perform better in forecasting. The Minnesota prior does not take any uncertainty in the estimator ˆΣ into account. Thus it does not provide a full Bayesian treatment of Σ. 25
  • 30. 6 FORECASTING 6 Forecasting Perhaps the most important feature of time series analysis is the forecasting part. The main objective is to find out which estimation method of the VECM is best in forecasting the inflation (LogCPI) in Ghana. Additionally the forecasting performance of the other variables will also be touched on. For this purpose - as mentioned in section 3 - I have kept six monthly datapoints, dating from December 2013 till May 2014, as out of sample data. Note that this is obviously a very small number of out-of-sample observations. However, this choice has been made for two reasons. First, the total number of observations that I have is not very large. Second, I am mainly interested in the performance of the models in the most recent period, because it may be expected that the best model for predicting the recent past is also the best model for predicting the (nearby) future. I will be forecasting these values using the different estimation methods discussed for the VECM model. The one-step forecast based on information available at time T is: yT+1|T = A0 + A1yT + A2yT−1 + · · · + ApyT−p+1. (6.0.7) The chain rule of forecasting can be used to obtain h-step forecasts according to: yT+h|T = A0 + A1yT+h−1|T + A2yT+h−2|T + · · · + ApyT+h−p|T , (6.0.8) where yT+i|T = yT+i for i ≤ 0. Clearly in our case we have h = 1, 2, . . . 6. 6.1 Forecasts Figure 5 shows the LogCPI datapoints from January 2012 till November 2013 attached to the 6 consecutive (forecast) datapoints. The graph shows the forecast values using the different estimation methods. As a recap the estimation methods used are from the cointegration model (4.1.4), the first difference model (5.1.1) using OLS (stated as ’No Coint’ in the legend) and the first difference model (5.1.1) using Bayesian estimation. Not surprisingly I used the noninforma- tive, the natural conjugate and the Minnesota prior for the Bayesian estimation. Additionally the actual datapoints are also displayed in the Figure as a comparison tool. 2012 2012.5 2013 2013.5 2014 2014.5 5.9 5.95 6 6.05 6.1 6.15 6.2 6.25 Coint No Coint Non Inf Minnes Nat Conj Actual Data Figure 5: Log CPI forecasts 26
  • 31. 6 FORECASTING It is clear from Figure 5 that the Bayesian forecasts using the natural conjugate prior are the closest to the actual datapoints. The noninformative prior seems to deliver the worst forecasts from all the three priors used. It is also clear that the cointegration forecasts are a little bit closer to the actual datapoints compared to the forecasts obtained without considering cointegration. 6.2 Predictive Accuracy To be more certain about the accuracy of our forecasts let us have a look at the MAE (Mean Absolute Error), the RMSE (Root Mean Squared Error) and the Diebold-Mariano test results, which are used to evaluate whether the difference in performance between the methods is significant. 6.2.1 MAE The Mean Absolute Error (MAE) is defined as: MAE = 1 6 6 i=1 |ˆyT+i − yT+i|, (6.2.1) with ˆyT+i the forecast value of yT+i, which is the true value at time T + i. Table 9: MAE values. LogCPI LogEx LogM2 LogIR Cointegration 20.8225 0.4046 583.0851 1.7481 No Cointegration 21.1034 0.4297 568.2084 1.5211 Non Informative 23.1752 0.3509 779.4893 2.4558 Minnesota 18.1238 0.3621 1228.5578 1.8556 Natural Conjugate 13.9855 0.3281 772.9586 2.0574 The MAE values are depicted in table 9. Focusing on the LogCPI we can conclude that the Bayesian estimation method using a natural conjugate prior has the best forecasting per- formance. Its MAE value (13.9855) is the smallest compared to the MAE values of the other estimation methods. This is not surprising, because we already saw that its forecasts are the closest to the actual data depicted in Figure 5. The Bayesian method with the Minnesota prior is ranked second (18.1238) in best forecasting performance according to the MAE value. The third place goes to the cointegration approach. The difference betweeen the MAE value of the cointegration approach (20.8225) and that without (21.1034) is quite small. So according to the MAE the gain we get from considering the long term relationship in the VECM is not very big. The worst inflation forecaster according to the MAE values is the Bayesian method using noninformative priors. I already stated that the noninformative prior is known not to decrease the predictive standard deviation, so this result is not surprising. If we look at the other variables, we see that overall the Bayesian estimation using the natural conjugate prior also has the lowest MAE value for LogEx. For the forecasting of LogM2 I must conclude that using Bayesian estimation methods is not the best choice. Classical estimation methods used in case of cointegration and in case of no cointegration seems to outperform the Bayesian methods. For LogIR, again we see that the Bayesian approach using the noninformative prior does not perform well forecasting. 27
  • 32. 6 FORECASTING 6.2.2 RMSE Another loss function is the Root Mean Squared Error (RMSE) and is defined as: RMSE = 6 i=1(ˆyT+i − yT+i)2 6 , (6.2.2) Table 10: RMSE values. LogCPI LogEx LogM2 LogIR Cointegration 22.8155 0.4491 646.3448 1.9948 No Cointegration 23.2204 0.4828 613.9482 1.7477 Non Informative 25.1429 0.4104 869.4263 2.8215 Minnesota 19.4639 0,4209 1326,6086 2,1682 Natural Conjugate 15.1183 0.3786 866.6952 2.3961 Again focusing on LogCPI, the RMSE results are not different from that of the MAE as I expected. The two loss functions do not differ very much as can be seen in their specification in equation (6.2.1) and (6.2.2). Again we see that the Bayesian estimation method using the natural conjugate prior outperforms the other methods. The ranking is the same as before. The Bayesian estimation using the noninformative prior is again the worst predictor for inflation. The same conclusions can be drawn for the other variables as before. 6.2.3 Diebold-Mariano test From the two loss function we have covered, it is clear what how the different estimation approaches perform when used to forecast. I am now interested to find out whether or not the differences in the forecast quality between the different estimation methods are significantly large enough. To find that out I will be performing the Diebold-Mariano test. The Diebold-Mariano test has the null hypothesis of equal predictive accuracy. The DM statistic is given by: DM = d ˆvar(d) , (6.2.3) with dt = |ˆyk t − yt| − |ˆyl t − yt| and ˆd = 1 6 6 t=1 dt. Here ˆyk and ˆyl are the forecast values of the true value yt using estimation method k and l respectively. The DM statistic is approxi- mately standard normally distributed. Since there are only 6 out-of-sample observations, I use a significance level of 10%. In a two-sided test the null of equal predictive accuracy of two methods is rejected if |DM| > 1.645. However, since I already expect the natural conjugate prior to perform well, I also consider a one-sided test to test whether the natural conjugate prior performs significantly better. In this case the null hypothesis (of equal or worse performance of the natural conjugate prior) is rejected if DM > 1.282. 28
  • 33. 6 FORECASTING Table 11: Results from the Diebold-Mariano test for LogCPI. Contest |DM| Test outcome (2-sided test, 10% sign.) Test outcome (1-sided test, 10% sign.) Cointegration vs No Cointegration 0.2989 Do not reject null - Cointegration vs Non Informative 0.5783 Do not reject null - Cointegration vs Minnesota 0.8619 Do not reject null - Cointegration vs Natural Conjugate 1.6035 Do not reject null Reject null No Cointegration vs Non Informative 0.4222 Do not reject null - No Cointegration vs Minnesota 0.7673 Do not reject null - No Cointegration vs Natural Conjugate 1.4904 Do not reject null Reject null Non Informative vs Minnesota 1.4098 Do not reject null - Non Informative vs Natural Conjugate 1.6965 Reject null Reject null Minnesota vs Natural Conjugate 2.2302 Reject null Reject null Table 11 displays the results from the Diebold-Mariano tests. We saw that the VECM model taking cointegration into account performed better during forecasting than the model without considering cointegration. The Diebold-Mariano (two-sided) test however did not reject the null of equal predictive accuracy. This means that the improvement the cointegration model delivers is significantly small. Comparing the cointegration model with the model where we estimated the VECM using the Bayesian approach with the noninformative prior, again I have to conclude that the predictive accuracy of both methods is not significantly large. The same conclusion can be drawn from the Diebold-Mariano test concerning the cointegration estimation approach compared to the Bayesian estimation approach using the Minnesota prior. The DM statistic of the test between the cointegration estimation compared to the Bayesian estimation using the natural conjugate prior (1.6035) is quite big compared to the previous ones. It is large enough to reject the null hypothesis in a one-sided test with 10% significance. The same conclusion can be drawn when comparing the forecast power of the VECM not considering cointegration with all other estimation methods, except for the Bayesian approach under the natural conjugate prior (one-sided test). We saw this before in the test considering cointegration in the VECM. We know from the MAE and RMSE that the Bayesian estimation method using the natural conjugate prior is the best in forecasting LogCPI. Given this fact and the Diebold-Mariano test results up till now I would recommend using the Bayesian estimation method using the natural conjugate prior instead of the classical estimation approaches. The Diebold-Mariano test comparing the Bayesian approach using the noninformative prior compared to the Minnesota prior and the natural conjugate prior resulted in rejecting the null hypothesis. Also in the comparison of the Minnesota prior and the natural conjugate prior, the natural conjugate prior is significantly better. To conclude, the natural conjugate prior is indeed significantly better than the four alternative methods. 29
  • 34. 6 FORECASTING 6.3 20-step Ahead Forecasts To obtain more results of the Diebold-Mariano tests I decided to also consider an alternative out-of-sample period where I forecast twenty values of our monthly variables instead of six. To be precise, I kept 20 monthly datapoints dating from October 2012 till May 2014 as out of sample data. Furthermore I changed the amount of lags into 12. It seemed reasonable to use 12 lags due to the fact that we are handling monthly datapoints. 2012 2012.5 2013 2013.5 2014 2014.5 5.9 5.95 6 6.05 6.1 6.15 6.2 6.25 6.3 6.35 Coint No Coint Non Inf Minnes Nat Conj Actual Data Figure 6: Log CPI forecasts for h = 20. Figure 6 shows the LogCPI datapoints from January 2012 till September 2012 attached to the 20 consecutive (forecast) datapoints. Again the Bayesian forecasts using the natural conjugate prior seem to be the closest to the actual datapoints. Another eye striking observation is that the Bayesian forecasts using the noninformative prior and the forecasts obtained without considering cointegration seems to be the same. Table 12: MAE values for h = 20. LogCPI LogEx LogM2 LogIR Cointegration 8.5722 0.1721 1157.3700 3.4503 No Cointegration 18.7909 0.1675 1239.8661 3.7385 Non Informative 18.7909 0.1675 1239.8661 3.7385 Minnesota 17.5946 0.1760 1441.5419 3.2820 Natural Conjugate 5.5489 0.2077 1230.8590 3.0093 Table 12 shows the MAE values. From this table we can observe that the Bayesian approach using the natural conjugate prior is ranked number one in predicting LogCPI. This was also the case in the previous section. However the forecasts incorporating cointegration dynamics in the model is ranked second instead of third. Forecasting 20 datapoints proves that including this long term dynamics is far more advantageous than excluding it. The Bayesian forecasts using the Minnesota prior is now ranked third. Finally - as we already saw in Figure 6 - the Bayesian forecasts using the noninformative prior and forecast without incorporating cointegration dy- namics is the same according to the MAE values. These two estimation methods prove to be the worst method to use in predicting inflation in Ghana. 30
  • 35. 6 FORECASTING Table 13: RMSE values for h = 20. LogCPI LogEx LogM2 LogIR Cointegration 10.5658 0.1882 1483.8178 3.9400 No Cointegration 23.1909 0.1883 1591.9582 4.2978 Non Informative 23.1909 0.1883 1591.9582 4.2978 Minnesota 21.1469 0.1932 1735.1487 3.7787 Natural Conjugate 7.2561 0.2250 1509.1141 3.5212 Table 13 shows the RMSE values. The same conclusions (ranking) can be drawn from this table as was drawn according to the MAE values, with respect to LogCPI. Table 14: Results from the Diebold-Mariano test for LogCPI for h = 20. Contest |DM| Test outcome (2-sided test, 5% sign.) Cointegration vs No Cointegration 1.0956 Do not reject null Cointegration vs Non Informative 1.0956 Do not reject null Cointegration vs Minnesota 1.2435 Do not reject null Cointegration vs Natural Conjugate 0.3035 Do not reject null No Cointegration vs Non Informative 1.2918 Do not reject null No Cointegration vs Minnesota 0.4369 Do not reject null No Cointegration vs Natural Conjugate 0.8068 Do not reject null Non Informative vs Minnesota 0.4369 Do not reject null Non Informative vs Natural Conjugate 0.8068 Do not reject null Minnesota vs Natural Conjugate 0.8048 Do not reject null Since there are now 20 out-of-sample observations, I used a significance level of 5% for the Diebold-Mariano test. In a two-sided test the null of equal predictive accuracy of two methods is then rejected if |DM| > 1.96. The new Diebold-Mariano test results are displayed in table 14. The tests were never able to reject the null hypothesis of equal predictive accuracy. Therefore no strong conclusions can be drawn about the difference in predictive accuracy of the different estimation methods. However it is interesting to note that the smallest DM statistic is obtained when comparing the forecasts incorporating cointegration with the Bayesian forecasts using the natural conjugate prior. This means that the predictive accuracy of these two methods are the most similar compared to the other comparisons. This of course is not a surprise as we can conclude that these two methods are best in forecasting inflation in Ghana. Furthermore the test comparing the Bayesian forecasts using the noninformative prior and the model without cointegration resulted in the largest DM statistic. Thus the predictive accuracy of these two models are less similar compared to the other comparisons. This may seem a surprising result, since the MAE and RMSE are almost the same for these methods. Indeed this implies that the numerator of the test statistic is very small. However, the denominator of the test statistic is also very small in this case, so that the ratio can still be relatively large. To end, forecasting 20 steps ahead instead of 6 steps ahead proved to lead to interesting conclusions. In both cases the Bayesian method using the natural conjugate prior leads to the best forecasts. Big difference between the two is the fact that the 20-step ahead forecasts showed that incorporating cointegration dynamics into the model does indeed prove to be valuable. This shows that including long term dynamics in the model can lead to substantial improvements when forecasting a larger forecast horizon. 31
  • 36. 7 CONCLUSION 7 Conclusion The purpose of this study was to compare different estimation methods in predicting inflation (CPI) in Ghana. After going through the literature it struck me that there is not a lot of work on predicting inflation in Ghana. In this light I found it important to conduct this research. Inspired by the work of Erasmus and Abdalla (2005) this study tried to dig deeper in the dynamics of cointegration for the variables under consideration (CPI, Ex, M2 and IR). A Further step was to implement Bayesian estimation procedures, with the goal of comparing the prediction performance of these methods. But the first step was to determine the order of integration of the variables. For this reason the AIC criterion was used to determine the type of model and the amount of lags that best fitted the variables. After obtaining this insight the ADF test - testing the null hypothesis of the presence of a unit root - could be performed. According to the ADF tests the variables LogCPI, LogEx, LogM2 and LogIR are integrated of order one. Knowing this, a VECM was set up. This was needed to test for cointegration using the Johansen approach. The number of lags most appropriate for the VECM proved to be 18, according to the LR test. Considering a significance level of 1% the Johansen cointegration tests showed that our time series are cointegrated, containing one cointegration vector. Additionally the Granger causality test gave reassurance on the use of the variables under consideration for forecasting. When forecasting six periods ahead, the Bayesian estimation approach using the natural conjugate prior seems to best predict the inflation dynamics in Ghana. The MAE and RMSE values were all in favor of this approach. The second best approach is the Bayesian method using the Minnesota prior. This confirms my intuition that the natural conjugate prior would outperform the Minnesota prior. Reason for this is the fact that the Minnesota prior replaces Σ an estimator and therefore does not take any uncertainty in the estimator into account. The natural conjugate prior provides a full treatment of Σ This could be the reason why the Diebold- Mariano test suggested that the predictive accuracy of an estimation using these priors is not the same. The cointegration model is the third best inflation predictor. However the difference between the performance of the cointegration model compared to the model without considering cointegration is small. Thus the gain from considering the long term relationship in the VECM - in this case - is small. The Bayesian estimation using the noninformative prior is the worst predictor for inflation in Ghana. This did not come as a surprise since the noninformative prior does not decrease the predictive standard deviations. Forecasting twenty values ahead proved to lead to more reliable conclusions though. Again the Bayesian estimation approach using the natural conjugate prior is best in predicting inflation dynamics in Ghana. Having a larger forecast horizon lead to the pleasant surprise that incorporating cointegration dynamics in the model is indeed valuable in predicting inflation dynamics in Ghana. This method is now ranked second place. The Bayesian estimation method using the Minnesota prior now ranks third. Not surprisingly the Bayesian approach using the noninformative prior and the model excluding cointegration dynamics are both ranked last. There are several areas where further research could be useful. The first is considering non- normally distributed disturbances. This would not be a very odd idea since the kurtosis of the data - see Appendix A - is large, which indicates the presence of fat tails. Furthermore different models could be combined. For example we could use a model which incorporates a cointegrating vector and use estimates of the other coefficients obtained from Bayesian estimation using the natural conjugate prior to predict future movements. Another possible extension is to use priors for which the posterior can not be evaluated analytically, so that we need to make use of a numerical integration method. Thus numerical integration methods based on Monte Carlo simulations like the Gibbs sampler and the Metropolis-Hasting sampler could be used. As 32
  • 37. 7 CONCLUSION mentioned before, the Bayesian estimation of a VECM with a cointegration relation (using a Markov chain Monte Carlo (MCMC) simulation method) for our data is an interesting topic for further research, especially for longer forecast horizons. 33
  • 38. REFERENCES References [1] Canova, Fabio (2007). Chapter 10: Bayesian VARs. 2007, Book: Methods for Applied Macroeconomic Research. [2] Ciccarelli, Matteo & Rebucci, Alessandro (2003). Bayesian VARs: A Survey of The Recent Literature with an Application To The European Monetary System. 2003, International Monetary Fund [3] Dagher, Jihad & Kovanen, Arto (2011). On the Stability of Money Demand in Ghana: A Bounds Testing Approach. 2011, International Monetary Fund [4] Enders, Walter (2003). Chapter 4: Models With Trend & Chapter 5: Multiequation Time-Series Models & Chapter 6: Cointegration And Error-Correction Models. 2003, Book: Applied Econometric Time Series, Second Edition [5] Engle, Robert F. & Granger, Clive W.J. (1987). Cointegration and Error-Correction: Representation, Estimation and Testing. 1987, Econometrica 55, 251-276 [6] Erasmus, Alnaa Samuel & Abdalla, Abdul-Mumuni (2005). Predicting Inflation in Ghana: A Comparison of Cointegration and ARIMA Models. 2005, University of Skovde [7] Havi, Emmanuel Dodzi K.& Enu, MPhil Patrick, MA & Opoku, C.D.K, MSc (2014). Demand For Money And Long Run Stability In Ghana: Cointegration Approach. 2014, Department of Economics, Methodist University College Ghana [8] Hendry, David F. & Juselius, Katarina (1999) Explaining Cointegration Analysis: Part I. 1999, Nuffield College, Oxford - European University Institute, Florence [9] Hendry, David F. & Juselius, Katarina (2000) Explaining Cointegration Analysis: Part II. 2000, Nuffield College, Oxford - Department of Economics, University of Copenhagen [10] Hjalmarsson, Erik & Österholm, Pär (2007) Testing for Cointegration Using the Johansen Methodology when Variables are Near- Integrated. 2007, International Monetary Fund 34
  • 39. REFERENCES [11] Johansen, Søren (1988) Statistical analysis of cointegration vectors. 1988, Journal of Economic Dynamics and Control 12, 231-254 [12] Johansen, Søren & Juselius, Katarina (1990) Maximum Likelihood Estimation And Inference On Cointegration - With Applications To The Demand For Money. 1990, Oxford Bulletin Of Economics And Statistics [13] Koop, Gary & Korobilis, Dimitris (2010). Bayesian Multivariate Time Series Methods for Empirical Macroeconomics. 2010, University of Strathclyde [14] Kovanen, Arto (2011). Does Money Matter for Inflation in Ghana?. 2011, International Monetary Fund [15] Kwiatkowski, Denis & Phillips, Peter C.B. & Schmidt, Peter & Shin, Yongcheol (1991). Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root. 1991, Elsevier Science Publishers B.V. [16] Ni, Shawn & Sun, Dongchu (2005). Bayesian Estimates for Vector-Autoregressive Models. 2005, University of Missouri, Columbia [17] Sims, Christopher A. (1980). Macroeconomics and Reality. Jan. 1980, Econometrica, Vol. 48, No. 1, 1-48 [18] Sjö, Bo (2008). Testing for Unit Roots and Cointegration. 2008, Linköping University 35
  • 40. B APPENDIX: SCATTERPLOTS OF THE DATA A APPENDIX: Data Statistics Table 15: Data Statistics. DLogCPI DLogEx DLogM2 DLogIR Mean 0.0152 0.0148 0.0241 -0.0017 Median 0.0146 0.0070 0.0206 0 Maximum 0.1033 0.6326 0.2151 0.1671 Minimum -0.0259 -0.5569 -0.2354 -0.2231 Std. Dev. 0.0159 0.0853 0.0394 0.0439 Skewness 1.0352 1.0088 -0.3175 -0.3578 Kurtosis 6.8216 35.0430 10.9880 10.3630 Observations 280 280 280 280 B APPENDIX: Scatterplots of the Data 1 2 3 4 5 6 7 −4 −3 −2 −1 0 1 Log CPI LogEx 1 2 3 4 5 6 7 3 4 5 6 7 8 9 10 Log CPI LogM2 1 2 3 4 5 6 7 2.5 3 3.5 4 Log CPI LogIR −4 −3 −2 −1 0 1 2 4 6 8 10 Log Ex LogM2 −4 −3 −2 −1 0 1 2.5 3 3.5 4 Log Ex LogIR 2 4 6 8 10 2.5 3 3.5 4 Log M2 LogIR Figure 7: Scatterplots of the Data. 36
  • 41. D APPENDIX: KPSS TEST RESULTS C APPENDIX: Results from Cointegration Estimation Table 16: The (normalized) cointegrating vector and speed of adjustment coefficients. β α LogCPI -1.0000 -0.0116 LogEx -120.4431 0.0001 LogM2 -0.0007 -1.4562 LogIR -2.7009 0.0067 Table 17: The Characteristic roots. λ 0.1436 0.0462 0.0053 0.0205 D APPENDIX: KPSS Test Results Table 18: The KPSS statistic values. LogCPI LogEx LogM2 LogIR H0 = I(0) and H1 = I(1) With intercept 1.8450 1.8818 1.4315 1.3499 With intercept and trend 0.4909 0.2405 0.4289 0.1988 H0 = I(1) and H1 = I(2) With intercept 1.9753 0.4504 2.1832 0.1053 With intercept and trend 0.0368 0.0600 0.3518 0.0800 Table 19: The critical values for the KPSS Test. significance levels 10% 5% 1% With intercept 0.347 0.463 0.739 With intercept and trend 0.119 0.146 0.216 37
  • 42.
  • 43. Final version. Any further copying or communication of this material should be done with permission of Victor K. Amankwah or the Vrije Universiteit Amsterdam. c 2015 Victor K. Amankwah (1818376). All rights reserved.