Multivariate time series

Certificate in
Quantitative Finance
Module 6 Assessed Assignment
2012
Luigi Piva
Multi-Variate Time Series Analysis
A multivariate time series consists of several series. Therefore, the concepts of vector and matrix are important in multivariate time series analysis
Many of the models and methods used in the univariate analysis can be generalized directly to the multivariate case, but there are situations in which the generalization requires some attention. In some situations,we need new models and methods to manage the complex relationships between different series.
I decided to use five important energy futures, importing closing data into a spreadsheets.
The time series, cover the period from 31/05/2007 to 16/07/2012:
 Crude Oil
 Ethanol
 Gasoline
 Heating Oil
 Natural Gas
In the graph below we see the series . Obviously the value of each series is different from the others, to be able to easily view all the series together, all the time series start from the same point, one, and move proportionally

To plot this chart , all series start at one. In the subsequent period, the value is equal to one plus the variation, calculated as follows:
Value = (today'close-yesterday close) / yesterday close
The series then continues by adding the following variation to the accumulated value up to that moment.
In an initial visual inspection, the series appear to be trending. In the markets of Gasoline and Ethanol there is a positive trend, while for what concerns Crude Oil and Heating Oil,the evolution is more an oscillatory movement . There is a negative trend for the Natural Gas. Does not appear that futures have a mean-reverting behavior, meaning that they tend to move around a mean value. Again visually, it seems that Crude Oil and Heating Oil are related, as well as Ethanol and Gasoline.
-1
-0,5
0
0,5
1
1,5
2
2,5
3
1
45
89
133
177
221
265
309
353
397
441
485
529
573
617
661
705
749
793
837
881
925
969
1013
1057
1101
1145
1189
1233
1277
CL
HO
NG
GS
ET

If we plot daily returns, we get the following chart : from the top to the bottom, Crude Oil, Ethanol, Gasoline, Heating Oil and Natural Gas:
The behaviour is completely different. The values are moving around zero. some series (Crude Oil, Ethanol, Heating Oil) show larger daily variations if compared to other series (Gasoline, Natural Gas).

Flow Diagram
In the flow chart below we see the major steps we will follow in this project.

Augmented Dickey-Fuller Test
We may be interested, in the individual time series first . Univariate time series are integrated if can be brought to stationarity through differencing.
Using the Augmented Dickey Fuller test we can test the individual time series and see if they are stationary
The following table summarizes the results:
H0
PValue
Stat
Crude Oil
0
0.4513
-0.5477
Ethanol
0
0.8781
0.7621
Gasoline
0
0.7175
0.1787
Heating Oil
0
0.5804
-0.1955
Natural Gas
0
0.4215
-0.5561
For all time series ,the null hypothesis of unit root is not rejected, the price series are not stationary, they are probably integrated. To make the series stationary we could take the differences. The number of differences that we have to take to make the series stationary is the order of integration
For all five series the order of integration is equal to one, can be compared to variables AR1
We repeat the ADF test for the daily returns series:
H0
PValue
Stat
Crude Oil
1
<Min
-37.55
Ethanol
1
<Min
-36.28
Gasoline
1
<Min
-36.76
Heating Oil
1
<Min
-36.45
Natural Gas
1
<Min
-40.53

The result is obviously completely different, in all the cases the null hypothesis is rejected and the series are stationary and non-integrated.
The AR models are normally used to study stationary time series, when we speak of multivariate time series models we refer to VAR (Vector Auto-Regression) models.
We will now use VAR models to analyze the returns of the five energy futures.
Vector Autoregressive Models
VAR is a simple and useful model for modeling our vectors of returns . We will think in terms of a model like the following:
Yt is a vector [n:1] e A is a [n:n] matrix of the coefficients of the lagged variable Yp . In this case the lag of the model is equal to 1.
Determining an appropriate number of lags
Among the various methods to derive the most appropriate number of lags, we will use Akaike Information Criterion, which requires various values : the likelihood and the number of active parameters in the model.
In practice, we can quickly obtain these data modeling our VAR for different lag (1,2,3,4 ...), keeping in mind that the first values are the most likely. To obtain the likelihood in Matlab, simply type LLF after the estimate of the model parameters. To derive the number of active parameters:
[NumParam,NumActive]=vgxcount( Model name )
To calculate Akaike Information Criterion
AIC = aicbic([LLF1, ...LLFn],[Np1,...Npn])
where LLF indicates the likelihood and Npn indicates the nth number of active parameters.

The lowest values of the AIC indicates the best lag.
VAR(p)
Likelihood
NumParam
AIC
1
1.5890e+004
5
-31770
2
1.5936e+004
5
-31862
3
1.6109e+004
5
-32208
4
1.6188e+004
5
-32366
Obviously, we will choose a VAR (1), model, ie with lag equal to one.
VAR(1) Parameters Estimation
In order to estimate the model using Matlab we will follow the following steps:
1. import stationary time series, collected in a matrix in excel with a series of returns in each of the columns and a number of rows equal to the observations.
2. Create the VAR model
We want to build a VAR model with one lag , a constant and five series:
Model = vgxset('n',5,'nAR',1,'Constant',true)
3. Fit the model to the data
We also want to find the values of the constants, parameters and of the covariances of the innovations:
[EstSpec,EstStdErrors,LLF,W] = vgxvarx(Model, DataMatrix);
and obviously we want to see the results
vgxdisp(EstSpec,EstStdErrors)
Then we obtain the estimates of the parameters:

and the covariance matrix of the residuals
Stability Check
Once fitted the model, we can control the stability of the model, given that we have no MA elements, having only AR model, the model is invertible by definition.
[isStable, isInvertible] = vgxqual(Model);
The answer is a logical operator (0.1) which represent the rejection and acceptance of the hypothesis of stability and reversibility.

In our case the answer (ans) is: (1.1). The model is stable and invertible.
Forecasts using a VAR model
We can use the estimated VAR model to make predictions about future values of the series studied.
[ypred,ycov] = vgxpred(Model, [],5,[],[])
Is an iterative instruction that uses the model we built and estimated to make 5 predictions about future changes in the futures prices.

-0,05
0
0,05
0,1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CL
CL
-0,1
-0,05
0
0,05
0,1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
HO
HO
-0,05
0
0,05
0,1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
GS
GS
-0,1
-0,05
0
0,05
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ET
ET
-0,1
-0,05
0
0,05
0,1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
NG
NG

We could check , later in this project, whether these changes are consistent with the forecasts of our VEC models on closing prices .
Closing Prices Time Series
We have already seen, with the ADF tests, that time series of prices are not stationary. We want a confirmation from the KPSS test, which evaluates the null hypothesis that a univariate time series y is trend stationary against the alternative that it is a unit root . We want this series to be integrated.
[h0,pVal0] = kpsstest(TimeSeries,'trend',false)
KPSS Test
H0
PValue
Crude Oil
1
>0.01
Ethanol
1
>0.01
Gasoline
1
>0.01
Heating Oil
1
>0.01
Natural Gas
1
>0.01
The results show that we accept the hypothesis that the processes are integrated in all the time series of derivative prices. We may calculate the order of integration of each series obtaining the number of differences required to make the series stationary.
Returning to the flowchart, rejecting the hypothesis of stationarity and having an indication of integrated processes, we continue in the right part of the scheme and apply a first test of cointegration.

Engle-Granger Test for Cointegration
To get information about the presence of a cointegration relationship, we will run the Engle- Granger test and T-test, this time over the entire Matrix of our futures prices.
The test has the form:
Y(:,1)=Y(:2:end)*b+X+a+e
On the left side we have the regressand , the first series, while on the right-side ofthe equation , from 2 to five in our case, we have the regressors. The key factor here are the residuals, to be more precise, the estimates of residuals. If the residuals series is stationary, the linear combination of variables is stationary
[hEG,pValEG]=egtest(DataMatrix,'test',{'t1})
we obtain the following results:
H0
PValue
t
Engle-Granger
1
0.0615
---
t.statistic
1
----
0.1
Both tests indicate the presence of cointegration in the matrix of the values of the derivatives. At this point, we want to identify the cointegration relationship.
We extract the vector of parameter b and the intercept c0 obtained running the function egcitest and form a linear combination of regression:
c0=reg.coeff(1);
b=reg.coeff(2:5);
plot(Y*[1;-b]-c0,'LineWidth',2)

We have a new variable that is the linear combination of the five futures.
We can see how the series is relatively stationary, moving around zero, with different clusters of volatility. That's another indication that there is a cointegration relationship.
The models that are used for the cointegrated systems are the Vector Error Correction Models (or cointegrated VAR ).

Vector Error Correction Models
Once a cointegration relationship has been determined, the remaining coefficients of the VEC model can be estimated using Ordinary Least Squares. Cointegrated variables tend to restore common stochastic trend, expressed in terms of error correction. The expression for a VEC (q) model ,where q is the number of lag, is the following:
Estimating a VEC Model
We said that after finding the cointegration relationship, we can determine the coefficients of the model. The term with the summation in the VEC model is similar to the VAR model.
The term that is really different is AB'yt-1. A represents the speed of adjustment to the imbalances of the model. Dx represents an exogenous variables, (not present in our case).
The matrix product AB represents our error correction coefficients

Residuals estimate Covariance Matrix

Simulations and Forecasts
Once the model coefficients are estimated, the underlying data generation process can be simulated. For example, the code in the script generates a single path of Monte Carlo
forecast:

In these days we can test the effectiveness of the forecast compared to the evolution of the markets, but it is clear that the predictions of such models lose value when it exceeds one or two periods , especially with daily data and without seasonal or exogenous element.
Limits of Engle-Granger Regression
The Engle-Granger method has several limitations. First, it identifies only a single cointegration relationship. This requires one of the variables to be identified as "first" among all the variables. This choice, which is usually arbitrary, will affect both test results and model estimation.
We try to go a bit further, permuting the five series and estimating the cointegration relationship for any choice of a variable as regressant
The table shows the results of the t statistic:

H0
PVal
1
0.0010
1
0.0010
1
0.0063
1
0.0010
1
0.0010
In our case, there is not much difference in choosing any of the five series as regressant and the other four as regressors.
Here we can see the five cointegration relationships detected by the permutation, the scale penalizes the different values but all relationships are stationary and mean-reverting.

Another limitation of the Engle-Granger method is that it is a two-steps procedure, with a first regression that estimates the residual series, and another regression to verify the unit root. Errors in the initial estimate are necessarily brought in the second evaluation.
Furthermore, the Engle-Granger method for the estimation of the cointegration relationships play a role in the VEC model definition. As a result, the VEC model estimates also becomes a two-step procedure
Johansen Test for Cointegration
The Johansen test for cointegration addresses many of the limitations of the Engle-Granger method. It avoids the two-step estimators and provides comprehensive tests in the presence of multiple cointegrating relationships.
His approach incorporates the maximum-likelihood test procedure in the process of estimating the model, avoiding conditional estimates. Furthermore, the test provides a framework for testing restrictions on cointegrating relationships.
The key point in the Johansen method is the ratio between the degree of the impact matrix
C = AB' and the size of its eigenvalues. The eigenvalues depend on the shape of the VEC model, and in particular on the composition of its deterministic terms. The method relies on the rank of cointegration by testing the number of eigenvalues that are statistically different from 0.
We will now run the test for the cointegration rank using the H1 default model,the form of the H1 model is:
A(B'yt-1+C0)+C1
[~,~,~,~,mles] = jcitest(Y,'model','H1','lags',2,'display','params');
The term mLes refers to the fact that the test procedure is based on the Maximum Likelihood method.

The results of the Johansen test for Cointegration give us more information than the Engle- Granger test. We will not take in account the case of rank equal to zero (VAR) and the case of rank equal to 5 (the data are stationary in value).
************************
Results Summary (Test 1)
Data: Y
Effective sample size: 1294
Model: H1
Lags: 1
Statistic: trace
Significance level: 0.05
r h stat cValue pValue eigVal
========================================
0 1 198.0842 95.7541 0.0010 0.0988
1 0 63.4945 69.8187 0.1443 0.0247
2 0 31.1656 47.8564 0.6613 0.0134
3 0 13.7345 29.7976 0.8550 0.0080
4 0 3.3247 15.4948 0.9503 0.0026
5 0 0.0048 3.8415 0.9445 0.0000
As expected, the null hypothesis is rejected for Rank equal to zero, while the null hypotheses are not rejected for the ranks from 2 to 4.
All the statistics (stat, cValue, pValue, eigenvalues) indicate the rank = 4 model as the most appropriate.
Parameters Estimation
In addition to the test for cointegration relationships, the test produces maximum likelihood estimates of the coefficients of the VEC model. We estimate the parameters for a VEC model with lag = 1 and rank = 4,

Comparing the Cointegration Analysis Strategies
Comparisons of Engle-Granger and Johansen approaches may be difficult for several reasons. First of all, the two methods are essentially different, and may disagree on inferences from the data itself

The Engle-Granger two-step method for the estimation of the model-VEC first estimates the cointegration relationship and then the coefficients of the model. It's very different from Johansen method of Maximum Likelihood.
However, the two approaches should provide results that are generally comparable, if both begin with the same data , searching for the same underlying relationships. Normalized cointegrating relationships discovered by one of the two methods should reflect the mechanisms of the process in the data and VEC models constructed from the reports should have comparable predictive power.

In our case the cointegration relationship obtained with the Engle-Granger and Johansen tests are very similar. when the results converge we get an important confirmation, given that we are using different methodologies.
Conclusions
Having said that the forecasting power of the econometric models is questionable, there are possible practical uses of these models possible.
In particular, even the VEC models obtained by the Johansen, procedures can be used to make predictions and seem to be more accurate.
We could, for example, to study the effect of these five series on the price of energy and we could study it at different timeframe (daily, hourly, high frequency). In doing so, we could insert exogenous variables, such as the dollar index or meteorological factors.
To improve the effectiveness of these models could be very useful to the use of genetic algorithms (included in Matlab and other software).

Multivariate time series

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Multivariate time series

Similar to Multivariate time series (20)

Recently uploaded

Recently uploaded (20)

Multivariate time series