HEHEH

Comparison of Different Methods in Forecasting
Stocks’ Returns or Prices
Zhicheng Li/Sirui Zhang/Haoran Jiang
Abstract
In this paper, four models are built in order to explain stocks behav-
ior, and the corresponding methods are used to forecast stocks’ returns
or prices in S&P 500 universe. All the forecasting results are compared
with the real values. It is shown that the traditional time series meth-
ods, including univariate (in AR model) and mutivariate (in VAR model)
methods, give little forecastability. On the contrary, the methods based on
statistical arbitrage, i.e, the Pair Trading and Market Neurtral model, per-
form much better. Meanwhile, we introduce some statistical techniques,
such as Principle Components Analysis (PCA) and mean-reversion con-
cept. Finally, Econometrics and statistic analysis are attempt to give a
reasonable interpretation.
1 Introduction
Forecasting is an everlasting topics not only in Economics but also in Fi-
nance. In the stock market, the incentive to make a good forecasting is
particularly strong, in the sense that people who have a better prediction
would make more money. Therefore, a lot of researches have been done
and various models and methods have been proposed and used. Before the
age of computers, people traded stocks and commodities mainly on intu-
ition. As the level of trading and the technology grew, people searched for
tools and methods that would increase their gains meanwhile minimizing
their risk. Statistics, fundamental analysis, and linear/non-liner regres-
sions are all attempt to predict and benefit from the markets direction [5].
In recent studies, some new techniques, such as Neural Network, Hidden
Markov Method(HMM) and Genetic Algorithms (GA), are used to fore-
cast stocks’ activity [9][10][13]. None of these techniques has proven to be
consistently correct as desired, and many skeptics argue about the utility
of many of these approaches. However, these methods are commonly used
in practice.
In our paper, we present four models with the application to S&P 500
stocks market. In each model, we state the concrete method for forecast-
ing. Given a particular time window in S&P universe, we forecast the
stocks’ prices or returns, then we compare the forecasting results with the
real values by calculating correlations. At last we look at the performance
of each method. The first model we start with is Auto-Regression (AR)
1

model, which is broadly used in time series analysis [5][7]. It assumes
that stock behaves in an autocorrelated and stochastic way, and is not
correlated with other stocks/factors. Basically, this method attempts to
model a linear function by a recurrence relation derived from past values.
The recurrence relation can then be used to predict new values in the time
series, which hopefully will be good approximations of the actual values.
While in the second model, we think that two stocks, especially in one
common industry, are tent to be correlated, i.e., a pair of stocks’ prices
are possibly to have a statistical relationship, called cointegration. We dig
out this property and implement pair trading in the second model. This
model is the ancestor of statistical arbitrage, which now is a widely used
method in the investment area [16][14]. In the third model, we extend
our idea to the point where individual stock is very possibly influenced
by whole market. We hope to find those common market factors that
each stock may depend on. Therefore, a statistical method, called Prin-
ciple Component Analysis (PCA), is employed to extract these common
market factors [17], i.e., Principle Components(PCs). By regressing each
stock on PCs, we infer their relationship, and further by VAR model,
which is a multivariate time series model, we forecast how PCs evolute.
Then we put the predicted PCs back to the original regressions and fore-
cast individual stocks. The last model we apply is market neutral model,
in which we form a portfolio whose expected returns are nothing related
with the market fundamentals. In spite of how the market fluctuates, the
portfolio’ return is just a stationary mean-reverting process. By using
mean-reversion, which is a very important technique in statistical arbi-
trage [3], we look for the opportunities that would give us large expected
returns, and then compare these returns with real values.
The structure of our paper is organized as below. In Section 2, we
introduce the data of S&P 500 stock market that we are using, and we
further diagnose and discover some property of this data set. Section 3 are
divided into four parts. Each part set forth a model of studying stocks’
behaviors and a method of how to forecast stocks’ prices/returns in our
case. Then in Section 4, we show the results of these four methods and
compare their performance. A detailed and reasonable analysis is also
tried. At last, we make a conclusion in the final Section.
2 Data and Stylized fact
In this paper, we use a database of S&P 500 (Standard & Poor’s 500)
from year 1989 to 2012. The data source is from CRSP (Center Research
Security Price), which is part of University of Chicago and renowned for
its expertise in building and maintaining historical, academic research-
quality stock market databases. The reason to choose S&P 500 is that it
comprises nearly 500 common stocks issued by 500 large-cap companies,
and covers about 75 percent of the American equity market by capitaliza-
tion. Meanwhile, S&P 500 indice is one of the most commonly followed
equity indices, and many consider it one of the best representations of the
U.S. stock market, and a bellwether for the U.S. economy [1] (See Figure
1).
2

Figure 1: Historical S&P 500 Earning and US Nominal GDP
The components of the S&P 500 are selected by the committee. This is
similar to the Dow Jones Industrial Average, but different from others such
as the Russell 1000, which are strictly rule-based. When considering the
eligibility of a new addition, the committee assesses the company’s merit
using eight primary criteria: market capitalization, liquidity, domicile,
public float, sector classification, financial viability, length of time publicly
traded and listing exchange [2]. The committee selects the companies in
the S&P 500 so they are representative of the industries in the United
States economy. In order to be added to the index, a company must satisfy
these liquidity-based size requirements: i) market capitalization is greater
than or equal to US4.0 billion; ii) annual dollar value traded to float-
adjusted market capitalization is greater than 1.0; iii)minimum monthly
trading volume of 250,000 shares in each of the six months leading up
to the evaluation date. Therefore, companies in S&P 500 are not static.
Sometimes, one company may dropped out from the list, and sometimes
another new company entered. That’s why we could see 1127 stocks’
records in our data.
The stocks’ prices in this data set are End-of-Day prices. As we have
roughly 252 business days a year, there are 5799 time records. In addition,
these prices are adjusted for including dividends and expanding shares.
Thus, the tendency of one stock prices can reflect the market value of that
company. Moreover, we normally think price’s increment is proportional
to itself, so the trend of one stock prices is exponential (See Figure 2)
and the log-prices would be I(1) process, which means the log-returns
(first differences of log-prices) are stationary (Stock and Watson (1988b)).
Table 1 is the results of ADF tests for all the stocks, which evidently
show that log-prices are basically I(1) process which have unit root and
log-returns are stationary process.
3

Figure 2: Five S&P 500 Stocks Prices’ Evolution
As we have a long time series in broad universe of U.S equities, we
could use back-testing method to compare diﬀerent methods for forecast-
ing stocks’ prices/returns. The principle is following: we set two param-
eters, i.e., historical window and forecasting window. Given the data in
historical window, we anticipate the prices/returns in forecasting window,
and then compare them with the actual data. The historical window can
move over time, so we can get a series of comparison results and make
a judgment. Another issue is that within a particular historical window,
some companies are not belong to S&P 500 or have no data, we need
reﬁne the dataset to those stocks who continuously existed in that period.
Standard & Poor believes that turnover in index membership should be
avoided whenever possible. Hence companies which were added to the
index usually stays in the index unless too many of the addition criteria
has been violated or if the company no longer exist due to mergers and
acquisitions [2]. Thus even it has the selection base which we have men-
tioned before, within the certain historical window that is not too long,
we can think that stocks behave naturally.
Table 1: Results of ADF tests for log-prices and log-returns processes
H0: have a unit
root (5% level)
Ratio of stocks
that accept
Ratio of stocks
that reject
log-prices 95.08% 4.92%
log-returns 0 100%
4

3 Models and Methods
3.1 Simple Autoregression Model
At the beginning, let us use a very simple model, that is autoregres-
sion(AR) model, which is widely used in single time series problem. Sup-
pose we are interested in forecasting the value of a variable Yt+1 based
on a set of variables Xt observed at date t. In this case, Xt consist of a
constant plus Yt, Yt−1, . . . Yt−m+1. Common methodology is to choose the
forecast Y ∗
t+1|t, so as to minimize
E(Yt+1 − Y ∗
t+1|t)2
(1)
which is mean squared error. Y ∗
t+1|t has a function form g(Xt) based on
the current information, then the last equation is to find the function
g(Xt) that minimize
E(Yt+1 − g(Xt))2
(2)
When we use linear projection, i.e, g(Xt) is a linear combination of Yt,
. . . Yt−m+1, equation 2 becomes a AR model. In our papaer, we just
choose two lags and have the regression model:
Yt − u = φ1(Yt−1 − u) + φ2(Yt−2 − u) + εt (3)
The reason for using two lags linear projection other than some other
methods (AIC/BIC) in determining lags [8] or using non-linear models is
that we think there is a trade-off between the size of samples, the numbers
of parameters to be estimated, and the credibility of the model we have.
Many parameters to be estimated might cause the lack of precision due
to the estimation process. And because we don’t have a ‘true’ model
governing stock prices/returns (Black (1986)), as long as what we have
built is effective to some extend as we expect, we could use it.
Back to the equation 3, if we could assume E(εt | Yt−1, Yt−2) = 0
and the process {Yt, [Yt−1, Yt−2]} is covariance-stationary and ergodic for
second moments, then the OLS regression yields a consistent estimate for
coefficients (Hamilton (1994)). Or, we transfer equation 3 to the form:
φ(L)(Y − u) = εt (4)
where the autoregressive operator φ(L) = (1−φ1L−φ2L2
). As long as all
the roots of φ(z) = 0 lie outside the unit circle, the autoregression satisfies
the stationary condition.
In this AR model, we choose log-returns which are already stationary
process as our forecasting object. Specifically, if we define Yit as the log-
return of stock i at time t, then equation 3 becomes
Yit = β0i + β1iYit−1 + β2iYit−2 + εit (5)
If the previous assumptions hold, we could apply OLS to this regression
and get consistent estimator ˆβki, (k = 0 . . . 2, i = 1 . . . N). Here we should
notice that this is not a panel data regression. They are different regres-
sions for different stocks, and the coefficients vary between stocks. Further
5

more, we set the length of the moving historical window as 1000 days, and
we want to forecast the next day return E(Yit+1) of stock i, which is
E(Yit+1) = ˆβ0i + ˆβ1iYit + ˆβ2iYit−1 (6)
At last we compare the forecast returns with real returns, and the results
are shown in next section.
3.2 Pair Trading Model
The assumptions in the previous model are very strong. It is unlikely
that stocks changes by themselves and are uncorrelated with others. In
other words, it is more plausible to think that stocks are possibly corre-
lated, especially in the same industry. Figure 3 shows a example that the
prices’ evolutions of two stocks in the same industry ‘Petroleum Refining’
(SIC:2911) from year 1989 to 1990, and it seems that they are highly cor-
related. Hence, in this model, we adopt one relationship which commonly
used in time series, i.e., cointegration, to analysis. Other than dealing
with log-returns, which are stationary process, we consider the log-prices
that are integrated of order 1. If stocks i and j are in the same industry
or have similar characteristics, one expects by hedging one stock on the
other to get positive profit (see Pole (2008)). Particularly, denote Pit and
Pjt as the corresponding price series, when we can model them like
ln(Pit) = αt + βln(Pjt) + Xt (7)
where Xt is a stationary, or a mean-reverting process. Then the relation
between these two log-prices which are I(1) series is cointegration. By
taking first difference of equation 7, log-returns should be satisfied
ln(Rit) = αdt + βln(Rjt) + dXt (8)
In many situation, the drift α is small compared to the fluctuations of Xt
and can be neglected. Thus the mean-reversion of Xt suggests us that we
could form a long-short portfolio in which we go long 1 dollar of stock i
and short β dollars of stock j if Xt is small. And conversely, go short stock
i and long j if Xt is large. Both situations are expected to get positive
returns. This mean-reversion paradigm is typically associated with market
over-reaction: assets are temporarily under or over priced with respect to
one or several reference securities (Lo and MacKinlay (1990)).
For our dataset, the concrete method is described as below. At first
within one historical window, we find a pair of stocks which are cointe-
grated without deterministic trend under certain industry (in our data,
we use SIC code to identify the industries). Denote them as stock i and
j, by regressing one on the other, we have:
ln(Pit) = βln(Pjt) + Xt (9)
And correspondingly, for log-returns,
ln(Rit) = βln(Rjt) + dXt (10)
6

Figure 3: Prices of two stocks in ‘Petroleum Refining’ industry from 1989 to
1990
As the Xt is stationary process and we expect to find mean-reverting
property, we use AR(1) model to do diagnose Xt:
Xt = β0 + β1Xt−1 + εt (11)
Subtracting both sides by Xt−1, we get
dXt = β0 + (β1 − 1)Xt−1 + εt (12)
The mean-reversion requires (β1 − 1) < 0, and the more negative, the
more mean-reverting. Therefore, the next step is to, within the particular
historical window (t=1. . . T), search all the stocks, find the top ten mean-
reverting pairs, and denote them as {i∗
, j∗
}10. Then for these ten pair-
trading portfolio, we need forecast their next day returns. By putting
T+1 to the equation 10, it becomes
ln(Ri∗T +1) − βln(Rj∗T +1) = dX∗
T +1 (13)
which means that long 1 dollar stock i∗
and short β dollars j∗
would give
us a expected return ET(dX∗
T +1). What’s more, from equation 12, it is
easy to see
dX∗
T +1 = β∗
0 + (β∗
1 − 1)X∗
T + ε∗
T +1 (14)
If we have the valid assumption ET(ε∗
T +1) = 0, which is also the require-
ment for getting a consistent estimator in AR(1), we could derive the
result:
ET(dX∗
T +1) = ET{ln(Ri∗T +1) − βln(Rj∗T +1)}
= β∗
0 + (β∗
1 − 1)X∗
T
(15)
showing that the expected returns in next day (T+1) of this pair trading
are just β∗
0 + (β∗
1 − 1)X∗
T . Then we can compare the forecasting returns
with the real returns by using pair trading, which is just ln(Ri∗T +1) −
βln(Rj∗T +1) located in the forecasting window. The results of comparison
will be shown in next part.
7

Moreover, if we want form a strategy to make more money, within the
ten pairs that are chosen by us, we select the pair (i∗∗
and j∗∗
) whose
absolute expected returns equals max{|β∗
0 +(β∗
1 −1)X∗
T |}, and just do pair
trading for that pair. If the expect return value is positive, we just long
1 dollar i∗∗
stock and short β dollars j∗∗
stock. While when the value is
negative, on the contrary, we short 1 dollar i∗∗
stock and long β dollars j∗∗
stock. Both cases give us the positive return, i.e, max{|β∗
0 +(β∗
1 −1)X∗
T |}.
3.3 VAR Model
From the previous model, we could see that cointegrated time series share
at least one common trend. Both causal observation and economic the-
ory suggest that many series might contain the same stochastic trend so
that they are cointegrated. If each of n series is integrated of order 1 and
can be jointly characterized by k < n stochastic trends, then the vector
representation of these series has k I(1) processes and n − k distinct sta-
tionary linear combinations. A technique proposed by Stock and Watson
(1988a) claim that we can extract common stochastic trends by Principal
Components Analysis (PCA). As we already know that log-prices is I(1)
process, we can regress each log-prices process on these cointegrated Prin-
cipal Components (PCs), then the residual we get should be stationary.
Or we can directly use log-returns which are already stationary process,
then the principal components and the residuals after regression are all
stationary.
Here we briefly introduce PCA. PCA is a statistical method that uses
an orthogonal transformation to convert a set of observations of possibly
correlated variables into a set of values of linearly uncorrelated variables
called principal components. This transformation is defined in such a way
that the first principal component has the largest possible variance, that
is, accounts for as much of the variability in the data as possible. And
each succeeding component in turn has the highest variance possible under
the constraint that it is orthogonal to the preceding components. Thus,
we can preserve most of the information of original data and meanwhile
achieve the purpose of reducing the dimension of dataset, i.e., get small
numbers of common stochastic trends.
The detail procedure for our case is following. Within one historical
window (t=1. . . T, i=1. . . N), we first standardized the volatility of each
stock’s log-prices (pi).
Yit =
pit − ¯pi
¯σi
(16)
where
¯pi =
1
T
T
t=1
pit ; ¯σ2
i =
1
T − 1
T
t=1
(pit − ¯pi)2
Then we calculate the covariance matrix of Yit (here is also the correlation
matrix). It is defined as C, and
Cij =
1
T − 1
T
t=1
YitYjt (17)
8

which is symmetric and non-negative definite. Notice that, for any stock
i, we have Cii = 1. The next step is to consider the eigenvectors and
eigenvalues of the covariance matrix. Define V as the eigenvectors matix
and λ as corresponding eigenvalues, i.e,
[V λ] = Eig(C); (18)
As Vi (i = 1 . . . N) are the eigenvectors of the covariance matrix, they are
orthogonal to each other. These eigenvectors can form a set of orthogonal
bases of another space. When we rank the eigenvalues in decreasing order:
N ≥ λ1 ≥ λ1 ≥ λ1 ≥ . . . ≥ λN ≥ 0
and define V1, V2, V3 . . . VN as the corresponding eigenvectors. A spectrum
of eigenvalues shows that they only contain a few large eigenvalues (See
Figure 4). We can then choose top K eigenvectors which correspond to the
biggest K eigenvalues. From Jolliffe (2005), we know that the projection
of original data on these top eigenvectors V1, V2, V3 . . . VK (also principal
bases in new space) can preserve most of the information.
Figure 4: Eigenvalues of the correlation matrix of stocks’ log-prices computed
on the first historical window (t=1. . . 100)
Thus, we project the log-prices data in the historical window on these
top eigenvectors and get K principal components (Fk, k = 1 . . . K):
Ftj =
N
i=1
Vji
¯σi
pti t = 1, . . . , T j = 1, . . . , K; (19)
For each stock’s log-prices process, we regress it on those common trends:
pi = θi0 +
K
j=1
θijFj + δi i = 1, 2, . . . , N. (20)
As they are cointegrated, and if we can claim that the disturbance item
is uncorrelated with PCs, the OLS estimator ˆθij, (i = 1 . . . N, j = 0 . . . K)
9

are consistent. The next step is that, rather than auto-regressing each
single log-price process and forecast, we use Vector Autoregression (VAR)
model to forecast these common trends (PCs) and combine them together
to estimate each log-prices process by putting them back to the original
regression equation 20. A VAR(p) model is written as an vector autore-
gression over the previous p values of the series, in this case:
#»
F t = #»c + φ1
#»
F t−1 + · · · + φp
#»
F t−p + #»ε t (21)
where
#»
F t =



F1t
...
FKt


 ; #»c =



c1t
...
cKt


 ; #»ε t =



ε1t
...
εKt


 ; φs = {φs
ij}K×K (22)
And putting forecasting value of
#»
F t+1 into equation 20, we have
ˆpit+1 = ˆθi0 +
K
j=1
ˆθijFjt+1 (23)
The principle of this method is that, rather than treating the evolution
of stock price as a spontaneous and endogenous process, we think it is
highly correlated with the whole market. As it is impossible to regress
each stock on the whole set of other stocks, we extract a small numbers of
common stochastic trends which can largely represent the whole market.
By the evolution of these trends, we capture more information which
would influence the single stock’s behavior. Indeed, we will encounter
similar econometrics problem as we were doing single series autoregression.
And it is hard for us to justify the valid of those assumptions. However,
as long as this model could increase the forecastability, it is effective to
some extent.
3.4 Market Neutral Model
Stocks’ prices or returns are apparently influenced by market fundamen-
tals. However, it is hard to build a model and take all possible factors into
account for explaining and forecasting fundamentals. Therefore, in this
section, we consider a statistical arbitrage model, in which the portfolio’s
return is not impacted by market fundamentals. The common features of
statistical arbitrage are (i) trading signals are systematic or rules-based,
(ii) the trading portfolio is market-neutral, in the sense that it has zero
beta with the market, and (iii) the mechanism for generating excess re-
turns is statistical. The idea is to make many bets with significant positive
expected returns in the appropriate time, and produce a low-volatility in-
vestment strategy which is uncorrelated with market.
Here we take reference of the paper by Avellaneda and Lee (2010) and
build this model. First we form principal components of log-returns of
S&P500 stocks in a certain period. For example, if we are at time T and
need forecast the next period stocks’ returns, we use the past 60 days of
record, i.e, the historical window is chosen as 60 days. Following the same
10

principle in last section, we choose the most significant K eigenvectors
that correspond to the biggest K eigenvalues. Define these vectors as
Vi, (i = 1 . . . K). Then we project log-return matrix (60 × N) on these
eigenvectors and form K market factors.
Ftj =
N
i=1
Vji
¯σi
Rti j = 1, . . . , K; t = (T − 59), . . . , T (24)
Where Ftj is the jth market factor at time t. We should notice that these
market factors are dynamic because they would change as the historical
window moving forward.
Then we regress each stock’s log-returns on these market factors
Ri = mi +
K
j=1
βijFj + ˜Ri i = 1, 2, . . . , N. (25)
Of course returns, principal components and the residuals are all station-
ary, and we could assume E( ˜Ri) = 0. The proposed strategy is to look for
those regression residuals that have the most significant reverting process.
Thus, we auto-regress each ˜Ri and find those residuals that have highest
negative autoregressive coefficient.
˜Rit = ρi
˜Rit−1 + it i = 1, 2, . . . , N. (26)
Figure 5 shows the top five mean-reverting residuals in the first historical
window.
Figure 5: The top 5 mean-reverting residuals in the first historical window
A trading portfolio which contains n stocks is said to be market-neutral
if the dollar amounts {Qi}n
i=1 invested in each stock in this portfolio are
satisfied:
¯βj =
n
i=1
βijQi = 0, j = 1, 2, . . . , k. (27)
11

βij is the coefficients of stock i regress on factor j. In code, we use Null
space to solve this linear system
Q = Null{β[K]×[n]} (28)
In order to guarantee a non-zero solution for the portfolio, we need choose
n = K+1 stocks, which have the smallest K+1 autoregressive coefficients,
as our portfolio member. Then we have
K+1
i=1
QiRi =
K+1
i=1
Qimi +
K+1
i=1
Qi
K
j=1
βijFj +
K+1
i=1
Qi
˜Ri
=
K+1
i=1
Qimi +
K+1
i=1
Qi
˜Ri +
K
j=1
K+1
i=1
βijQi Fj
=
K+1
i=1
Qi(mi + ˜Ri)
(29)
In this equation, it is obviously to see that the portfolio return has nothing
to do with market environment. And it is depend on the intrinsic factor
mi and a statistic random variable ˜Ri, which is mean zero and stationary
process satisfy mean-reversion.
The next step is to generate signals for entering trading. Loading
auto-regressing expression 26 into equation 29, we have:
K+1
i=1
QiRit =
K+1
i=1
Qi(mi + ˜Rit) =
K+1
i=1
Qi(mi + ρi
˜Rit−1 + it) (30)
Suppose we are at the last period T of historical window, from above
equation, we expect the portfolio return at T+1 is
ET
K+1
i=1
QiRiT +1 = ET
K+1
i=1
Qi(mi + ρi
˜RiT + iT +1)
=
K+1
i=1
Qi(mi + ρi
˜RiT )
(31)
When K+1
i=1 Qi(mi + ρi
˜RiT ) is very high (positive), we could buy this
portfolio and expect to get a high return. While when K+1
i=1 Qi(mi +
ρi
˜RiT ) is sufficiently negative, we could short this portfolio, and still ex-
pect to get a high return. Thus, we could directly use K+1
i=1 Qi(mi +
ρi
˜RiT ) as our trading signal, where ρi is negative coefficient. Define the
signal as ST . In our strategy, we set the trading entry criteria are:
1. if ST −mean{St} ≥ 0.7(max{St}−mean{St}), t = (T −59), . . . , T
Enter trading, long the stocks whose Qi are positive by the amount
of | Qi |, short the stocks whose Qi are negative by the amount of
| Qi |. This would give expected return as | K+1
i=1 Qi(mi + ρi
˜RiT ) |
2. if ST −mean{St} ≤ 0.7(min{St}−mean{St}), t = (T −59), . . . , T
Enter trading, long the stocks whose Qi are negative by the amount
of | Qi |, short the stocks whose Qi are posituve by the amount of
| Qi |. This would give expected return as | K+1
i=1 Qi(mi + ρi
˜RiT ) |
12

Finally, as historical window moving forward, we compare these expected
portfolio returns to the real portfolio returns and get a correlation result.
4 Comparison and Analysis
Table 2 shows the comparison results of these four different methods for
forecasting stocks’ log-returns in S&P 500 universe. Here need clarify
some parameters. In the time series models (AR and VAR), we use the
historical window across 1000 days. While in statistical arbitrage models,
followed by Avellaneda and Lee (2010), we use past 60 days’ records as
our information set for trading. ‘Common factors’ refer to the number
of other time series are used to forecast. In Pair Trading and Market
Neutral models, it means to the number of PCs that we used. As in the
last model, we use a signal to identify whether enter trading or not, we
see that the forecasting times is less than others.
Table 2: Comparison between four types of forecasting methods
Length of Common Forecasting Correlation
historical window factors used times with real returns
AR 1000 NA 1000 3.17%
Pair Trading 60 2 1000 13.89%
VAR 1000 5 1000 4.52%
Market Neutral 60 15 503 17.2%
From the table we can see that both AR and VAR models exhibit
little forecastability. In the AR model, we just investigate each stock’s
log-returns. We know that individual log-price processes are almost a
random walk process, in the sense that log-returns, that is the differences
of log-prices, are almost white noise. Even though they are stationary
process, it is still hard to forecast their following behaviors. While in
the VAR model, we want to capture more market information that would
impact stock’ behavior. Thus we switch to look at how common market
factors (PCs) evolute. Then by putting the forecasting values of PCs into
the original regressions, we get the predicted values for each stock. How-
ever, we see that the effect on forecasting each individual’s log-returns is
trivial. Therefore, it would not increase much opportunity to earn money.
Moreover, when we extend our forecasting window to five days, we find
that the accuracy of these two methods decrease as the forecasting period
increase (See Table 3). Overall, time series forecasting provides reason-
able credit over short periods of time, but the accuracy of forecasting
diminishes sharply as the length of prediction increases.
Nonetheless, from Table 2, we find the second and last model improve
a lot on the forecastility. The latent methodology in the second and last
models is mean-reversion, which is a mathematical concept sometimes
used for stock investing. This concept suggest that prices and returns
eventually move back towards the mean or average. Revisiting the equa-
tion 9 in Pair Trading model, we see that the pair of stocks’ log-prices
13

Table 3: Time series methods to forecast different periods
Length of Days to Correlation
historical window forecast with real returns
AR 1000 1 3.17%
VAR 1000 1 4.52%
AR 1000 5 1.38%
VAR 1000 5 1.90%
are cointegrated and the residuals after regression are supposed to move
around the average. By mean-reversion, we expect dXt have a negative
correlation with Xt. This is not only a property that we infer or extract
from data, but also supported by a theoretical model, i.e, OrnsteinUh-
lenbeck (O-U) process. In mathematics, the O-U process (see Gardiner
(1985)), is a stochastic process that describes the velocity of a massive
Brownian particle under the influence of friction. The process is sta-
tionary, Gaussian, and Markovian. Over time, the process tends to drift
towards its long-term mean: such a process is called mean-reverting. More
over, another important and widely used assumption in Finance is that
stock prices’ stochastical movement follows geometric brownian motion.
Thus, for the Xt in equation 9, we could apply O-U process and get:
dXt = κ(m − Xt)dt + σ · dWt, κ > 0 (32)
where m is the mean of Xt, dWt is the increment of brownian motion
(Wt ∼ N(0, t)), σ measures the volatility of movement, and the parame-
ter κ is called the speed of mean-reversion. This process is stationary and
auto-regressive with lag 1. In particular, the increment dXt has uncondi-
tional mean zero and conditional mean equal to
E{dXt|Xs, s ≤ t} = κ(m − Xt)dt
When Xt > m, we expect dXt be negative, and Xt < m implies a positive
dXt. A small transformation to equation 32 , we get:
dXt = κm · dt − κdt · Xt + σ · dWt (33)
Compare it with equation 9 in Pair Trading model, we find that they have
the same form, and
β0 = κm · dt, β1 − 1 = −κdt, εt = σdWt (34)
This on the other hand endorses AR(1) model which we used for the
process Xt. And finding the most negative coefficient β1 is equivalent to
finding the process which has the highest speed of mean-reverting.
For the last model, i.e., Market Neutral model, we used another method
to identify mean-reverting process. In stead of studying cointegrated
log-prices, we directly regress log-returns which are already stationary
process on the common market factors (PCs). The residuals after re-
gression(include constant item) are mean zero. But there is no rigorous
model to support that the residual series are mean-reverting around zero.
14

The relationship in equation 26, i.e., ˜Rit = ρi
˜Rit−1 + it, where ρi < 0
is basically an assumption. However, we looked for all the stocks, and
found those who are most possible to obey this relationship (See Fig-
ure 5). Therefore, for the stocks we have chosen, the residuals ˜Rit after
regression on the common market factors, are reasonable to assume oscil-
lating near zero. Then we could effectively apply mean-reversion method.
Nevertheless, we need pay attention that not all stationary processes are
mean-reverting, or can be used for mean-reversion. Moreover, if a ran-
dom walk I(1) process have mean zero, the probability for it crosses zero
is one, but the mean time to crossing zero is infinite. Thus, we could
neither apply mean-reversion to a random walk process in a direct way.
The reason for a relatively good performance of the second and last
model is that, in stead of focusing on forecasting variables themselves, we
pay attention on the residuals. Either by the existing theories or econo-
metrics analysis, we extract more information on the property of residuals,
which exhibit more forecastability. Just as the famous saying in Finance:
“Profit comes from residuals”. The other learning from our research is
that, there is not a ‘real’ model explaining stocks’ prices or returns in
Finance. All the existing theory are partially right, and all the model are
only valid when the assumptions are reasonable. For example, the funda-
mental assumption for the O-U process or the famous Black-Scholes model
is that the underling stock price St follows geometric brownian motion
St+1 − St = (r − q)Stdt + σStdWt
=⇒
St+1
St
= 1 + (r − q)dt + σ
√
dt · Z, Z ∼ N(0, 1)
(35)
which suggests that log-prices is a self auto-regressive process and not
impacted by others, however, we already found this is not proper most
time. There are too many variables and factors which could influence the
stock markets. Even one model can works well for a time, once many
people begin to use it, people’s trading and investment behavior would
conversely impact the market and may offset the utility of that model.
Hence, other than some Economics problem, Finance market are almost
full of noise (Black (1986)) and hard to model. The job is to find a little
bit useful information in the enormous environment, catch opportunity
and make money.
5 Conclusion
Practical experiments and back testing results illustrate that the tradi-
tional time series methods don’t work well. The models AR and VAR
which belong to univariate and multivariate time series analysis respec-
tively can only have less than 5 percent accuracy. When the forecasting
period increases, the accuracy decreases significantly. This suggests that it
is hard to derive a true recurrence relation that can be used to predict new
values. However, Pair Trading and Market Neutral models which based
on statistical arbitrage principle improve the forecastability to more than
10 percent. The idea is to form a pair or a portfolio whose returns only de-
15

pend on the values of residuals, and further by excavating mean-reversion
property of these residuals, we gain more forecastability.
References
(2012). Standard & Poor’s 500 index - S&P 500. Investopedia.
(2013). S&P Indice Methodology. Standard And Poor’s.
Avellaneda, M. and J.-H. Lee (2010). Statistical arbitrage in the us equi-
ties market. Quantitative Finance 10(7), 761–782.
Black, F. (1986). Noise. The journal of finance 41(3), 529–543.
Box, G. E., G. M. Jenkins, and G. C. Reinsel (2013). Time series analysis:
forecasting and control. John Wiley & Sons.
Gardiner, C. (1985). Stochastic methods. Springer-Verlag, Berlin–
Heidelberg–New York–Tokyo.
Hamilton, J. D. (1994). Time series analysis, Volume 2. Princeton uni-
versity press Princeton.
Hannan, E. J. and B. G. Quinn (1979). The determination of the order
of an autoregression. Journal of the Royal Statistical Society. Series B
(Methodological), 190–195.
Hassan, M. R. and B. Nath (2005). Stock market forecasting using hidden
markov model: a new approach. In Intelligent Systems Design and
Applications, 2005. ISDA’05. Proceedings. 5th International Conference
on, pp. 192–196. IEEE.
Hassan, M. R., B. Nath, and M. Kirley (2007). A fusion model of hmm,
ann and ga for stock market forecasting. Expert Systems with Applica-
tions 33(1), 171–180.
Hirsa, A. (2012). Computational methods in finance. CRC Press.
Jolliffe, I. (2005). Principal component analysis. Wiley Online Library.
Lawrence, R. (1997). Using neural networks to forecast stock market
prices. University of Manitoba.
Lo, A. W. and A. C. MacKinlay (1990). When are contrarian profits
due to stock market overreaction? Review of Financial studies 3(2),
175–205.
Miller, M. H., J. Muthuswamy, and R. E. Whaley (1994). Mean reversion
of standard & poor’s 500 index basis changes: Arbitrage-induced or
statistical illusion? The Journal of Finance 49(2), 479–513.
Pole, A. (2008). Statistical arbitrage: algorithmic trading insights and
techniques, Volume 411. John Wiley & Sons.
16

Stock, J. H. and M. W. Watson (1988a). Testing for common trends.
Journal of the American statistical Association 83(404), 1097–1107.
Stock, J. H. and M. W. Watson (1988b). Variable trends in economic time
series. The Journal of Economic Perspectives, 147–174.
17

HEHEH

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to HEHEH

Similar to HEHEH (20)

More from Sirui Zhang

More from Sirui Zhang (6)

HEHEH