This document compares different methods for forecasting stock returns and prices in the S&P 500 universe. It presents four models: 1) an autoregression (AR) model that forecasts individual stock returns based on past returns, 2) a pair trading model that forecasts returns of cointegrated stock pairs based on their statistical relationship, 3) a principal component analysis (PCA) model that extracts common market factors and forecasts returns based on these factors, and 4) a market neutral model that forms portfolios with returns unrelated to market fundamentals. The models are tested on S&P 500 stock data from 1989-2012 and their forecasting results are compared to real returns.
An Empirical Assessment of Capital Asset Pricing Model with Reference to Nati...ijtsrd
"This study concentrates on empirical assessment of Capital Asset Pricing Model CAPM on the National Stock Exchange NSE . CAPM assists to determine a well diversified portfolio. The main objective of this research paper is to check the applicability of Nobel laureate’s model in Indian equity market by testing the relationship between risk and return, whether there is any direct proportionality in the expected rate of return and its systematic risk. It relates its results by using the beta systematic risk as a measuring factor. The study was being conducted for a period of 260 weeks from 7 April 2013 to 25 March 2018. 45 companies from NSE were picked as a proxy for the market portfolio. This research was done by using regression analysis on stocks and portfolio to find out the final results. Research of this study nullifies that this model is applicable to the Indian market and also contradicts its expected return and systematic risk which are linearly related to each other. Miss. Yashashri Shinde | Miss. Teja Mane ""An Empirical Assessment of Capital Asset Pricing Model with Reference to National Stock Exchange"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Special Issue | Fostering Innovation, Integration and Inclusion Through Interdisciplinary Practices in Management , March 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23105.pdf
Paper URL: https://www.ijtsrd.com/management/public-sector-management/23105/an-empirical-assessment-of-capital-asset-pricing-model-with-reference-to-national-stock-exchange/miss-yashashri-shinde"
International Journal of Business and Management Invention (IJBMI)inventionjournals
International Journal of Business and Management Invention (IJBMI) is an international journal intended for professionals and researchers in all fields of Business and Management. IJBMI publishes research articles and reviews within the whole field Business and Management, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
An Empirical Assessment of Capital Asset Pricing Model with Reference to Nati...ijtsrd
"This study concentrates on empirical assessment of Capital Asset Pricing Model CAPM on the National Stock Exchange NSE . CAPM assists to determine a well diversified portfolio. The main objective of this research paper is to check the applicability of Nobel laureate’s model in Indian equity market by testing the relationship between risk and return, whether there is any direct proportionality in the expected rate of return and its systematic risk. It relates its results by using the beta systematic risk as a measuring factor. The study was being conducted for a period of 260 weeks from 7 April 2013 to 25 March 2018. 45 companies from NSE were picked as a proxy for the market portfolio. This research was done by using regression analysis on stocks and portfolio to find out the final results. Research of this study nullifies that this model is applicable to the Indian market and also contradicts its expected return and systematic risk which are linearly related to each other. Miss. Yashashri Shinde | Miss. Teja Mane ""An Empirical Assessment of Capital Asset Pricing Model with Reference to National Stock Exchange"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Special Issue | Fostering Innovation, Integration and Inclusion Through Interdisciplinary Practices in Management , March 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23105.pdf
Paper URL: https://www.ijtsrd.com/management/public-sector-management/23105/an-empirical-assessment-of-capital-asset-pricing-model-with-reference-to-national-stock-exchange/miss-yashashri-shinde"
International Journal of Business and Management Invention (IJBMI)inventionjournals
International Journal of Business and Management Invention (IJBMI) is an international journal intended for professionals and researchers in all fields of Business and Management. IJBMI publishes research articles and reviews within the whole field Business and Management, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
MODELING THE AUTOREGRESSIVE CAPITAL ASSET PRICING MODEL FOR TOP 10 SELECTED...IAEME Publication
Systematic risk is the uncertainty inherent to the entire market or entire market segment and Unsystematic risk is the type of uncertainty that comes with the company or industry we invest. It can be reduced through diversification. The study generalized for selecting of non -linear capital asset pricing model for top securities in BSE and made an attempt to identify the marketable and non-marketable risk of investors of top companies. The analysis was conducted at different stages. They are Vector auto regression of systematic and unsystematic risk.
AN ANALYSIS OF THE FINANCIAL PERFORMANCE EFFECT OF SHARIA COMPANIES ON STOCK ...Saputra Ayudi
This study entitled " An Analysis of the Financial Performance Effect of Sharia Companies on Stock Price Changes ( a Case Study of Companies Listed on the Sharia Stock Listing) ". This study uses eight (8) the financial ratios are debt to equity ratio, current ratio, inventory turnover ratio, total asset turnover, net profit margin, return on equity, price earning ratio, and dividend yield. The above financial ratios have been considered representative of the five types of financial ratios, which are 5 types of financial ratios in question are leverage ratios, liquidity ratios, efficiency ratios, profitability ratios, and market value ratios.
The CBS Television surprise hit “Undercover Boss” has aired for six consecutive seasons and
features publicly traded firms, closely-held corporations, and in some instances not-for-profit institutions. While
there has been much analysis on the ethical dilemmas faced by the undercover CEO or other executive, no
practical analysis of a firm‟s profitability has been conducted on any of the firms featured on the show.
Conventional wisdom would suggest that financial performance of a featured firm would improve after the initial
airing date, as the show typically ends on a „feel good‟ note and most often places the executive, as well as the
firm, in a positive light. This paper analyzes the stock market price after the initial air date as well revenue and net
income for all publicly traded firms that have appears on the show through the end of the sixth season.
Stock Prices valuation of IT Companies in India: An Empirical Study Dr.Punit Kumar Dwivedi
In this paper, we would like to answer the questions such as
Is it worthwhile investing in such software companies?
Will capital appreciation of software companies continue in the future?
It is important to analyze whether investors will be benefitted by investing in this software industry or whether software companies’ outperformance over other industries is just the temporary phase. Finally, we would like to suggest our recommendations over software industries whether investors should buy/sell/hold the stock of these companies based on our analysis.
A pair trade is the taking of a long position in one security together with an equal short position in another that is strongly correlated with it. It is sometimes used to refer to multiple long and short positions that are similarly matched.
Testing and extending the capital asset pricing modelGabriel Koh
This paper attempts to prove whether the conventional Capital Asset Pricing Model (CAPM) holds with respect to a set of asset returns. Starting with the Fama-Macbeth cross-sectional regression, we prove through the significance of pricing errors that the CAPM does not hold. Hence, we expand the original CAPM by including risk factors and factor-mimicking portfolios built on firm-specific characteristics and test for their significance in the model. Ultimately, by adding significant factors, we find that the model helps to better explain asset returns, but does still not entirely capture pricing errors.
An Empirical Analysis of the Capital Asset Pricing Model.pdfSaiReddy794166
The International Journal of Engineering and Science and Research is an online journal in English published. The aim is to publish peer reviewed research and review articles fastly with out delay in the developing field of engineering and science Research.
The International Journal of Engineering and Science and Research is an online journal in English published. The aim is to publish peer reviewed research and review articles fastly with out delay in the developing field of engineering and science Research.
MODELING THE AUTOREGRESSIVE CAPITAL ASSET PRICING MODEL FOR TOP 10 SELECTED...IAEME Publication
Systematic risk is the uncertainty inherent to the entire market or entire market segment and Unsystematic risk is the type of uncertainty that comes with the company or industry we invest. It can be reduced through diversification. The study generalized for selecting of non -linear capital asset pricing model for top securities in BSE and made an attempt to identify the marketable and non-marketable risk of investors of top companies. The analysis was conducted at different stages. They are Vector auto regression of systematic and unsystematic risk.
AN ANALYSIS OF THE FINANCIAL PERFORMANCE EFFECT OF SHARIA COMPANIES ON STOCK ...Saputra Ayudi
This study entitled " An Analysis of the Financial Performance Effect of Sharia Companies on Stock Price Changes ( a Case Study of Companies Listed on the Sharia Stock Listing) ". This study uses eight (8) the financial ratios are debt to equity ratio, current ratio, inventory turnover ratio, total asset turnover, net profit margin, return on equity, price earning ratio, and dividend yield. The above financial ratios have been considered representative of the five types of financial ratios, which are 5 types of financial ratios in question are leverage ratios, liquidity ratios, efficiency ratios, profitability ratios, and market value ratios.
The CBS Television surprise hit “Undercover Boss” has aired for six consecutive seasons and
features publicly traded firms, closely-held corporations, and in some instances not-for-profit institutions. While
there has been much analysis on the ethical dilemmas faced by the undercover CEO or other executive, no
practical analysis of a firm‟s profitability has been conducted on any of the firms featured on the show.
Conventional wisdom would suggest that financial performance of a featured firm would improve after the initial
airing date, as the show typically ends on a „feel good‟ note and most often places the executive, as well as the
firm, in a positive light. This paper analyzes the stock market price after the initial air date as well revenue and net
income for all publicly traded firms that have appears on the show through the end of the sixth season.
Stock Prices valuation of IT Companies in India: An Empirical Study Dr.Punit Kumar Dwivedi
In this paper, we would like to answer the questions such as
Is it worthwhile investing in such software companies?
Will capital appreciation of software companies continue in the future?
It is important to analyze whether investors will be benefitted by investing in this software industry or whether software companies’ outperformance over other industries is just the temporary phase. Finally, we would like to suggest our recommendations over software industries whether investors should buy/sell/hold the stock of these companies based on our analysis.
A pair trade is the taking of a long position in one security together with an equal short position in another that is strongly correlated with it. It is sometimes used to refer to multiple long and short positions that are similarly matched.
Testing and extending the capital asset pricing modelGabriel Koh
This paper attempts to prove whether the conventional Capital Asset Pricing Model (CAPM) holds with respect to a set of asset returns. Starting with the Fama-Macbeth cross-sectional regression, we prove through the significance of pricing errors that the CAPM does not hold. Hence, we expand the original CAPM by including risk factors and factor-mimicking portfolios built on firm-specific characteristics and test for their significance in the model. Ultimately, by adding significant factors, we find that the model helps to better explain asset returns, but does still not entirely capture pricing errors.
An Empirical Analysis of the Capital Asset Pricing Model.pdfSaiReddy794166
The International Journal of Engineering and Science and Research is an online journal in English published. The aim is to publish peer reviewed research and review articles fastly with out delay in the developing field of engineering and science Research.
The International Journal of Engineering and Science and Research is an online journal in English published. The aim is to publish peer reviewed research and review articles fastly with out delay in the developing field of engineering and science Research.
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...ijmvsc
Predicting daily behavior of stock market is a serious challenge for investors and corporate stockholders and it can help them to invest with more confident by taking risks and fluctuations into consideration. In this paper, by applying linear regression for predicting behavior of S&P 500 index, we prove that our proposed method has a similar and good performance in comparison to real volumes and the stockholders can invest confidentially based on that.
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...QuantInsti
Session Details:
This session explains Mixture Models and explores its application to predict an asset’s return distribution and identify outlier returns that are likely to mean revert.
The objective of this session is to explain and illustrated the use of Mixture Models with a sample strategy in Python.
Who should attend?
- Traders/quants/analysts interested in algorithmic trading research
- Python/software/strategy developers
- Algorithmic/Systematic traders
- Portfolio Managers and consultants
- Students and academicians
Guest Speaker
Mr. Brian Christopher
Quantitative researcher, Python developer, CFA charterholder, and founder of Blackarbs LLC, a quantitative research firm.
Six years ago he learned to code using Python for the purpose of creating algorithmic trading strategies. Four years ago he decided to self publish his research with a focus on practical, reproducible application.
Now he continues his open research initiatives for a growing community of traders, researchers, developers, engineers, architects and practitioners across various industries.
He attained a BSc in Economics from Northeastern University in Boston, MA and received the Chartered Financial Analyst (CFA) designation in 2016.
Access the webinar recording here: https://www.youtube.com/watch?v=o5BFAQK_Acw
Know more about EPAT™ by QuantInsti™ at http://www.quantinsti.com/epat/
1. Comparison of Different Methods in Forecasting
Stocks’ Returns or Prices
Zhicheng Li/Sirui Zhang/Haoran Jiang
Abstract
In this paper, four models are built in order to explain stocks behav-
ior, and the corresponding methods are used to forecast stocks’ returns
or prices in S&P 500 universe. All the forecasting results are compared
with the real values. It is shown that the traditional time series meth-
ods, including univariate (in AR model) and mutivariate (in VAR model)
methods, give little forecastability. On the contrary, the methods based on
statistical arbitrage, i.e, the Pair Trading and Market Neurtral model, per-
form much better. Meanwhile, we introduce some statistical techniques,
such as Principle Components Analysis (PCA) and mean-reversion con-
cept. Finally, Econometrics and statistic analysis are attempt to give a
reasonable interpretation.
1 Introduction
Forecasting is an everlasting topics not only in Economics but also in Fi-
nance. In the stock market, the incentive to make a good forecasting is
particularly strong, in the sense that people who have a better prediction
would make more money. Therefore, a lot of researches have been done
and various models and methods have been proposed and used. Before the
age of computers, people traded stocks and commodities mainly on intu-
ition. As the level of trading and the technology grew, people searched for
tools and methods that would increase their gains meanwhile minimizing
their risk. Statistics, fundamental analysis, and linear/non-liner regres-
sions are all attempt to predict and benefit from the markets direction [5].
In recent studies, some new techniques, such as Neural Network, Hidden
Markov Method(HMM) and Genetic Algorithms (GA), are used to fore-
cast stocks’ activity [9][10][13]. None of these techniques has proven to be
consistently correct as desired, and many skeptics argue about the utility
of many of these approaches. However, these methods are commonly used
in practice.
In our paper, we present four models with the application to S&P 500
stocks market. In each model, we state the concrete method for forecast-
ing. Given a particular time window in S&P universe, we forecast the
stocks’ prices or returns, then we compare the forecasting results with the
real values by calculating correlations. At last we look at the performance
of each method. The first model we start with is Auto-Regression (AR)
1
2. model, which is broadly used in time series analysis [5][7]. It assumes
that stock behaves in an autocorrelated and stochastic way, and is not
correlated with other stocks/factors. Basically, this method attempts to
model a linear function by a recurrence relation derived from past values.
The recurrence relation can then be used to predict new values in the time
series, which hopefully will be good approximations of the actual values.
While in the second model, we think that two stocks, especially in one
common industry, are tent to be correlated, i.e., a pair of stocks’ prices
are possibly to have a statistical relationship, called cointegration. We dig
out this property and implement pair trading in the second model. This
model is the ancestor of statistical arbitrage, which now is a widely used
method in the investment area [16][14]. In the third model, we extend
our idea to the point where individual stock is very possibly influenced
by whole market. We hope to find those common market factors that
each stock may depend on. Therefore, a statistical method, called Prin-
ciple Component Analysis (PCA), is employed to extract these common
market factors [17], i.e., Principle Components(PCs). By regressing each
stock on PCs, we infer their relationship, and further by VAR model,
which is a multivariate time series model, we forecast how PCs evolute.
Then we put the predicted PCs back to the original regressions and fore-
cast individual stocks. The last model we apply is market neutral model,
in which we form a portfolio whose expected returns are nothing related
with the market fundamentals. In spite of how the market fluctuates, the
portfolio’ return is just a stationary mean-reverting process. By using
mean-reversion, which is a very important technique in statistical arbi-
trage [3], we look for the opportunities that would give us large expected
returns, and then compare these returns with real values.
The structure of our paper is organized as below. In Section 2, we
introduce the data of S&P 500 stock market that we are using, and we
further diagnose and discover some property of this data set. Section 3 are
divided into four parts. Each part set forth a model of studying stocks’
behaviors and a method of how to forecast stocks’ prices/returns in our
case. Then in Section 4, we show the results of these four methods and
compare their performance. A detailed and reasonable analysis is also
tried. At last, we make a conclusion in the final Section.
2 Data and Stylized fact
In this paper, we use a database of S&P 500 (Standard & Poor’s 500)
from year 1989 to 2012. The data source is from CRSP (Center Research
Security Price), which is part of University of Chicago and renowned for
its expertise in building and maintaining historical, academic research-
quality stock market databases. The reason to choose S&P 500 is that it
comprises nearly 500 common stocks issued by 500 large-cap companies,
and covers about 75 percent of the American equity market by capitaliza-
tion. Meanwhile, S&P 500 indice is one of the most commonly followed
equity indices, and many consider it one of the best representations of the
U.S. stock market, and a bellwether for the U.S. economy [1] (See Figure
1).
2
3. Figure 1: Historical S&P 500 Earning and US Nominal GDP
The components of the S&P 500 are selected by the committee. This is
similar to the Dow Jones Industrial Average, but different from others such
as the Russell 1000, which are strictly rule-based. When considering the
eligibility of a new addition, the committee assesses the company’s merit
using eight primary criteria: market capitalization, liquidity, domicile,
public float, sector classification, financial viability, length of time publicly
traded and listing exchange [2]. The committee selects the companies in
the S&P 500 so they are representative of the industries in the United
States economy. In order to be added to the index, a company must satisfy
these liquidity-based size requirements: i) market capitalization is greater
than or equal to US4.0 billion; ii) annual dollar value traded to float-
adjusted market capitalization is greater than 1.0; iii)minimum monthly
trading volume of 250,000 shares in each of the six months leading up
to the evaluation date. Therefore, companies in S&P 500 are not static.
Sometimes, one company may dropped out from the list, and sometimes
another new company entered. That’s why we could see 1127 stocks’
records in our data.
The stocks’ prices in this data set are End-of-Day prices. As we have
roughly 252 business days a year, there are 5799 time records. In addition,
these prices are adjusted for including dividends and expanding shares.
Thus, the tendency of one stock prices can reflect the market value of that
company. Moreover, we normally think price’s increment is proportional
to itself, so the trend of one stock prices is exponential (See Figure 2)
and the log-prices would be I(1) process, which means the log-returns
(first differences of log-prices) are stationary (Stock and Watson (1988b)).
Table 1 is the results of ADF tests for all the stocks, which evidently
show that log-prices are basically I(1) process which have unit root and
log-returns are stationary process.
3
4. Figure 2: Five S&P 500 Stocks Prices’ Evolution
As we have a long time series in broad universe of U.S equities, we
could use back-testing method to compare different methods for forecast-
ing stocks’ prices/returns. The principle is following: we set two param-
eters, i.e., historical window and forecasting window. Given the data in
historical window, we anticipate the prices/returns in forecasting window,
and then compare them with the actual data. The historical window can
move over time, so we can get a series of comparison results and make
a judgment. Another issue is that within a particular historical window,
some companies are not belong to S&P 500 or have no data, we need
refine the dataset to those stocks who continuously existed in that period.
Standard & Poor believes that turnover in index membership should be
avoided whenever possible. Hence companies which were added to the
index usually stays in the index unless too many of the addition criteria
has been violated or if the company no longer exist due to mergers and
acquisitions [2]. Thus even it has the selection base which we have men-
tioned before, within the certain historical window that is not too long,
we can think that stocks behave naturally.
Table 1: Results of ADF tests for log-prices and log-returns processes
H0: have a unit
root (5% level)
Ratio of stocks
that accept
Ratio of stocks
that reject
log-prices 95.08% 4.92%
log-returns 0 100%
4
5. 3 Models and Methods
3.1 Simple Autoregression Model
At the beginning, let us use a very simple model, that is autoregres-
sion(AR) model, which is widely used in single time series problem. Sup-
pose we are interested in forecasting the value of a variable Yt+1 based
on a set of variables Xt observed at date t. In this case, Xt consist of a
constant plus Yt, Yt−1, . . . Yt−m+1. Common methodology is to choose the
forecast Y ∗
t+1|t, so as to minimize
E(Yt+1 − Y ∗
t+1|t)2
(1)
which is mean squared error. Y ∗
t+1|t has a function form g(Xt) based on
the current information, then the last equation is to find the function
g(Xt) that minimize
E(Yt+1 − g(Xt))2
(2)
When we use linear projection, i.e, g(Xt) is a linear combination of Yt,
. . . Yt−m+1, equation 2 becomes a AR model. In our papaer, we just
choose two lags and have the regression model:
Yt − u = φ1(Yt−1 − u) + φ2(Yt−2 − u) + εt (3)
The reason for using two lags linear projection other than some other
methods (AIC/BIC) in determining lags [8] or using non-linear models is
that we think there is a trade-off between the size of samples, the numbers
of parameters to be estimated, and the credibility of the model we have.
Many parameters to be estimated might cause the lack of precision due
to the estimation process. And because we don’t have a ‘true’ model
governing stock prices/returns (Black (1986)), as long as what we have
built is effective to some extend as we expect, we could use it.
Back to the equation 3, if we could assume E(εt | Yt−1, Yt−2) = 0
and the process {Yt, [Yt−1, Yt−2]} is covariance-stationary and ergodic for
second moments, then the OLS regression yields a consistent estimate for
coefficients (Hamilton (1994)). Or, we transfer equation 3 to the form:
φ(L)(Y − u) = εt (4)
where the autoregressive operator φ(L) = (1−φ1L−φ2L2
). As long as all
the roots of φ(z) = 0 lie outside the unit circle, the autoregression satisfies
the stationary condition.
In this AR model, we choose log-returns which are already stationary
process as our forecasting object. Specifically, if we define Yit as the log-
return of stock i at time t, then equation 3 becomes
Yit = β0i + β1iYit−1 + β2iYit−2 + εit (5)
If the previous assumptions hold, we could apply OLS to this regression
and get consistent estimator ˆβki, (k = 0 . . . 2, i = 1 . . . N). Here we should
notice that this is not a panel data regression. They are different regres-
sions for different stocks, and the coefficients vary between stocks. Further
5
6. more, we set the length of the moving historical window as 1000 days, and
we want to forecast the next day return E(Yit+1) of stock i, which is
E(Yit+1) = ˆβ0i + ˆβ1iYit + ˆβ2iYit−1 (6)
At last we compare the forecast returns with real returns, and the results
are shown in next section.
3.2 Pair Trading Model
The assumptions in the previous model are very strong. It is unlikely
that stocks changes by themselves and are uncorrelated with others. In
other words, it is more plausible to think that stocks are possibly corre-
lated, especially in the same industry. Figure 3 shows a example that the
prices’ evolutions of two stocks in the same industry ‘Petroleum Refining’
(SIC:2911) from year 1989 to 1990, and it seems that they are highly cor-
related. Hence, in this model, we adopt one relationship which commonly
used in time series, i.e., cointegration, to analysis. Other than dealing
with log-returns, which are stationary process, we consider the log-prices
that are integrated of order 1. If stocks i and j are in the same industry
or have similar characteristics, one expects by hedging one stock on the
other to get positive profit (see Pole (2008)). Particularly, denote Pit and
Pjt as the corresponding price series, when we can model them like
ln(Pit) = αt + βln(Pjt) + Xt (7)
where Xt is a stationary, or a mean-reverting process. Then the relation
between these two log-prices which are I(1) series is cointegration. By
taking first difference of equation 7, log-returns should be satisfied
ln(Rit) = αdt + βln(Rjt) + dXt (8)
In many situation, the drift α is small compared to the fluctuations of Xt
and can be neglected. Thus the mean-reversion of Xt suggests us that we
could form a long-short portfolio in which we go long 1 dollar of stock i
and short β dollars of stock j if Xt is small. And conversely, go short stock
i and long j if Xt is large. Both situations are expected to get positive
returns. This mean-reversion paradigm is typically associated with market
over-reaction: assets are temporarily under or over priced with respect to
one or several reference securities (Lo and MacKinlay (1990)).
For our dataset, the concrete method is described as below. At first
within one historical window, we find a pair of stocks which are cointe-
grated without deterministic trend under certain industry (in our data,
we use SIC code to identify the industries). Denote them as stock i and
j, by regressing one on the other, we have:
ln(Pit) = βln(Pjt) + Xt (9)
And correspondingly, for log-returns,
ln(Rit) = βln(Rjt) + dXt (10)
6
7. Figure 3: Prices of two stocks in ‘Petroleum Refining’ industry from 1989 to
1990
As the Xt is stationary process and we expect to find mean-reverting
property, we use AR(1) model to do diagnose Xt:
Xt = β0 + β1Xt−1 + εt (11)
Subtracting both sides by Xt−1, we get
dXt = β0 + (β1 − 1)Xt−1 + εt (12)
The mean-reversion requires (β1 − 1) < 0, and the more negative, the
more mean-reverting. Therefore, the next step is to, within the particular
historical window (t=1. . . T), search all the stocks, find the top ten mean-
reverting pairs, and denote them as {i∗
, j∗
}10. Then for these ten pair-
trading portfolio, we need forecast their next day returns. By putting
T+1 to the equation 10, it becomes
ln(Ri∗T +1) − βln(Rj∗T +1) = dX∗
T +1 (13)
which means that long 1 dollar stock i∗
and short β dollars j∗
would give
us a expected return ET(dX∗
T +1). What’s more, from equation 12, it is
easy to see
dX∗
T +1 = β∗
0 + (β∗
1 − 1)X∗
T + ε∗
T +1 (14)
If we have the valid assumption ET(ε∗
T +1) = 0, which is also the require-
ment for getting a consistent estimator in AR(1), we could derive the
result:
ET(dX∗
T +1) = ET{ln(Ri∗T +1) − βln(Rj∗T +1)}
= β∗
0 + (β∗
1 − 1)X∗
T
(15)
showing that the expected returns in next day (T+1) of this pair trading
are just β∗
0 + (β∗
1 − 1)X∗
T . Then we can compare the forecasting returns
with the real returns by using pair trading, which is just ln(Ri∗T +1) −
βln(Rj∗T +1) located in the forecasting window. The results of comparison
will be shown in next part.
7
8. Moreover, if we want form a strategy to make more money, within the
ten pairs that are chosen by us, we select the pair (i∗∗
and j∗∗
) whose
absolute expected returns equals max{|β∗
0 +(β∗
1 −1)X∗
T |}, and just do pair
trading for that pair. If the expect return value is positive, we just long
1 dollar i∗∗
stock and short β dollars j∗∗
stock. While when the value is
negative, on the contrary, we short 1 dollar i∗∗
stock and long β dollars j∗∗
stock. Both cases give us the positive return, i.e, max{|β∗
0 +(β∗
1 −1)X∗
T |}.
3.3 VAR Model
From the previous model, we could see that cointegrated time series share
at least one common trend. Both causal observation and economic the-
ory suggest that many series might contain the same stochastic trend so
that they are cointegrated. If each of n series is integrated of order 1 and
can be jointly characterized by k < n stochastic trends, then the vector
representation of these series has k I(1) processes and n − k distinct sta-
tionary linear combinations. A technique proposed by Stock and Watson
(1988a) claim that we can extract common stochastic trends by Principal
Components Analysis (PCA). As we already know that log-prices is I(1)
process, we can regress each log-prices process on these cointegrated Prin-
cipal Components (PCs), then the residual we get should be stationary.
Or we can directly use log-returns which are already stationary process,
then the principal components and the residuals after regression are all
stationary.
Here we briefly introduce PCA. PCA is a statistical method that uses
an orthogonal transformation to convert a set of observations of possibly
correlated variables into a set of values of linearly uncorrelated variables
called principal components. This transformation is defined in such a way
that the first principal component has the largest possible variance, that
is, accounts for as much of the variability in the data as possible. And
each succeeding component in turn has the highest variance possible under
the constraint that it is orthogonal to the preceding components. Thus,
we can preserve most of the information of original data and meanwhile
achieve the purpose of reducing the dimension of dataset, i.e., get small
numbers of common stochastic trends.
The detail procedure for our case is following. Within one historical
window (t=1. . . T, i=1. . . N), we first standardized the volatility of each
stock’s log-prices (pi).
Yit =
pit − ¯pi
¯σi
(16)
where
¯pi =
1
T
T
t=1
pit ; ¯σ2
i =
1
T − 1
T
t=1
(pit − ¯pi)2
Then we calculate the covariance matrix of Yit (here is also the correlation
matrix). It is defined as C, and
Cij =
1
T − 1
T
t=1
YitYjt (17)
8
9. which is symmetric and non-negative definite. Notice that, for any stock
i, we have Cii = 1. The next step is to consider the eigenvectors and
eigenvalues of the covariance matrix. Define V as the eigenvectors matix
and λ as corresponding eigenvalues, i.e,
[V λ] = Eig(C); (18)
As Vi (i = 1 . . . N) are the eigenvectors of the covariance matrix, they are
orthogonal to each other. These eigenvectors can form a set of orthogonal
bases of another space. When we rank the eigenvalues in decreasing order:
N ≥ λ1 ≥ λ1 ≥ λ1 ≥ . . . ≥ λN ≥ 0
and define V1, V2, V3 . . . VN as the corresponding eigenvectors. A spectrum
of eigenvalues shows that they only contain a few large eigenvalues (See
Figure 4). We can then choose top K eigenvectors which correspond to the
biggest K eigenvalues. From Jolliffe (2005), we know that the projection
of original data on these top eigenvectors V1, V2, V3 . . . VK (also principal
bases in new space) can preserve most of the information.
Figure 4: Eigenvalues of the correlation matrix of stocks’ log-prices computed
on the first historical window (t=1. . . 100)
Thus, we project the log-prices data in the historical window on these
top eigenvectors and get K principal components (Fk, k = 1 . . . K):
Ftj =
N
i=1
Vji
¯σi
pti t = 1, . . . , T j = 1, . . . , K; (19)
For each stock’s log-prices process, we regress it on those common trends:
pi = θi0 +
K
j=1
θijFj + δi i = 1, 2, . . . , N. (20)
As they are cointegrated, and if we can claim that the disturbance item
is uncorrelated with PCs, the OLS estimator ˆθij, (i = 1 . . . N, j = 0 . . . K)
9
10. are consistent. The next step is that, rather than auto-regressing each
single log-price process and forecast, we use Vector Autoregression (VAR)
model to forecast these common trends (PCs) and combine them together
to estimate each log-prices process by putting them back to the original
regression equation 20. A VAR(p) model is written as an vector autore-
gression over the previous p values of the series, in this case:
#»
F t = #»c + φ1
#»
F t−1 + · · · + φp
#»
F t−p + #»ε t (21)
where
#»
F t =
F1t
...
FKt
; #»c =
c1t
...
cKt
; #»ε t =
ε1t
...
εKt
; φs = {φs
ij}K×K (22)
And putting forecasting value of
#»
F t+1 into equation 20, we have
ˆpit+1 = ˆθi0 +
K
j=1
ˆθijFjt+1 (23)
The principle of this method is that, rather than treating the evolution
of stock price as a spontaneous and endogenous process, we think it is
highly correlated with the whole market. As it is impossible to regress
each stock on the whole set of other stocks, we extract a small numbers of
common stochastic trends which can largely represent the whole market.
By the evolution of these trends, we capture more information which
would influence the single stock’s behavior. Indeed, we will encounter
similar econometrics problem as we were doing single series autoregression.
And it is hard for us to justify the valid of those assumptions. However,
as long as this model could increase the forecastability, it is effective to
some extent.
3.4 Market Neutral Model
Stocks’ prices or returns are apparently influenced by market fundamen-
tals. However, it is hard to build a model and take all possible factors into
account for explaining and forecasting fundamentals. Therefore, in this
section, we consider a statistical arbitrage model, in which the portfolio’s
return is not impacted by market fundamentals. The common features of
statistical arbitrage are (i) trading signals are systematic or rules-based,
(ii) the trading portfolio is market-neutral, in the sense that it has zero
beta with the market, and (iii) the mechanism for generating excess re-
turns is statistical. The idea is to make many bets with significant positive
expected returns in the appropriate time, and produce a low-volatility in-
vestment strategy which is uncorrelated with market.
Here we take reference of the paper by Avellaneda and Lee (2010) and
build this model. First we form principal components of log-returns of
S&P500 stocks in a certain period. For example, if we are at time T and
need forecast the next period stocks’ returns, we use the past 60 days of
record, i.e, the historical window is chosen as 60 days. Following the same
10
11. principle in last section, we choose the most significant K eigenvectors
that correspond to the biggest K eigenvalues. Define these vectors as
Vi, (i = 1 . . . K). Then we project log-return matrix (60 × N) on these
eigenvectors and form K market factors.
Ftj =
N
i=1
Vji
¯σi
Rti j = 1, . . . , K; t = (T − 59), . . . , T (24)
Where Ftj is the jth market factor at time t. We should notice that these
market factors are dynamic because they would change as the historical
window moving forward.
Then we regress each stock’s log-returns on these market factors
Ri = mi +
K
j=1
βijFj + ˜Ri i = 1, 2, . . . , N. (25)
Of course returns, principal components and the residuals are all station-
ary, and we could assume E( ˜Ri) = 0. The proposed strategy is to look for
those regression residuals that have the most significant reverting process.
Thus, we auto-regress each ˜Ri and find those residuals that have highest
negative autoregressive coefficient.
˜Rit = ρi
˜Rit−1 + it i = 1, 2, . . . , N. (26)
Figure 5 shows the top five mean-reverting residuals in the first historical
window.
Figure 5: The top 5 mean-reverting residuals in the first historical window
A trading portfolio which contains n stocks is said to be market-neutral
if the dollar amounts {Qi}n
i=1 invested in each stock in this portfolio are
satisfied:
¯βj =
n
i=1
βijQi = 0, j = 1, 2, . . . , k. (27)
11
12. βij is the coefficients of stock i regress on factor j. In code, we use Null
space to solve this linear system
Q = Null{β[K]×[n]} (28)
In order to guarantee a non-zero solution for the portfolio, we need choose
n = K+1 stocks, which have the smallest K+1 autoregressive coefficients,
as our portfolio member. Then we have
K+1
i=1
QiRi =
K+1
i=1
Qimi +
K+1
i=1
Qi
K
j=1
βijFj +
K+1
i=1
Qi
˜Ri
=
K+1
i=1
Qimi +
K+1
i=1
Qi
˜Ri +
K
j=1
K+1
i=1
βijQi Fj
=
K+1
i=1
Qi(mi + ˜Ri)
(29)
In this equation, it is obviously to see that the portfolio return has nothing
to do with market environment. And it is depend on the intrinsic factor
mi and a statistic random variable ˜Ri, which is mean zero and stationary
process satisfy mean-reversion.
The next step is to generate signals for entering trading. Loading
auto-regressing expression 26 into equation 29, we have:
K+1
i=1
QiRit =
K+1
i=1
Qi(mi + ˜Rit) =
K+1
i=1
Qi(mi + ρi
˜Rit−1 + it) (30)
Suppose we are at the last period T of historical window, from above
equation, we expect the portfolio return at T+1 is
ET
K+1
i=1
QiRiT +1 = ET
K+1
i=1
Qi(mi + ρi
˜RiT + iT +1)
=
K+1
i=1
Qi(mi + ρi
˜RiT )
(31)
When K+1
i=1 Qi(mi + ρi
˜RiT ) is very high (positive), we could buy this
portfolio and expect to get a high return. While when K+1
i=1 Qi(mi +
ρi
˜RiT ) is sufficiently negative, we could short this portfolio, and still ex-
pect to get a high return. Thus, we could directly use K+1
i=1 Qi(mi +
ρi
˜RiT ) as our trading signal, where ρi is negative coefficient. Define the
signal as ST . In our strategy, we set the trading entry criteria are:
1. if ST −mean{St} ≥ 0.7(max{St}−mean{St}), t = (T −59), . . . , T
Enter trading, long the stocks whose Qi are positive by the amount
of | Qi |, short the stocks whose Qi are negative by the amount of
| Qi |. This would give expected return as | K+1
i=1 Qi(mi + ρi
˜RiT ) |
2. if ST −mean{St} ≤ 0.7(min{St}−mean{St}), t = (T −59), . . . , T
Enter trading, long the stocks whose Qi are negative by the amount
of | Qi |, short the stocks whose Qi are posituve by the amount of
| Qi |. This would give expected return as | K+1
i=1 Qi(mi + ρi
˜RiT ) |
12
13. Finally, as historical window moving forward, we compare these expected
portfolio returns to the real portfolio returns and get a correlation result.
4 Comparison and Analysis
Table 2 shows the comparison results of these four different methods for
forecasting stocks’ log-returns in S&P 500 universe. Here need clarify
some parameters. In the time series models (AR and VAR), we use the
historical window across 1000 days. While in statistical arbitrage models,
followed by Avellaneda and Lee (2010), we use past 60 days’ records as
our information set for trading. ‘Common factors’ refer to the number
of other time series are used to forecast. In Pair Trading and Market
Neutral models, it means to the number of PCs that we used. As in the
last model, we use a signal to identify whether enter trading or not, we
see that the forecasting times is less than others.
Table 2: Comparison between four types of forecasting methods
Length of Common Forecasting Correlation
historical window factors used times with real returns
AR 1000 NA 1000 3.17%
Pair Trading 60 2 1000 13.89%
VAR 1000 5 1000 4.52%
Market Neutral 60 15 503 17.2%
From the table we can see that both AR and VAR models exhibit
little forecastability. In the AR model, we just investigate each stock’s
log-returns. We know that individual log-price processes are almost a
random walk process, in the sense that log-returns, that is the differences
of log-prices, are almost white noise. Even though they are stationary
process, it is still hard to forecast their following behaviors. While in
the VAR model, we want to capture more market information that would
impact stock’ behavior. Thus we switch to look at how common market
factors (PCs) evolute. Then by putting the forecasting values of PCs into
the original regressions, we get the predicted values for each stock. How-
ever, we see that the effect on forecasting each individual’s log-returns is
trivial. Therefore, it would not increase much opportunity to earn money.
Moreover, when we extend our forecasting window to five days, we find
that the accuracy of these two methods decrease as the forecasting period
increase (See Table 3). Overall, time series forecasting provides reason-
able credit over short periods of time, but the accuracy of forecasting
diminishes sharply as the length of prediction increases.
Nonetheless, from Table 2, we find the second and last model improve
a lot on the forecastility. The latent methodology in the second and last
models is mean-reversion, which is a mathematical concept sometimes
used for stock investing. This concept suggest that prices and returns
eventually move back towards the mean or average. Revisiting the equa-
tion 9 in Pair Trading model, we see that the pair of stocks’ log-prices
13
14. Table 3: Time series methods to forecast different periods
Length of Days to Correlation
historical window forecast with real returns
AR 1000 1 3.17%
VAR 1000 1 4.52%
AR 1000 5 1.38%
VAR 1000 5 1.90%
are cointegrated and the residuals after regression are supposed to move
around the average. By mean-reversion, we expect dXt have a negative
correlation with Xt. This is not only a property that we infer or extract
from data, but also supported by a theoretical model, i.e, OrnsteinUh-
lenbeck (O-U) process. In mathematics, the O-U process (see Gardiner
(1985)), is a stochastic process that describes the velocity of a massive
Brownian particle under the influence of friction. The process is sta-
tionary, Gaussian, and Markovian. Over time, the process tends to drift
towards its long-term mean: such a process is called mean-reverting. More
over, another important and widely used assumption in Finance is that
stock prices’ stochastical movement follows geometric brownian motion.
Thus, for the Xt in equation 9, we could apply O-U process and get:
dXt = κ(m − Xt)dt + σ · dWt, κ > 0 (32)
where m is the mean of Xt, dWt is the increment of brownian motion
(Wt ∼ N(0, t)), σ measures the volatility of movement, and the parame-
ter κ is called the speed of mean-reversion. This process is stationary and
auto-regressive with lag 1. In particular, the increment dXt has uncondi-
tional mean zero and conditional mean equal to
E{dXt|Xs, s ≤ t} = κ(m − Xt)dt
When Xt > m, we expect dXt be negative, and Xt < m implies a positive
dXt. A small transformation to equation 32 , we get:
dXt = κm · dt − κdt · Xt + σ · dWt (33)
Compare it with equation 9 in Pair Trading model, we find that they have
the same form, and
β0 = κm · dt, β1 − 1 = −κdt, εt = σdWt (34)
This on the other hand endorses AR(1) model which we used for the
process Xt. And finding the most negative coefficient β1 is equivalent to
finding the process which has the highest speed of mean-reverting.
For the last model, i.e., Market Neutral model, we used another method
to identify mean-reverting process. In stead of studying cointegrated
log-prices, we directly regress log-returns which are already stationary
process on the common market factors (PCs). The residuals after re-
gression(include constant item) are mean zero. But there is no rigorous
model to support that the residual series are mean-reverting around zero.
14
15. The relationship in equation 26, i.e., ˜Rit = ρi
˜Rit−1 + it, where ρi < 0
is basically an assumption. However, we looked for all the stocks, and
found those who are most possible to obey this relationship (See Fig-
ure 5). Therefore, for the stocks we have chosen, the residuals ˜Rit after
regression on the common market factors, are reasonable to assume oscil-
lating near zero. Then we could effectively apply mean-reversion method.
Nevertheless, we need pay attention that not all stationary processes are
mean-reverting, or can be used for mean-reversion. Moreover, if a ran-
dom walk I(1) process have mean zero, the probability for it crosses zero
is one, but the mean time to crossing zero is infinite. Thus, we could
neither apply mean-reversion to a random walk process in a direct way.
The reason for a relatively good performance of the second and last
model is that, in stead of focusing on forecasting variables themselves, we
pay attention on the residuals. Either by the existing theories or econo-
metrics analysis, we extract more information on the property of residuals,
which exhibit more forecastability. Just as the famous saying in Finance:
“Profit comes from residuals”. The other learning from our research is
that, there is not a ‘real’ model explaining stocks’ prices or returns in
Finance. All the existing theory are partially right, and all the model are
only valid when the assumptions are reasonable. For example, the funda-
mental assumption for the O-U process or the famous Black-Scholes model
is that the underling stock price St follows geometric brownian motion
St+1 − St = (r − q)Stdt + σStdWt
=⇒
St+1
St
= 1 + (r − q)dt + σ
√
dt · Z, Z ∼ N(0, 1)
(35)
which suggests that log-prices is a self auto-regressive process and not
impacted by others, however, we already found this is not proper most
time. There are too many variables and factors which could influence the
stock markets. Even one model can works well for a time, once many
people begin to use it, people’s trading and investment behavior would
conversely impact the market and may offset the utility of that model.
Hence, other than some Economics problem, Finance market are almost
full of noise (Black (1986)) and hard to model. The job is to find a little
bit useful information in the enormous environment, catch opportunity
and make money.
5 Conclusion
Practical experiments and back testing results illustrate that the tradi-
tional time series methods don’t work well. The models AR and VAR
which belong to univariate and multivariate time series analysis respec-
tively can only have less than 5 percent accuracy. When the forecasting
period increases, the accuracy decreases significantly. This suggests that it
is hard to derive a true recurrence relation that can be used to predict new
values. However, Pair Trading and Market Neutral models which based
on statistical arbitrage principle improve the forecastability to more than
10 percent. The idea is to form a pair or a portfolio whose returns only de-
15
16. pend on the values of residuals, and further by excavating mean-reversion
property of these residuals, we gain more forecastability.
References
(2012). Standard & Poor’s 500 index - S&P 500. Investopedia.
(2013). S&P Indice Methodology. Standard And Poor’s.
Avellaneda, M. and J.-H. Lee (2010). Statistical arbitrage in the us equi-
ties market. Quantitative Finance 10(7), 761–782.
Black, F. (1986). Noise. The journal of finance 41(3), 529–543.
Box, G. E., G. M. Jenkins, and G. C. Reinsel (2013). Time series analysis:
forecasting and control. John Wiley & Sons.
Gardiner, C. (1985). Stochastic methods. Springer-Verlag, Berlin–
Heidelberg–New York–Tokyo.
Hamilton, J. D. (1994). Time series analysis, Volume 2. Princeton uni-
versity press Princeton.
Hannan, E. J. and B. G. Quinn (1979). The determination of the order
of an autoregression. Journal of the Royal Statistical Society. Series B
(Methodological), 190–195.
Hassan, M. R. and B. Nath (2005). Stock market forecasting using hidden
markov model: a new approach. In Intelligent Systems Design and
Applications, 2005. ISDA’05. Proceedings. 5th International Conference
on, pp. 192–196. IEEE.
Hassan, M. R., B. Nath, and M. Kirley (2007). A fusion model of hmm,
ann and ga for stock market forecasting. Expert Systems with Applica-
tions 33(1), 171–180.
Hirsa, A. (2012). Computational methods in finance. CRC Press.
Jolliffe, I. (2005). Principal component analysis. Wiley Online Library.
Lawrence, R. (1997). Using neural networks to forecast stock market
prices. University of Manitoba.
Lo, A. W. and A. C. MacKinlay (1990). When are contrarian profits
due to stock market overreaction? Review of Financial studies 3(2),
175–205.
Miller, M. H., J. Muthuswamy, and R. E. Whaley (1994). Mean reversion
of standard & poor’s 500 index basis changes: Arbitrage-induced or
statistical illusion? The Journal of Finance 49(2), 479–513.
Pole, A. (2008). Statistical arbitrage: algorithmic trading insights and
techniques, Volume 411. John Wiley & Sons.
16
17. Stock, J. H. and M. W. Watson (1988a). Testing for common trends.
Journal of the American statistical Association 83(404), 1097–1107.
Stock, J. H. and M. W. Watson (1988b). Variable trends in economic time
series. The Journal of Economic Perspectives, 147–174.
17