1. Final Project
A Statistical Arbitrage Strategy for SP500
Zhicheng Li/Sirui Zhang/Jian Wang
Stony Brook University
December 9, 2014
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 1 / 18
2. Theory of Strategy
Following the paper of Avellaneda et. al and Principal Components
Analysis
Form a dynamic market neutral portfolio, use statistic arbitrage to do
group trading
Use mean-reverting process to generate the trading signal
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 2 / 18
3. Method of Strategy: 1
Parameters: in window=M days, out window=1 day, M=60,K=15
(These are followed by Avellanda) Calculate each
stock’s log-return
Rit = log(
Pit
Pit−1
) t = 1, 2, . . . , M, i = 1, 2, . . . , N (1)
Standardized logreturn
Yit =
Rit − Ri
σi
(2)
where
Ri =
1
M
M
t=1
Rit, σi
2
=
1
M − 1
M
t=1
(Rit − Ri )2
(3)
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 3 / 18
4. Method of Strategy: 2
Calculate the empirical correlation matrix of the data.
ρij =
1
M − 1
M
t=1
YitYjt (4)
Calculate Principal Components of each time window
C = Cov(ρ); [V D] = eig(C); (5)
Choose the most significant K eigen vectors that correspond to the biggest
K eigen values
V = V(:, NL − K + 1 : NL); (6)
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 4 / 18
5. Method of Strategy: 3
Project log-return matrix on these egien vectors and form K market factors
Fjt =
N
i=1
v
(j)
i
¯σi
Rit j = 1, 2, . . . , K. (7)
Regressing each stock’s returns on these market factors
Ri = mi +
K
j=1
βij Fj + ˜Ri i = 1, 2, . . . , N. (8)
As we could assume E(˜Ri ) = 0, we auto-regress each ˜Ri and find those
residuals that have the highest negative autoregressive coefficient
˜Rit = ρi
˜Rit−1 + it (9)
Choose K+1 (Here is 16) stocks as our portfolio member
PTi , i = 1, 2, . . . , 16,
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 5 / 18
6. Method of Strategy: 4
A market-neutral trading portfolio is that the dollar amounts {Qi }K+1
i=1
invested in each stock in this portfolio are satisfied:
¯βj =
K+1
i=1
βij Qi = 0, j = 1, 2, . . . , k. (10)
βij is the coeff. of stock i regress on factor j. In code, we use Null space
to solve this linear system
Q = Null{β[K]×[K+1]} (11)
Then we have
K+1
i=1
Qi Ri =
K+1
i=1
Qi mi +
K+1
i=1
Qi
K
j=1
βij Fj +
K+1
i=1
Qi
˜Ri
=
K+1
i=1
Qi mi +
K+1
i=1
Qi
˜Ri +
K
j=1
K+1
i=1
βij Qi Fj
(12)
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 6 / 18
7. Method of Strategy: 5
which means
K+1
i=1
Qi Ri =
K+1
i=1
Qi (mi + ˜Ri ) (13)
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 7 / 18
8. Mean-Reverting Process: 1
Assume that stock returns satisfy the system of stochastic differential
equations
dSi (t)
Si (t)
= αi dt +
N
j=1
βij
dIj (t)
Ij (t)
+ dXi (t) (14)
Here,the idiosyncratic component of the return is given by
αi dt + dXi (t) (15)
Our model assumes(i) a drift which measures systematic deviations from
the sector and(ii) a price fluctuation that is mean-reverting to the overall
industry level.
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 8 / 18
9. Mean-Reverting Process: 2
Based on these considerations,we introduce a parametric model forXi (t)
which can be estimated easily, namely, the Ornstein-Uhlembeck process:
dXi (t) = κi (mi − Xi (t))dt + σi dWi (t) (16)
If we assume momentarily that the parameters of the model are constant,
we can write
Xi (t0+∆t) = e−κi ∆t
Xi (t0)+(1−e−κi ∆t
)mi +σi
t0+∆t
t0
e−κi (t0+∆t−s)
dWi (s)
(17)
Equilibrium probability distribution for the process Xi (t) is normal with
E {Xi (t)} = mi and Var {Xi (t)} =
σi
2
2κi
(18)
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 9 / 18
10. Mean-Reverting Process: 3
According to Equation(14),long 1 dollar in the stock and shortβij dollars in
the jth principle component has an expected 1-day return
αi dt + κi (mi − Xi (t))dt (19)
The second term corresponds to the model’s prediction for the return
based on the position of the stationary process Xi (t) :it forecasts a
negative return if Xi (t) is sufficiently high and a positive return if Xi (t) is
sufficiently low.
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 10 / 18
11. Signal Generation: 1
We focus only on the process Xi (t),neglecting the drift αi .We know that
the equilibrium variance is
σeq,i =
σi
√
2κi
(20)
Accordingly, we define the dimensionless variable
si =
Xi (t) − mi
σeq,i
(21)
We call this variable the s-score.Our basic trading signal based on
mean-reversion is: buy to open(means buying one dollar of the
corresponding stock and selling βij dollars of its jth principle components) if
si < −1.25; sell to open(means selling one dollar of the corresponding
stock and buying βij dollars of its jth principle components) if si > 1.25;
close short position(means buying stock and selling principle components)
if si < 0.75; close long position(means selling stock and buying principle
components) if si > −0.5
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 11 / 18
12. Signal Generation: 2
Here, we use solution in appendix to estimate the residual process and
generate the signal.
Estimate the regression
RS
n = β0 + βRI
n + n n = 1, 2, . . . , 60 (22)
We set
α =
β0
∆t
= β0 ∗ 252 (23)
Next,we define auxiliary process
Xk =
k
j=1
j k = 1, 2, . . . , 60 (24)
which can viewed as a discrete version of X(t),the OU process that we are
estimating.
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 12 / 18
13. Signal Generation: 3
Notice that the regression ”forces” the residuals to have mean zero, so we
have X60 = 0.
The estimation of the OU parameters is done by solving the 1-lag
regression model
Xn+1 = a + bXn + ζn+1 n = 1, 2 . . . , 59 (25)
According to (17),we have
a = m(1 − e−κ∆t
), b = e−κ∆t
, Variance(ζ) = σ2 1 − e−2κ∆t
2κ
(26)
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 13 / 18
14. Signal Generation: 4
Whence
κ = −log(b)∗252, m =
a
1 − b
, σ =
Variance(ζ) ∗ 2κ
1 − b2
, σeq =
Variance(ζ)
1 − b2
(27)
Notice that the s-score,which is defined theoretically as
s =
X(t) − m
σeq
(28)
SinceX(t) = X60 = 0
s =
−m
σeq
=
−a ∗
√
1 − b2
(1 − b) ∗ Variance(ζ)
(29)
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 14 / 18
15. Signal Generation: 5
The last caveat is that we found that centered means work better,so we set
m =
a
1 − b
−
a
1 − b
(30)
where brackets denote averaging over different stocks.The s-score is
therefore,
s =
−m
σeq
=
−a ∗
√
1 − b2
(1 − b) ∗ Variance(ζ)
+
a
1 − b
∗
1 − b2
Variance(ζ)
(31)
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 15 / 18
16. Signal Generation: 6
Since we cannot long or short the principle components, we need to use
the market-neutral way to erase the principle components part.According
to(13),when we use the portfolio Q, we only need to long or short the
portfolio Q according to the signal. Here, we need to calculate the signal
of the portfolio.Si is the signal of ith stock in portfolio Q.
SQ =
K+1
i=1
Qi Si (32)
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 16 / 18
17. Plot and Result: 1
First 40 times signals plot:
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 17 / 18
18. Plot and Result: 2
First 40 times strategy.
Zhicheng Li/Sirui Zhang/Jian Wang (Stony Brook University)Final Project December 9, 2014 18 / 18