A simple PCA model was used to find the direction of most variability for the CEF puzzle.
Evidence that the MOM factor as detailed by Carhart (1997) explains this puzzle was
found. Data sets used are available for independent verification of results.
Playing with the Rubik cube: Principal Component Analysis Solving the Close End Funds Puzzle?
1. Playing with the Rubik cube:
Principal Component Analysis
Solving the
Close End Funds Puzzle?
by
Ismael Torres-Pizarro, University of Puerto Rico
Ismaeltorres2002@yahoo.co, (787) 315-5636
A simple PCA model was used to find the direction of most variability for the CEF puzzle.
Evidence that the MOM factor as detailed by Carhart (1997) explains this puzzle was
found. Data sets used are available for independent verification of results.
I. Setting the game:
In his well-known undergraduate textbook, Madura (2003) states that
stock prices are affected by economic factors (such as market yields, bond yields
which are a proxy measurements for market risk and changes in the bond
markets that might cause investors to switch from bonds to stocks and viceverse),
firm specific factors (such as dividend policy, acquisitions, expectations, etc) and
market related factors (such as investor sentiment, etc.). Fama & French (1992,
1993) discussed the use of two factors in addition to the firm beta to model stock
returns1: a) SMB stands for "small (market capitalization) minus big" b) HML for
"high (book-to-market ratio) minus low"; they measure the historic excess returns
of small caps over big caps and of value stocks over growth stocks. These factors
are calculated with combinations of portfolios composed by ranked stocks and
1 Should be noted the most basic definition of a stock return = {(Value of investment at
the end of the period – Value of investment at beginning of the period) + Dividends
received within the period} / Value of investment at beginning of the period = Total
Return. That is, it is just another way to see prices changes.
2. available historical market data. Historical values were downloaded from French's
web page. Carhart (1997) extended the Fama-French model with an additional
momentum factor (MOM), which is long prior-month winners and short prior-
month. We have monthly data points for all the variables (namely, inputs and
outputs factors in the PCA approach) for each of the close to 300 funds starting
since 1987 in some cases.
1. Average Monthly Market Price Discount
a. High and low monthly CEF stock price, counting as two separate inputs
variables
2. Monthly NAV (net asset value: the book value of the firm assets less the firm’
book value liabilities)
3. Fund Monthly Market Return
4. Dividend Distribution per share
5. Fama & French Rm-Rf
6. Fama & French SMB (small [cap] minus big: a measure the historic excess
returns of small caps over the market as a whole)
7. Fama & French HML (high [book/price] minus low: historic excess returns of
"value" stocks over the market as a whole)
8. Fama & French MOM
9. Market Yield 1 year
10.Market Yield 10 years
11.Corporate Bond Yields
a. AAA interests yield
b. BAA interest yield
3. The response (dependent) variable Discount is defined here as the
difference between the Funds’ Average Monthly Market Price and its Monthly
NAV. All the models here will hypothesize the response (output or dependent)
variable is a function of the other inputs (independent) variables (factors),
namely: distribution, monthly return, Rm-Rf, SMB, HML, MOM, market yields
for: 1 year and 10 years length and corporate bond yields for 1 year and 10 years.
Neoclassical finance might expect that the input variable “distribution” has
a great weight in both response models while the others variables should not be
significant (Rm-Rf, SMB, HML, MOM) or its significance be related to the fund
nature of equity or bond (that should give a positive or negative relationship to
market and corporate yields) in a regression analysis. That is, the neoclassical
finance school would hypothesize the first and most important principal
component in a PCA analysis for the output variable would be the “distribution”
of dividends variable, while the others input variables should not have any
component at all (or perhaps be insignificant for all practical purposes).
Behavioral finance would expect the MOM factor to be of great significance;
to be either among the main components of the first principal component or one
the most important among a few other variables or their combinations.
The process has a similarity to the well-known Rubik’s cube toy. A
numerical example clarifies it:
Let us say we have two variables2 named such as:
f(x,y)T =
2 Numerical example taken and modified from Jackson (1991).
4. 10.0 10.4 9.7 9.7 11.7 11.0 8.7 9.5 10.1 9.6 10.5 9.2 11.3 10.1 8.5
10.7 9.8 10.0 10.1 11.5 10.8 8.8 9.3 9.4 9.6 10.4 9.0 11.6 9.8 9.2
FIGURE 1. Scatter Plot of X vs. Y for the PCA Example.
Figure 1 shows both variables. Its mean vector and covariance matrix are:
Mean(f(x,y)T)= [10,10]
Covariance(f(x,y)T)=[ 0.79857142857143 0.67928571428571
0.67928571428571 0.7342857142857]
The correspondent eigenvalues and eigenvectors from that covariance
matrix are:
eigenvalues=[1.44647433819575 0.08638280466139]
eigenvectors=[-0.72362480830445 0.69019355024975
-0.69019355024975 -0.72362480830445]
5. The eigenvectors values are just the arccosines of a new “principal”
rotation of the original variables about their means over the original set of axis
(X,Y). That is, moving “upward” and to the “left” the original axis to align a new
set of axis with the data set that also pinpoint on the direction of the highest
variation. In this case, we move3:
arccosine(eigenvectors)=[43.645432° 133.645432°
46.354568° 43.645432°]
That is, the new abscissa, E1, moved “up” and to the left 43.65° measured
from the old abscissa, X (or moved “down” and to the right 46.35° measured from
the old ordinate, Y). As the new ordinate, E2, must be orthogonal to E1, we have
completed the process4 for this simple case.
Now, the first set of values from the old set (X,Y) was (10,10.7) taking the
mean from each variable we have now (0.0,0.7) which is the same as moving the
origin to (10,10) from the old (0,0). Now, performing the calculation:
X*Eigenvector(1,1) + Y* Eigenvector(1,2)= 0.0*0.7236 + 0.7*0.6902= 0.4831;
X*Eigenvector(2,1) + Y* Eigenvector(2,2)= 0.0*(-0.6902)+ 0.7*0.7236 = 0.5065;
3 It should be noted that the eigenvector could also be represented as
=[0.72362480830445 -0.69019355024975; 0.69019355024975; 0.72362480830445];
that is, the negative sign just shows the fact the line cross over to the other quadrant.
4 Just by adding 90° to the angles; that is, the new ordinate is 43.65°+ 90° = 133.65° and
46.35°+ 90°= 136.35, which are nothing more than the angles from the second
eigenvector.
6. Therefore, we have mapped (10.0, 10.7) into (0.48, 0.51) in the new set of
ordinates. The process is just the inverse5 if you need to get from the new set of
variables (E1, E2) to (X, Y):
E1*Eigenvector(1,1)+E2*Eigenvector(2,1)=0.4831*0.7236+0.5065*(-0.6902)= 0.0
E1*Eigenvector(1,2)+E2*Eigenvector(2,2)=0.4831*0.6902+0.5065*0.7236 = 0.7
Note that we just normalized the original variables6
and that the sample
variance for each normalized variable is not other than its eigenvalue. We can
observe from the fact that the axis has moved almost 45° (also, from the
comparison of the eigenvalues; which are close to one another; that is, E1 ≈ E2 = -
0.72362480830445 ≈ -0.69019355024975) that the pattern between the observed
data might be modeled by the linear equation: Y = X + Intercept. This is further
supported by estimating the simple linear regression methodology in an Excel
spreadsheet, which gives us the following summary output:
5 Usually this would require to calculate the inverse matrix; however, the eigenvector
matrix inverse its transpose matrix. This should dismiss any claim that the original
variables cannot be recovered, although it would be a cumbersome process for a
multivariable space.
6 From the statistics world, this is the reason the numbers in the eigenvectors are also
known as “scores”, short for z scores.
7. FIGURE 2. Summary Output of X vs. Y for the PCA Example
From Figure 2 it can be seen, a very simple model for this could be
conjectured to be: Y=0.85X+1.49; or when taking the p-values (or confidence
intervals) into consideration we can also use Y = X (a 45° degree angle line7). This
is what we referred when we talked about the data reduction feature of PCA and
its pattern discovered feature.
II. Playing with the “Rubik’s cube”:
By simple transforming the original data with Matlab we obtained the
most important eigenvalues and their accumulative weight:
7
Using the convenient fact that the true variable coefficient should be within (-1.17, 4.16) so, setting
it to 1 (a value inside the interval) is more revealing about the possible true relationship between the
variables.
8. TABLE 1. Eigenvalues relative weight in this case
From Table 1 we can affirm that with only five (5) eingenvalues about
99.54% of the variability of the data could be well modeled. Thus a significant
data compression has been achieved. The PCA transformation matrix is now a
(15,5) matrix where the convention is (variable, eingenvalue) where the order of
the variable is: 1) hi_price; 2)lo_price; 3) NAV; 4) Monthly_Return; 5)
AVE_Discount; 6) DPS; 7) Rm_Rf; 8) SMB; 9) HML; 10) MOM; 11) Rf; 12)
Yield_OneYear; 13) Yield_TenYear; 14) AAA; 15) BAA and look as follows:
9. TABLE 2. Details of the five most principal Eigenvalues and the fifteen variables
We know this eigenvalue matrix is also the arccosine matrix. That is, an
original datapoint must moved as directed from its original 15 dimensional word
to a new mapping with only 5 dimensions. We can observe fom Table 2 that some
of the angles have cosines so little that will only cause the datapoint to move in
almost right angles. We set such angles to 90° (cosine 90°=0) which is the same as
to have set their arccosines to zero. This is shown in Table 3 below.
TABLE 3. Details of the five most principal Eigenvalues and the fifteen variables
where non-significant values were set to zero
10. From Table 3, it is quite obvious the most important variables8 are: 1 (high
price), 2 (low price), 3 (NAV), 5 (average discount) and 10 (MOM) for the first
eigenvalue which happens to be the one where the most variability is associated.
Most important variable for the 2nd eigenvalue are: 3 (NAV), 8 (SMB) ; 9 (HML) and
10 (MOM); for the third eigenvalue: : 3 (NAV), 9 (HML) and 10 (MOM); for the
fourth eigenvalue: 5 (average discount), 8 (SMB) ; 9 (HML) and 10 (MOM); and for
the fifth and last eigenvalue: 1 (high price), 2 (low price), 3 (NAV), 5 (average
discount) and 9 (HML).
The first eigenvalue variable list is illustrative as we posit the average
discount = average price – NAV. It pops out that MOM variable has a substantial
effect in all five eigenvalues; in particular, MOM is the only other variable that is
not part of the linear equation that defines the puzzling discount in the first
eigenvalue which is the principal direction of the system variability; therefore, it
should be an important variable affecting the discount.
As a validation exercise we used fifty-six (56) funds were used in the
validation of the models with 11,327 data points. This represent an 18.98%
sample size (56 out of 295 funds; a mix of new and previously used funds) and a
21.59% (11,327/52,462 datapoints) close to the general 20% normally used as
guideline for model validation purposes. The interpolations9 dates range from
8
Important in the sense that to move the system accordingly to the angle associate to that variable
moves the system quirkier to the new position.
9
Interpolation: A method of constructing new data points within the range of a discrete set of
known data points. It could be use as a validation of the model as the model will predict values
inside its range that should be in close agreement with the actual values observed. Great
departures from such values indicate a poor model.
11. August 1987 to December 2010 and the extrapolations10 date from January 2011
to October 2011.
The Euclidean distance between the validating sample size of 11,327
discount data values and their estimates as obtained by the transformation PCA
matrix was a total of 18.26358994273340 separation units; equivalent to a MSE
≈ 0.02944810784818.
III. I think I can solve the puzzle now:
The puzzling behavior of the CEF discount seems to be mostly caused by
the MOM. The linear equation: average discount= average price –NAV becomes
for the first eigenvalue:
Abs{[arccosine(0.59227071437245) + arccosine(0.53641494107392)]/2 –
arccosine(0.59955661177596)} ≈ arccosine(0.03521378405278)
=
[33.93461226273970° + 30.73431219129440°]/2 - 34.35206343392610°
= abs{ -2.01760120690908°}
≈
2.01760120690938°
10
Extrapolation: The process of constructing new data points outside the range of a discrete set of
known data points. It is similar to the process of interpolation, but the results of extrapolations are
subject to greater uncertainty. It could be use as a validation of the model as the model will predict
values outside its range that should be in close agreement with the actual values observed. Close
agreement from such values indicate a good model.
12. Any difference in the actual discount seen not covered by this equation
must come from the MOM angle’s arccosine (-0.02408744663071) = -
1.38010903118630° and the statistical error.