SlideShare a Scribd company logo
1 of 12
Playing with the Rubik cube:
Principal Component Analysis
Solving the
Close End Funds Puzzle?
by
Ismael Torres-Pizarro, University of Puerto Rico
Ismaeltorres2002@yahoo.co, (787) 315-5636
A simple PCA model was used to find the direction of most variability for the CEF puzzle.
Evidence that the MOM factor as detailed by Carhart (1997) explains this puzzle was
found. Data sets used are available for independent verification of results.
I. Setting the game:
In his well-known undergraduate textbook, Madura (2003) states that
stock prices are affected by economic factors (such as market yields, bond yields
which are a proxy measurements for market risk and changes in the bond
markets that might cause investors to switch from bonds to stocks and viceverse),
firm specific factors (such as dividend policy, acquisitions, expectations, etc) and
market related factors (such as investor sentiment, etc.). Fama & French (1992,
1993) discussed the use of two factors in addition to the firm beta to model stock
returns1: a) SMB stands for "small (market capitalization) minus big" b) HML for
"high (book-to-market ratio) minus low"; they measure the historic excess returns
of small caps over big caps and of value stocks over growth stocks. These factors
are calculated with combinations of portfolios composed by ranked stocks and
1 Should be noted the most basic definition of a stock return = {(Value of investment at
the end of the period – Value of investment at beginning of the period) + Dividends
received within the period} / Value of investment at beginning of the period = Total
Return. That is, it is just another way to see prices changes.
available historical market data. Historical values were downloaded from French's
web page. Carhart (1997) extended the Fama-French model with an additional
momentum factor (MOM), which is long prior-month winners and short prior-
month. We have monthly data points for all the variables (namely, inputs and
outputs factors in the PCA approach) for each of the close to 300 funds starting
since 1987 in some cases.
1. Average Monthly Market Price Discount
a. High and low monthly CEF stock price, counting as two separate inputs
variables
2. Monthly NAV (net asset value: the book value of the firm assets less the firm’
book value liabilities)
3. Fund Monthly Market Return
4. Dividend Distribution per share
5. Fama & French Rm-Rf
6. Fama & French SMB (small [cap] minus big: a measure the historic excess
returns of small caps over the market as a whole)
7. Fama & French HML (high [book/price] minus low: historic excess returns of
"value" stocks over the market as a whole)
8. Fama & French MOM
9. Market Yield 1 year
10.Market Yield 10 years
11.Corporate Bond Yields
a. AAA interests yield
b. BAA interest yield
The response (dependent) variable Discount is defined here as the
difference between the Funds’ Average Monthly Market Price and its Monthly
NAV. All the models here will hypothesize the response (output or dependent)
variable is a function of the other inputs (independent) variables (factors),
namely: distribution, monthly return, Rm-Rf, SMB, HML, MOM, market yields
for: 1 year and 10 years length and corporate bond yields for 1 year and 10 years.
Neoclassical finance might expect that the input variable “distribution” has
a great weight in both response models while the others variables should not be
significant (Rm-Rf, SMB, HML, MOM) or its significance be related to the fund
nature of equity or bond (that should give a positive or negative relationship to
market and corporate yields) in a regression analysis. That is, the neoclassical
finance school would hypothesize the first and most important principal
component in a PCA analysis for the output variable would be the “distribution”
of dividends variable, while the others input variables should not have any
component at all (or perhaps be insignificant for all practical purposes).
Behavioral finance would expect the MOM factor to be of great significance;
to be either among the main components of the first principal component or one
the most important among a few other variables or their combinations.
The process has a similarity to the well-known Rubik’s cube toy. A
numerical example clarifies it:
Let us say we have two variables2 named such as:
f(x,y)T =
2 Numerical example taken and modified from Jackson (1991).
10.0 10.4 9.7 9.7 11.7 11.0 8.7 9.5 10.1 9.6 10.5 9.2 11.3 10.1 8.5
10.7 9.8 10.0 10.1 11.5 10.8 8.8 9.3 9.4 9.6 10.4 9.0 11.6 9.8 9.2
FIGURE 1. Scatter Plot of X vs. Y for the PCA Example.
Figure 1 shows both variables. Its mean vector and covariance matrix are:
Mean(f(x,y)T)= [10,10]
Covariance(f(x,y)T)=[ 0.79857142857143 0.67928571428571
0.67928571428571 0.7342857142857]
The correspondent eigenvalues and eigenvectors from that covariance
matrix are:
eigenvalues=[1.44647433819575 0.08638280466139]
eigenvectors=[-0.72362480830445 0.69019355024975
-0.69019355024975 -0.72362480830445]
The eigenvectors values are just the arccosines of a new “principal”
rotation of the original variables about their means over the original set of axis
(X,Y). That is, moving “upward” and to the “left” the original axis to align a new
set of axis with the data set that also pinpoint on the direction of the highest
variation. In this case, we move3:
arccosine(eigenvectors)=[43.645432° 133.645432°
46.354568° 43.645432°]
That is, the new abscissa, E1, moved “up” and to the left 43.65° measured
from the old abscissa, X (or moved “down” and to the right 46.35° measured from
the old ordinate, Y). As the new ordinate, E2, must be orthogonal to E1, we have
completed the process4 for this simple case.
Now, the first set of values from the old set (X,Y) was (10,10.7) taking the
mean from each variable we have now (0.0,0.7) which is the same as moving the
origin to (10,10) from the old (0,0). Now, performing the calculation:
X*Eigenvector(1,1) + Y* Eigenvector(1,2)= 0.0*0.7236 + 0.7*0.6902= 0.4831;
X*Eigenvector(2,1) + Y* Eigenvector(2,2)= 0.0*(-0.6902)+ 0.7*0.7236 = 0.5065;
3 It should be noted that the eigenvector could also be represented as
=[0.72362480830445 -0.69019355024975; 0.69019355024975; 0.72362480830445];
that is, the negative sign just shows the fact the line cross over to the other quadrant.
4 Just by adding 90° to the angles; that is, the new ordinate is 43.65°+ 90° = 133.65° and
46.35°+ 90°= 136.35, which are nothing more than the angles from the second
eigenvector.
Therefore, we have mapped (10.0, 10.7) into (0.48, 0.51) in the new set of
ordinates. The process is just the inverse5 if you need to get from the new set of
variables (E1, E2) to (X, Y):
E1*Eigenvector(1,1)+E2*Eigenvector(2,1)=0.4831*0.7236+0.5065*(-0.6902)= 0.0
E1*Eigenvector(1,2)+E2*Eigenvector(2,2)=0.4831*0.6902+0.5065*0.7236 = 0.7
Note that we just normalized the original variables6
and that the sample
variance for each normalized variable is not other than its eigenvalue. We can
observe from the fact that the axis has moved almost 45° (also, from the
comparison of the eigenvalues; which are close to one another; that is, E1 ≈ E2 = -
0.72362480830445 ≈ -0.69019355024975) that the pattern between the observed
data might be modeled by the linear equation: Y = X + Intercept. This is further
supported by estimating the simple linear regression methodology in an Excel
spreadsheet, which gives us the following summary output:
5 Usually this would require to calculate the inverse matrix; however, the eigenvector
matrix inverse its transpose matrix. This should dismiss any claim that the original
variables cannot be recovered, although it would be a cumbersome process for a
multivariable space.
6 From the statistics world, this is the reason the numbers in the eigenvectors are also
known as “scores”, short for z scores.
FIGURE 2. Summary Output of X vs. Y for the PCA Example
From Figure 2 it can be seen, a very simple model for this could be
conjectured to be: Y=0.85X+1.49; or when taking the p-values (or confidence
intervals) into consideration we can also use Y = X (a 45° degree angle line7). This
is what we referred when we talked about the data reduction feature of PCA and
its pattern discovered feature.
II. Playing with the “Rubik’s cube”:
By simple transforming the original data with Matlab we obtained the
most important eigenvalues and their accumulative weight:
7
Using the convenient fact that the true variable coefficient should be within (-1.17, 4.16) so, setting
it to 1 (a value inside the interval) is more revealing about the possible true relationship between the
variables.
TABLE 1. Eigenvalues relative weight in this case
From Table 1 we can affirm that with only five (5) eingenvalues about
99.54% of the variability of the data could be well modeled. Thus a significant
data compression has been achieved. The PCA transformation matrix is now a
(15,5) matrix where the convention is (variable, eingenvalue) where the order of
the variable is: 1) hi_price; 2)lo_price; 3) NAV; 4) Monthly_Return; 5)
AVE_Discount; 6) DPS; 7) Rm_Rf; 8) SMB; 9) HML; 10) MOM; 11) Rf; 12)
Yield_OneYear; 13) Yield_TenYear; 14) AAA; 15) BAA and look as follows:
TABLE 2. Details of the five most principal Eigenvalues and the fifteen variables
We know this eigenvalue matrix is also the arccosine matrix. That is, an
original datapoint must moved as directed from its original 15 dimensional word
to a new mapping with only 5 dimensions. We can observe fom Table 2 that some
of the angles have cosines so little that will only cause the datapoint to move in
almost right angles. We set such angles to 90° (cosine 90°=0) which is the same as
to have set their arccosines to zero. This is shown in Table 3 below.
TABLE 3. Details of the five most principal Eigenvalues and the fifteen variables
where non-significant values were set to zero
From Table 3, it is quite obvious the most important variables8 are: 1 (high
price), 2 (low price), 3 (NAV), 5 (average discount) and 10 (MOM) for the first
eigenvalue which happens to be the one where the most variability is associated.
Most important variable for the 2nd eigenvalue are: 3 (NAV), 8 (SMB) ; 9 (HML) and
10 (MOM); for the third eigenvalue: : 3 (NAV), 9 (HML) and 10 (MOM); for the
fourth eigenvalue: 5 (average discount), 8 (SMB) ; 9 (HML) and 10 (MOM); and for
the fifth and last eigenvalue: 1 (high price), 2 (low price), 3 (NAV), 5 (average
discount) and 9 (HML).
The first eigenvalue variable list is illustrative as we posit the average
discount = average price – NAV. It pops out that MOM variable has a substantial
effect in all five eigenvalues; in particular, MOM is the only other variable that is
not part of the linear equation that defines the puzzling discount in the first
eigenvalue which is the principal direction of the system variability; therefore, it
should be an important variable affecting the discount.
As a validation exercise we used fifty-six (56) funds were used in the
validation of the models with 11,327 data points. This represent an 18.98%
sample size (56 out of 295 funds; a mix of new and previously used funds) and a
21.59% (11,327/52,462 datapoints) close to the general 20% normally used as
guideline for model validation purposes. The interpolations9 dates range from
8
Important in the sense that to move the system accordingly to the angle associate to that variable
moves the system quirkier to the new position.
9
Interpolation: A method of constructing new data points within the range of a discrete set of
known data points. It could be use as a validation of the model as the model will predict values
inside its range that should be in close agreement with the actual values observed. Great
departures from such values indicate a poor model.
August 1987 to December 2010 and the extrapolations10 date from January 2011
to October 2011.
The Euclidean distance between the validating sample size of 11,327
discount data values and their estimates as obtained by the transformation PCA
matrix was a total of 18.26358994273340 separation units; equivalent to a MSE
≈ 0.02944810784818.
III. I think I can solve the puzzle now:
The puzzling behavior of the CEF discount seems to be mostly caused by
the MOM. The linear equation: average discount= average price –NAV becomes
for the first eigenvalue:
Abs{[arccosine(0.59227071437245) + arccosine(0.53641494107392)]/2 –
arccosine(0.59955661177596)} ≈ arccosine(0.03521378405278)
=
[33.93461226273970° + 30.73431219129440°]/2 - 34.35206343392610°
= abs{ -2.01760120690908°}
≈
2.01760120690938°
10
Extrapolation: The process of constructing new data points outside the range of a discrete set of
known data points. It is similar to the process of interpolation, but the results of extrapolations are
subject to greater uncertainty. It could be use as a validation of the model as the model will predict
values outside its range that should be in close agreement with the actual values observed. Close
agreement from such values indicate a good model.
Any difference in the actual discount seen not covered by this equation
must come from the MOM angle’s arccosine (-0.02408744663071) = -
1.38010903118630° and the statistical error.

More Related Content

What's hot

The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderAshwin Rao
 
Critical Review
Critical ReviewCritical Review
Critical ReviewZhu Yanran
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regressionjamuga gitulho
 
A relability assessment
A relability assessmentA relability assessment
A relability assessmentMohammed Awad
 
Forecasting With An Adaptive Control Algorithm
Forecasting With An Adaptive Control AlgorithmForecasting With An Adaptive Control Algorithm
Forecasting With An Adaptive Control Algorithmshwetakarsh
 
Regression topics
Regression topicsRegression topics
Regression topicsGaetan Lion
 

What's hot (11)

The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options Trader
 
Critical Review
Critical ReviewCritical Review
Critical Review
 
2.1 regression
2.1 regression2.1 regression
2.1 regression
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
 
A relability assessment
A relability assessmentA relability assessment
A relability assessment
 
Econometrics ch12
Econometrics ch12Econometrics ch12
Econometrics ch12
 
Ch2 slides
Ch2 slidesCh2 slides
Ch2 slides
 
Ch4 slides
Ch4 slidesCh4 slides
Ch4 slides
 
Forecasting With An Adaptive Control Algorithm
Forecasting With An Adaptive Control AlgorithmForecasting With An Adaptive Control Algorithm
Forecasting With An Adaptive Control Algorithm
 
Regression topics
Regression topicsRegression topics
Regression topics
 
Chapter 04
Chapter 04 Chapter 04
Chapter 04
 

Viewers also liked

2008 TSPR 132: Banco Popular vs. Sucn. Talavera- Lease o Alquiler con opción ...
2008 TSPR 132: Banco Popular vs. Sucn. Talavera- Lease o Alquiler con opción ...2008 TSPR 132: Banco Popular vs. Sucn. Talavera- Lease o Alquiler con opción ...
2008 TSPR 132: Banco Popular vs. Sucn. Talavera- Lease o Alquiler con opción ...Ismael Torres-Pizarro, PhD, PE, Esq.
 
Microsoft dará acceso al programa office 365 sin costo en la upr – metro
Microsoft dará acceso al programa office 365 sin costo en la upr – metroMicrosoft dará acceso al programa office 365 sin costo en la upr – metro
Microsoft dará acceso al programa office 365 sin costo en la upr – metroIsmael Torres-Pizarro, PhD, PE, Esq.
 
Una mirada retrospectiva a Herrera Bolívar et al v. Ramírez Torres1: Cálculo ...
Una mirada retrospectiva a Herrera Bolívar et al v. Ramírez Torres1: Cálculo ...Una mirada retrospectiva a Herrera Bolívar et al v. Ramírez Torres1: Cálculo ...
Una mirada retrospectiva a Herrera Bolívar et al v. Ramírez Torres1: Cálculo ...Ismael Torres-Pizarro, PhD, PE, Esq.
 
Pensiones versus Pesos o Porque entiendo que Santiago v Maysonet1 está mal “d...
Pensiones versus Pesos o Porque entiendo que Santiago v Maysonet1 está mal “d...Pensiones versus Pesos o Porque entiendo que Santiago v Maysonet1 está mal “d...
Pensiones versus Pesos o Porque entiendo que Santiago v Maysonet1 está mal “d...Ismael Torres-Pizarro, PhD, PE, Esq.
 
Un Breve Estudio Comparado sobre el Reglamento Educación Continua de los Inge...
Un Breve Estudio Comparado sobre el Reglamento Educación Continua de los Inge...Un Breve Estudio Comparado sobre el Reglamento Educación Continua de los Inge...
Un Breve Estudio Comparado sobre el Reglamento Educación Continua de los Inge...Ismael Torres-Pizarro, PhD, PE, Esq.
 

Viewers also liked (16)

paper on PCA_rubik cube_sampling writing
paper on PCA_rubik cube_sampling writingpaper on PCA_rubik cube_sampling writing
paper on PCA_rubik cube_sampling writing
 
Apuntes sobre la Educación Legal Continuada en PR
Apuntes sobre la Educación Legal Continuada en PRApuntes sobre la Educación Legal Continuada en PR
Apuntes sobre la Educación Legal Continuada en PR
 
2008 TSPR 132: Banco Popular vs. Sucn. Talavera- Lease o Alquiler con opción ...
2008 TSPR 132: Banco Popular vs. Sucn. Talavera- Lease o Alquiler con opción ...2008 TSPR 132: Banco Popular vs. Sucn. Talavera- Lease o Alquiler con opción ...
2008 TSPR 132: Banco Popular vs. Sucn. Talavera- Lease o Alquiler con opción ...
 
Calculos daños de nuevo
Calculos daños  de nuevoCalculos daños  de nuevo
Calculos daños de nuevo
 
PDFMailer
PDFMailerPDFMailer
PDFMailer
 
Franco Resto vs Rivera Aponte1: Cálculos Matemáticos
Franco Resto vs Rivera Aponte1: Cálculos MatemáticosFranco Resto vs Rivera Aponte1: Cálculos Matemáticos
Franco Resto vs Rivera Aponte1: Cálculos Matemáticos
 
sessionid=3-5E5D358
sessionid=3-5E5D358sessionid=3-5E5D358
sessionid=3-5E5D358
 
Planes de pensión definidos
Planes de pensión definidosPlanes de pensión definidos
Planes de pensión definidos
 
sessionid=3-51DF424
sessionid=3-51DF424sessionid=3-51DF424
sessionid=3-51DF424
 
Microsoft dará acceso al programa office 365 sin costo en la upr – metro
Microsoft dará acceso al programa office 365 sin costo en la upr – metroMicrosoft dará acceso al programa office 365 sin costo en la upr – metro
Microsoft dará acceso al programa office 365 sin costo en la upr – metro
 
Apuntes sobre cuota viudal usufructuaria
Apuntes sobre cuota viudal usufructuariaApuntes sobre cuota viudal usufructuaria
Apuntes sobre cuota viudal usufructuaria
 
Una mirada retrospectiva a Herrera Bolívar et al v. Ramírez Torres1: Cálculo ...
Una mirada retrospectiva a Herrera Bolívar et al v. Ramírez Torres1: Cálculo ...Una mirada retrospectiva a Herrera Bolívar et al v. Ramírez Torres1: Cálculo ...
Una mirada retrospectiva a Herrera Bolívar et al v. Ramírez Torres1: Cálculo ...
 
Pensiones versus Pesos o Porque entiendo que Santiago v Maysonet1 está mal “d...
Pensiones versus Pesos o Porque entiendo que Santiago v Maysonet1 está mal “d...Pensiones versus Pesos o Porque entiendo que Santiago v Maysonet1 está mal “d...
Pensiones versus Pesos o Porque entiendo que Santiago v Maysonet1 está mal “d...
 
Ingeniería Financiera: Una Breve Guía
Ingeniería Financiera: Una Breve GuíaIngeniería Financiera: Una Breve Guía
Ingeniería Financiera: Una Breve Guía
 
Un Breve Estudio Comparado sobre el Reglamento Educación Continua de los Inge...
Un Breve Estudio Comparado sobre el Reglamento Educación Continua de los Inge...Un Breve Estudio Comparado sobre el Reglamento Educación Continua de los Inge...
Un Breve Estudio Comparado sobre el Reglamento Educación Continua de los Inge...
 
Gordon growth model proof
Gordon growth model proofGordon growth model proof
Gordon growth model proof
 

Similar to Playing with the Rubik cube: Principal Component Analysis Solving the Close End Funds Puzzle?

Stochastic Vol Forecasting
Stochastic Vol ForecastingStochastic Vol Forecasting
Stochastic Vol ForecastingSwati Mital
 
CFM Challenge - Course Project
CFM Challenge - Course ProjectCFM Challenge - Course Project
CFM Challenge - Course ProjectKhalilBergaoui
 
Affine Term Structure Model with Stochastic Market Price of Risk
Affine Term Structure Model with Stochastic Market Price of RiskAffine Term Structure Model with Stochastic Market Price of Risk
Affine Term Structure Model with Stochastic Market Price of RiskSwati Mital
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegressionDaniel K
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectSurya Chandra
 
Vasicek Model Project
Vasicek Model ProjectVasicek Model Project
Vasicek Model ProjectCedric Melhy
 
Financial asset prices dynamic
Financial asset prices dynamicFinancial asset prices dynamic
Financial asset prices dynamicSaïd Bolgot
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.pptTanyaWadhwani4
 
LFM Pedersen Calibration - Cappelli
LFM Pedersen Calibration - CappelliLFM Pedersen Calibration - Cappelli
LFM Pedersen Calibration - CappelliJoel Cappelli
 
The Fundamental theorem of calculus
The Fundamental theorem of calculus The Fundamental theorem of calculus
The Fundamental theorem of calculus AhsanIrshad8
 
Combining Economic Fundamentals to Predict Exchange Rates
Combining Economic Fundamentals to Predict Exchange RatesCombining Economic Fundamentals to Predict Exchange Rates
Combining Economic Fundamentals to Predict Exchange RatesBrant Munro
 
Principal component analysis in modelling
Principal component analysis in modellingPrincipal component analysis in modelling
Principal component analysis in modellingharvcap
 
Predicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in RPredicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in RSotiris Baratsas
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time seriesLuigi Piva CQF
 

Similar to Playing with the Rubik cube: Principal Component Analysis Solving the Close End Funds Puzzle? (20)

Stochastic Vol Forecasting
Stochastic Vol ForecastingStochastic Vol Forecasting
Stochastic Vol Forecasting
 
CFM Challenge - Course Project
CFM Challenge - Course ProjectCFM Challenge - Course Project
CFM Challenge - Course Project
 
Time series
Time seriesTime series
Time series
 
Affine Term Structure Model with Stochastic Market Price of Risk
Affine Term Structure Model with Stochastic Market Price of RiskAffine Term Structure Model with Stochastic Market Price of Risk
Affine Term Structure Model with Stochastic Market Price of Risk
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems Project
 
The Short-term Swap Rate Models in China
The Short-term Swap Rate Models in ChinaThe Short-term Swap Rate Models in China
The Short-term Swap Rate Models in China
 
Vasicek Model Project
Vasicek Model ProjectVasicek Model Project
Vasicek Model Project
 
Financial asset prices dynamic
Financial asset prices dynamicFinancial asset prices dynamic
Financial asset prices dynamic
 
Crude Oil Levy
Crude Oil LevyCrude Oil Levy
Crude Oil Levy
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
LFM Pedersen Calibration - Cappelli
LFM Pedersen Calibration - CappelliLFM Pedersen Calibration - Cappelli
LFM Pedersen Calibration - Cappelli
 
Implimenting_HJM
Implimenting_HJMImplimenting_HJM
Implimenting_HJM
 
20120140503019
2012014050301920120140503019
20120140503019
 
The Fundamental theorem of calculus
The Fundamental theorem of calculus The Fundamental theorem of calculus
The Fundamental theorem of calculus
 
Case Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R CodeCase Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R Code
 
Combining Economic Fundamentals to Predict Exchange Rates
Combining Economic Fundamentals to Predict Exchange RatesCombining Economic Fundamentals to Predict Exchange Rates
Combining Economic Fundamentals to Predict Exchange Rates
 
Principal component analysis in modelling
Principal component analysis in modellingPrincipal component analysis in modelling
Principal component analysis in modelling
 
Predicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in RPredicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in R
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time series
 

Recently uploaded

VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130Suhani Kapoor
 
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxOAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxhiddenlevers
 
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一S SDS
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
Financial institutions facilitate financing, economic transactions, issue fun...
Financial institutions facilitate financing, economic transactions, issue fun...Financial institutions facilitate financing, economic transactions, issue fun...
Financial institutions facilitate financing, economic transactions, issue fun...Avanish Goel
 
Andheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot ModelsAndheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot Modelshematsharma006
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawlmakika9823
 
How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingAggregage
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managmentfactical
 
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130Suhani Kapoor
 
Unveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net WorthUnveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net WorthShaheen Kumar
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignHenry Tapper
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Commonwealth
 
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services  9892124323 | ₹,4500 With Room Free DeliveryMalad Call Girl in Services  9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free DeliveryPooja Nehwal
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfHenry Tapper
 
Quantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector CompaniesQuantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector Companiesprashantbhati354
 
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...yordanosyohannes2
 

Recently uploaded (20)

VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
 
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxOAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
 
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
Financial institutions facilitate financing, economic transactions, issue fun...
Financial institutions facilitate financing, economic transactions, issue fun...Financial institutions facilitate financing, economic transactions, issue fun...
Financial institutions facilitate financing, economic transactions, issue fun...
 
Andheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot ModelsAndheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot Models
 
Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
 
How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of Reporting
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managment
 
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
 
Unveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net WorthUnveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net Worth
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]
 
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services  9892124323 | ₹,4500 With Room Free DeliveryMalad Call Girl in Services  9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
 
Quantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector CompaniesQuantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector Companies
 
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
 

Playing with the Rubik cube: Principal Component Analysis Solving the Close End Funds Puzzle?

  • 1. Playing with the Rubik cube: Principal Component Analysis Solving the Close End Funds Puzzle? by Ismael Torres-Pizarro, University of Puerto Rico Ismaeltorres2002@yahoo.co, (787) 315-5636 A simple PCA model was used to find the direction of most variability for the CEF puzzle. Evidence that the MOM factor as detailed by Carhart (1997) explains this puzzle was found. Data sets used are available for independent verification of results. I. Setting the game: In his well-known undergraduate textbook, Madura (2003) states that stock prices are affected by economic factors (such as market yields, bond yields which are a proxy measurements for market risk and changes in the bond markets that might cause investors to switch from bonds to stocks and viceverse), firm specific factors (such as dividend policy, acquisitions, expectations, etc) and market related factors (such as investor sentiment, etc.). Fama & French (1992, 1993) discussed the use of two factors in addition to the firm beta to model stock returns1: a) SMB stands for "small (market capitalization) minus big" b) HML for "high (book-to-market ratio) minus low"; they measure the historic excess returns of small caps over big caps and of value stocks over growth stocks. These factors are calculated with combinations of portfolios composed by ranked stocks and 1 Should be noted the most basic definition of a stock return = {(Value of investment at the end of the period – Value of investment at beginning of the period) + Dividends received within the period} / Value of investment at beginning of the period = Total Return. That is, it is just another way to see prices changes.
  • 2. available historical market data. Historical values were downloaded from French's web page. Carhart (1997) extended the Fama-French model with an additional momentum factor (MOM), which is long prior-month winners and short prior- month. We have monthly data points for all the variables (namely, inputs and outputs factors in the PCA approach) for each of the close to 300 funds starting since 1987 in some cases. 1. Average Monthly Market Price Discount a. High and low monthly CEF stock price, counting as two separate inputs variables 2. Monthly NAV (net asset value: the book value of the firm assets less the firm’ book value liabilities) 3. Fund Monthly Market Return 4. Dividend Distribution per share 5. Fama & French Rm-Rf 6. Fama & French SMB (small [cap] minus big: a measure the historic excess returns of small caps over the market as a whole) 7. Fama & French HML (high [book/price] minus low: historic excess returns of "value" stocks over the market as a whole) 8. Fama & French MOM 9. Market Yield 1 year 10.Market Yield 10 years 11.Corporate Bond Yields a. AAA interests yield b. BAA interest yield
  • 3. The response (dependent) variable Discount is defined here as the difference between the Funds’ Average Monthly Market Price and its Monthly NAV. All the models here will hypothesize the response (output or dependent) variable is a function of the other inputs (independent) variables (factors), namely: distribution, monthly return, Rm-Rf, SMB, HML, MOM, market yields for: 1 year and 10 years length and corporate bond yields for 1 year and 10 years. Neoclassical finance might expect that the input variable “distribution” has a great weight in both response models while the others variables should not be significant (Rm-Rf, SMB, HML, MOM) or its significance be related to the fund nature of equity or bond (that should give a positive or negative relationship to market and corporate yields) in a regression analysis. That is, the neoclassical finance school would hypothesize the first and most important principal component in a PCA analysis for the output variable would be the “distribution” of dividends variable, while the others input variables should not have any component at all (or perhaps be insignificant for all practical purposes). Behavioral finance would expect the MOM factor to be of great significance; to be either among the main components of the first principal component or one the most important among a few other variables or their combinations. The process has a similarity to the well-known Rubik’s cube toy. A numerical example clarifies it: Let us say we have two variables2 named such as: f(x,y)T = 2 Numerical example taken and modified from Jackson (1991).
  • 4. 10.0 10.4 9.7 9.7 11.7 11.0 8.7 9.5 10.1 9.6 10.5 9.2 11.3 10.1 8.5 10.7 9.8 10.0 10.1 11.5 10.8 8.8 9.3 9.4 9.6 10.4 9.0 11.6 9.8 9.2 FIGURE 1. Scatter Plot of X vs. Y for the PCA Example. Figure 1 shows both variables. Its mean vector and covariance matrix are: Mean(f(x,y)T)= [10,10] Covariance(f(x,y)T)=[ 0.79857142857143 0.67928571428571 0.67928571428571 0.7342857142857] The correspondent eigenvalues and eigenvectors from that covariance matrix are: eigenvalues=[1.44647433819575 0.08638280466139] eigenvectors=[-0.72362480830445 0.69019355024975 -0.69019355024975 -0.72362480830445]
  • 5. The eigenvectors values are just the arccosines of a new “principal” rotation of the original variables about their means over the original set of axis (X,Y). That is, moving “upward” and to the “left” the original axis to align a new set of axis with the data set that also pinpoint on the direction of the highest variation. In this case, we move3: arccosine(eigenvectors)=[43.645432° 133.645432° 46.354568° 43.645432°] That is, the new abscissa, E1, moved “up” and to the left 43.65° measured from the old abscissa, X (or moved “down” and to the right 46.35° measured from the old ordinate, Y). As the new ordinate, E2, must be orthogonal to E1, we have completed the process4 for this simple case. Now, the first set of values from the old set (X,Y) was (10,10.7) taking the mean from each variable we have now (0.0,0.7) which is the same as moving the origin to (10,10) from the old (0,0). Now, performing the calculation: X*Eigenvector(1,1) + Y* Eigenvector(1,2)= 0.0*0.7236 + 0.7*0.6902= 0.4831; X*Eigenvector(2,1) + Y* Eigenvector(2,2)= 0.0*(-0.6902)+ 0.7*0.7236 = 0.5065; 3 It should be noted that the eigenvector could also be represented as =[0.72362480830445 -0.69019355024975; 0.69019355024975; 0.72362480830445]; that is, the negative sign just shows the fact the line cross over to the other quadrant. 4 Just by adding 90° to the angles; that is, the new ordinate is 43.65°+ 90° = 133.65° and 46.35°+ 90°= 136.35, which are nothing more than the angles from the second eigenvector.
  • 6. Therefore, we have mapped (10.0, 10.7) into (0.48, 0.51) in the new set of ordinates. The process is just the inverse5 if you need to get from the new set of variables (E1, E2) to (X, Y): E1*Eigenvector(1,1)+E2*Eigenvector(2,1)=0.4831*0.7236+0.5065*(-0.6902)= 0.0 E1*Eigenvector(1,2)+E2*Eigenvector(2,2)=0.4831*0.6902+0.5065*0.7236 = 0.7 Note that we just normalized the original variables6 and that the sample variance for each normalized variable is not other than its eigenvalue. We can observe from the fact that the axis has moved almost 45° (also, from the comparison of the eigenvalues; which are close to one another; that is, E1 ≈ E2 = - 0.72362480830445 ≈ -0.69019355024975) that the pattern between the observed data might be modeled by the linear equation: Y = X + Intercept. This is further supported by estimating the simple linear regression methodology in an Excel spreadsheet, which gives us the following summary output: 5 Usually this would require to calculate the inverse matrix; however, the eigenvector matrix inverse its transpose matrix. This should dismiss any claim that the original variables cannot be recovered, although it would be a cumbersome process for a multivariable space. 6 From the statistics world, this is the reason the numbers in the eigenvectors are also known as “scores”, short for z scores.
  • 7. FIGURE 2. Summary Output of X vs. Y for the PCA Example From Figure 2 it can be seen, a very simple model for this could be conjectured to be: Y=0.85X+1.49; or when taking the p-values (or confidence intervals) into consideration we can also use Y = X (a 45° degree angle line7). This is what we referred when we talked about the data reduction feature of PCA and its pattern discovered feature. II. Playing with the “Rubik’s cube”: By simple transforming the original data with Matlab we obtained the most important eigenvalues and their accumulative weight: 7 Using the convenient fact that the true variable coefficient should be within (-1.17, 4.16) so, setting it to 1 (a value inside the interval) is more revealing about the possible true relationship between the variables.
  • 8. TABLE 1. Eigenvalues relative weight in this case From Table 1 we can affirm that with only five (5) eingenvalues about 99.54% of the variability of the data could be well modeled. Thus a significant data compression has been achieved. The PCA transformation matrix is now a (15,5) matrix where the convention is (variable, eingenvalue) where the order of the variable is: 1) hi_price; 2)lo_price; 3) NAV; 4) Monthly_Return; 5) AVE_Discount; 6) DPS; 7) Rm_Rf; 8) SMB; 9) HML; 10) MOM; 11) Rf; 12) Yield_OneYear; 13) Yield_TenYear; 14) AAA; 15) BAA and look as follows:
  • 9. TABLE 2. Details of the five most principal Eigenvalues and the fifteen variables We know this eigenvalue matrix is also the arccosine matrix. That is, an original datapoint must moved as directed from its original 15 dimensional word to a new mapping with only 5 dimensions. We can observe fom Table 2 that some of the angles have cosines so little that will only cause the datapoint to move in almost right angles. We set such angles to 90° (cosine 90°=0) which is the same as to have set their arccosines to zero. This is shown in Table 3 below. TABLE 3. Details of the five most principal Eigenvalues and the fifteen variables where non-significant values were set to zero
  • 10. From Table 3, it is quite obvious the most important variables8 are: 1 (high price), 2 (low price), 3 (NAV), 5 (average discount) and 10 (MOM) for the first eigenvalue which happens to be the one where the most variability is associated. Most important variable for the 2nd eigenvalue are: 3 (NAV), 8 (SMB) ; 9 (HML) and 10 (MOM); for the third eigenvalue: : 3 (NAV), 9 (HML) and 10 (MOM); for the fourth eigenvalue: 5 (average discount), 8 (SMB) ; 9 (HML) and 10 (MOM); and for the fifth and last eigenvalue: 1 (high price), 2 (low price), 3 (NAV), 5 (average discount) and 9 (HML). The first eigenvalue variable list is illustrative as we posit the average discount = average price – NAV. It pops out that MOM variable has a substantial effect in all five eigenvalues; in particular, MOM is the only other variable that is not part of the linear equation that defines the puzzling discount in the first eigenvalue which is the principal direction of the system variability; therefore, it should be an important variable affecting the discount. As a validation exercise we used fifty-six (56) funds were used in the validation of the models with 11,327 data points. This represent an 18.98% sample size (56 out of 295 funds; a mix of new and previously used funds) and a 21.59% (11,327/52,462 datapoints) close to the general 20% normally used as guideline for model validation purposes. The interpolations9 dates range from 8 Important in the sense that to move the system accordingly to the angle associate to that variable moves the system quirkier to the new position. 9 Interpolation: A method of constructing new data points within the range of a discrete set of known data points. It could be use as a validation of the model as the model will predict values inside its range that should be in close agreement with the actual values observed. Great departures from such values indicate a poor model.
  • 11. August 1987 to December 2010 and the extrapolations10 date from January 2011 to October 2011. The Euclidean distance between the validating sample size of 11,327 discount data values and their estimates as obtained by the transformation PCA matrix was a total of 18.26358994273340 separation units; equivalent to a MSE ≈ 0.02944810784818. III. I think I can solve the puzzle now: The puzzling behavior of the CEF discount seems to be mostly caused by the MOM. The linear equation: average discount= average price –NAV becomes for the first eigenvalue: Abs{[arccosine(0.59227071437245) + arccosine(0.53641494107392)]/2 – arccosine(0.59955661177596)} ≈ arccosine(0.03521378405278) = [33.93461226273970° + 30.73431219129440°]/2 - 34.35206343392610° = abs{ -2.01760120690908°} ≈ 2.01760120690938° 10 Extrapolation: The process of constructing new data points outside the range of a discrete set of known data points. It is similar to the process of interpolation, but the results of extrapolations are subject to greater uncertainty. It could be use as a validation of the model as the model will predict values outside its range that should be in close agreement with the actual values observed. Close agreement from such values indicate a good model.
  • 12. Any difference in the actual discount seen not covered by this equation must come from the MOM angle’s arccosine (-0.02408744663071) = - 1.38010903118630° and the statistical error.