SlideShare a Scribd company logo
1 of 18
Download to read offline
Application of Graphic LASSO in Portfolio
Optimization
Yixuan Chen 158000258
Mengxi Jiang 160005688
Abstract
We used graphical lasso to estimate the precision matrix of stocks in the US stock
market and apply optimization to get portfolio. The graphical lasso is compared with
other estimation methods, sample covariance and shrinkage. And we find graphical
lasso has an excellent performance in the test data.
I. Introduction:
The Least Absolute Shrinkage and Selection Operator is also called LASSO. It is
a regression method that adds a constraint on the OLS in the regression. It makes the
sum of all absolute coefficients less than a tuning parameter t. When t is large, the
constraint may not have any influence on regression. However, by penalizing the
absolute size of the regression coefficients, you may push some of the parameter
estimates to zero when tuning parameter t is small. The smaller t is, the stronger
constraint is.
Graphic lasso is the expansion of lasso. However, it is not used on regression. It is
used to estimate the precision matrix. Shared the same penalized idea with lasso,
graphical lasso needs a constraint on the minimization. When tuning parameter t is
large, the constraint may not have any influence. However, when t is small, some
elements cijin precision matrix C = ฮฃโˆ’1
may be pushed to zero. The precision
matrix will become a sparse matrix. Graphical lasso is usually used to estimate a
sparse inverse covariance matrix. Besides, with the usage of coordinate descent
procedure, graphical lasso is significantly faster.
In past a few years, many scholars did research on the graphical lasso.
Meinshausen and Buhlmann (2006) suggested a simpler method for covariance
selection that can be used for very large Gaussian graphs. Using their method, we can
always estimate the non-zero elements in precision matrix. Besides them, other new
articles proposed different attractive algorithms to execute the Glasso estimation.
Yuan and Lin (2007), Banerjee and others (2007), and Dahl and others (2007) dealt
with maximization with penalized log-likelihood. Specifically, Yuan and Lin (2007)
used l1 penalty on cij, which is like what lasso did in a linear regression. l1 penalty
would push the concentration matrix to be sparse and then estimated this sparse
matrix. On the foundation of previous researches, Friedman, Hastie and Tibshirani
(2008) proposed another method to do the graphical lasso. the blockwise coordinate
descent approach is involved here in order to make calculation faster and at same time
explain the "conceptual gap". Besides, witten, Friedman and Simon (2011) presented
a condition for getting a related element in graphical lasso. The condition often was
neglected before.
In our project, the algorithm we use is the algorithm Yuan and Li proposed in
2007. The reason we use it over others is that it is sampler, faster and more efficient.
Their algorithm combines the positive definite constraint and the application of
likelihood. At present, most of the research on lasso is not on the field of finance.
Hence we try to apply graphical lasso to find the concentration matrix estimator ฮฃฬ‚โˆ’1
and make the setting of portfolio optimization good. After we get the performance of
graphical lasso in portfolio, we try to find trading strategy and make money.
We want to estimate the concentration matrix C = ฮฃโˆ’1
. Then use C to optimize
portfolio. The interesting part here is that we try to identify some part in C as zero.
II. Technical Description
We assume X = (X1, X2, โ€ฆ , Xn) is a n-dimensional random vector. X follows a
multivariate normal distribution ฮ(ฮผ, ฮฃ) . ฮผ is the mean, ฮฃ is the a covariance
matrix. If we have a random sample x = (x1, x2, โ€ฆ , xn), and we want to do estimation
of the concentration matrix C = ฮฃโˆ’1
. The elements in the concentration matrix is cij.
If some cij = 0, it means there is conditional independency between Xi and Xj
when other variables are given.
2.1 The reason why we estimate precision matrix
Graphical lasso is used to estimate the precision matrix instead of covariance
matrix. When we estimated the true covariance matrix of a stock market, the data is
every large. If we just estimate the covariance matrix directly, it is not easy and not
efficient. And we cannot push the elements in the covariance matrix to zero, which
means covariance matrix cannot be sparse in real stock market. This is because there
are no two independent stocks in real world.
However, if we estimate precision matrix instead of covariance matrix, we can
overcome these problems above. The elements in precision matrix can be pushed to
zero. When element ๐‘๐‘–๐‘— = 0, the stocki and stockj are conditional independent.
That make sense. Besides, when data is large, we can decrease the number of
elements we need to estimate by pushing some to zero. It makes our estimation much
more sampler and faster. Therefore, it is a good move to estimate the precision matrix
instead of covariance matrix.
2.2 Computation of Graphical Lasso
the largest likelihood estimator of (ฮผ, ฮฃ) is (Xฬ…, Aฬ…), where
๐ดฬ… =
1
๐‘›
โˆ‘(๐‘‹๐‘– โˆ’ ๐‘‹ฬ…)(๐‘‹๐‘– โˆ’ ๐‘‹ฬ…)โ€ฒ
๐‘›
๐‘–=1
Because our matrix should be sparse, in order to get a better estimation, we add a
constraint to the minimization:
min โˆ’ log|C| +
1
n
โˆ‘(Xi โˆ’ ฮผ)C(Xi โˆ’ ฮผ)โ€ฒ
n
i=1
๐‘ ๐‘ข๐‘๐‘—๐‘’๐‘๐‘ก ๐‘ก๐‘œ โˆ‘|๐‘๐‘–๐‘—|
๐‘–โ‰ ๐‘—
โ‰ค ๐‘ก
where t is a tuning parameter.
If sample observations are centered, ยต=0. The minimization above becomes:
min โˆ’ log|๐ถ| +
1
๐‘›
โˆ‘ ๐‘‹๐‘– ๐ถ๐‘‹๐‘–
โ€ฒ
๐‘›
๐‘–=1
๐‘ ๐‘ข๐‘๐‘—๐‘’๐‘๐‘ก ๐‘ก๐‘œ โˆ‘|๐‘๐‘–๐‘—|
๐‘–โ‰ ๐‘—
โ‰ค ๐‘ก
Where
1
n
โˆ‘ XiCXi
โ€ฒn
i=1 can be expressed as tr(CAฬ…). In order to solve the minimization,
we transform it as Lagrangian form:
โˆ’ log|๐ถ| + ๐‘ก๐‘Ÿ(๐ถ๐ดฬ…) + ๐œ† โˆ‘|๐‘๐‘–๐‘—|
๐‘–โ‰ ๐‘—
Here, ฮป is the tuning parameter.
2.3 Computation of Tuning Parameter
Tuning parameter is an important procedure in calculation of graphical lasso. It
decides the extent of sparsity of the concentration matrix. Therefore, we need to
choose an optimal tuning parameter ฮปโˆ—
. Here are two criterion of checking if ฮป is
good.
๐ต๐ผ๐ถ(๐‘ก) = โˆ’ log|๐ถฬ‚(๐‘ก)| + ๐‘ก๐‘Ÿ(๐ถฬ‚(๐‘ก)๐ดฬ…) +
log ๐‘›
๐‘›
โˆ‘ ๐‘’ฬ‚๐‘– ๐‘—(๐‘ก)
๐‘–โ‰ค๐‘—
where eฬ‚ij = 0 if cฬ‚ij = 0, and eฬ‚ij = 1 otherwise.
๐พ๐ฟ == โˆ’ log|๐ถฬ‚| + ๐‘ก๐‘Ÿ(๐ถฬ‚ฮฃ) โˆ’ (โˆ’ log|ฮฃโˆ’1| + ๐‘)
where p is the number of matrix dimension.
KL is called Kullback-Leibler loss. It measures the true distance between the true
precision matrix ฮฃโˆ’1
and the estimator Cฬ‚. The smaller the distance is, the better
tuning parameter ฮป is. BIC(t) is a approach to the true distance between the true
precision matrix ฮฃโˆ’1
and the estimator Cฬ‚.
We try ฮป from 0.01 to 0.99. Calculate every KL or BIC with every ฮป. And then
we take the one with minimum BIC or KL.
III. Simulation
We tried the eight different models in the paper of Yuan & Min in our simulation.
Among these models, there are both data with sparse and dense precision matrix.
Model 1. Heterogeneous model with โˆ‘= diag(1,2,..., n).
Model 2. An AR(1) model with cii = 1 and ci,iโˆ’1 = ciโˆ’1,i = 0.5.
Model 3. An AR(2) model with cii = 1, ci,iโˆ’1 = ciโˆ’1,i = 0.5 and ci,iโˆ’2 = ciโˆ’2,i = 0.25.
Model 4. An AR(3)model with cii = 1, ci,iโˆ’1 = ciโˆ’1,i = 0.4 and ci,iโˆ’2 = ciโˆ’2,i = ci,iโˆ’3 =ciโˆ’3,i =
0.2.
Model 5. An AR(4) model with cii = 1, ci,iโˆ’1 = ciโˆ’1,i = 0.4, ci,iโˆ’2 = ciโˆ’2,i = ci,iโˆ’3 =ciโˆ’3,i = 0.2
and ci,iโˆ’4 = ciโˆ’4,i = 0.1.
Model 6. Full model with cij= 2 if i= j and cij= 1 otherwise.
Model 7. Star model with every node connected to the first node, with cii = 1,
c1,i= ci,1 = 0.2 and cij= 0 otherwise.
Model 8. Circle model with cii = 1, ci,iโˆ’1 = ciโˆ’1,i = 0.5 and c1n = cn1 = 0.4.
For each model, we generated sample data with the size n = 1000 and dimension
p = 15, then we use Glasso to estimate the precision matrix. We use the inverse of
estimated precision matrix, that is, the estimated covariance matrix in the portfolio
optimization to find the global minimum variance and the optimal portfolio
considering return. The โ€œportfolio optimizationโ€ here in the simulation takes the 15
sets of data in each model as 15 component stocks. And there are 1000 daily returns
data for each of them. For the global minimum variance optimization, we try find a
portfolio weight to minimize the variance of the portfolio,
๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› ๐œ” ๐‘‡
ฮฃ๐œ”,
โˆ‘ ๐œ”๐‘– = 1๐‘– .
For the optimization considering return, we add a target return ฯto the portfolio,
๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› ๐œ” ๐‘‡
๐›ด๐œ”,
๐œ” ๐‘‡
๐œ‡ = ๐œŒ,
โˆ‘ ๐œ”๐‘– = 1๐‘– .
We use the Lagrange multiplier and change the optimization objective,
๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› ๐œ” ๐‘‡
๐›ด๐œ” โˆ’ ๐œ†(๐œ” ๐‘‡
๐œ‡ โˆ’ ๐œŒ) ,
โˆ‘ ๐œ”๐‘– = 1๐‘– .
ฮผ is the return of each portfolio component. There are two unknown parameter in
the optimization objective, the covariance matrix and mean of component stocks, ฮฃ
and ฮผ . To estimate the ฯ‰ thatoptimizes the portfolio, we use the estimated
covariance matrix and sample mean in the optimization.
๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› ๐œ” ๐‘‡
๐›ดฬ‚ ๐œ” โˆ’ ๐œ†(๐œ” ๐‘‡
๐œ‡ฬ‚ โˆ’ ๐œŒ) ,
โˆ‘ ๐œ”๐‘– = 1๐‘– .
Therefore, we get ฯ‰ฬ‚ from this optimization. The solution of this optimization is,
๐œ” =
1
2
๐›ดฬ‚โˆ’1
๐œ‡ฬ‚ ๐œ† ,
๐œ† = โˆ’2๐œŒ๐œ‡ฬ‚โˆ’1
๐›ดฬ‚(๐œ‡ฬ‚ ๐‘‡
)โˆ’1
As we can see, ฮป is determined by different return target we set to the portfolio.
Since we do not actually have a specific return target, we assume ฮป = 1 in the
optimization objective without the loss of generality to the convenience of
computation and comparison. Therefore, the optimization of portfolio considering
return turns out finally to be,
๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› ๐œ” ๐‘‡
๐›ดฬ‚ ๐œ” โˆ’ ๐œ” ๐‘‡
๐œ‡ฬ‚,
โˆ‘ ๐œ”๐‘– = 1๐‘– .
The objective is intuitive in the way that it balances the goal of a small variance
and a large return.
We use the optimized component weights and a different test dataset generated by
the same model to simulate the prediction. We compare the results of both the train
dataset and the test dataset using Glasso estimator with two other estimators of
covariance matrix, the sample covariance and the shrinkage estimator. The Glasso
estimator and shrinkage estimator were obtained using the โ€œglassoโ€ and โ€œcorpcorโ€
package from R. The glasso method in R asks for the tuning parameter lambda when
it is used to do the estimation. As we mentioned above, there are two different
criterions to evaluate the estimator, BIC and KL-loss. In the simulation, we use the
KL-loss to select the appropriate tuning parameter ฮป for each model. Because BIC
depends on the information of sample data (sample covariance) and is actually an
approximate estimate of KL loss. Since we use generated data with known ฮฃ in the
simulation, KL loss is applicable and thus a more accurate criterion. In other words,
KL is the measure of the true distance between estimator and true value, while BIC is
an approximant to the true distance.
As shown in Figure 1-4, these are comparison of optimization result for Model 31.
The sample covariance gets the best performance in the train dataset of both the two
optimization while the Glasso estimator gets the best performance in the test dataset.
This is actually a desirable result since the goodness of the estimated covariance
matrix should be judged by how it performs to get a portfolio weight structure that
can gain more than others when applied to future data. Whatโ€™s more, it is reasonable
to find that the sample covariance performs the best in the train data because the
sample covariance is the true covariance matrix for the train data.
1Figures for other models can be found in the appendixes
Figure 1
Figure 2
Figure 3
Figure 4
To see the significance that the Glasso performs better than the other estimators in
test data, we use t-test to exam the portfolio optimization result of Glasso estimator
compared with the other two estimator separately. The t-testโ€™s null hypothesis is that
the optimized result of Glasso equals to that of sample covariance and shrinkage
estimator. And the alternative hypothesis is that the result of Glasso is smaller than the
others. As shown in Table 1, Glassoโ€™s advantage over the other two estimators is most
significant in Model 3,4 and 8. Another important fact we should notice in the table is
that sample covariance is not significantly worse than Glasso only in model 7 which is
the full model where element variables is highly correlated with each other. This
corresponds with the theory that Glasso performs better in data with a sparse precision
matrix.
Global minimum variance Optimization considering return
Sample
covariance
Shrinkage
estimator
Sample
covariance
Shrinkage
estimator
Model 1 7.529993e-16 1 2.190922e-08 1
Model 2 2.101272e-25 0.3208471 9.987082e-22 0.2344433
Model 3 1.001844e-22 5.520879e-17 6.237072e-09 0.02483197
Model 4 7.915417e-19 7.118669e-08 1.788595e-12 0.01158764
Model 5 8.280318e-21 0.6967956 1.226117e-06 0.9867212
Model 6 6.868502e-16 0.05461308 1.04845e-05 0.9993311
Model 7 0.9766911 3.561808e-06 0.4037951 8.45485e-05
Model 8 1.891239e-24 5.771165e-24 2.622082e-10 2.007245e-05
Table 1. p-value of t-test
IV. Real World Examples
We consider two simple real world datasets. The component stocks return of DJI
and NASDAQ 100. The first datasets contains 30 component stocks and the second
datasets contains 85 component stocks2. We choose the daily data from 2010 to 2012
as the train data and that from 2013 as the test data. So there are 754 samples in the
train data and 252 samples in the test data. We use the holding period return because it
is assumed to approximately follow a normal distribution and is more stable than the
stock price. The HPR contains the dividend which also reduces the large jumping in
the data. To choose the tuning parameterฮปwhen using Glasso, we use BIC as the
criterion here in the real data due to the unavailability of true covariance matrix.
However, just apply Glasso to sample covariance of the return might not work
well. Because the precision matrix of the return is not necessarily sparse. We might
assume that the residual of the return using a factor model should have a sparse
precision matrix. So we apply the Fama-French model to the stock return and estimate
the covariance matrix separately. All component stocks are affected by the
Fama-French model. So the covariance between them can be partly explained by FF
model. The remaining part comes from the internal relationships between themselves.
In order to optimize the portfolio while the Fama-French factor keeps fluctuating, we
need to separate these two correlations.
2 The actual amount of components of NASDAQ 100 is 103. Here we filtered the some of the stocks due to the
unavailability of data in the time periods we choose to study.
X ๐‘– = ๐›ฝ๐‘– ๐น + ๐œ€๐‘–,
cov(X) = ๐‘๐‘œ๐‘ฃ(๐‘‹1, ๐‘‹2, โ€ฆ , ๐‘‹ ๐‘›),
๐‘๐‘œ๐‘ฃ(๐‘‹๐‘–, ๐‘‹๐‘—) = ๐‘๐‘œ๐‘ฃ(๐›ฝ๐‘– ๐น + ๐œ€๐‘–, ๐›ฝ๐‘— ๐น + ๐œ€๐‘—) = ๐‘๐‘œ๐‘ฃ(๐›ฝ๐‘– ๐น, ๐›ฝ๐‘— ๐น) + ๐‘๐‘œ๐‘ฃ(๐œ€๐‘–, ๐œ€๐‘—)
Xi is the HPR of each component stocks, F is the four Fama-French model factors
and ฮตi is the residual. We use the Fama-French model to separate the original data
into two parts, the Fama-French factor model data and the cleaned data. Then we use
the three methods to estimate the covariance matrix of the two parts separately and
add them up to get the estimated covariance matrix of the original data. This is
theoretical supported because the factor and residual is assumed to be independent in
the linear regression model.
Sample
Covariance
Glasso
Estimate
Shrinkage
Estimate
Global minimum variance (train) 3.89 x 10-05
1.37 x 10-05
4.2 x 10-05
Global minimum variance (test) 5.20 x 10-05
3.91 x10-05
5.08 x 10-05
Minimum objective considering return
(train)
-5.07 x 10-03
-1.89 x 10-03
-5 x 10-03
Minimum variance(train) 4.78 x 10-03
6.46 x 10-04
4.65 x 10-03
Minimum objective considering return
(test)
9.08 x 10-03
8.73 x 10-04
8.47 x 10-03
Minimum variance(test) 4.68 x 10-03
3.72 x 10-04
4.45 x 10-03
Return -4.39 x 10-03
1.25 x 10-03
-4.02 x 10-03
Table 2. DJI_independent model
Sample
Covariance
Glasso
Estimate
Shrinkage
Estimate
Global minimum variance (train) 3.89 x 10 -05
2.11 x 10 -05
3.99 x 10 -05
Global minimum variance (test) 5.20 x 10 -05
3.69 x 10 -05
5.05 x 10 -05
Minimum objective considering return
(train)
3.89 x 10 -05
2.11 x 10 -05
3.99 x 10 -05
Minimum variance(train) 3.89 x 10 -05
7.18 x 10 -05
3.92 x 10 -05
Minimum objective considering return
(test)
-4.96 x 10 -04
-9.56 x 10 -04
-5.02 x 10 -04
Minimum variance(test) 5.20 x 10 -05
3.69 x 10 -05
5.06 x 10 -05
Return(test) 5.48 x 10-04
9.94 x 10-04
5.53 x 10-04
Table 3. DIJ_Fama-French mode
Sample
Covariance
Glasso
Estimate
Shrinkage
Estimate
Global minimum variance (train) 8.20 x 10-05
2.19 x 10-05
9.2 x 10-05
Global minimum variance (test) 7.33 x 10-05
6.58 x10-05
7.34 x 10-05
Minimum objective considering return -8.59 x 10-03
-3.69 x 10-03
-8.61 x 10-03
(train)
Minimum variance(train) 8.23 x 10-03
1.74 x 10-03
8.31 x 10-03
Minimum objective considering return
(test)
1.68 x 10-02
2.46 x 10-03
1.67 x 10-02
Minimum variance(test) 6.85 x 10-03
1.24 x 10-03
6.83 x 10-03
Return -9.98 x 10-03
-1.22 x 10-03
-9.90 x 10-03
Table 4. NASDAQ_independent model
Sample
Covariance
Glasso
Estimate
Shrinkage
Estimate
Global minimum variance (train) 8.20 x 10 -05
24.99 x 10 -05
8.84 x 10 -05
Global minimum variance (test) 7.33 x 10 -05
5.98 x 10 -05
7.16 x 10 -05
Minimum objective considering return
(train)
8.20 x 10 -05
4.99 x 10 -05
8.84 x 10 -05
Minimum variance(train) 8.08 x 10 -03
1.72 x 10 -03
4.05 x 10 -03
Minimum objective considering return
(test)
-1.25 x 10 -05
-1.18 x 10 -05
-1.17 x 10 -05
Minimum variance(test) 7.33 x 10 -05
5.98 x 10 -05
7.16 x 10 -05
Return(test) 1.33 x 10-03
1.25 x 10-03
1.25 x 10-03
Table 5. NASDAQ_Fama-French model
As shown in the Table 2-5, using the Glasso estimator, we get the best
portfolio with the smallest variance and the largest return in the test data of both DJI
and NASDAQ component stock returns. Whatโ€™s more, we get even smaller variance
in the Fama-French model compared with the independent model. For DJI, the
optimal portfolio with Fama-French model has a variance of 3.69 x 10 -05, which is
10 times smaller than that with the independent model 3.72 x 10-04. For NASDAQ,
the improvement is more significant, the variance is reduced from 1.24 x 10-03 to
5.98 x 10 -05 while the return is also increased.
V. Conclusion
Through the simulation and real data application above, we find that Graphical
LASSO performs well in the application to the portfolio optimization. And using a
factor model like Fama-French model can improve the result of portfolio optimization
significantly. Therefore, it is a useful and practical method to apply in practice to get
the estimate of stock returnโ€™s covariance matrix and do the optimization.
VI. Problems and Future plan
When applying Glasso to real data using Fama-French model in this paper, we
estimate ๐‘‹ โˆ’ ๐›ฝ๐น directly. This may cause a problem of getting a not positive definite
estimate. Whatโ€™s more, just adding the two estimated covariance matrix up does not
utilize the significance of Glasso, that is estimating a sparse precision matrix.
Therefore, we plan to redo the Fama-French model using Woodbury Formula,
Following the formula, the precision matrix of return should be
ฮฃโˆ’1
= (ฮฃ0 + ๐›ฝฮจ๐›ฝ ๐‘‡
)โˆ’1
= ฮฃ0
โˆ’1
โˆ’ ฮฃ0
โˆ’1
๐›ฝ(ฮจโˆ’1
+ ๐›ฝ ๐‘‡
ฮฃ0
โˆ’1
๐›ฝ)๐›ฝ ๐‘‡
ฮฃ0
โˆ’1
ฮจโˆ’1
is the precision matrix of Fama-French factors, ฮฃ0
โˆ’1
is the precision
matrix of the residual. In this way, we might expect a better estimate and
guarantee the estimated covariance matrix to be positive definite.
Reference
Yuan, M., and Lin, Y. (2007), โ€œModel Selection and Estimation in the Gaussian Graphical Model,โ€
Biometrika, 94, 19โ€“35. [893,897]
Banerjee, O., El Ghaoui, L. E., and Dโ€™Aspremont, A. (2008), โ€œModel Selection Through Sparse
Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data,โ€ Journal of Machine
Learning Research, 9,485โ€“516. [893,895]
Friedman, J., Hastie, T., and Tibshirani, R. (2007), โ€œSparse Inverse Covariance Estimation With the
Graphical Lasso,โ€ Biostatistics, 9, 432โ€“441. [893,895,897]
Meinshausen, N., and Bรผhlmann, P. (2006), โ€œHigh-Dimensional Graphs and Variable Selection With
the Lasso,โ€ The Annals of Statistics, 34, 1436โ€“1462. [897,898]
Tibshirani, R. (1996), โ€œRegression Shrinkage and Selection via the Lasso,โ€ Journal of the Royal
Statistical Society, Ser. B, 58, 267โ€“288. [893]
Daniela M. Witten, Jerome H. Friedman, and Noah Simon(2011), "New Insight and Faster
computations for the Graphical Lasso", Journal of Computational and Graphical Statistics, 892-900
Max A. Woodbury, Inverting modified matrices, Memorandum Rept. 42, Statistical Research Group,
Princeton University, Princeton, NJ, 1950, 4pp MR 38136
Max A. Woodbury, The Stability of Out-Input Matrices. Chicago, Ill., 1949. 5 pp. MR 32564
Appendix
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi Jiang
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi Jiang
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi Jiang
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi Jiang
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi Jiang
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi Jiang

More Related Content

What's hot

Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationMarjan Sterjev
ย 
Introductory maths analysis chapter 09 official
Introductory maths analysis   chapter 09 officialIntroductory maths analysis   chapter 09 official
Introductory maths analysis chapter 09 officialEvert Sandye Taasiringan
ย 
8517ijaia06
8517ijaia068517ijaia06
8517ijaia06ijaia
ย 
The Odd Generalized Exponential Log Logistic Distribution
The Odd Generalized Exponential Log Logistic DistributionThe Odd Generalized Exponential Log Logistic Distribution
The Odd Generalized Exponential Log Logistic Distributioninventionjournals
ย 
Introductory maths analysis chapter 10 official
Introductory maths analysis   chapter 10 officialIntroductory maths analysis   chapter 10 official
Introductory maths analysis chapter 10 officialEvert Sandye Taasiringan
ย 
Introductory maths analysis chapter 16 official
Introductory maths analysis   chapter 16 officialIntroductory maths analysis   chapter 16 official
Introductory maths analysis chapter 16 officialEvert Sandye Taasiringan
ย 
Numerical analysis multivariable unconstrained
Numerical analysis  multivariable unconstrainedNumerical analysis  multivariable unconstrained
Numerical analysis multivariable unconstrainedSHAMJITH KM
ย 
Introductory maths analysis chapter 15 official
Introductory maths analysis   chapter 15 officialIntroductory maths analysis   chapter 15 official
Introductory maths analysis chapter 15 officialEvert Sandye Taasiringan
ย 
Introductory maths analysis chapter 11 official
Introductory maths analysis   chapter 11 officialIntroductory maths analysis   chapter 11 official
Introductory maths analysis chapter 11 officialEvert Sandye Taasiringan
ย 
Introductory maths analysis chapter 17 official
Introductory maths analysis   chapter 17 officialIntroductory maths analysis   chapter 17 official
Introductory maths analysis chapter 17 officialEvert Sandye Taasiringan
ย 
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...ijfls
ย 
Introductory maths analysis chapter 01 official
Introductory maths analysis   chapter 01 officialIntroductory maths analysis   chapter 01 official
Introductory maths analysis chapter 01 officialEvert Sandye Taasiringan
ย 
Introductory maths analysis chapter 12 official
Introductory maths analysis   chapter 12 officialIntroductory maths analysis   chapter 12 official
Introductory maths analysis chapter 12 officialEvert Sandye Taasiringan
ย 
Introductory maths analysis chapter 02 official
Introductory maths analysis   chapter 02 officialIntroductory maths analysis   chapter 02 official
Introductory maths analysis chapter 02 officialEvert Sandye Taasiringan
ย 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelLaboratoire Statistique et gรฉnome
ย 

What's hot (16)

Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
ย 
Introductory maths analysis chapter 09 official
Introductory maths analysis   chapter 09 officialIntroductory maths analysis   chapter 09 official
Introductory maths analysis chapter 09 official
ย 
8517ijaia06
8517ijaia068517ijaia06
8517ijaia06
ย 
The Odd Generalized Exponential Log Logistic Distribution
The Odd Generalized Exponential Log Logistic DistributionThe Odd Generalized Exponential Log Logistic Distribution
The Odd Generalized Exponential Log Logistic Distribution
ย 
Introductory maths analysis chapter 10 official
Introductory maths analysis   chapter 10 officialIntroductory maths analysis   chapter 10 official
Introductory maths analysis chapter 10 official
ย 
Introductory maths analysis chapter 16 official
Introductory maths analysis   chapter 16 officialIntroductory maths analysis   chapter 16 official
Introductory maths analysis chapter 16 official
ย 
Numerical analysis multivariable unconstrained
Numerical analysis  multivariable unconstrainedNumerical analysis  multivariable unconstrained
Numerical analysis multivariable unconstrained
ย 
Introductory maths analysis chapter 15 official
Introductory maths analysis   chapter 15 officialIntroductory maths analysis   chapter 15 official
Introductory maths analysis chapter 15 official
ย 
Introductory maths analysis chapter 11 official
Introductory maths analysis   chapter 11 officialIntroductory maths analysis   chapter 11 official
Introductory maths analysis chapter 11 official
ย 
Introductory maths analysis chapter 17 official
Introductory maths analysis   chapter 17 officialIntroductory maths analysis   chapter 17 official
Introductory maths analysis chapter 17 official
ย 
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
ย 
Introductory maths analysis chapter 01 official
Introductory maths analysis   chapter 01 officialIntroductory maths analysis   chapter 01 official
Introductory maths analysis chapter 01 official
ย 
Introductory maths analysis chapter 12 official
Introductory maths analysis   chapter 12 officialIntroductory maths analysis   chapter 12 official
Introductory maths analysis chapter 12 official
ย 
Introductory maths analysis chapter 02 official
Introductory maths analysis   chapter 02 officialIntroductory maths analysis   chapter 02 official
Introductory maths analysis chapter 02 official
ย 
Thesis
ThesisThesis
Thesis
ย 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
ย 

Viewers also liked

How to Prevent Injuries in Your Playground
How to Prevent Injuries in Your PlaygroundHow to Prevent Injuries in Your Playground
How to Prevent Injuries in Your Playgroundryanharper87
ย 
Globus South & Central America 2016
Globus South & Central America 2016Globus South & Central America 2016
Globus South & Central America 2016globusaustralia
ย 
David Jang - Olivet University
David Jang - Olivet UniversityDavid Jang - Olivet University
David Jang - Olivet UniversityOlivet University
ย 
The new world for funny team name.
The new world for funny team name.The new world for funny team name.
The new world for funny team name.ๆš็ฟ” ้ปƒ
ย 
Globus Religious Travel 2016
Globus Religious Travel 2016Globus Religious Travel 2016
Globus Religious Travel 2016globusaustralia
ย 
Lezione disturbi della sfera cognitiva
Lezione disturbi della sfera cognitivaLezione disturbi della sfera cognitiva
Lezione disturbi della sfera cognitivaCatina Feresin
ย 
Presentaciรณ pares modificada
Presentaciรณ pares modificadaPresentaciรณ pares modificada
Presentaciรณ pares modificadapowersMontse
ย 
Charles J Bowne III resemee - Copy
Charles J Bowne III resemee - CopyCharles J Bowne III resemee - Copy
Charles J Bowne III resemee - Copycharles bowne
ย 
Guest-workers for Australia: win-win, token gesture or moral hazard?
Guest-workers for Australia: win-win, token gesture or moral hazard?Guest-workers for Australia: win-win, token gesture or moral hazard?
Guest-workers for Australia: win-win, token gesture or moral hazard?greedycabin1256
ย 
ะœะตะฝะตะดะถะผะตะฝั‚ 2.0
ะœะตะฝะตะดะถะผะตะฝั‚ 2.0ะœะตะฝะตะดะถะผะตะฝั‚ 2.0
ะœะตะฝะตะดะถะผะตะฝั‚ 2.0Victor Ignatiev
ย 
I padpresentation 2015_revised
I padpresentation 2015_revisedI padpresentation 2015_revised
I padpresentation 2015_revisedlashida
ย 

Viewers also liked (13)

How to Prevent Injuries in Your Playground
How to Prevent Injuries in Your PlaygroundHow to Prevent Injuries in Your Playground
How to Prevent Injuries in Your Playground
ย 
Globus South & Central America 2016
Globus South & Central America 2016Globus South & Central America 2016
Globus South & Central America 2016
ย 
yoyoyo
yoyoyoyoyoyo
yoyoyo
ย 
David Jang - Olivet University
David Jang - Olivet UniversityDavid Jang - Olivet University
David Jang - Olivet University
ย 
The new world for funny team name.
The new world for funny team name.The new world for funny team name.
The new world for funny team name.
ย 
Globus Religious Travel 2016
Globus Religious Travel 2016Globus Religious Travel 2016
Globus Religious Travel 2016
ย 
Lezione disturbi della sfera cognitiva
Lezione disturbi della sfera cognitivaLezione disturbi della sfera cognitiva
Lezione disturbi della sfera cognitiva
ย 
Presentaciรณ pares modificada
Presentaciรณ pares modificadaPresentaciรณ pares modificada
Presentaciรณ pares modificada
ย 
Charles J Bowne III resemee - Copy
Charles J Bowne III resemee - CopyCharles J Bowne III resemee - Copy
Charles J Bowne III resemee - Copy
ย 
UAG Oriental Jewelry
UAG Oriental JewelryUAG Oriental Jewelry
UAG Oriental Jewelry
ย 
Guest-workers for Australia: win-win, token gesture or moral hazard?
Guest-workers for Australia: win-win, token gesture or moral hazard?Guest-workers for Australia: win-win, token gesture or moral hazard?
Guest-workers for Australia: win-win, token gesture or moral hazard?
ย 
ะœะตะฝะตะดะถะผะตะฝั‚ 2.0
ะœะตะฝะตะดะถะผะตะฝั‚ 2.0ะœะตะฝะตะดะถะผะตะฝั‚ 2.0
ะœะตะฝะตะดะถะผะตะฝั‚ 2.0
ย 
I padpresentation 2015_revised
I padpresentation 2015_revisedI padpresentation 2015_revised
I padpresentation 2015_revised
ย 

Similar to Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi Jiang

Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...Mamoon Ismail Khalid
ย 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
ย 
working with python
working with pythonworking with python
working with pythonbhavesh lande
ย 
Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)Adrian Aley
ย 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
ย 
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIllustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIJMER
ย 
Shrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionShrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionBennoG1
ย 
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithminventionjournals
ย 
nber_slides.pdf
nber_slides.pdfnber_slides.pdf
nber_slides.pdfssuser05b736
ย 
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...ijscmcj
ย 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
ย 
PATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptxPATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptxPATTEMJAGADESH
ย 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
ย 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2preetikumara
ย 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsNBER
ย 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegressionDaniel K
ย 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectSurya Chandra
ย 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
ย 

Similar to Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi Jiang (20)

Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
ย 
JISA_Paper
JISA_PaperJISA_Paper
JISA_Paper
ย 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
ย 
working with python
working with pythonworking with python
working with python
ย 
Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)
ย 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
ย 
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIllustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
ย 
Shrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionShrinkage Methods in Linear Regression
Shrinkage Methods in Linear Regression
ย 
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
ย 
nber_slides.pdf
nber_slides.pdfnber_slides.pdf
nber_slides.pdf
ย 
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
ย 
2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data 2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data
ย 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
ย 
PATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptxPATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptx
ย 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
ย 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
ย 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
ย 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
ย 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems Project
ย 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
ย 

Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi Jiang

  • 1. Application of Graphic LASSO in Portfolio Optimization Yixuan Chen 158000258 Mengxi Jiang 160005688
  • 2. Abstract We used graphical lasso to estimate the precision matrix of stocks in the US stock market and apply optimization to get portfolio. The graphical lasso is compared with other estimation methods, sample covariance and shrinkage. And we find graphical lasso has an excellent performance in the test data. I. Introduction: The Least Absolute Shrinkage and Selection Operator is also called LASSO. It is a regression method that adds a constraint on the OLS in the regression. It makes the sum of all absolute coefficients less than a tuning parameter t. When t is large, the constraint may not have any influence on regression. However, by penalizing the absolute size of the regression coefficients, you may push some of the parameter estimates to zero when tuning parameter t is small. The smaller t is, the stronger constraint is. Graphic lasso is the expansion of lasso. However, it is not used on regression. It is used to estimate the precision matrix. Shared the same penalized idea with lasso, graphical lasso needs a constraint on the minimization. When tuning parameter t is large, the constraint may not have any influence. However, when t is small, some elements cijin precision matrix C = ฮฃโˆ’1 may be pushed to zero. The precision matrix will become a sparse matrix. Graphical lasso is usually used to estimate a sparse inverse covariance matrix. Besides, with the usage of coordinate descent procedure, graphical lasso is significantly faster. In past a few years, many scholars did research on the graphical lasso. Meinshausen and Buhlmann (2006) suggested a simpler method for covariance selection that can be used for very large Gaussian graphs. Using their method, we can always estimate the non-zero elements in precision matrix. Besides them, other new articles proposed different attractive algorithms to execute the Glasso estimation. Yuan and Lin (2007), Banerjee and others (2007), and Dahl and others (2007) dealt with maximization with penalized log-likelihood. Specifically, Yuan and Lin (2007) used l1 penalty on cij, which is like what lasso did in a linear regression. l1 penalty would push the concentration matrix to be sparse and then estimated this sparse matrix. On the foundation of previous researches, Friedman, Hastie and Tibshirani (2008) proposed another method to do the graphical lasso. the blockwise coordinate descent approach is involved here in order to make calculation faster and at same time explain the "conceptual gap". Besides, witten, Friedman and Simon (2011) presented a condition for getting a related element in graphical lasso. The condition often was neglected before. In our project, the algorithm we use is the algorithm Yuan and Li proposed in 2007. The reason we use it over others is that it is sampler, faster and more efficient.
  • 3. Their algorithm combines the positive definite constraint and the application of likelihood. At present, most of the research on lasso is not on the field of finance. Hence we try to apply graphical lasso to find the concentration matrix estimator ฮฃฬ‚โˆ’1 and make the setting of portfolio optimization good. After we get the performance of graphical lasso in portfolio, we try to find trading strategy and make money. We want to estimate the concentration matrix C = ฮฃโˆ’1 . Then use C to optimize portfolio. The interesting part here is that we try to identify some part in C as zero. II. Technical Description We assume X = (X1, X2, โ€ฆ , Xn) is a n-dimensional random vector. X follows a multivariate normal distribution ฮ(ฮผ, ฮฃ) . ฮผ is the mean, ฮฃ is the a covariance matrix. If we have a random sample x = (x1, x2, โ€ฆ , xn), and we want to do estimation of the concentration matrix C = ฮฃโˆ’1 . The elements in the concentration matrix is cij. If some cij = 0, it means there is conditional independency between Xi and Xj when other variables are given. 2.1 The reason why we estimate precision matrix Graphical lasso is used to estimate the precision matrix instead of covariance matrix. When we estimated the true covariance matrix of a stock market, the data is every large. If we just estimate the covariance matrix directly, it is not easy and not efficient. And we cannot push the elements in the covariance matrix to zero, which means covariance matrix cannot be sparse in real stock market. This is because there are no two independent stocks in real world. However, if we estimate precision matrix instead of covariance matrix, we can overcome these problems above. The elements in precision matrix can be pushed to zero. When element ๐‘๐‘–๐‘— = 0, the stocki and stockj are conditional independent. That make sense. Besides, when data is large, we can decrease the number of elements we need to estimate by pushing some to zero. It makes our estimation much more sampler and faster. Therefore, it is a good move to estimate the precision matrix instead of covariance matrix. 2.2 Computation of Graphical Lasso the largest likelihood estimator of (ฮผ, ฮฃ) is (Xฬ…, Aฬ…), where ๐ดฬ… = 1 ๐‘› โˆ‘(๐‘‹๐‘– โˆ’ ๐‘‹ฬ…)(๐‘‹๐‘– โˆ’ ๐‘‹ฬ…)โ€ฒ ๐‘› ๐‘–=1 Because our matrix should be sparse, in order to get a better estimation, we add a constraint to the minimization:
  • 4. min โˆ’ log|C| + 1 n โˆ‘(Xi โˆ’ ฮผ)C(Xi โˆ’ ฮผ)โ€ฒ n i=1 ๐‘ ๐‘ข๐‘๐‘—๐‘’๐‘๐‘ก ๐‘ก๐‘œ โˆ‘|๐‘๐‘–๐‘—| ๐‘–โ‰ ๐‘— โ‰ค ๐‘ก where t is a tuning parameter. If sample observations are centered, ยต=0. The minimization above becomes: min โˆ’ log|๐ถ| + 1 ๐‘› โˆ‘ ๐‘‹๐‘– ๐ถ๐‘‹๐‘– โ€ฒ ๐‘› ๐‘–=1 ๐‘ ๐‘ข๐‘๐‘—๐‘’๐‘๐‘ก ๐‘ก๐‘œ โˆ‘|๐‘๐‘–๐‘—| ๐‘–โ‰ ๐‘— โ‰ค ๐‘ก Where 1 n โˆ‘ XiCXi โ€ฒn i=1 can be expressed as tr(CAฬ…). In order to solve the minimization, we transform it as Lagrangian form: โˆ’ log|๐ถ| + ๐‘ก๐‘Ÿ(๐ถ๐ดฬ…) + ๐œ† โˆ‘|๐‘๐‘–๐‘—| ๐‘–โ‰ ๐‘— Here, ฮป is the tuning parameter. 2.3 Computation of Tuning Parameter Tuning parameter is an important procedure in calculation of graphical lasso. It decides the extent of sparsity of the concentration matrix. Therefore, we need to choose an optimal tuning parameter ฮปโˆ— . Here are two criterion of checking if ฮป is good. ๐ต๐ผ๐ถ(๐‘ก) = โˆ’ log|๐ถฬ‚(๐‘ก)| + ๐‘ก๐‘Ÿ(๐ถฬ‚(๐‘ก)๐ดฬ…) + log ๐‘› ๐‘› โˆ‘ ๐‘’ฬ‚๐‘– ๐‘—(๐‘ก) ๐‘–โ‰ค๐‘— where eฬ‚ij = 0 if cฬ‚ij = 0, and eฬ‚ij = 1 otherwise. ๐พ๐ฟ == โˆ’ log|๐ถฬ‚| + ๐‘ก๐‘Ÿ(๐ถฬ‚ฮฃ) โˆ’ (โˆ’ log|ฮฃโˆ’1| + ๐‘) where p is the number of matrix dimension. KL is called Kullback-Leibler loss. It measures the true distance between the true precision matrix ฮฃโˆ’1 and the estimator Cฬ‚. The smaller the distance is, the better tuning parameter ฮป is. BIC(t) is a approach to the true distance between the true precision matrix ฮฃโˆ’1 and the estimator Cฬ‚. We try ฮป from 0.01 to 0.99. Calculate every KL or BIC with every ฮป. And then we take the one with minimum BIC or KL.
  • 5. III. Simulation We tried the eight different models in the paper of Yuan & Min in our simulation. Among these models, there are both data with sparse and dense precision matrix. Model 1. Heterogeneous model with โˆ‘= diag(1,2,..., n). Model 2. An AR(1) model with cii = 1 and ci,iโˆ’1 = ciโˆ’1,i = 0.5. Model 3. An AR(2) model with cii = 1, ci,iโˆ’1 = ciโˆ’1,i = 0.5 and ci,iโˆ’2 = ciโˆ’2,i = 0.25. Model 4. An AR(3)model with cii = 1, ci,iโˆ’1 = ciโˆ’1,i = 0.4 and ci,iโˆ’2 = ciโˆ’2,i = ci,iโˆ’3 =ciโˆ’3,i = 0.2. Model 5. An AR(4) model with cii = 1, ci,iโˆ’1 = ciโˆ’1,i = 0.4, ci,iโˆ’2 = ciโˆ’2,i = ci,iโˆ’3 =ciโˆ’3,i = 0.2 and ci,iโˆ’4 = ciโˆ’4,i = 0.1. Model 6. Full model with cij= 2 if i= j and cij= 1 otherwise. Model 7. Star model with every node connected to the first node, with cii = 1, c1,i= ci,1 = 0.2 and cij= 0 otherwise. Model 8. Circle model with cii = 1, ci,iโˆ’1 = ciโˆ’1,i = 0.5 and c1n = cn1 = 0.4. For each model, we generated sample data with the size n = 1000 and dimension p = 15, then we use Glasso to estimate the precision matrix. We use the inverse of estimated precision matrix, that is, the estimated covariance matrix in the portfolio optimization to find the global minimum variance and the optimal portfolio considering return. The โ€œportfolio optimizationโ€ here in the simulation takes the 15 sets of data in each model as 15 component stocks. And there are 1000 daily returns data for each of them. For the global minimum variance optimization, we try find a portfolio weight to minimize the variance of the portfolio, ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› ๐œ” ๐‘‡ ฮฃ๐œ”, โˆ‘ ๐œ”๐‘– = 1๐‘– . For the optimization considering return, we add a target return ฯto the portfolio, ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› ๐œ” ๐‘‡ ๐›ด๐œ”, ๐œ” ๐‘‡ ๐œ‡ = ๐œŒ, โˆ‘ ๐œ”๐‘– = 1๐‘– . We use the Lagrange multiplier and change the optimization objective, ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› ๐œ” ๐‘‡ ๐›ด๐œ” โˆ’ ๐œ†(๐œ” ๐‘‡ ๐œ‡ โˆ’ ๐œŒ) , โˆ‘ ๐œ”๐‘– = 1๐‘– . ฮผ is the return of each portfolio component. There are two unknown parameter in the optimization objective, the covariance matrix and mean of component stocks, ฮฃ and ฮผ . To estimate the ฯ‰ thatoptimizes the portfolio, we use the estimated covariance matrix and sample mean in the optimization. ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› ๐œ” ๐‘‡ ๐›ดฬ‚ ๐œ” โˆ’ ๐œ†(๐œ” ๐‘‡ ๐œ‡ฬ‚ โˆ’ ๐œŒ) ,
  • 6. โˆ‘ ๐œ”๐‘– = 1๐‘– . Therefore, we get ฯ‰ฬ‚ from this optimization. The solution of this optimization is, ๐œ” = 1 2 ๐›ดฬ‚โˆ’1 ๐œ‡ฬ‚ ๐œ† , ๐œ† = โˆ’2๐œŒ๐œ‡ฬ‚โˆ’1 ๐›ดฬ‚(๐œ‡ฬ‚ ๐‘‡ )โˆ’1 As we can see, ฮป is determined by different return target we set to the portfolio. Since we do not actually have a specific return target, we assume ฮป = 1 in the optimization objective without the loss of generality to the convenience of computation and comparison. Therefore, the optimization of portfolio considering return turns out finally to be, ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› ๐œ” ๐‘‡ ๐›ดฬ‚ ๐œ” โˆ’ ๐œ” ๐‘‡ ๐œ‡ฬ‚, โˆ‘ ๐œ”๐‘– = 1๐‘– . The objective is intuitive in the way that it balances the goal of a small variance and a large return. We use the optimized component weights and a different test dataset generated by the same model to simulate the prediction. We compare the results of both the train dataset and the test dataset using Glasso estimator with two other estimators of covariance matrix, the sample covariance and the shrinkage estimator. The Glasso estimator and shrinkage estimator were obtained using the โ€œglassoโ€ and โ€œcorpcorโ€ package from R. The glasso method in R asks for the tuning parameter lambda when it is used to do the estimation. As we mentioned above, there are two different criterions to evaluate the estimator, BIC and KL-loss. In the simulation, we use the KL-loss to select the appropriate tuning parameter ฮป for each model. Because BIC depends on the information of sample data (sample covariance) and is actually an approximate estimate of KL loss. Since we use generated data with known ฮฃ in the simulation, KL loss is applicable and thus a more accurate criterion. In other words, KL is the measure of the true distance between estimator and true value, while BIC is an approximant to the true distance. As shown in Figure 1-4, these are comparison of optimization result for Model 31. The sample covariance gets the best performance in the train dataset of both the two optimization while the Glasso estimator gets the best performance in the test dataset. This is actually a desirable result since the goodness of the estimated covariance matrix should be judged by how it performs to get a portfolio weight structure that can gain more than others when applied to future data. Whatโ€™s more, it is reasonable to find that the sample covariance performs the best in the train data because the sample covariance is the true covariance matrix for the train data. 1Figures for other models can be found in the appendixes
  • 7. Figure 1 Figure 2 Figure 3 Figure 4 To see the significance that the Glasso performs better than the other estimators in test data, we use t-test to exam the portfolio optimization result of Glasso estimator compared with the other two estimator separately. The t-testโ€™s null hypothesis is that
  • 8. the optimized result of Glasso equals to that of sample covariance and shrinkage estimator. And the alternative hypothesis is that the result of Glasso is smaller than the others. As shown in Table 1, Glassoโ€™s advantage over the other two estimators is most significant in Model 3,4 and 8. Another important fact we should notice in the table is that sample covariance is not significantly worse than Glasso only in model 7 which is the full model where element variables is highly correlated with each other. This corresponds with the theory that Glasso performs better in data with a sparse precision matrix. Global minimum variance Optimization considering return Sample covariance Shrinkage estimator Sample covariance Shrinkage estimator Model 1 7.529993e-16 1 2.190922e-08 1 Model 2 2.101272e-25 0.3208471 9.987082e-22 0.2344433 Model 3 1.001844e-22 5.520879e-17 6.237072e-09 0.02483197 Model 4 7.915417e-19 7.118669e-08 1.788595e-12 0.01158764 Model 5 8.280318e-21 0.6967956 1.226117e-06 0.9867212 Model 6 6.868502e-16 0.05461308 1.04845e-05 0.9993311 Model 7 0.9766911 3.561808e-06 0.4037951 8.45485e-05 Model 8 1.891239e-24 5.771165e-24 2.622082e-10 2.007245e-05 Table 1. p-value of t-test IV. Real World Examples We consider two simple real world datasets. The component stocks return of DJI and NASDAQ 100. The first datasets contains 30 component stocks and the second datasets contains 85 component stocks2. We choose the daily data from 2010 to 2012 as the train data and that from 2013 as the test data. So there are 754 samples in the train data and 252 samples in the test data. We use the holding period return because it is assumed to approximately follow a normal distribution and is more stable than the stock price. The HPR contains the dividend which also reduces the large jumping in the data. To choose the tuning parameterฮปwhen using Glasso, we use BIC as the criterion here in the real data due to the unavailability of true covariance matrix. However, just apply Glasso to sample covariance of the return might not work well. Because the precision matrix of the return is not necessarily sparse. We might assume that the residual of the return using a factor model should have a sparse precision matrix. So we apply the Fama-French model to the stock return and estimate the covariance matrix separately. All component stocks are affected by the Fama-French model. So the covariance between them can be partly explained by FF model. The remaining part comes from the internal relationships between themselves. In order to optimize the portfolio while the Fama-French factor keeps fluctuating, we need to separate these two correlations. 2 The actual amount of components of NASDAQ 100 is 103. Here we filtered the some of the stocks due to the unavailability of data in the time periods we choose to study.
  • 9. X ๐‘– = ๐›ฝ๐‘– ๐น + ๐œ€๐‘–, cov(X) = ๐‘๐‘œ๐‘ฃ(๐‘‹1, ๐‘‹2, โ€ฆ , ๐‘‹ ๐‘›), ๐‘๐‘œ๐‘ฃ(๐‘‹๐‘–, ๐‘‹๐‘—) = ๐‘๐‘œ๐‘ฃ(๐›ฝ๐‘– ๐น + ๐œ€๐‘–, ๐›ฝ๐‘— ๐น + ๐œ€๐‘—) = ๐‘๐‘œ๐‘ฃ(๐›ฝ๐‘– ๐น, ๐›ฝ๐‘— ๐น) + ๐‘๐‘œ๐‘ฃ(๐œ€๐‘–, ๐œ€๐‘—) Xi is the HPR of each component stocks, F is the four Fama-French model factors and ฮตi is the residual. We use the Fama-French model to separate the original data into two parts, the Fama-French factor model data and the cleaned data. Then we use the three methods to estimate the covariance matrix of the two parts separately and add them up to get the estimated covariance matrix of the original data. This is theoretical supported because the factor and residual is assumed to be independent in the linear regression model. Sample Covariance Glasso Estimate Shrinkage Estimate Global minimum variance (train) 3.89 x 10-05 1.37 x 10-05 4.2 x 10-05 Global minimum variance (test) 5.20 x 10-05 3.91 x10-05 5.08 x 10-05 Minimum objective considering return (train) -5.07 x 10-03 -1.89 x 10-03 -5 x 10-03 Minimum variance(train) 4.78 x 10-03 6.46 x 10-04 4.65 x 10-03 Minimum objective considering return (test) 9.08 x 10-03 8.73 x 10-04 8.47 x 10-03 Minimum variance(test) 4.68 x 10-03 3.72 x 10-04 4.45 x 10-03 Return -4.39 x 10-03 1.25 x 10-03 -4.02 x 10-03 Table 2. DJI_independent model Sample Covariance Glasso Estimate Shrinkage Estimate Global minimum variance (train) 3.89 x 10 -05 2.11 x 10 -05 3.99 x 10 -05 Global minimum variance (test) 5.20 x 10 -05 3.69 x 10 -05 5.05 x 10 -05 Minimum objective considering return (train) 3.89 x 10 -05 2.11 x 10 -05 3.99 x 10 -05 Minimum variance(train) 3.89 x 10 -05 7.18 x 10 -05 3.92 x 10 -05 Minimum objective considering return (test) -4.96 x 10 -04 -9.56 x 10 -04 -5.02 x 10 -04 Minimum variance(test) 5.20 x 10 -05 3.69 x 10 -05 5.06 x 10 -05 Return(test) 5.48 x 10-04 9.94 x 10-04 5.53 x 10-04 Table 3. DIJ_Fama-French mode Sample Covariance Glasso Estimate Shrinkage Estimate Global minimum variance (train) 8.20 x 10-05 2.19 x 10-05 9.2 x 10-05 Global minimum variance (test) 7.33 x 10-05 6.58 x10-05 7.34 x 10-05 Minimum objective considering return -8.59 x 10-03 -3.69 x 10-03 -8.61 x 10-03
  • 10. (train) Minimum variance(train) 8.23 x 10-03 1.74 x 10-03 8.31 x 10-03 Minimum objective considering return (test) 1.68 x 10-02 2.46 x 10-03 1.67 x 10-02 Minimum variance(test) 6.85 x 10-03 1.24 x 10-03 6.83 x 10-03 Return -9.98 x 10-03 -1.22 x 10-03 -9.90 x 10-03 Table 4. NASDAQ_independent model Sample Covariance Glasso Estimate Shrinkage Estimate Global minimum variance (train) 8.20 x 10 -05 24.99 x 10 -05 8.84 x 10 -05 Global minimum variance (test) 7.33 x 10 -05 5.98 x 10 -05 7.16 x 10 -05 Minimum objective considering return (train) 8.20 x 10 -05 4.99 x 10 -05 8.84 x 10 -05 Minimum variance(train) 8.08 x 10 -03 1.72 x 10 -03 4.05 x 10 -03 Minimum objective considering return (test) -1.25 x 10 -05 -1.18 x 10 -05 -1.17 x 10 -05 Minimum variance(test) 7.33 x 10 -05 5.98 x 10 -05 7.16 x 10 -05 Return(test) 1.33 x 10-03 1.25 x 10-03 1.25 x 10-03 Table 5. NASDAQ_Fama-French model As shown in the Table 2-5, using the Glasso estimator, we get the best portfolio with the smallest variance and the largest return in the test data of both DJI and NASDAQ component stock returns. Whatโ€™s more, we get even smaller variance in the Fama-French model compared with the independent model. For DJI, the optimal portfolio with Fama-French model has a variance of 3.69 x 10 -05, which is 10 times smaller than that with the independent model 3.72 x 10-04. For NASDAQ, the improvement is more significant, the variance is reduced from 1.24 x 10-03 to 5.98 x 10 -05 while the return is also increased. V. Conclusion Through the simulation and real data application above, we find that Graphical LASSO performs well in the application to the portfolio optimization. And using a factor model like Fama-French model can improve the result of portfolio optimization significantly. Therefore, it is a useful and practical method to apply in practice to get the estimate of stock returnโ€™s covariance matrix and do the optimization. VI. Problems and Future plan When applying Glasso to real data using Fama-French model in this paper, we estimate ๐‘‹ โˆ’ ๐›ฝ๐น directly. This may cause a problem of getting a not positive definite estimate. Whatโ€™s more, just adding the two estimated covariance matrix up does not utilize the significance of Glasso, that is estimating a sparse precision matrix.
  • 11. Therefore, we plan to redo the Fama-French model using Woodbury Formula, Following the formula, the precision matrix of return should be ฮฃโˆ’1 = (ฮฃ0 + ๐›ฝฮจ๐›ฝ ๐‘‡ )โˆ’1 = ฮฃ0 โˆ’1 โˆ’ ฮฃ0 โˆ’1 ๐›ฝ(ฮจโˆ’1 + ๐›ฝ ๐‘‡ ฮฃ0 โˆ’1 ๐›ฝ)๐›ฝ ๐‘‡ ฮฃ0 โˆ’1 ฮจโˆ’1 is the precision matrix of Fama-French factors, ฮฃ0 โˆ’1 is the precision matrix of the residual. In this way, we might expect a better estimate and guarantee the estimated covariance matrix to be positive definite. Reference Yuan, M., and Lin, Y. (2007), โ€œModel Selection and Estimation in the Gaussian Graphical Model,โ€ Biometrika, 94, 19โ€“35. [893,897] Banerjee, O., El Ghaoui, L. E., and Dโ€™Aspremont, A. (2008), โ€œModel Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data,โ€ Journal of Machine Learning Research, 9,485โ€“516. [893,895] Friedman, J., Hastie, T., and Tibshirani, R. (2007), โ€œSparse Inverse Covariance Estimation With the Graphical Lasso,โ€ Biostatistics, 9, 432โ€“441. [893,895,897] Meinshausen, N., and Bรผhlmann, P. (2006), โ€œHigh-Dimensional Graphs and Variable Selection With the Lasso,โ€ The Annals of Statistics, 34, 1436โ€“1462. [897,898] Tibshirani, R. (1996), โ€œRegression Shrinkage and Selection via the Lasso,โ€ Journal of the Royal Statistical Society, Ser. B, 58, 267โ€“288. [893] Daniela M. Witten, Jerome H. Friedman, and Noah Simon(2011), "New Insight and Faster computations for the Graphical Lasso", Journal of Computational and Graphical Statistics, 892-900 Max A. Woodbury, Inverting modified matrices, Memorandum Rept. 42, Statistical Research Group, Princeton University, Princeton, NJ, 1950, 4pp MR 38136 Max A. Woodbury, The Stability of Out-Input Matrices. Chicago, Ill., 1949. 5 pp. MR 32564