We propose a strategy for the problem of Portfolio Diversification in Financial Economics. This task can be seen as a clustering process performed with the exploitation of Non-Negative Matrix Factorization techniques.
4. Introduction
In Market Trading, the trader needs to predict future stock prices
to determine a self-financing trading strategy that maximizes
the portfolio return.
Problem: creating and managing successful portfolios of
financial assets is a difficult practice.
Solution: Portfolio Diversification to attempt to minimize the
risk for a given amount of return.
This problem can be seen as a clustering process:
group data (e.g., stocks) into subgroups of similar behavior
(e.g., the same market trend).
5. Motivation
With K-Means it is not possible to establish the effectiveness
and coherence of the clusters when dealing with stock data1:
it tends to find spherical clusters: centroid-based clustering
does not handle the noise;
need to introduce weighted Euclidean distance instead of
standard Euclidean distance to re-evaluate centroid-based
clusters.
Proposal: Non-negative Matrix Factorization (NMF) to cluster
underlying stock trends.
1F. Cai, N. Le-Khac, and M. Kechadi. Clustering Approaches for Financial
Data Analysis: A Survey, Proceedings of DMIN 2012. pp. 1-7. 2012.
7. Problem formulation
Market made up of m stocks S1, S2, . . . , Sm stored as a row
vector whose entries are n daily closing prices.
Suppose there are k latent bases, W1, W2, . . . , Wk; each Wj is a
n-dimensional row vector, thought as a Brownian motion.
Express each stock as linear combination of these bases with a
non negative real number Hij indicating the association degree
of the i-th stock with the basis Wj.
Using a matrix notation,
S+ H+ W±,
where S ∈ Rm×n
+ , H ∈ Rm×k
+ and W ∈ Rk×n
± .
8. NMFs
Standard definition:
S H W,
where S ∈ Rm×n
+ , H ∈ Rm×k
+ and W ∈ Rk×n
+ , and k ≤ m.
Role of k: force representation for data to capture
underlying regularities in the data
Matrices H and W are found by solving the optimization
problem
min
H≥0,
W≥0
S − H W 2
F,
where · F is the Frobenius norm.
9. Convex NMF
Convex NMF (C-NMF) allows the data matrix S to have mixed
signs. It minimizes
min
Hi 1 1,
H≥0
S − S H W 2
F,
Advantage of the convex constraint imposed on H:
interpreting the rows of H as weighted sums of certain
data points so that rows can be interpreted as centroids.
10. Convex-Hull NMF
Convex-Hull NMF (CH-NMF) is a fast technique and scales
extremely well.
The task now is to solve the following optimization problem
min S − S H W 2
F,
subject to the convexity constraints
Hi 1 1, H ≥ 0,
Wj 1 1, W ≥ 0.
This optimization problem is equivalent to projecting the
solution in the convex hull of S.
Advantage: new opportunities for data interpretation.
12. Experiments
Data gathered:
NASDAQ Stock Market
28 stocks belonging to 8 different sectors
10 years of closing prices (2518 working days)
Clustering methods applied:
NMF
C-NMF
CH-NMF
K-Means
13. Experiments
Tried different numbers of clusters:
all methods were run for each k ∈ {3, 4, . . . , 8}.
For each k, clustering evaluated in terms of:
1. plots of reconstruction of matrix S by matrix multiplication H W
2. plots of trend matrix W
3. analysis of colormaps for matrix H
4. Analysis of convergence iterations, Frobenius error and number
of attracted clusters for each method
5. Qualitative grouping of recurrent subgroups of stocks for each
method
29. Conclusions
Portfolio diversification is the financial process of allocating
capital in a way that reduces the exposure to risk by investing in
a variety of assets (i.e., stocks).
This equals to clustering stocks having similar trend.
K-Means is not effective on this task. Hence, we applied NMF.
Adding convexity constraints in the transformation improves
the exploitation of similar stock trends.
In particular
CH-NMF is a very fast and scalable convex NMF technique
that compares favorably for large data sets, both in terms of
speed and reconstruction quality
30. Conclusions
Extensive experimental evaluation on real world NASDAQ
stock data show that, compared to K-Means, NMF techniques:
better point out the clustering properties,
yield very low error in Frobenius norm,
high efficiency in terms of convergence time.
Future works:
use more datasets from different markets
investigate further decomposition techniques to improve
the effectiveness of clustering stock data
impose other penalty constraints in order to achieve a
better portfolio diversification strategy