SlideShare a Scribd company logo
1 of 30
Download to read offline
CLUSTERING UNDERLYING STOCK
TRENDS VIA NON-NEGATIVE MATRIX
FACTORIZATION
MIDAS WORKSHOP @ ECML-PKDD 2016
Andrea Pazienza, Sabrina Francesca Pellegrino,
Stefano Ferilli, Floriana Esposito
19/09/2016 - Riva del Garda, Italy
Overview
1. Introduction
2. Clustering with NMFs
3. Experiments on Financial Data
4. Conclusions
INTRODUCTION
Introduction
In Market Trading, the trader needs to predict future stock prices
to determine a self-financing trading strategy that maximizes
the portfolio return.
Problem: creating and managing successful portfolios of
financial assets is a difficult practice.
Solution: Portfolio Diversification to attempt to minimize the
risk for a given amount of return.
This problem can be seen as a clustering process:
group data (e.g., stocks) into subgroups of similar behavior
(e.g., the same market trend).
Motivation
With K-Means it is not possible to establish the effectiveness
and coherence of the clusters when dealing with stock data1:
it tends to find spherical clusters: centroid-based clustering
does not handle the noise;
need to introduce weighted Euclidean distance instead of
standard Euclidean distance to re-evaluate centroid-based
clusters.
Proposal: Non-negative Matrix Factorization (NMF) to cluster
underlying stock trends.
1F. Cai, N. Le-Khac, and M. Kechadi. Clustering Approaches for Financial
Data Analysis: A Survey, Proceedings of DMIN 2012. pp. 1-7. 2012.
CLUSTERING WITH NMFS
Problem formulation
Market made up of m stocks S1, S2, . . . , Sm stored as a row
vector whose entries are n daily closing prices.
Suppose there are k latent bases, W1, W2, . . . , Wk; each Wj is a
n-dimensional row vector, thought as a Brownian motion.
Express each stock as linear combination of these bases with a
non negative real number Hij indicating the association degree
of the i-th stock with the basis Wj.
Using a matrix notation,
S+ H+ W±,
where S ∈ Rm×n
+ , H ∈ Rm×k
+ and W ∈ Rk×n
± .
NMFs
Standard definition:
S H W,
where S ∈ Rm×n
+ , H ∈ Rm×k
+ and W ∈ Rk×n
+ , and k ≤ m.
Role of k: force representation for data to capture
underlying regularities in the data
Matrices H and W are found by solving the optimization
problem
min
H≥0,
W≥0
S − H W 2
F,
where · F is the Frobenius norm.
Convex NMF
Convex NMF (C-NMF) allows the data matrix S to have mixed
signs. It minimizes
min
Hi 1 1,
H≥0
S − S H W 2
F,
Advantage of the convex constraint imposed on H:
interpreting the rows of H as weighted sums of certain
data points so that rows can be interpreted as centroids.
Convex-Hull NMF
Convex-Hull NMF (CH-NMF) is a fast technique and scales
extremely well.
The task now is to solve the following optimization problem
min S − S H W 2
F,
subject to the convexity constraints
Hi 1 1, H ≥ 0,
Wj 1 1, W ≥ 0.
This optimization problem is equivalent to projecting the
solution in the convex hull of S.
Advantage: new opportunities for data interpretation.
EXPERIMENTS ON FINANCIAL DATA
Experiments
Data gathered:
NASDAQ Stock Market
28 stocks belonging to 8 different sectors
10 years of closing prices (2518 working days)
Clustering methods applied:
NMF
C-NMF
CH-NMF
K-Means
Experiments
Tried different numbers of clusters:
all methods were run for each k ∈ {3, 4, . . . , 8}.
For each k, clustering evaluated in terms of:
1. plots of reconstruction of matrix S by matrix multiplication H W
2. plots of trend matrix W
3. analysis of colormaps for matrix H
4. Analysis of convergence iterations, Frobenius error and number
of attracted clusters for each method
5. Qualitative grouping of recurrent subgroups of stocks for each
method
Stock prices data trends for k 4
Figure: Trends for NMF
Figure: Trends for C-NMF
Figure: Trends for CH-NMF
Table: Numerical results for NMF
NMF
k # iter error # clusters
3 1528 33.7703 3
4 2355 27.1966 4
5 3358 21.0838 5
6 2523 16.9987 4
7 5000 14.6706 6
8 5000 13.5482 7
Table: Numerical results for C-NMF
C-NMF
k # iter error # clusters
3 500 45.7185 2
4 500 42.4148 2
5 500 40.2502 2
6 500 33.5761 2
7 500 38.6675 2
8 500 32.1786 2
Table: Numerical results for CH-NMF
CH-NMF
k # iter error # clusters
3 1 47.5844 2
4 1 43.9824 4
5 1 38.6585 4
6 1 56.4050 4
7 1 32.5755 5
8 1 46.2535 5
Figure: Colormap for NMF
Figure: Colormap for C-NMF
Figure: Colormap for CH-NMF
Table: NMF Clusters
k 3 k 4 k 5 k 6 k 7 k 8
3 4 5 6 3 5 8 10 12 14 15 3 4 5 6 3 4 5 6 11 15 17 18
7 8 9 10 11 15 16 16 17 18 7 8 10 11 7 8 14 16 20 21 22 23
11 14 15 16 19 20 21 19 20 23 13 14 15 16 19 20 24 24 25 26 28
17 19 20 21 22 23 24 24 25 26 17 18 19 20
22 23 24 25 25 26 28 21 22 24 25
26 28 28
12 13 18 6 7 12 13 4 5 8 10 23 26 12 15 23 2 4 5 8
14 17 18 21 22 28 26 9 10
1 2 27 27 7 11 1 2 9 27 21 28 27
1 2 4 9 3 6 13 12 10 13 17 16
18 22 25
1 2 9 27 27 3 13 14
1 2 9 11 1 6 7 12
19
Table: C-NMF Clusters
k 3 k 4 k 5 k 6 k 7 k 8
1 2 12 1 2 9 12 1 2 9 12 1 2 27 1 2 6 9 1 2 9 12
27 27 27 12 23 27 27
3 4 5 6 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 3 4 5 7 8 3 4 5 6 7
7 8 9 10 8 10 11 8 10 11 8 9 10 11 10 11 13 8 10 11
11 13 14 13 14 15 13 14 15 12 13 14 15 14 15 16 13 14 15
15 16 17 16 17 18 16 17 18 16 17 18 19 17 18 19 16 17 18
18 19 20 19 20 21 19 20 21 20 21 22 23 20 21 22 19 20 21
21 22 23 22 23 24 22 23 24 24 25 26 28 24 25 26 22 23 24
24 25 26 25 26 28 25 26 28 28 25 26 28
28
Table: CH-NMF Clusters
k 3 k 4 k 5 k 6 k 7 k 8
1 2 9 12 27 1 2 9 27 1 2 9 27 1 2 9 12 27 1 2 9 2 27
3 4 5 6 7 3 4 5 7 8 3 5 7 8 3 4 5 7 8 3 4 6 7 4 8 10
8 9 10 11 10 11 14 10 11 18 11 14 15 16 13 14 17 11 13 15
13 14 15 15 16 17 21 22 24 17 20 22 23 20 23 24 16 17 18
16 17 18 18 19 20 25 24 25 26 28 25 26 20 21 22
19 20 21 21 22 23 23 24 25
22 23 24 24 25 26 26 28
25 26 28 28
6 13 4 6 12 14 6 13 5 8 10 11 1 5 9 12
15 16 17 15 16 18 14 19
19 20 23 19 21 22
26 28 28
12 13 10 18 19 21 12 6 7
27 3
Table: K-Means Clusters
k 3 k 4 k 5 k 6 k 7 k 8
3 4 5 6 7 8 3 4 5 6 7 3 4 7 8 3 4 7 8 3 4 7 8 5 6 15 18
10 11 13 15 8 10 11 13 10 16 17 10 11 10 11 20 23 24
16 18 20 23 15 16 18 20 19 21 25 16 25 16 25 26 28
24 25 26 28 23 24 26 28 26
1 9 12 14 1 9 12 14 5 6 11 13 5 6 13 5 6 15 9 14 17
17 19 21 22 17 19 21 22 15 18 20 23 15 18 20 18 20 23 19 21 22
25 24 26 28 23 24 28 24 26 28
2 27 2 1 9 12 9 14 17 9 14 17 4 7 8 10
14 22 19 21 22 19 21 22 11 16 25
27 27 27 1 12 27
2 2 27 2
1 12 2 1 12
13 3
13
CONCLUSIONS
Conclusions
Portfolio diversification is the financial process of allocating
capital in a way that reduces the exposure to risk by investing in
a variety of assets (i.e., stocks).
This equals to clustering stocks having similar trend.
K-Means is not effective on this task. Hence, we applied NMF.
Adding convexity constraints in the transformation improves
the exploitation of similar stock trends.
In particular
CH-NMF is a very fast and scalable convex NMF technique
that compares favorably for large data sets, both in terms of
speed and reconstruction quality
Conclusions
Extensive experimental evaluation on real world NASDAQ
stock data show that, compared to K-Means, NMF techniques:
better point out the clustering properties,
yield very low error in Frobenius norm,
high efficiency in terms of convergence time.
Future works:
use more datasets from different markets
investigate further decomposition techniques to improve
the effectiveness of clustering stock data
impose other penalty constraints in order to achieve a
better portfolio diversification strategy

More Related Content

Similar to Clustering Underlying Stock Trends via NMF

Similar to Clustering Underlying Stock Trends via NMF (20)

Overview of sparse and low-rank matrix / tensor techniques
Overview of sparse and low-rank matrix / tensor techniques Overview of sparse and low-rank matrix / tensor techniques
Overview of sparse and low-rank matrix / tensor techniques
 
Time series
Time seriesTime series
Time series
 
Application of parallel hierarchical matrices for parameter inference and pre...
Application of parallel hierarchical matrices for parameter inference and pre...Application of parallel hierarchical matrices for parameter inference and pre...
Application of parallel hierarchical matrices for parameter inference and pre...
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Identification of unknown parameters and prediction with hierarchical matrice...
Identification of unknown parameters and prediction with hierarchical matrice...Identification of unknown parameters and prediction with hierarchical matrice...
Identification of unknown parameters and prediction with hierarchical matrice...
 
Cryptographic Technique Used Lower and Upper Triangular Decomposition Method
Cryptographic Technique Used Lower and Upper Triangular Decomposition MethodCryptographic Technique Used Lower and Upper Triangular Decomposition Method
Cryptographic Technique Used Lower and Upper Triangular Decomposition Method
 
New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...
New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...
New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...
 
Chapter-3.pdf
Chapter-3.pdfChapter-3.pdf
Chapter-3.pdf
 
Chapter-3.pdf
Chapter-3.pdfChapter-3.pdf
Chapter-3.pdf
 
Scalable hierarchical algorithms for stochastic PDEs and UQ
Scalable hierarchical algorithms for stochastic PDEs and UQScalable hierarchical algorithms for stochastic PDEs and UQ
Scalable hierarchical algorithms for stochastic PDEs and UQ
 
161783709 chapter-04-answers
161783709 chapter-04-answers161783709 chapter-04-answers
161783709 chapter-04-answers
 
161783709 chapter-04-answers
161783709 chapter-04-answers161783709 chapter-04-answers
161783709 chapter-04-answers
 
Spc la
Spc laSpc la
Spc la
 
Identification of unknown parameters and prediction of missing values. Compar...
Identification of unknown parameters and prediction of missing values. Compar...Identification of unknown parameters and prediction of missing values. Compar...
Identification of unknown parameters and prediction of missing values. Compar...
 
Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...
 
IRJET - Candle Stick Chart for Stock Market Prediction
IRJET - Candle Stick Chart for Stock Market PredictionIRJET - Candle Stick Chart for Stock Market Prediction
IRJET - Candle Stick Chart for Stock Market Prediction
 
01 Chapter MATLAB introduction
01 Chapter MATLAB introduction01 Chapter MATLAB introduction
01 Chapter MATLAB introduction
 
X Bar R Charts
X Bar R ChartsX Bar R Charts
X Bar R Charts
 
X Bar R Charts
X Bar R ChartsX Bar R Charts
X Bar R Charts
 
X Bar R Charts
X Bar R ChartsX Bar R Charts
X Bar R Charts
 

More from Andrea Pazienza

More from Andrea Pazienza (7)

Mining Arguments from Online Debating Systems
Mining Arguments from Online Debating SystemsMining Arguments from Online Debating Systems
Mining Arguments from Online Debating Systems
 
Synthesis of Argumentation Graphs by Matrix Factorization
Synthesis of Argumentation Graphs by Matrix FactorizationSynthesis of Argumentation Graphs by Matrix Factorization
Synthesis of Argumentation Graphs by Matrix Factorization
 
Constructing and Evaluating Bipolar Weighted Argumentation Frameworks for Onl...
Constructing and Evaluating Bipolar Weighted Argumentation Frameworks for Onl...Constructing and Evaluating Bipolar Weighted Argumentation Frameworks for Onl...
Constructing and Evaluating Bipolar Weighted Argumentation Frameworks for Onl...
 
Albero: un Grafo particolare
Albero: un Grafo particolareAlbero: un Grafo particolare
Albero: un Grafo particolare
 
An Authority Degree-based Evaluation Strategy for Abstract Argumentation Fram...
An Authority Degree-based Evaluation Strategy for Abstract Argumentation Fram...An Authority Degree-based Evaluation Strategy for Abstract Argumentation Fram...
An Authority Degree-based Evaluation Strategy for Abstract Argumentation Fram...
 
An Abstract Argumentation-based Strategy for Reading Order Detection
An Abstract Argumentation-based Strategy for Reading Order DetectionAn Abstract Argumentation-based Strategy for Reading Order Detection
An Abstract Argumentation-based Strategy for Reading Order Detection
 
Empowered Negative Specialization in Inductive Logic Programming
Empowered Negative Specialization in Inductive Logic ProgrammingEmpowered Negative Specialization in Inductive Logic Programming
Empowered Negative Specialization in Inductive Logic Programming
 

Recently uploaded

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 

Recently uploaded (20)

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 

Clustering Underlying Stock Trends via NMF

  • 1. CLUSTERING UNDERLYING STOCK TRENDS VIA NON-NEGATIVE MATRIX FACTORIZATION MIDAS WORKSHOP @ ECML-PKDD 2016 Andrea Pazienza, Sabrina Francesca Pellegrino, Stefano Ferilli, Floriana Esposito 19/09/2016 - Riva del Garda, Italy
  • 2. Overview 1. Introduction 2. Clustering with NMFs 3. Experiments on Financial Data 4. Conclusions
  • 4. Introduction In Market Trading, the trader needs to predict future stock prices to determine a self-financing trading strategy that maximizes the portfolio return. Problem: creating and managing successful portfolios of financial assets is a difficult practice. Solution: Portfolio Diversification to attempt to minimize the risk for a given amount of return. This problem can be seen as a clustering process: group data (e.g., stocks) into subgroups of similar behavior (e.g., the same market trend).
  • 5. Motivation With K-Means it is not possible to establish the effectiveness and coherence of the clusters when dealing with stock data1: it tends to find spherical clusters: centroid-based clustering does not handle the noise; need to introduce weighted Euclidean distance instead of standard Euclidean distance to re-evaluate centroid-based clusters. Proposal: Non-negative Matrix Factorization (NMF) to cluster underlying stock trends. 1F. Cai, N. Le-Khac, and M. Kechadi. Clustering Approaches for Financial Data Analysis: A Survey, Proceedings of DMIN 2012. pp. 1-7. 2012.
  • 7. Problem formulation Market made up of m stocks S1, S2, . . . , Sm stored as a row vector whose entries are n daily closing prices. Suppose there are k latent bases, W1, W2, . . . , Wk; each Wj is a n-dimensional row vector, thought as a Brownian motion. Express each stock as linear combination of these bases with a non negative real number Hij indicating the association degree of the i-th stock with the basis Wj. Using a matrix notation, S+ H+ W±, where S ∈ Rm×n + , H ∈ Rm×k + and W ∈ Rk×n ± .
  • 8. NMFs Standard definition: S H W, where S ∈ Rm×n + , H ∈ Rm×k + and W ∈ Rk×n + , and k ≤ m. Role of k: force representation for data to capture underlying regularities in the data Matrices H and W are found by solving the optimization problem min H≥0, W≥0 S − H W 2 F, where · F is the Frobenius norm.
  • 9. Convex NMF Convex NMF (C-NMF) allows the data matrix S to have mixed signs. It minimizes min Hi 1 1, H≥0 S − S H W 2 F, Advantage of the convex constraint imposed on H: interpreting the rows of H as weighted sums of certain data points so that rows can be interpreted as centroids.
  • 10. Convex-Hull NMF Convex-Hull NMF (CH-NMF) is a fast technique and scales extremely well. The task now is to solve the following optimization problem min S − S H W 2 F, subject to the convexity constraints Hi 1 1, H ≥ 0, Wj 1 1, W ≥ 0. This optimization problem is equivalent to projecting the solution in the convex hull of S. Advantage: new opportunities for data interpretation.
  • 12. Experiments Data gathered: NASDAQ Stock Market 28 stocks belonging to 8 different sectors 10 years of closing prices (2518 working days) Clustering methods applied: NMF C-NMF CH-NMF K-Means
  • 13. Experiments Tried different numbers of clusters: all methods were run for each k ∈ {3, 4, . . . , 8}. For each k, clustering evaluated in terms of: 1. plots of reconstruction of matrix S by matrix multiplication H W 2. plots of trend matrix W 3. analysis of colormaps for matrix H 4. Analysis of convergence iterations, Frobenius error and number of attracted clusters for each method 5. Qualitative grouping of recurrent subgroups of stocks for each method
  • 14. Stock prices data trends for k 4
  • 18. Table: Numerical results for NMF NMF k # iter error # clusters 3 1528 33.7703 3 4 2355 27.1966 4 5 3358 21.0838 5 6 2523 16.9987 4 7 5000 14.6706 6 8 5000 13.5482 7
  • 19. Table: Numerical results for C-NMF C-NMF k # iter error # clusters 3 500 45.7185 2 4 500 42.4148 2 5 500 40.2502 2 6 500 33.5761 2 7 500 38.6675 2 8 500 32.1786 2
  • 20. Table: Numerical results for CH-NMF CH-NMF k # iter error # clusters 3 1 47.5844 2 4 1 43.9824 4 5 1 38.6585 4 6 1 56.4050 4 7 1 32.5755 5 8 1 46.2535 5
  • 24. Table: NMF Clusters k 3 k 4 k 5 k 6 k 7 k 8 3 4 5 6 3 5 8 10 12 14 15 3 4 5 6 3 4 5 6 11 15 17 18 7 8 9 10 11 15 16 16 17 18 7 8 10 11 7 8 14 16 20 21 22 23 11 14 15 16 19 20 21 19 20 23 13 14 15 16 19 20 24 24 25 26 28 17 19 20 21 22 23 24 24 25 26 17 18 19 20 22 23 24 25 25 26 28 21 22 24 25 26 28 28 12 13 18 6 7 12 13 4 5 8 10 23 26 12 15 23 2 4 5 8 14 17 18 21 22 28 26 9 10 1 2 27 27 7 11 1 2 9 27 21 28 27 1 2 4 9 3 6 13 12 10 13 17 16 18 22 25 1 2 9 27 27 3 13 14 1 2 9 11 1 6 7 12 19
  • 25. Table: C-NMF Clusters k 3 k 4 k 5 k 6 k 7 k 8 1 2 12 1 2 9 12 1 2 9 12 1 2 27 1 2 6 9 1 2 9 12 27 27 27 12 23 27 27 3 4 5 6 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 3 4 5 7 8 3 4 5 6 7 7 8 9 10 8 10 11 8 10 11 8 9 10 11 10 11 13 8 10 11 11 13 14 13 14 15 13 14 15 12 13 14 15 14 15 16 13 14 15 15 16 17 16 17 18 16 17 18 16 17 18 19 17 18 19 16 17 18 18 19 20 19 20 21 19 20 21 20 21 22 23 20 21 22 19 20 21 21 22 23 22 23 24 22 23 24 24 25 26 28 24 25 26 22 23 24 24 25 26 25 26 28 25 26 28 28 25 26 28 28
  • 26. Table: CH-NMF Clusters k 3 k 4 k 5 k 6 k 7 k 8 1 2 9 12 27 1 2 9 27 1 2 9 27 1 2 9 12 27 1 2 9 2 27 3 4 5 6 7 3 4 5 7 8 3 5 7 8 3 4 5 7 8 3 4 6 7 4 8 10 8 9 10 11 10 11 14 10 11 18 11 14 15 16 13 14 17 11 13 15 13 14 15 15 16 17 21 22 24 17 20 22 23 20 23 24 16 17 18 16 17 18 18 19 20 25 24 25 26 28 25 26 20 21 22 19 20 21 21 22 23 23 24 25 22 23 24 24 25 26 26 28 25 26 28 28 6 13 4 6 12 14 6 13 5 8 10 11 1 5 9 12 15 16 17 15 16 18 14 19 19 20 23 19 21 22 26 28 28 12 13 10 18 19 21 12 6 7 27 3
  • 27. Table: K-Means Clusters k 3 k 4 k 5 k 6 k 7 k 8 3 4 5 6 7 8 3 4 5 6 7 3 4 7 8 3 4 7 8 3 4 7 8 5 6 15 18 10 11 13 15 8 10 11 13 10 16 17 10 11 10 11 20 23 24 16 18 20 23 15 16 18 20 19 21 25 16 25 16 25 26 28 24 25 26 28 23 24 26 28 26 1 9 12 14 1 9 12 14 5 6 11 13 5 6 13 5 6 15 9 14 17 17 19 21 22 17 19 21 22 15 18 20 23 15 18 20 18 20 23 19 21 22 25 24 26 28 23 24 28 24 26 28 2 27 2 1 9 12 9 14 17 9 14 17 4 7 8 10 14 22 19 21 22 19 21 22 11 16 25 27 27 27 1 12 27 2 2 27 2 1 12 2 1 12 13 3 13
  • 29. Conclusions Portfolio diversification is the financial process of allocating capital in a way that reduces the exposure to risk by investing in a variety of assets (i.e., stocks). This equals to clustering stocks having similar trend. K-Means is not effective on this task. Hence, we applied NMF. Adding convexity constraints in the transformation improves the exploitation of similar stock trends. In particular CH-NMF is a very fast and scalable convex NMF technique that compares favorably for large data sets, both in terms of speed and reconstruction quality
  • 30. Conclusions Extensive experimental evaluation on real world NASDAQ stock data show that, compared to K-Means, NMF techniques: better point out the clustering properties, yield very low error in Frobenius norm, high efficiency in terms of convergence time. Future works: use more datasets from different markets investigate further decomposition techniques to improve the effectiveness of clustering stock data impose other penalty constraints in order to achieve a better portfolio diversification strategy