SlideShare a Scribd company logo
1 of 64
Download to read offline
A review of two decades of correlations, hierarchies,
networks and clustering in ļ¬nancial markets
Ton Duc Thang University, Ho Chi Minh City, Vietnam
Gautier Marti, Frank Nielsen, Mikolaj BiĀ“nkowski, Philippe Donnat
Ecole Polytechnique, Imperial College London, Hellebore Capital Ltd.
10 August 2018
HELLEBORECAPITAL
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 1 / 64
Table of contents
1 Introduction
2 Correlation networks
The standard and widely adopted methodology
Concerns about the standard methodology
Contributions for improving the methodology
On algorithms
On distances
On other methodological aspects
3 Other networks
4 Dynamics of networks
5 Applications
6 Opinionated views on research directions
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 2 / 64
Section 1
Introduction
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 3 / 64
Introduction
Motivation: A better understanding of ļ¬nancial markets using a scientiļ¬c
approach.
Empirical studies are using data to verify hypotheses and discover stylized
facts. Example of datasets:
price, volume, returns, turnover time series
supply chain networks
market (OTC, exchange) transaction data
retail transactional data (credit cards)
corporate payments networks
international trade (import/export) networks,
...
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 4 / 64
Introduction
Several research ļ¬elds are tackling the problem with their own tools:
statistical physics, econophyics:
Minimum Spanning Tree (MST)
Random Matrix Theory (RMT)
linear correlations
statistics, data mining, machine learning:
graph theory
communities detection
clustering algorithms
non-linear dependence
alternative distances
statistical signiļ¬cance and robustness check via bootstrapping
economics, ļ¬nance, accounting, behavioural ļ¬nance:
standard industry and fundamental classiļ¬cations vs. statistical and
text-based classiļ¬cations
networks of trades, suppliers, consumers, competitors, investors
linear regressions on network statistics, statistical signiļ¬cance through
t-stats
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 5 / 64
Section 2
Correlation networks
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 6 / 64
The standard and widely adopted methodology
(Mantegna, 1999) [add the proper biblio ref]
Let N be the number of assets.
Let Pi (t) be the price at time t of asset i, 1 ā‰¤ i ā‰¤ N.
Let ri (t) be the log-return at time t of asset i:
ri (t) = log Pi (t) āˆ’ log Pi (t āˆ’ 1).
For each pair i, j of assets, compute their correlation:
Ļij =
ri rj āˆ’ ri rj
r2
i āˆ’ ri
2 r2
j āˆ’ rj
2
.
Convert the correlation coeļ¬ƒcients Ļij into distances:
dij = 2(1 āˆ’ Ļij ).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 7 / 64
The standard and widely adopted methodology
From all the distances dij , compute a minimum spanning tree (MST)
using, for example, Algorithm 1:
Algorithm 1 Kruskalā€™s algorithm
1: procedure BuildMST({dij }1ā‰¤i,jā‰¤N)
2: Start with a fully disconnected graph G = (V , E)
3: E ā† āˆ…
4: V ā† {i}1ā‰¤iā‰¤N
5: Try to add edges by increasing distances
6: for (i, j) āˆˆ V 2 ordered by increasing dij do
7: Verify that i and j are not already connected by a path
8: if not connected(i, j) then
9: Add the edge (i, j) to connect i and j
10: E ā† E āˆŖ {(i, j)}
11: G is the resulting MST return G = (V , E)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 8 / 64
The standard and widely adopted methodology
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 9 / 64
Concerns about the standard methodology
The clusters obtained from the MST (or equivalently, the Single
Linkage Clustering Algorithm (SLCA)) are known to be unstable
(small perturbations of the input data may cause big diļ¬€erences in
the resulting clusters) [MVDN15].
The clustering instability may be partly due to the algorithm
(MST/Single Linkage are known for the chaining phenomenon
[CM10]).
The clustering instability may be partly due to the correlation
coeļ¬ƒcient (Pearson linear correlation) deļ¬ning the distance which
is known for being brittle to outliers, and, more generally, not well
suited to distributions other than the Gaussian ones [DMV16].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 10 / 64
Single Linkage chaining problem...
makes it brittle to small perturbations in the input distances.
Clusters and hierarchies are skewed: It does not take into account some
notion of density.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 11 / 64
Pearson linear correlation...
is too sensitive to outliers.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 12 / 64
Concerns about the standard methodology
Theoretical results providing the statistical reliability of hierarchical
trees and correlation-based networks are still not available [TLM10].
One might expect that the higher the correlation associated to a link
in a correlation-based network is, the higher the reliability of this link
is. In [TCL+07], authors show that this is not always observed
empirically.
Changes aļ¬€ecting speciļ¬c links (and clusters) during prominent crises
are of diļ¬ƒcult interpretation due to the high level of statistical
uncertainty associated with the correlation estimation [STZM11].
The standard method is somewhat arbitrary: A change in the
method (e.g. using a diļ¬€erent clustering algorithm or a diļ¬€erent
correlation coeļ¬ƒcient) may yield a huge change in the clustering
results [LRW+14, MVDN15]. As a consequence, it implies huge
variability in portfolio formation and perceived risk [LRW+14].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 13 / 64
Variance of the Pearson correlation estimator
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 14 / 64
CRLB of the Pearson correlation estimator - Proof
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 15 / 64
Random Matrix Theory & Empirical correlation matrices
Let X be the matrix storing the standardized returns of N = 560 assets
(credit default swaps) over a period of T = 2500 trading days.
Then, the empirical correlation matrix of the returns is
C =
1
T
XX .
We can compute the empirical density of its eigenvalues
Ļ(Ī») =
1
N
dn(Ī»)
dĪ»
,
where n(Ī») counts the number of eigenvalues of C less than Ī».
From random matrix theory, the Marchenko-Pastur distribution gives
the limit distribution as N ā†’ āˆž, T ā†’ āˆž and T/N ļ¬xed. It reads:
Ļ(Ī») =
T/N
2Ļ€
(Ī»max āˆ’ Ī»)(Ī» āˆ’ Ī»min)
Ī»
,
where Ī»max
min = 1 + N/T Ā± 2 N/T, and Ī» āˆˆ [Ī»min, Ī»max].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 16 / 64
Random Matrix Theory & Empirical correlation matrices
Notice that the Marchenko-Pastur density ļ¬ts well the empirical density
meaning that most of the information contained in the empirical
correlation matrix amounts to noise: only 26 eigenvalues are greater than
Ī»max. The highest eigenvalue corresponds to the ā€˜marketā€™, the 25 others
can be associated to ā€˜industrial sectorsā€™.
It is a known stylized fact of empirical correlation matrices between
ļ¬nancial returns: Only ā‰ˆ 5% of their eigenvalues are greater than Ī»max.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 17 / 64
A somewhat arbitrary choice of methodology
The standard method is somewhat arbitrary. Adopting another one may
yield strongly diļ¬€erent results. Which ones to trust? Are they both useful?
Clusters obtained are much diļ¬€erent from one method to another
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 18 / 64
Contributions on algorithms
Several alternative algorithms have been proposed to replace the minimum
spanning tree and its corresponding clusters:
Average Linkage Minimum Spanning Tree (ALMST) [TCL+07];
Authors introduce a spanning tree associated to the Average Linkage
Clustering Algorithm (ALCA); It is designed to remedy the unwanted
chaining phenomenon of MST/SLCA.
Planar Maximally Filtered Graph (PMFG) [ADMH05, TADMM05]
which strictly contains the Minimum Spanning Tree (MST) but
encodes a larger amount of information in its internal structure.
Directed Bubble Hierarchal Tree (DBHT) [SDMA11, SDMA12]
which is designed to extract, without parameters, the deterministic
clusters from the PMFG.
Triangulated Maximally Filtered Graph (TMFG) [MDMA16];
Authors introduce another ļ¬ltered graph more suitable for big
datasets.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 19 / 64
Contributions on algorithms (contā€™d)
Clustering using Potts super-paramagnetic transitions [KKM00];
When anti-correlations occur, the model creates repulsion between
the stocks which modify their clustering structure.
Clustering using maximum likelihood [GM01, GM02]; Authors
deļ¬ne the likelihood of a clustering based on a simple 1-factor model,
then devise parameter-free methods to ļ¬nd a clustering with high
likelihood.
Clustering using Random Matrix Theory (RMT) [PGR+00];
Eigenvalues help to determine the number of clusters, and
eigenvectors their composition.
[MG15] proposes network-based community detection methods whose
null hypothesis is consistent with RMT results on cross-correlation
matrices for ļ¬nancial time series data, unlike existing community
detection algorithms.
Clustering using the p-median problem [KBP14]; With this
construction, every cluster is a star, i.e. a tree with one central node.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 20 / 64
Planar Maximally Filtered Graph (PMFG)
The PMFG is a compelling alternative to the MST.
PMFG nodes are colored according to the clusters obtained from DBHT
Implementation of the PMFG in Python: https:
//gmarti.gitlab.io/networks/2018/06/03/pmfg-algorithm.html
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 21 / 64
Contributions on distances
At the heart of clustering algorithms is the fundamental notion of distance
that can be deļ¬ned upon a proper representation of data. It is thus an
obvious direction to explore. We list below what has been proposed in the
literature so far:
Distances that try to quantify how one ļ¬nancial instrument provides
information about another instrument:
Distance using Granger causality [BGLP12],
Distance using partial correlation [KTM+
10],
Study of asynchronous, lead-lag relationships by using mutual
information instead of Pearsonā€™s correlation coeļ¬ƒcient
[Fie14a, RTS16],
The correlation matrix is normalized using the aļ¬ƒnity transformation:
the correlation between each pair of stocks is normalized according to
the correlations of each of the two stocks with all other stocks
[KSM+
10].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 22 / 64
Contributions on distances (contā€™d)
Distances that aim at including non-linear relationships in the
analysis:
Distances using mutual information, mutual information rate, and
other information-theoretic distances
[Fie14b, RTS16, BP17a, BP17b, GHA18, GZT18],
The Brownian distance [ZPKS14],
Copula-based [MND16, DP15, B+
13] and tail dependence
[DFPW15] distances.
Distances that aim at taking into account multivariate dependence:
Each stock is represented by a bivariate time series: its returns and
traded volumes [BR08]; a distance is then applied to an ad hoc
transform of the two time series into a symbolic sequence,
Each stock is represented by a multivariate time series, for example the
daily (high, low, open, close) [LD13]; Authors use the Escouļ¬erā€™s RV
coeļ¬ƒcient (a multivariate extension of the Pearsonā€™s correlation
coeļ¬ƒcient).
A distance taking into account both the correlation between returns
and their distributions [DMV16].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 23 / 64
Contributions on distances (contā€™d)
Unlike recent studies which claim that the existence of nonlinear
dependence between stock returns have eļ¬€ects on network
characteristics, [HH18] documents that ā€œmost of the apparent
nonlinearity is due to univariate non-Gaussianity. Further, strong
non-stationarity in a few speciļ¬c stocks may play a role. In particular,
the sharp decrease of some stocks during the global ļ¬nancial crisis in
2008ā€ gives rise to apparent negative tail dependence among stocks.
When constructing unweighted stock networks, they suggest to use
linear correlation ā€œon marginally normalized dataā€, that is Spearmanā€™s
rank correlation. In fact, this is similar to the idea of splitting apart
the dependence information from the distribution one as in [DMV16],
where Spearmanā€™s rank correlation stems from using a Euclidean
distance between the uniform margins of the underlying bivariate
copula. Following previous studies, and unlike in [DMV16], the
distribution information is discarded when constructing the network.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 24 / 64
Dependence and marginal distribution of the returns
Theorem (Sklarā€™s theorem, 1959)
For any random vector X = (X1, . . . , XN) having continuous marginal
cumulative distribution functions Fi , its joint cumulative distribution F is
uniquely expressed as
F(X1, . . . , XN) = C(F1(X1), . . . , FN(XN)),
where C, the multivariate distribution of uniform marginals, is known as
the copula of X.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 25 / 64
Information-theoretic distances vs. Copula-based ones?
Copula entropy:
Hc(x) = āˆ’
u
c(u) log c(u)du
Mutual information:
I(x) =
x
p(x) log
p(x)
i pi (xi )
dx
=
x
c(ux )
i
pi (xi ) log c(ux )dx
=
u
c(ux ) log c(ux )dux
= āˆ’Hc(x)
Entropy:
H(x) =
i
H(xi ) + Hc(x)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 26 / 64
Contributions on other methodological aspects
Reliability and statistical uncertainty of the methods:
A bootstrap approach is used to estimate the statistical reliability of
both hierarchical trees [TLM07a, MAND16] and correlation-based
networks [TCL+
07, MMMM18],
Consistency proof of clustering algorithms for recovering clusters
deļ¬ned by nested block correlation matrices; Study of empirical
convergence rates [MAND16],
Kullback-Leibler divergence is used to estimate the amount of
ļ¬ltered information between the sample correlation matrix and the
ļ¬ltered one [TLM07b],
Cophenetic correlation is used between the original correlation
distances and the hierarchical cluster representation [PS15],
Several measures between successive (in time) clusters, dendrograms,
networks are used to estimate stability of the methods, e.g. cophenetic
correlation between dendrograms in [PLJ76], adjusted Rand index
(ARI) between clusters in [MVDN15], mutual information (MI) of link
co-occurrence between networks in [STZM11].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 27 / 64
Contributions on other methodological aspects (contā€™d)
Preprocessing of the time series:
Subtract the market mode before performing a cluster or network
analysis on the returns [BMM07],
Encode both rank statistics and a distribution histogram of the
returns into a representative vector [DMV16],
Fit an ARMA(p,q)-FIEGARCH(1,d,1)-cDCC process (econometric
preprocessing) to obtain dynamic correlations instead of the common
approach of rolling window Pearson correlations [ST14],
Use a clustering of successive correlation matrices to infer a market
state [PS15].
Use of other types of networks: threshold networks [OKK04],
inļ¬‚uence networks [GZC15], partial-correlation networks
[KTM+10, KPGGBJ12], Granger causality networks
[BGLP12, VLB15], cointegration-based networks [Tu14], bipartite
networks [TML+11], etc.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 28 / 64
Consistency and empirical convergence rates [MAND16]
Model selection: The faster the (empirical) convergence, the better.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 29 / 64
Statistical & practical stability
One can use bootstrap, block bootstrap or other common sense and
practical perturbations of the data as presented in [MVDN15].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 30 / 64
Eļ¬€ect of a basic preprocessing: Subtract the market mode
Visualization of the Planar Maximally Filtered Graph (PMFG) and DBHT clusters,
for both non-detrended (left) and detrended (right) log-returns [MADM15].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 31 / 64
Section 3
Other networks
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 32 / 64
Examples of other ļ¬nancial networks
supply chain networks [Wu15]
investor (security holdings and trading behaviour) networks [BKES18]
corporate board and director networks [BC04]
international trade networks [BFG10]
transaction networks [LL18]
sovereign debt (quarterly public debt-to-GDP ratio) networks [MO15]
interbank (exposures between banks) networks [SVLG13]
These networks are built from alternative data which are often:
conļ¬dential
hard or costly to obtain
Most often these studies are done in collaboration with a commercial or
regulatory organization. Some of these datasets may contain signiļ¬cant
alphas, and thus results are not publicly advertised: Papers are relatively
few in contrast to the ones on the correlation of asset returns which are
more oriented toward risk understanding.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 33 / 64
Section 4
Dynamics of networks
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 34 / 64
Studying the dynamics of networks
Comparing and ļ¬nding the diļ¬€erences in a sequence of large graphs is a
computationally diļ¬ƒcult problem. In the literature, one often studies the
following statistics:
(for networks) the normalized tree length [OCK+03], the mean
occupation layer [OCK+03], the tree half-life [OCK+03], a survival
ratio of the edges [OCKK02, JMS+05, ST14], node degree, strength
[ST14], eigenvector, betweenness, closeness centrality [ST14], the
agglomerative coeļ¬ƒcient [MO15]
(for clusters) the merging, splitting, birth, death, contraction, and
growth of the clusters in time [PS15]
Remark. To the best of my knowledge, graph embedding into vector spaces (cf.
the recent Deep Learning literature, or this survey [GF18]) have not been used to
study time series of ļ¬nancial networks. Such a vector representation would open
the ļ¬eld to the toolbox of standard machine learning algorithms: Cluster networks
and ļ¬nd those which are associated to some events (e.g. a crisis); Predict the
future networks in a sequence of networks with a LSTM (stat arb?); Detect a
structural break, etc.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 35 / 64
Section 5
Applications
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 36 / 64
Portfolio optimization
[OCK+
03] ļ¬nds that the Markowitz portfolio layer in the MST is higher
than the mean layer at all times.
As the stocks of the minimum risk portfolio are found on the outskirts
of the tree [PDMA13, OCK+
03], authors expect larger trees to have
greater diversiļ¬cation potential.
In [TLGM08, PLJ76], authors compare the Markowitz portfolios from
the ļ¬ltered empirical correlation matrices using the clustering approach,
the RMT approach and the shrinkage approach.
[RLL+
16, PZ16] propose to invest in diļ¬€erent part of the MST
depending on the estimated market conditions.
Authors show that there is no inner-mathematical relationship between
the minimum variance portfolio from Markowitz theory and the
portfolios designed from the minimum spanning tree [HMM18].
Empirical evidence of such relations found by previous studies is
essentially a stylized fact of ļ¬nancial returns correlations and time
series, not a general property of correlation matrices.
[DFPW15] introduces a procedure to design portfolios which are
diversiļ¬ed in their tail behavior by selecting only a single asset in each
cluster.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 37 / 64
Trading strategy
Earnings per share forecasts prepared on the basis of statistically
grouped data (clusters) outperform forecasts made on data grouped on
traditional industrial criteria as well as forecasts prepared by mechanical
extrapolation techniques [EG71].
One can build a simple mean-reversion statistical arbitrage strategy
whereby one assumes that stocks in a given industry move together,
cross-sectionally demeans stock returns within said industry, shorts
stocks with positive residual returns and goes long stocks with negative
residual returns [KY16].
In [PS15], they suggest that tracking the merging, splitting, birth, and
death of the clusters in time could be the basis for pairs-like reversal
trading strategies but with pairs corresponding to clusters.
The paper [DC05] describes methods for index tracking and enhanced
index tracking based on clusters of ļ¬nancial time series.
[MADM16] ļ¬nds the existence of signiļ¬cant relations between past
changes in the market correlation structure and future changes in the
market volatility.
In [KLT12], authors claim that long-short strategies exploiting
mispricing due to the industry categorization bias generate statistically
signiļ¬cant and economically sizable risk-adjusted excess returns.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 38 / 64
Risk
In [DPT14], authors design clusters that tend to be comonotonic in
their extreme low values: To avoid contagion in the portfolio during
risky scenarios, an investor should diversify over these clusters.
In [MDMA14], authors postulate the existence of a hierarchical
structure of risks which can be deemed responsible for both stock
multivariate dependency structure and univariate multifractal
behaviour, and then propose a model that reproduces the empirical
observations (entanglement of univariate multi-scaling and multivariate
cross-correlation properties of ļ¬nancial time series). The interplay
between multi-scaling and average cross-correlation is conļ¬rmed in
[BMDM18].
Clusters (statistical industry classiļ¬cation) can be an alternative to
sometimes unavailable ā€œfundamentalā€ industry classiļ¬cations (e.g. in
emerging or small markets) [KY16].
[HZYU16] ļ¬nds that ļ¬nancial institutions which have, in the
correlation networks, greater node strength, larger node betweenness
centrality, larger node closeness centrality and larger node clustering
coeļ¬ƒcient tend to be associated with larger systemic risk contributions.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 39 / 64
Financial policy making
Clusters and networks can help designing ļ¬nancial policies. Several
papers propose to leverage them to detect risky market environments,
develop indicators that can predict forthcoming crisis or economic
recovery [ZLW+
11], improve economic nowcasting [EFC17], or ļ¬nd key
markets and assets that drive a whole region, and on which stimulus
can be applied eļ¬€ectively.
Authors of [HSBYBY10] claim that ā€œseparation prevents failure
propagation and connections increase risks of global crisesā€ whereas the
prevailing view in favor of deregulation is that banks, by investing in
diverse sectors, would have greater stability. To support their argument,
using ļ¬nancial networks, they study the aftermath of the Glass-Steagall
Act (1933) repeal by Clinton administration in 1999. They ļ¬nd that
erosion of the Glassā€“Steagall Act, and cross sector investments
eliminated ā€œļ¬rewallsā€ that could have prevented the housing sector
decline from triggering a wider ļ¬nancial and economic crisis:
Our analysis implies that the investment across economic
sectors itself creates increased cross-linking of otherwise
much more weakly coupled parts of the economy, causing
dependencies that increase, rather than decrease, risk.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 40 / 64
Section 6
Opinionated views on research directions
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 41 / 64
Opinionated views on research directions
Whatā€™s missing for ā€œļ¬nancial networksā€ to become a mature research ļ¬eld?
Some inspiration from the booming deep learning era:
lack of reproducibility
provide code and data (at least synthetic datasets)
diļ¬ƒculty to compare methods, re-implementation bias
build open source libraries (standardized api, optimized code)
open source software helps to engage more with practitioners
conļ¬dential data
provide synthetic datasets encoding stylized facts
propose generative models (cf. the GAN literature applied to graphs)
lack of evaluation metrics / no end-to-end approach
deļ¬ne common tasks (e.g. evaluate the clustering or network
methodology on portfolio optimization, crisis detection, mean reversion
strategy) where all the details are speciļ¬ed (e.g. a well-chosen artiļ¬cial
dataset, or samples from a generative model, or public ļ¬nancial data)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 42 / 64
Thank you for the attention. Questions?
Co-authorship network (left) and its MST (right)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 43 / 64
References I
Tomaso Aste, Tiziana Di Matteo, and ST Hyde, Complex networks on
hyperbolic surfaces, Physica A: Statistical Mechanics and its
Applications 346 (2005), no. 1, 20ā€“26.
Eike Christian Brechmann et al., Hierarchical kendall copulas and the
modeling of systemic and operational risk, Ph.D. thesis,
UniversitĀØatsbibliothek der TU MĀØunchen, 2013.
Stefano Battiston and Michele Catanzaro, Statistical properties of
corporate board and director networks, The European Physical Journal
B 38 (2004), no. 2, 345ā€“352.
Matteo Barigozzi, Giorgio Fagiolo, and Diego Garlaschelli,
Multinetwork of international trade: A commodity-speciļ¬c analysis,
Physical Review E 81 (2010), no. 4, 046104.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 44 / 64
References II
Monica Billio, Mila Getmansky, Andrew W Lo, and Loriana Pelizzon,
Econometric measures of connectedness and systemic risk in the
ļ¬nance and insurance sectors, Journal of Financial Economics 104
(2012), no. 3, 535ā€“559.
Kestutis Baltakys, Juho Kanniainen, and Frank Emmert-Streib,
Multilayer aggregation with statistical validation: Application to
investor networks, Scientiļ¬c reports 8 (2018), no. 1, 8198.
RJ Buonocore, RN Mantegna, and T Di Matteo, On the interplay
between multiscaling and average cross-correlation, arXiv preprint
arXiv:1802.01113 (2018).
Christian Borghesi, Matteo Marsili, and Salvatore Miccich`e,
Emergence of time-horizon invariant correlation structure in ļ¬nancial
returns by subtraction of the market mode, Physical Review E 76
(2007), no. 2, 026104.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 45 / 64
References III
Eduard Baitinger and Jochen Papenbrock, Interconnectedness risk and
active portfolio management: The information-theoretic perspective.
AQ Barbi and GA Prataviera, Nonlinear dependencies on brazilian
equity network from mutual information minimum spanning trees,
arXiv preprint arXiv:1711.06185 (2017).
Juan Gabriel Brida and Wiston AdriĀ“an Risso, Multidimensional
minimal spanning tree: The dow jones case, Physica A: Statistical
Mechanics and its Applications 387 (2008), no. 21, 5205ā€“5210.
Gunnar Carlsson and Facundo MĖœAĖ‡Smoli, Characterization, stability
and convergence of hierarchical clustering methods, Journal of
machine learning research 11 (2010), no. Apr, 1425ā€“1470.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 46 / 64
References IV
Christian Dose and Silvano Cincotti, Clustering of ļ¬nancial time series
with application to index and enhanced index tracking portfolio,
Physica A: Statistical Mechanics and its Applications 355 (2005),
no. 1, 145ā€“151.
Fabrizio Durante, Enrico Foscolo, Roberta Pappad`a, and Hao Wang,
A portfolio diversiļ¬cation strategy via tail dependence measures.
Philippe Donnat, Gautier Marti, and Philippe Very, Toward a generic
representation of random variables for machine learning, Pattern
Recognition Letters 70 (2016), 24ā€“31.
Fabrizio Durante and Roberta Pappada, Cluster analysis of time series
via kendall distribution, Strengthening Links Between Data Analysis
and Soft Computing, Springer, 2015, pp. 209ā€“216.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 47 / 64
References V
Fabrizio Durante, Roberta Pappad`a, and Nicola Torelli, Clustering of
ļ¬nancial time series in risky scenarios, Advances in Data Analysis and
Classiļ¬cation 8 (2014), no. 4, 359ā€“376.
Mohammed Elshendy and Andrea Fronzetti Colladon, Big data
analysis of economic news: Hints to forecast macroeconomic
indicators, International Journal of Engineering Business Management
9 (2017), 1847979017720040.
Edwin J Elton and Martin J Gruber, Improved forecasting through the
design of homogeneous groups, The Journal of Business 44 (1971),
no. 4, 432ā€“450.
Pawel Fiedor, Information-theoretic approach to lead-lag eļ¬€ect on
ļ¬nancial markets, The European Physical Journal B 87 (2014), no. 8,
1ā€“9.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 48 / 64
References VI
, Networks in ļ¬nancial markets based on the mutual
information rate, Physical Review E 89 (2014), no. 5, 052801.
Palash Goyal and Emilio Ferrara, Graph embedding techniques,
applications, and performance: A survey, Knowledge-Based Systems
151 (2018), 78ā€“94.
Yong Kheng Goh, Haslifah M Hasim, and Chris G Antonopoulos,
Inference of ļ¬nancial networks using the normalised mutual
information rate, PloS one 13 (2018), no. 2, e0192160.
Lorenzo Giada and Matteo Marsili, Data clustering and noise
undressing of correlation matrices, Physical Review E 63 (2001),
no. 6, 061101.
, Algorithms of maximum likelihood data clustering with
applications, Physica A: Statistical Mechanics and its Applications
315 (2002), no. 3, 650ā€“664.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 49 / 64
References VII
Ya-Chun Gao, Yong Zeng, and Shi-Min Cai, Inļ¬‚uence network in the
Chinese stock market, Journal of Statistical Mechanics: Theory and
Experiment 2015 (2015), no. 3, P03017.
Xue Guo, Hu Zhang, and Tianhai Tian, Development of stock
correlation networks using mutual information and ļ¬nancial big data,
PloS one 13 (2018), no. 4, e0195941.
David Hartman and Jaroslav Hlinka, Nonlinearity in stock networks,
arXiv preprint arXiv:1804.10264 (2018).
Amelie HĀØuttner, Jan-Frederik Mai, and Stefano Mineo, Portfolio
selection based on graphs: Does it align with markowitz-optimal
portfolios?, Dependence Modeling (2018).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 50 / 64
References VIII
Dion Harmon, Blake Stacey, Yavni Bar-Yam, and Yaneer Bar-Yam,
Networks of economic market interdependence and systemic risk,
arXiv preprint arXiv:1011.3707 (2010).
Wei-Qiang Huang, Xin-Tian Zhuang, Shuang Yao, and Stan Uryasev,
A ļ¬nancial network perspective of ļ¬nancial institutionsā€™ systemic risk
contributions, Physica A: Statistical Mechanics and its Applications
456 (2016), 183ā€“196.
Neil F Johnson, Mark McDonald, Omer Suleman, Stacy Williams, and
Sam Howison, What shakes the FX tree? understanding currency
dominance, dependence, and dynamics (keynote address), SPIE Third
International Symposium on Fluctuations and Noise, International
Society for Optics and Photonics, 2005, pp. 86ā€“99.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 51 / 64
References IX
Anton Kocheturov, Mikhail Batsyn, and Panos M Pardalos, Dynamics
of cluster structures in a ļ¬nancial market network, Physica A:
Statistical Mechanics and its Applications 413 (2014), 523ā€“533.
L Kullmann, J Kertesz, and RN Mantegna, Identiļ¬cation of clusters of
companies in stock indices via potts super-paramagnetic transitions,
Physica A: Statistical Mechanics and its Applications 287 (2000),
no. 3, 412ā€“419.
Philipp KrĀØuger, Augustin Landier, and David Thesmar, Categorization
bias in the stock market, Available SSRN 2034204 (2012).
Dror Y Kenett, Tobias Preis, Gitit Gur-Gershgoren, and Eshel
Ben-Jacob, Dependency network and node inļ¬‚uence: application to
the study of ļ¬nancial markets, International Journal of Bifurcation and
Chaos 22 (2012), no. 07, 1250181.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 52 / 64
References X
Dror Y Kenett, Yoash Shapira, Asaf Madi, Sharron Bransburg-Zabary,
Gitit Gur-Gershgoren, and Eshel Ben-Jacob, Dynamics of stock market
correlations, AUCO Czech Economic Review 4 (2010), no. 3, 330ā€“341.
Dror Y Kenett, Michele Tumminello, Asaf Madi, Gitit Gur-Gershgoren,
Rosario N Mantegna, and Eshel Ben-Jacob, Dominating clasp of the
ļ¬nancial sector revealed by partial correlation analysis of the stock
market, PloS one 5 (2010), no. 12, e15032.
Zura Kakushadze and Willie Yu, Statistical industry classiļ¬cation.
Gan Siew Lee and Maman A Djauhari, Multidimensional stock
network analysis: An Escouļ¬erā€™s RV coeļ¬ƒcient approach, AIP
Conference Proceedings, vol. 1, 2013, pp. 550ā€“555.
Elisa Letizia and Fabrizio Lillo, Corporate payments networks and
credit risk rating.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 53 / 64
References XI
Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong, and
Mark Flood, Clustering techniques and their eļ¬€ect on portfolio
formation and risk analysis, Proceedings of the International
Workshop on Data Science for Macro-Modeling, ACM, 2014, pp. 1ā€“6.
NicolĀ“o Musmeci, Tomaso Aste, and Tiziana Di Matteo, Relation
between ļ¬nancial market structure and the real economy: comparison
between clustering methods, PloS one 10 (2015), no. 3, e0116201.
NicolĀ“o Musmeci, Tomaso Aste, and T Di Matteo, Interplay between
past market correlation structure changes and future volatility
outbursts, Scientiļ¬c reports 6 (2016).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 54 / 64
References XII
Gautier Marti, SĀ“ebastien Andler, Frank Nielsen, and Philippe Donnat,
Clustering ļ¬nancial time series: How long is enough?, Proceedings of
the Twenty-Fifth International Joint Conference on Artiļ¬cial
Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, 2016,
pp. 2583ā€“2589.
Raļ¬€aello Morales, T Di Matteo, and Tomaso Aste, Dependency
structure and scaling properties of ļ¬nancial time series are related,
Scientiļ¬c Reports 4 (2014), no. 4589.
Guido Previde Massara, Tiziana Di Matteo, and Tomaso Aste,
Network ļ¬ltering for big data: triangulated maximally ļ¬ltered graph,
Journal of complex Networks 5 (2016), no. 2, 161ā€“178.
Mel MacMahon and Diego Garlaschelli, Community detection for
correlation matrices, Phys. Rev. X 5 (2015), 021006.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 55 / 64
References XIII
Federico Musciotto, Luca Marotta, Salvatore Miccich`e, and Rosario N
Mantegna, Bootstrap validation of links of a minimum spanning tree,
arXiv preprint arXiv:1802.03395 (2018).
Gautier Marti, Frank Nielsen, and Philippe Donnat, Optimal copula
transport for clustering multivariate time series, 2016 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), IEEE, 2016, pp. 2379ā€“2383.
David Matesanz and Guillermo J Ortega, Sovereign public debt crisis
in europe. a network analysis, Physica A: Statistical Mechanics and its
Applications 436 (2015), 756ā€“766.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 56 / 64
References XIV
Gautier Marti, Philippe Very, Philippe Donnat, and Frank Nielsen, A
proposal of a methodological framework with experimental guidelines
to investigate clustering stability on ļ¬nancial time series, 14th IEEE
International Conference on Machine Learning and Applications,
ICMLA 2015, Miami, FL, USA, December 9-11, 2015, 2015,
pp. 32ā€“37.
J-P Onnela, Anirban Chakraborti, Kimmo Kaski, Janos Kertesz, and
Antti Kanto, Dynamics of market correlations: Taxonomy and
portfolio analysis, Physical Review E 68 (2003), no. 5, 056110.
J-P Onnela, A Chakraborti, K Kaski, and J KertiĀ“esz, Dynamic asset
trees and portfolio analysis, The European Physical Journal
B-Condensed Matter and Complex Systems 30 (2002), no. 3,
285ā€“288.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 57 / 64
References XV
J-P Onnela, Kimmo Kaski, and Janos KertĀ“esz, Clustering and
information in correlation based ļ¬nancial networks, The European
Physical Journal B-Condensed Matter and Complex Systems 38
(2004), no. 2, 353ā€“362.
Francesco Pozzi, Tiziana Di Matteo, and Tomaso Aste, Spread of risk
across ļ¬nancial markets: better to invest in the peripheries, Scientiļ¬c
reports 3 (2013).
Vasiliki Plerou, P Gopikrishnan, Bernd Rosenow, LA Nunes Amaral,
and H Eugene Stanley, A random matrix theory approach to ļ¬nancial
cross-correlations, Physica A: Statistical Mechanics and its
Applications 287 (2000), no. 3, 374ā€“382.
Don B Panton, V Parker Lessig, and O Maurice Joy, Comovement of
international equity markets: a taxonomic approach, Journal of
Financial and Quantitative Analysis 11 (1976), no. 03, 415ā€“432.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 58 / 64
References XVI
Jochen Papenbrock and Peter Schwendner, Handling risk-on/risk-oļ¬€
dynamics with correlation regimes and correlation networks, Financial
Markets and Portfolio Management 29 (2015), no. 2, 125ā€“147.
Gustavo Peralta and Abalfazl Zareei, A network approach to portfolio
selection, Journal of Empirical Finance (2016).
Fei Ren, Ya-Nan Lu, Sai-Ping Li, Xiong-Fei Jiang, Li-Xin Zhong, and
Tian Qiu, Dynamic portfolio strategy using clustering approach, arXiv
preprint arXiv:1608.03058 (2016).
Jacopo Rocchi, Enoch Yan Lok Tsui, and David Saad, Emerging
interdependence between stock values during ļ¬nancial crashes, arXiv
preprint arXiv:1611.02549 (2016).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 59 / 64
References XVII
Won-Min Song, Tiziana Di Matteo, and Tomaso Aste, Nested
hierarchies in planar graphs, Discrete Applied Mathematics 159
(2011), no. 17, 2135ā€“2146.
Won-Min Song, T Di Matteo, and Tomaso Aste, Hierarchical
information clustering by means of topologically embedded graphs,
PLoS One 7 (2012), no. 3, e31929.
Ahmet Sensoy and Benjamin M Tabak, Dynamic spanning trees in
stock market networks: The case of Asia-Paciļ¬c, Physica A:
Statistical Mechanics and its Applications 414 (2014), 387ā€“402.
Dong-Ming Song, Michele Tumminello, Wei-Xing Zhou, and
Rosario N Mantegna, Evolution of worldwide stock markets,
correlation structure, and correlation-based graphs, Physical Review E
84 (2011), no. 2, 026108.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 60 / 64
References XVIII
Tiziano Squartini, Iman Van Lelyveld, and Diego Garlaschelli,
Early-warning signals of topological collapse in interbank networks,
Scientiļ¬c reports 3 (2013).
Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and Rosario N
Mantegna, A tool for ļ¬ltering information in complex systems,
Proceedings of the National Academy of Sciences of the United States
of America 102 (2005), no. 30, 10421ā€“10426.
Michele Tumminello, Claudia Coronnello, Fabrizio Lillo, Salvatore
Micciche, and Rosario N Mantegna, Spanning trees and bootstrap
reliability estimation in correlation-based networks, International
Journal of Bifurcation and Chaos 17 (2007), no. 07, 2319ā€“2329.
Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario N
Mantegna, Cluster analysis for portfolio optimization, Journal of
Economic Dynamics and Control 32 (2008), no. 1, 235ā€“258.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 61 / 64
References XIX
Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna,
Hierarchically nested factor model from multivariate data, EPL
(Europhysics Letters) 78 (2007), no. 3, 30006.
, Kullback-leibler distance as a measure of the information
ļ¬ltered from multivariate data, Physical Review E 76 (2007), no. 3,
031123.
, Correlation, hierarchies, and networks in ļ¬nancial markets,
Journal of Economic Behavior & Organization 75 (2010), no. 1,
40ā€“58.
Michele Tumminello, Salvatore Miccich`e, Fabrizio Lillo, Jyrki Piilo,
and Rosario N Mantegna, Statistically validated networks in bipartite
complex systems, PloS one 6 (2011), no. 3, e17994.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 62 / 64
References XX
Chengyi Tu, Cointegration-based ļ¬nancial networks study in chinese
stock market, Physica A: Statistical Mechanics and its Applications
402 (2014), 245ā€“254.
TomĀ“aĖ‡s V`yrost, Ė‡Stefan LyĀ“ocsa, and Eduard BaumĀØohl, Granger causality
stock market networks: Temporal proximity and preferential
attachment, Physica A: Statistical Mechanics and its Applications 427
(2015), 262ā€“276.
Liuren Wu, Centrality of the supply chain network.
Yiting Zhang, Gladys Hui Ting Lee, Jian Cheng Wong, Jun Liang
Kok, Manamohan Prusty, and Siew Ann Cheong, Will the us economy
recover in 2010? a minimal spanning tree study, Physica A: Statistical
Mechanics and its Applications 390 (2011), no. 11, 2020ā€“2050.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 63 / 64
References XXI
Xin Zhang, Boris Podobnik, Dror Y Kenett, and H Eugene Stanley,
Systemic risk and causality dynamics of the world international
shipping market, Physica A: Statistical Mechanics and its Applications
415 (2014), 43ā€“53.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 64 / 64

More Related Content

What's hot

Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
Mokhtar SELLAMI
Ā 
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
SYRTO Project
Ā 
Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...
butest
Ā 

What's hot (20)

Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
Ā 
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Ā 
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
Ā 
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
Ā 
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
Ā 
Clustering in dynamic causal networks as a measure of systemic risk on the eu...
Clustering in dynamic causal networks as a measure of systemic risk on the eu...Clustering in dynamic causal networks as a measure of systemic risk on the eu...
Clustering in dynamic causal networks as a measure of systemic risk on the eu...
Ā 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
Ā 
Entropy and systemic risk measures
Entropy and systemic risk measuresEntropy and systemic risk measures
Entropy and systemic risk measures
Ā 
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...
Ā 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
Ā 
Options on Quantum Money: Quantum Path- Integral With Serial Shocks
Options on Quantum Money: Quantum Path- Integral With Serial ShocksOptions on Quantum Money: Quantum Path- Integral With Serial Shocks
Options on Quantum Money: Quantum Path- Integral With Serial Shocks
Ā 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
Ā 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
Ā 
Discussion of ā€œNetwork Connectivity and Systematic Riskā€ and ā€œThe Impact of N...
Discussion of ā€œNetwork Connectivity and Systematic Riskā€ and ā€œThe Impact of N...Discussion of ā€œNetwork Connectivity and Systematic Riskā€ and ā€œThe Impact of N...
Discussion of ā€œNetwork Connectivity and Systematic Riskā€ and ā€œThe Impact of N...
Ā 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
Ā 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
Ā 
Unit 6: All
Unit 6: AllUnit 6: All
Unit 6: All
Ā 
A prospect theory model of route choice with context dependent reference points
A prospect theory model of route choice with context dependent reference pointsA prospect theory model of route choice with context dependent reference points
A prospect theory model of route choice with context dependent reference points
Ā 
Application of transportation problem under pentagonal neutrosophic environment
Application of transportation problem under pentagonal neutrosophic environmentApplication of transportation problem under pentagonal neutrosophic environment
Application of transportation problem under pentagonal neutrosophic environment
Ā 
Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...
Ā 

Similar to A review of two decades of correlations, hierarchies, networks and clustering in financial markets

Similar to A review of two decades of correlations, hierarchies, networks and clustering in financial markets (20)

CoopLoc Technical Presentation
CoopLoc Technical PresentationCoopLoc Technical Presentation
CoopLoc Technical Presentation
Ā 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
Ā 
Glm
GlmGlm
Glm
Ā 
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
Ā 
Information filtering networks
Information filtering networksInformation filtering networks
Information filtering networks
Ā 
Increasing electrical grid stability classification performance using ensemb...
Increasing electrical grid stability classification performance  using ensemb...Increasing electrical grid stability classification performance  using ensemb...
Increasing electrical grid stability classification performance using ensemb...
Ā 
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSEVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
Ā 
Centrality Prediction in Mobile Social Networks
Centrality Prediction in Mobile Social NetworksCentrality Prediction in Mobile Social Networks
Centrality Prediction in Mobile Social Networks
Ā 
Metaheuristic Optimization: Algorithm Analysis and Open Problems
Metaheuristic Optimization: Algorithm Analysis and Open ProblemsMetaheuristic Optimization: Algorithm Analysis and Open Problems
Metaheuristic Optimization: Algorithm Analysis and Open Problems
Ā 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
Ā 
IRJET - Exploring Agglomerative Spectral Clustering Technique Employed for...
IRJET - 	  Exploring Agglomerative Spectral Clustering Technique Employed for...IRJET - 	  Exploring Agglomerative Spectral Clustering Technique Employed for...
IRJET - Exploring Agglomerative Spectral Clustering Technique Employed for...
Ā 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
Ā 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
Ā 
recko_paper
recko_paperrecko_paper
recko_paper
Ā 
mlcourse.ai. Clustering
mlcourse.ai. Clusteringmlcourse.ai. Clustering
mlcourse.ai. Clustering
Ā 
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
Ā 
50120130406039
5012013040603950120130406039
50120130406039
Ā 
algorithms
algorithmsalgorithms
algorithms
Ā 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
Ā 
Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...
Ā 

More from Gautier Marti

More from Gautier Marti (13)

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
Ā 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...
Ā 
A quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsA quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptions
Ā 
How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?
Ā 
Generating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGenerating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in Finance
Ā 
Applications of GANs in Finance
Applications of GANs in FinanceApplications of GANs in Finance
Applications of GANs in Finance
Ā 
My recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsMy recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returns
Ā 
Takeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaTakeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, California
Ā 
Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...
Ā 
Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...
Ā 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond Correlation
Ā 
On the stability of clustering financial time series
On the stability of clustering financial time seriesOn the stability of clustering financial time series
On the stability of clustering financial time series
Ā 
Clustering Random Walk Time Series
Clustering Random Walk Time SeriesClustering Random Walk Time Series
Clustering Random Walk Time Series
Ā 

Recently uploaded

20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf
Adnet Communications
Ā 
VIP Independent Call Girls in Bandra West šŸŒ¹ 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West šŸŒ¹ 9920725232 ( Call Me ) Mumbai Esc...VIP Independent Call Girls in Bandra West šŸŒ¹ 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West šŸŒ¹ 9920725232 ( Call Me ) Mumbai Esc...
dipikadinghjn ( Why You Choose Us? ) Escorts
Ā 

Recently uploaded (20)

High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Ā 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdf
Ā 
The Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfThe Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdf
Ā 
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Ā 
Call US šŸ“ž 9892124323 āœ… Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US šŸ“ž 9892124323 āœ… Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US šŸ“ž 9892124323 āœ… Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US šŸ“ž 9892124323 āœ… Kurla Call Girls In Kurla ( Mumbai ) secure service
Ā 
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Ā 
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
Ā 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdf
Ā 
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
Ā 
Independent Call Girl Number in Kurla MumbaišŸ“² Pooja Nehwal 9892124323 šŸ’ž Full ...
Independent Call Girl Number in Kurla MumbaišŸ“² Pooja Nehwal 9892124323 šŸ’ž Full ...Independent Call Girl Number in Kurla MumbaišŸ“² Pooja Nehwal 9892124323 šŸ’ž Full ...
Independent Call Girl Number in Kurla MumbaišŸ“² Pooja Nehwal 9892124323 šŸ’ž Full ...
Ā 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf
Ā 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
Ā 
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
Ā 
The Economic History of the U.S. Lecture 25.pdf
The Economic History of the U.S. Lecture 25.pdfThe Economic History of the U.S. Lecture 25.pdf
The Economic History of the U.S. Lecture 25.pdf
Ā 
Solution Manual for Principles of Corporate Finance 14th Edition by Richard B...
Solution Manual for Principles of Corporate Finance 14th Edition by Richard B...Solution Manual for Principles of Corporate Finance 14th Edition by Richard B...
Solution Manual for Principles of Corporate Finance 14th Edition by Richard B...
Ā 
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...
Ā 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
Ā 
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Ā 
VIP Independent Call Girls in Bandra West šŸŒ¹ 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West šŸŒ¹ 9920725232 ( Call Me ) Mumbai Esc...VIP Independent Call Girls in Bandra West šŸŒ¹ 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West šŸŒ¹ 9920725232 ( Call Me ) Mumbai Esc...
Ā 
The Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdfThe Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdf
Ā 

A review of two decades of correlations, hierarchies, networks and clustering in financial markets

  • 1. A review of two decades of correlations, hierarchies, networks and clustering in ļ¬nancial markets Ton Duc Thang University, Ho Chi Minh City, Vietnam Gautier Marti, Frank Nielsen, Mikolaj BiĀ“nkowski, Philippe Donnat Ecole Polytechnique, Imperial College London, Hellebore Capital Ltd. 10 August 2018 HELLEBORECAPITAL Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 1 / 64
  • 2. Table of contents 1 Introduction 2 Correlation networks The standard and widely adopted methodology Concerns about the standard methodology Contributions for improving the methodology On algorithms On distances On other methodological aspects 3 Other networks 4 Dynamics of networks 5 Applications 6 Opinionated views on research directions Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 2 / 64
  • 3. Section 1 Introduction Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 3 / 64
  • 4. Introduction Motivation: A better understanding of ļ¬nancial markets using a scientiļ¬c approach. Empirical studies are using data to verify hypotheses and discover stylized facts. Example of datasets: price, volume, returns, turnover time series supply chain networks market (OTC, exchange) transaction data retail transactional data (credit cards) corporate payments networks international trade (import/export) networks, ... Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 4 / 64
  • 5. Introduction Several research ļ¬elds are tackling the problem with their own tools: statistical physics, econophyics: Minimum Spanning Tree (MST) Random Matrix Theory (RMT) linear correlations statistics, data mining, machine learning: graph theory communities detection clustering algorithms non-linear dependence alternative distances statistical signiļ¬cance and robustness check via bootstrapping economics, ļ¬nance, accounting, behavioural ļ¬nance: standard industry and fundamental classiļ¬cations vs. statistical and text-based classiļ¬cations networks of trades, suppliers, consumers, competitors, investors linear regressions on network statistics, statistical signiļ¬cance through t-stats Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 5 / 64
  • 6. Section 2 Correlation networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 6 / 64
  • 7. The standard and widely adopted methodology (Mantegna, 1999) [add the proper biblio ref] Let N be the number of assets. Let Pi (t) be the price at time t of asset i, 1 ā‰¤ i ā‰¤ N. Let ri (t) be the log-return at time t of asset i: ri (t) = log Pi (t) āˆ’ log Pi (t āˆ’ 1). For each pair i, j of assets, compute their correlation: Ļij = ri rj āˆ’ ri rj r2 i āˆ’ ri 2 r2 j āˆ’ rj 2 . Convert the correlation coeļ¬ƒcients Ļij into distances: dij = 2(1 āˆ’ Ļij ). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 7 / 64
  • 8. The standard and widely adopted methodology From all the distances dij , compute a minimum spanning tree (MST) using, for example, Algorithm 1: Algorithm 1 Kruskalā€™s algorithm 1: procedure BuildMST({dij }1ā‰¤i,jā‰¤N) 2: Start with a fully disconnected graph G = (V , E) 3: E ā† āˆ… 4: V ā† {i}1ā‰¤iā‰¤N 5: Try to add edges by increasing distances 6: for (i, j) āˆˆ V 2 ordered by increasing dij do 7: Verify that i and j are not already connected by a path 8: if not connected(i, j) then 9: Add the edge (i, j) to connect i and j 10: E ā† E āˆŖ {(i, j)} 11: G is the resulting MST return G = (V , E) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 8 / 64
  • 9. The standard and widely adopted methodology Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 9 / 64
  • 10. Concerns about the standard methodology The clusters obtained from the MST (or equivalently, the Single Linkage Clustering Algorithm (SLCA)) are known to be unstable (small perturbations of the input data may cause big diļ¬€erences in the resulting clusters) [MVDN15]. The clustering instability may be partly due to the algorithm (MST/Single Linkage are known for the chaining phenomenon [CM10]). The clustering instability may be partly due to the correlation coeļ¬ƒcient (Pearson linear correlation) deļ¬ning the distance which is known for being brittle to outliers, and, more generally, not well suited to distributions other than the Gaussian ones [DMV16]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 10 / 64
  • 11. Single Linkage chaining problem... makes it brittle to small perturbations in the input distances. Clusters and hierarchies are skewed: It does not take into account some notion of density. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 11 / 64
  • 12. Pearson linear correlation... is too sensitive to outliers. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 12 / 64
  • 13. Concerns about the standard methodology Theoretical results providing the statistical reliability of hierarchical trees and correlation-based networks are still not available [TLM10]. One might expect that the higher the correlation associated to a link in a correlation-based network is, the higher the reliability of this link is. In [TCL+07], authors show that this is not always observed empirically. Changes aļ¬€ecting speciļ¬c links (and clusters) during prominent crises are of diļ¬ƒcult interpretation due to the high level of statistical uncertainty associated with the correlation estimation [STZM11]. The standard method is somewhat arbitrary: A change in the method (e.g. using a diļ¬€erent clustering algorithm or a diļ¬€erent correlation coeļ¬ƒcient) may yield a huge change in the clustering results [LRW+14, MVDN15]. As a consequence, it implies huge variability in portfolio formation and perceived risk [LRW+14]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 13 / 64
  • 14. Variance of the Pearson correlation estimator Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 14 / 64
  • 15. CRLB of the Pearson correlation estimator - Proof Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 15 / 64
  • 16. Random Matrix Theory & Empirical correlation matrices Let X be the matrix storing the standardized returns of N = 560 assets (credit default swaps) over a period of T = 2500 trading days. Then, the empirical correlation matrix of the returns is C = 1 T XX . We can compute the empirical density of its eigenvalues Ļ(Ī») = 1 N dn(Ī») dĪ» , where n(Ī») counts the number of eigenvalues of C less than Ī». From random matrix theory, the Marchenko-Pastur distribution gives the limit distribution as N ā†’ āˆž, T ā†’ āˆž and T/N ļ¬xed. It reads: Ļ(Ī») = T/N 2Ļ€ (Ī»max āˆ’ Ī»)(Ī» āˆ’ Ī»min) Ī» , where Ī»max min = 1 + N/T Ā± 2 N/T, and Ī» āˆˆ [Ī»min, Ī»max]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 16 / 64
  • 17. Random Matrix Theory & Empirical correlation matrices Notice that the Marchenko-Pastur density ļ¬ts well the empirical density meaning that most of the information contained in the empirical correlation matrix amounts to noise: only 26 eigenvalues are greater than Ī»max. The highest eigenvalue corresponds to the ā€˜marketā€™, the 25 others can be associated to ā€˜industrial sectorsā€™. It is a known stylized fact of empirical correlation matrices between ļ¬nancial returns: Only ā‰ˆ 5% of their eigenvalues are greater than Ī»max. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 17 / 64
  • 18. A somewhat arbitrary choice of methodology The standard method is somewhat arbitrary. Adopting another one may yield strongly diļ¬€erent results. Which ones to trust? Are they both useful? Clusters obtained are much diļ¬€erent from one method to another Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 18 / 64
  • 19. Contributions on algorithms Several alternative algorithms have been proposed to replace the minimum spanning tree and its corresponding clusters: Average Linkage Minimum Spanning Tree (ALMST) [TCL+07]; Authors introduce a spanning tree associated to the Average Linkage Clustering Algorithm (ALCA); It is designed to remedy the unwanted chaining phenomenon of MST/SLCA. Planar Maximally Filtered Graph (PMFG) [ADMH05, TADMM05] which strictly contains the Minimum Spanning Tree (MST) but encodes a larger amount of information in its internal structure. Directed Bubble Hierarchal Tree (DBHT) [SDMA11, SDMA12] which is designed to extract, without parameters, the deterministic clusters from the PMFG. Triangulated Maximally Filtered Graph (TMFG) [MDMA16]; Authors introduce another ļ¬ltered graph more suitable for big datasets. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 19 / 64
  • 20. Contributions on algorithms (contā€™d) Clustering using Potts super-paramagnetic transitions [KKM00]; When anti-correlations occur, the model creates repulsion between the stocks which modify their clustering structure. Clustering using maximum likelihood [GM01, GM02]; Authors deļ¬ne the likelihood of a clustering based on a simple 1-factor model, then devise parameter-free methods to ļ¬nd a clustering with high likelihood. Clustering using Random Matrix Theory (RMT) [PGR+00]; Eigenvalues help to determine the number of clusters, and eigenvectors their composition. [MG15] proposes network-based community detection methods whose null hypothesis is consistent with RMT results on cross-correlation matrices for ļ¬nancial time series data, unlike existing community detection algorithms. Clustering using the p-median problem [KBP14]; With this construction, every cluster is a star, i.e. a tree with one central node. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 20 / 64
  • 21. Planar Maximally Filtered Graph (PMFG) The PMFG is a compelling alternative to the MST. PMFG nodes are colored according to the clusters obtained from DBHT Implementation of the PMFG in Python: https: //gmarti.gitlab.io/networks/2018/06/03/pmfg-algorithm.html Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 21 / 64
  • 22. Contributions on distances At the heart of clustering algorithms is the fundamental notion of distance that can be deļ¬ned upon a proper representation of data. It is thus an obvious direction to explore. We list below what has been proposed in the literature so far: Distances that try to quantify how one ļ¬nancial instrument provides information about another instrument: Distance using Granger causality [BGLP12], Distance using partial correlation [KTM+ 10], Study of asynchronous, lead-lag relationships by using mutual information instead of Pearsonā€™s correlation coeļ¬ƒcient [Fie14a, RTS16], The correlation matrix is normalized using the aļ¬ƒnity transformation: the correlation between each pair of stocks is normalized according to the correlations of each of the two stocks with all other stocks [KSM+ 10]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 22 / 64
  • 23. Contributions on distances (contā€™d) Distances that aim at including non-linear relationships in the analysis: Distances using mutual information, mutual information rate, and other information-theoretic distances [Fie14b, RTS16, BP17a, BP17b, GHA18, GZT18], The Brownian distance [ZPKS14], Copula-based [MND16, DP15, B+ 13] and tail dependence [DFPW15] distances. Distances that aim at taking into account multivariate dependence: Each stock is represented by a bivariate time series: its returns and traded volumes [BR08]; a distance is then applied to an ad hoc transform of the two time series into a symbolic sequence, Each stock is represented by a multivariate time series, for example the daily (high, low, open, close) [LD13]; Authors use the Escouļ¬erā€™s RV coeļ¬ƒcient (a multivariate extension of the Pearsonā€™s correlation coeļ¬ƒcient). A distance taking into account both the correlation between returns and their distributions [DMV16]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 23 / 64
  • 24. Contributions on distances (contā€™d) Unlike recent studies which claim that the existence of nonlinear dependence between stock returns have eļ¬€ects on network characteristics, [HH18] documents that ā€œmost of the apparent nonlinearity is due to univariate non-Gaussianity. Further, strong non-stationarity in a few speciļ¬c stocks may play a role. In particular, the sharp decrease of some stocks during the global ļ¬nancial crisis in 2008ā€ gives rise to apparent negative tail dependence among stocks. When constructing unweighted stock networks, they suggest to use linear correlation ā€œon marginally normalized dataā€, that is Spearmanā€™s rank correlation. In fact, this is similar to the idea of splitting apart the dependence information from the distribution one as in [DMV16], where Spearmanā€™s rank correlation stems from using a Euclidean distance between the uniform margins of the underlying bivariate copula. Following previous studies, and unlike in [DMV16], the distribution information is discarded when constructing the network. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 24 / 64
  • 25. Dependence and marginal distribution of the returns Theorem (Sklarā€™s theorem, 1959) For any random vector X = (X1, . . . , XN) having continuous marginal cumulative distribution functions Fi , its joint cumulative distribution F is uniquely expressed as F(X1, . . . , XN) = C(F1(X1), . . . , FN(XN)), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 25 / 64
  • 26. Information-theoretic distances vs. Copula-based ones? Copula entropy: Hc(x) = āˆ’ u c(u) log c(u)du Mutual information: I(x) = x p(x) log p(x) i pi (xi ) dx = x c(ux ) i pi (xi ) log c(ux )dx = u c(ux ) log c(ux )dux = āˆ’Hc(x) Entropy: H(x) = i H(xi ) + Hc(x) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 26 / 64
  • 27. Contributions on other methodological aspects Reliability and statistical uncertainty of the methods: A bootstrap approach is used to estimate the statistical reliability of both hierarchical trees [TLM07a, MAND16] and correlation-based networks [TCL+ 07, MMMM18], Consistency proof of clustering algorithms for recovering clusters deļ¬ned by nested block correlation matrices; Study of empirical convergence rates [MAND16], Kullback-Leibler divergence is used to estimate the amount of ļ¬ltered information between the sample correlation matrix and the ļ¬ltered one [TLM07b], Cophenetic correlation is used between the original correlation distances and the hierarchical cluster representation [PS15], Several measures between successive (in time) clusters, dendrograms, networks are used to estimate stability of the methods, e.g. cophenetic correlation between dendrograms in [PLJ76], adjusted Rand index (ARI) between clusters in [MVDN15], mutual information (MI) of link co-occurrence between networks in [STZM11]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 27 / 64
  • 28. Contributions on other methodological aspects (contā€™d) Preprocessing of the time series: Subtract the market mode before performing a cluster or network analysis on the returns [BMM07], Encode both rank statistics and a distribution histogram of the returns into a representative vector [DMV16], Fit an ARMA(p,q)-FIEGARCH(1,d,1)-cDCC process (econometric preprocessing) to obtain dynamic correlations instead of the common approach of rolling window Pearson correlations [ST14], Use a clustering of successive correlation matrices to infer a market state [PS15]. Use of other types of networks: threshold networks [OKK04], inļ¬‚uence networks [GZC15], partial-correlation networks [KTM+10, KPGGBJ12], Granger causality networks [BGLP12, VLB15], cointegration-based networks [Tu14], bipartite networks [TML+11], etc. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 28 / 64
  • 29. Consistency and empirical convergence rates [MAND16] Model selection: The faster the (empirical) convergence, the better. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 29 / 64
  • 30. Statistical & practical stability One can use bootstrap, block bootstrap or other common sense and practical perturbations of the data as presented in [MVDN15]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 30 / 64
  • 31. Eļ¬€ect of a basic preprocessing: Subtract the market mode Visualization of the Planar Maximally Filtered Graph (PMFG) and DBHT clusters, for both non-detrended (left) and detrended (right) log-returns [MADM15]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 31 / 64
  • 32. Section 3 Other networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 32 / 64
  • 33. Examples of other ļ¬nancial networks supply chain networks [Wu15] investor (security holdings and trading behaviour) networks [BKES18] corporate board and director networks [BC04] international trade networks [BFG10] transaction networks [LL18] sovereign debt (quarterly public debt-to-GDP ratio) networks [MO15] interbank (exposures between banks) networks [SVLG13] These networks are built from alternative data which are often: conļ¬dential hard or costly to obtain Most often these studies are done in collaboration with a commercial or regulatory organization. Some of these datasets may contain signiļ¬cant alphas, and thus results are not publicly advertised: Papers are relatively few in contrast to the ones on the correlation of asset returns which are more oriented toward risk understanding. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 33 / 64
  • 34. Section 4 Dynamics of networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 34 / 64
  • 35. Studying the dynamics of networks Comparing and ļ¬nding the diļ¬€erences in a sequence of large graphs is a computationally diļ¬ƒcult problem. In the literature, one often studies the following statistics: (for networks) the normalized tree length [OCK+03], the mean occupation layer [OCK+03], the tree half-life [OCK+03], a survival ratio of the edges [OCKK02, JMS+05, ST14], node degree, strength [ST14], eigenvector, betweenness, closeness centrality [ST14], the agglomerative coeļ¬ƒcient [MO15] (for clusters) the merging, splitting, birth, death, contraction, and growth of the clusters in time [PS15] Remark. To the best of my knowledge, graph embedding into vector spaces (cf. the recent Deep Learning literature, or this survey [GF18]) have not been used to study time series of ļ¬nancial networks. Such a vector representation would open the ļ¬eld to the toolbox of standard machine learning algorithms: Cluster networks and ļ¬nd those which are associated to some events (e.g. a crisis); Predict the future networks in a sequence of networks with a LSTM (stat arb?); Detect a structural break, etc. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 35 / 64
  • 36. Section 5 Applications Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 36 / 64
  • 37. Portfolio optimization [OCK+ 03] ļ¬nds that the Markowitz portfolio layer in the MST is higher than the mean layer at all times. As the stocks of the minimum risk portfolio are found on the outskirts of the tree [PDMA13, OCK+ 03], authors expect larger trees to have greater diversiļ¬cation potential. In [TLGM08, PLJ76], authors compare the Markowitz portfolios from the ļ¬ltered empirical correlation matrices using the clustering approach, the RMT approach and the shrinkage approach. [RLL+ 16, PZ16] propose to invest in diļ¬€erent part of the MST depending on the estimated market conditions. Authors show that there is no inner-mathematical relationship between the minimum variance portfolio from Markowitz theory and the portfolios designed from the minimum spanning tree [HMM18]. Empirical evidence of such relations found by previous studies is essentially a stylized fact of ļ¬nancial returns correlations and time series, not a general property of correlation matrices. [DFPW15] introduces a procedure to design portfolios which are diversiļ¬ed in their tail behavior by selecting only a single asset in each cluster. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 37 / 64
  • 38. Trading strategy Earnings per share forecasts prepared on the basis of statistically grouped data (clusters) outperform forecasts made on data grouped on traditional industrial criteria as well as forecasts prepared by mechanical extrapolation techniques [EG71]. One can build a simple mean-reversion statistical arbitrage strategy whereby one assumes that stocks in a given industry move together, cross-sectionally demeans stock returns within said industry, shorts stocks with positive residual returns and goes long stocks with negative residual returns [KY16]. In [PS15], they suggest that tracking the merging, splitting, birth, and death of the clusters in time could be the basis for pairs-like reversal trading strategies but with pairs corresponding to clusters. The paper [DC05] describes methods for index tracking and enhanced index tracking based on clusters of ļ¬nancial time series. [MADM16] ļ¬nds the existence of signiļ¬cant relations between past changes in the market correlation structure and future changes in the market volatility. In [KLT12], authors claim that long-short strategies exploiting mispricing due to the industry categorization bias generate statistically signiļ¬cant and economically sizable risk-adjusted excess returns. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 38 / 64
  • 39. Risk In [DPT14], authors design clusters that tend to be comonotonic in their extreme low values: To avoid contagion in the portfolio during risky scenarios, an investor should diversify over these clusters. In [MDMA14], authors postulate the existence of a hierarchical structure of risks which can be deemed responsible for both stock multivariate dependency structure and univariate multifractal behaviour, and then propose a model that reproduces the empirical observations (entanglement of univariate multi-scaling and multivariate cross-correlation properties of ļ¬nancial time series). The interplay between multi-scaling and average cross-correlation is conļ¬rmed in [BMDM18]. Clusters (statistical industry classiļ¬cation) can be an alternative to sometimes unavailable ā€œfundamentalā€ industry classiļ¬cations (e.g. in emerging or small markets) [KY16]. [HZYU16] ļ¬nds that ļ¬nancial institutions which have, in the correlation networks, greater node strength, larger node betweenness centrality, larger node closeness centrality and larger node clustering coeļ¬ƒcient tend to be associated with larger systemic risk contributions. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 39 / 64
  • 40. Financial policy making Clusters and networks can help designing ļ¬nancial policies. Several papers propose to leverage them to detect risky market environments, develop indicators that can predict forthcoming crisis or economic recovery [ZLW+ 11], improve economic nowcasting [EFC17], or ļ¬nd key markets and assets that drive a whole region, and on which stimulus can be applied eļ¬€ectively. Authors of [HSBYBY10] claim that ā€œseparation prevents failure propagation and connections increase risks of global crisesā€ whereas the prevailing view in favor of deregulation is that banks, by investing in diverse sectors, would have greater stability. To support their argument, using ļ¬nancial networks, they study the aftermath of the Glass-Steagall Act (1933) repeal by Clinton administration in 1999. They ļ¬nd that erosion of the Glassā€“Steagall Act, and cross sector investments eliminated ā€œļ¬rewallsā€ that could have prevented the housing sector decline from triggering a wider ļ¬nancial and economic crisis: Our analysis implies that the investment across economic sectors itself creates increased cross-linking of otherwise much more weakly coupled parts of the economy, causing dependencies that increase, rather than decrease, risk. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 40 / 64
  • 41. Section 6 Opinionated views on research directions Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 41 / 64
  • 42. Opinionated views on research directions Whatā€™s missing for ā€œļ¬nancial networksā€ to become a mature research ļ¬eld? Some inspiration from the booming deep learning era: lack of reproducibility provide code and data (at least synthetic datasets) diļ¬ƒculty to compare methods, re-implementation bias build open source libraries (standardized api, optimized code) open source software helps to engage more with practitioners conļ¬dential data provide synthetic datasets encoding stylized facts propose generative models (cf. the GAN literature applied to graphs) lack of evaluation metrics / no end-to-end approach deļ¬ne common tasks (e.g. evaluate the clustering or network methodology on portfolio optimization, crisis detection, mean reversion strategy) where all the details are speciļ¬ed (e.g. a well-chosen artiļ¬cial dataset, or samples from a generative model, or public ļ¬nancial data) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 42 / 64
  • 43. Thank you for the attention. Questions? Co-authorship network (left) and its MST (right) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 43 / 64
  • 44. References I Tomaso Aste, Tiziana Di Matteo, and ST Hyde, Complex networks on hyperbolic surfaces, Physica A: Statistical Mechanics and its Applications 346 (2005), no. 1, 20ā€“26. Eike Christian Brechmann et al., Hierarchical kendall copulas and the modeling of systemic and operational risk, Ph.D. thesis, UniversitĀØatsbibliothek der TU MĀØunchen, 2013. Stefano Battiston and Michele Catanzaro, Statistical properties of corporate board and director networks, The European Physical Journal B 38 (2004), no. 2, 345ā€“352. Matteo Barigozzi, Giorgio Fagiolo, and Diego Garlaschelli, Multinetwork of international trade: A commodity-speciļ¬c analysis, Physical Review E 81 (2010), no. 4, 046104. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 44 / 64
  • 45. References II Monica Billio, Mila Getmansky, Andrew W Lo, and Loriana Pelizzon, Econometric measures of connectedness and systemic risk in the ļ¬nance and insurance sectors, Journal of Financial Economics 104 (2012), no. 3, 535ā€“559. Kestutis Baltakys, Juho Kanniainen, and Frank Emmert-Streib, Multilayer aggregation with statistical validation: Application to investor networks, Scientiļ¬c reports 8 (2018), no. 1, 8198. RJ Buonocore, RN Mantegna, and T Di Matteo, On the interplay between multiscaling and average cross-correlation, arXiv preprint arXiv:1802.01113 (2018). Christian Borghesi, Matteo Marsili, and Salvatore Miccich`e, Emergence of time-horizon invariant correlation structure in ļ¬nancial returns by subtraction of the market mode, Physical Review E 76 (2007), no. 2, 026104. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 45 / 64
  • 46. References III Eduard Baitinger and Jochen Papenbrock, Interconnectedness risk and active portfolio management: The information-theoretic perspective. AQ Barbi and GA Prataviera, Nonlinear dependencies on brazilian equity network from mutual information minimum spanning trees, arXiv preprint arXiv:1711.06185 (2017). Juan Gabriel Brida and Wiston AdriĀ“an Risso, Multidimensional minimal spanning tree: The dow jones case, Physica A: Statistical Mechanics and its Applications 387 (2008), no. 21, 5205ā€“5210. Gunnar Carlsson and Facundo MĖœAĖ‡Smoli, Characterization, stability and convergence of hierarchical clustering methods, Journal of machine learning research 11 (2010), no. Apr, 1425ā€“1470. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 46 / 64
  • 47. References IV Christian Dose and Silvano Cincotti, Clustering of ļ¬nancial time series with application to index and enhanced index tracking portfolio, Physica A: Statistical Mechanics and its Applications 355 (2005), no. 1, 145ā€“151. Fabrizio Durante, Enrico Foscolo, Roberta Pappad`a, and Hao Wang, A portfolio diversiļ¬cation strategy via tail dependence measures. Philippe Donnat, Gautier Marti, and Philippe Very, Toward a generic representation of random variables for machine learning, Pattern Recognition Letters 70 (2016), 24ā€“31. Fabrizio Durante and Roberta Pappada, Cluster analysis of time series via kendall distribution, Strengthening Links Between Data Analysis and Soft Computing, Springer, 2015, pp. 209ā€“216. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 47 / 64
  • 48. References V Fabrizio Durante, Roberta Pappad`a, and Nicola Torelli, Clustering of ļ¬nancial time series in risky scenarios, Advances in Data Analysis and Classiļ¬cation 8 (2014), no. 4, 359ā€“376. Mohammed Elshendy and Andrea Fronzetti Colladon, Big data analysis of economic news: Hints to forecast macroeconomic indicators, International Journal of Engineering Business Management 9 (2017), 1847979017720040. Edwin J Elton and Martin J Gruber, Improved forecasting through the design of homogeneous groups, The Journal of Business 44 (1971), no. 4, 432ā€“450. Pawel Fiedor, Information-theoretic approach to lead-lag eļ¬€ect on ļ¬nancial markets, The European Physical Journal B 87 (2014), no. 8, 1ā€“9. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 48 / 64
  • 49. References VI , Networks in ļ¬nancial markets based on the mutual information rate, Physical Review E 89 (2014), no. 5, 052801. Palash Goyal and Emilio Ferrara, Graph embedding techniques, applications, and performance: A survey, Knowledge-Based Systems 151 (2018), 78ā€“94. Yong Kheng Goh, Haslifah M Hasim, and Chris G Antonopoulos, Inference of ļ¬nancial networks using the normalised mutual information rate, PloS one 13 (2018), no. 2, e0192160. Lorenzo Giada and Matteo Marsili, Data clustering and noise undressing of correlation matrices, Physical Review E 63 (2001), no. 6, 061101. , Algorithms of maximum likelihood data clustering with applications, Physica A: Statistical Mechanics and its Applications 315 (2002), no. 3, 650ā€“664. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 49 / 64
  • 50. References VII Ya-Chun Gao, Yong Zeng, and Shi-Min Cai, Inļ¬‚uence network in the Chinese stock market, Journal of Statistical Mechanics: Theory and Experiment 2015 (2015), no. 3, P03017. Xue Guo, Hu Zhang, and Tianhai Tian, Development of stock correlation networks using mutual information and ļ¬nancial big data, PloS one 13 (2018), no. 4, e0195941. David Hartman and Jaroslav Hlinka, Nonlinearity in stock networks, arXiv preprint arXiv:1804.10264 (2018). Amelie HĀØuttner, Jan-Frederik Mai, and Stefano Mineo, Portfolio selection based on graphs: Does it align with markowitz-optimal portfolios?, Dependence Modeling (2018). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 50 / 64
  • 51. References VIII Dion Harmon, Blake Stacey, Yavni Bar-Yam, and Yaneer Bar-Yam, Networks of economic market interdependence and systemic risk, arXiv preprint arXiv:1011.3707 (2010). Wei-Qiang Huang, Xin-Tian Zhuang, Shuang Yao, and Stan Uryasev, A ļ¬nancial network perspective of ļ¬nancial institutionsā€™ systemic risk contributions, Physica A: Statistical Mechanics and its Applications 456 (2016), 183ā€“196. Neil F Johnson, Mark McDonald, Omer Suleman, Stacy Williams, and Sam Howison, What shakes the FX tree? understanding currency dominance, dependence, and dynamics (keynote address), SPIE Third International Symposium on Fluctuations and Noise, International Society for Optics and Photonics, 2005, pp. 86ā€“99. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 51 / 64
  • 52. References IX Anton Kocheturov, Mikhail Batsyn, and Panos M Pardalos, Dynamics of cluster structures in a ļ¬nancial market network, Physica A: Statistical Mechanics and its Applications 413 (2014), 523ā€“533. L Kullmann, J Kertesz, and RN Mantegna, Identiļ¬cation of clusters of companies in stock indices via potts super-paramagnetic transitions, Physica A: Statistical Mechanics and its Applications 287 (2000), no. 3, 412ā€“419. Philipp KrĀØuger, Augustin Landier, and David Thesmar, Categorization bias in the stock market, Available SSRN 2034204 (2012). Dror Y Kenett, Tobias Preis, Gitit Gur-Gershgoren, and Eshel Ben-Jacob, Dependency network and node inļ¬‚uence: application to the study of ļ¬nancial markets, International Journal of Bifurcation and Chaos 22 (2012), no. 07, 1250181. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 52 / 64
  • 53. References X Dror Y Kenett, Yoash Shapira, Asaf Madi, Sharron Bransburg-Zabary, Gitit Gur-Gershgoren, and Eshel Ben-Jacob, Dynamics of stock market correlations, AUCO Czech Economic Review 4 (2010), no. 3, 330ā€“341. Dror Y Kenett, Michele Tumminello, Asaf Madi, Gitit Gur-Gershgoren, Rosario N Mantegna, and Eshel Ben-Jacob, Dominating clasp of the ļ¬nancial sector revealed by partial correlation analysis of the stock market, PloS one 5 (2010), no. 12, e15032. Zura Kakushadze and Willie Yu, Statistical industry classiļ¬cation. Gan Siew Lee and Maman A Djauhari, Multidimensional stock network analysis: An Escouļ¬erā€™s RV coeļ¬ƒcient approach, AIP Conference Proceedings, vol. 1, 2013, pp. 550ā€“555. Elisa Letizia and Fabrizio Lillo, Corporate payments networks and credit risk rating. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 53 / 64
  • 54. References XI Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong, and Mark Flood, Clustering techniques and their eļ¬€ect on portfolio formation and risk analysis, Proceedings of the International Workshop on Data Science for Macro-Modeling, ACM, 2014, pp. 1ā€“6. NicolĀ“o Musmeci, Tomaso Aste, and Tiziana Di Matteo, Relation between ļ¬nancial market structure and the real economy: comparison between clustering methods, PloS one 10 (2015), no. 3, e0116201. NicolĀ“o Musmeci, Tomaso Aste, and T Di Matteo, Interplay between past market correlation structure changes and future volatility outbursts, Scientiļ¬c reports 6 (2016). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 54 / 64
  • 55. References XII Gautier Marti, SĀ“ebastien Andler, Frank Nielsen, and Philippe Donnat, Clustering ļ¬nancial time series: How long is enough?, Proceedings of the Twenty-Fifth International Joint Conference on Artiļ¬cial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, 2016, pp. 2583ā€“2589. Raļ¬€aello Morales, T Di Matteo, and Tomaso Aste, Dependency structure and scaling properties of ļ¬nancial time series are related, Scientiļ¬c Reports 4 (2014), no. 4589. Guido Previde Massara, Tiziana Di Matteo, and Tomaso Aste, Network ļ¬ltering for big data: triangulated maximally ļ¬ltered graph, Journal of complex Networks 5 (2016), no. 2, 161ā€“178. Mel MacMahon and Diego Garlaschelli, Community detection for correlation matrices, Phys. Rev. X 5 (2015), 021006. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 55 / 64
  • 56. References XIII Federico Musciotto, Luca Marotta, Salvatore Miccich`e, and Rosario N Mantegna, Bootstrap validation of links of a minimum spanning tree, arXiv preprint arXiv:1802.03395 (2018). Gautier Marti, Frank Nielsen, and Philippe Donnat, Optimal copula transport for clustering multivariate time series, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2016, pp. 2379ā€“2383. David Matesanz and Guillermo J Ortega, Sovereign public debt crisis in europe. a network analysis, Physica A: Statistical Mechanics and its Applications 436 (2015), 756ā€“766. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 56 / 64
  • 57. References XIV Gautier Marti, Philippe Very, Philippe Donnat, and Frank Nielsen, A proposal of a methodological framework with experimental guidelines to investigate clustering stability on ļ¬nancial time series, 14th IEEE International Conference on Machine Learning and Applications, ICMLA 2015, Miami, FL, USA, December 9-11, 2015, 2015, pp. 32ā€“37. J-P Onnela, Anirban Chakraborti, Kimmo Kaski, Janos Kertesz, and Antti Kanto, Dynamics of market correlations: Taxonomy and portfolio analysis, Physical Review E 68 (2003), no. 5, 056110. J-P Onnela, A Chakraborti, K Kaski, and J KertiĀ“esz, Dynamic asset trees and portfolio analysis, The European Physical Journal B-Condensed Matter and Complex Systems 30 (2002), no. 3, 285ā€“288. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 57 / 64
  • 58. References XV J-P Onnela, Kimmo Kaski, and Janos KertĀ“esz, Clustering and information in correlation based ļ¬nancial networks, The European Physical Journal B-Condensed Matter and Complex Systems 38 (2004), no. 2, 353ā€“362. Francesco Pozzi, Tiziana Di Matteo, and Tomaso Aste, Spread of risk across ļ¬nancial markets: better to invest in the peripheries, Scientiļ¬c reports 3 (2013). Vasiliki Plerou, P Gopikrishnan, Bernd Rosenow, LA Nunes Amaral, and H Eugene Stanley, A random matrix theory approach to ļ¬nancial cross-correlations, Physica A: Statistical Mechanics and its Applications 287 (2000), no. 3, 374ā€“382. Don B Panton, V Parker Lessig, and O Maurice Joy, Comovement of international equity markets: a taxonomic approach, Journal of Financial and Quantitative Analysis 11 (1976), no. 03, 415ā€“432. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 58 / 64
  • 59. References XVI Jochen Papenbrock and Peter Schwendner, Handling risk-on/risk-oļ¬€ dynamics with correlation regimes and correlation networks, Financial Markets and Portfolio Management 29 (2015), no. 2, 125ā€“147. Gustavo Peralta and Abalfazl Zareei, A network approach to portfolio selection, Journal of Empirical Finance (2016). Fei Ren, Ya-Nan Lu, Sai-Ping Li, Xiong-Fei Jiang, Li-Xin Zhong, and Tian Qiu, Dynamic portfolio strategy using clustering approach, arXiv preprint arXiv:1608.03058 (2016). Jacopo Rocchi, Enoch Yan Lok Tsui, and David Saad, Emerging interdependence between stock values during ļ¬nancial crashes, arXiv preprint arXiv:1611.02549 (2016). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 59 / 64
  • 60. References XVII Won-Min Song, Tiziana Di Matteo, and Tomaso Aste, Nested hierarchies in planar graphs, Discrete Applied Mathematics 159 (2011), no. 17, 2135ā€“2146. Won-Min Song, T Di Matteo, and Tomaso Aste, Hierarchical information clustering by means of topologically embedded graphs, PLoS One 7 (2012), no. 3, e31929. Ahmet Sensoy and Benjamin M Tabak, Dynamic spanning trees in stock market networks: The case of Asia-Paciļ¬c, Physica A: Statistical Mechanics and its Applications 414 (2014), 387ā€“402. Dong-Ming Song, Michele Tumminello, Wei-Xing Zhou, and Rosario N Mantegna, Evolution of worldwide stock markets, correlation structure, and correlation-based graphs, Physical Review E 84 (2011), no. 2, 026108. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 60 / 64
  • 61. References XVIII Tiziano Squartini, Iman Van Lelyveld, and Diego Garlaschelli, Early-warning signals of topological collapse in interbank networks, Scientiļ¬c reports 3 (2013). Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and Rosario N Mantegna, A tool for ļ¬ltering information in complex systems, Proceedings of the National Academy of Sciences of the United States of America 102 (2005), no. 30, 10421ā€“10426. Michele Tumminello, Claudia Coronnello, Fabrizio Lillo, Salvatore Micciche, and Rosario N Mantegna, Spanning trees and bootstrap reliability estimation in correlation-based networks, International Journal of Bifurcation and Chaos 17 (2007), no. 07, 2319ā€“2329. Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario N Mantegna, Cluster analysis for portfolio optimization, Journal of Economic Dynamics and Control 32 (2008), no. 1, 235ā€“258. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 61 / 64
  • 62. References XIX Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna, Hierarchically nested factor model from multivariate data, EPL (Europhysics Letters) 78 (2007), no. 3, 30006. , Kullback-leibler distance as a measure of the information ļ¬ltered from multivariate data, Physical Review E 76 (2007), no. 3, 031123. , Correlation, hierarchies, and networks in ļ¬nancial markets, Journal of Economic Behavior & Organization 75 (2010), no. 1, 40ā€“58. Michele Tumminello, Salvatore Miccich`e, Fabrizio Lillo, Jyrki Piilo, and Rosario N Mantegna, Statistically validated networks in bipartite complex systems, PloS one 6 (2011), no. 3, e17994. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 62 / 64
  • 63. References XX Chengyi Tu, Cointegration-based ļ¬nancial networks study in chinese stock market, Physica A: Statistical Mechanics and its Applications 402 (2014), 245ā€“254. TomĀ“aĖ‡s V`yrost, Ė‡Stefan LyĀ“ocsa, and Eduard BaumĀØohl, Granger causality stock market networks: Temporal proximity and preferential attachment, Physica A: Statistical Mechanics and its Applications 427 (2015), 262ā€“276. Liuren Wu, Centrality of the supply chain network. Yiting Zhang, Gladys Hui Ting Lee, Jian Cheng Wong, Jun Liang Kok, Manamohan Prusty, and Siew Ann Cheong, Will the us economy recover in 2010? a minimal spanning tree study, Physica A: Statistical Mechanics and its Applications 390 (2011), no. 11, 2020ā€“2050. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 63 / 64
  • 64. References XXI Xin Zhang, Boris Podobnik, Dror Y Kenett, and H Eugene Stanley, Systemic risk and causality dynamics of the world international shipping market, Physica A: Statistical Mechanics and its Applications 415 (2014), 43ā€“53. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 64 / 64