Opinionated review of two decades of correlations, hierarchies,
networks and clustering in financial markets presented at Ton Duc Thang University in Ho Chi Minh City, Vietnam.
A review of two decades of correlations, hierarchies, networks and clustering in financial markets
1. A review of two decades of correlations, hierarchies,
networks and clustering in ļ¬nancial markets
Ton Duc Thang University, Ho Chi Minh City, Vietnam
Gautier Marti, Frank Nielsen, Mikolaj BiĀ“nkowski, Philippe Donnat
Ecole Polytechnique, Imperial College London, Hellebore Capital Ltd.
10 August 2018
HELLEBORECAPITAL
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 1 / 64
2. Table of contents
1 Introduction
2 Correlation networks
The standard and widely adopted methodology
Concerns about the standard methodology
Contributions for improving the methodology
On algorithms
On distances
On other methodological aspects
3 Other networks
4 Dynamics of networks
5 Applications
6 Opinionated views on research directions
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 2 / 64
4. Introduction
Motivation: A better understanding of ļ¬nancial markets using a scientiļ¬c
approach.
Empirical studies are using data to verify hypotheses and discover stylized
facts. Example of datasets:
price, volume, returns, turnover time series
supply chain networks
market (OTC, exchange) transaction data
retail transactional data (credit cards)
corporate payments networks
international trade (import/export) networks,
...
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 4 / 64
5. Introduction
Several research ļ¬elds are tackling the problem with their own tools:
statistical physics, econophyics:
Minimum Spanning Tree (MST)
Random Matrix Theory (RMT)
linear correlations
statistics, data mining, machine learning:
graph theory
communities detection
clustering algorithms
non-linear dependence
alternative distances
statistical signiļ¬cance and robustness check via bootstrapping
economics, ļ¬nance, accounting, behavioural ļ¬nance:
standard industry and fundamental classiļ¬cations vs. statistical and
text-based classiļ¬cations
networks of trades, suppliers, consumers, competitors, investors
linear regressions on network statistics, statistical signiļ¬cance through
t-stats
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 5 / 64
7. The standard and widely adopted methodology
(Mantegna, 1999) [add the proper biblio ref]
Let N be the number of assets.
Let Pi (t) be the price at time t of asset i, 1 ā¤ i ā¤ N.
Let ri (t) be the log-return at time t of asset i:
ri (t) = log Pi (t) ā log Pi (t ā 1).
For each pair i, j of assets, compute their correlation:
Ļij =
ri rj ā ri rj
r2
i ā ri
2 r2
j ā rj
2
.
Convert the correlation coeļ¬cients Ļij into distances:
dij = 2(1 ā Ļij ).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 7 / 64
8. The standard and widely adopted methodology
From all the distances dij , compute a minimum spanning tree (MST)
using, for example, Algorithm 1:
Algorithm 1 Kruskalās algorithm
1: procedure BuildMST({dij }1ā¤i,jā¤N)
2: Start with a fully disconnected graph G = (V , E)
3: E ā ā
4: V ā {i}1ā¤iā¤N
5: Try to add edges by increasing distances
6: for (i, j) ā V 2 ordered by increasing dij do
7: Verify that i and j are not already connected by a path
8: if not connected(i, j) then
9: Add the edge (i, j) to connect i and j
10: E ā E āŖ {(i, j)}
11: G is the resulting MST return G = (V , E)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 8 / 64
9. The standard and widely adopted methodology
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 9 / 64
10. Concerns about the standard methodology
The clusters obtained from the MST (or equivalently, the Single
Linkage Clustering Algorithm (SLCA)) are known to be unstable
(small perturbations of the input data may cause big diļ¬erences in
the resulting clusters) [MVDN15].
The clustering instability may be partly due to the algorithm
(MST/Single Linkage are known for the chaining phenomenon
[CM10]).
The clustering instability may be partly due to the correlation
coeļ¬cient (Pearson linear correlation) deļ¬ning the distance which
is known for being brittle to outliers, and, more generally, not well
suited to distributions other than the Gaussian ones [DMV16].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 10 / 64
11. Single Linkage chaining problem...
makes it brittle to small perturbations in the input distances.
Clusters and hierarchies are skewed: It does not take into account some
notion of density.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 11 / 64
12. Pearson linear correlation...
is too sensitive to outliers.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 12 / 64
13. Concerns about the standard methodology
Theoretical results providing the statistical reliability of hierarchical
trees and correlation-based networks are still not available [TLM10].
One might expect that the higher the correlation associated to a link
in a correlation-based network is, the higher the reliability of this link
is. In [TCL+07], authors show that this is not always observed
empirically.
Changes aļ¬ecting speciļ¬c links (and clusters) during prominent crises
are of diļ¬cult interpretation due to the high level of statistical
uncertainty associated with the correlation estimation [STZM11].
The standard method is somewhat arbitrary: A change in the
method (e.g. using a diļ¬erent clustering algorithm or a diļ¬erent
correlation coeļ¬cient) may yield a huge change in the clustering
results [LRW+14, MVDN15]. As a consequence, it implies huge
variability in portfolio formation and perceived risk [LRW+14].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 13 / 64
14. Variance of the Pearson correlation estimator
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 14 / 64
15. CRLB of the Pearson correlation estimator - Proof
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 15 / 64
16. Random Matrix Theory & Empirical correlation matrices
Let X be the matrix storing the standardized returns of N = 560 assets
(credit default swaps) over a period of T = 2500 trading days.
Then, the empirical correlation matrix of the returns is
C =
1
T
XX .
We can compute the empirical density of its eigenvalues
Ļ(Ī») =
1
N
dn(Ī»)
dĪ»
,
where n(Ī») counts the number of eigenvalues of C less than Ī».
From random matrix theory, the Marchenko-Pastur distribution gives
the limit distribution as N ā ā, T ā ā and T/N ļ¬xed. It reads:
Ļ(Ī») =
T/N
2Ļ
(Ī»max ā Ī»)(Ī» ā Ī»min)
Ī»
,
where Ī»max
min = 1 + N/T Ā± 2 N/T, and Ī» ā [Ī»min, Ī»max].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 16 / 64
17. Random Matrix Theory & Empirical correlation matrices
Notice that the Marchenko-Pastur density ļ¬ts well the empirical density
meaning that most of the information contained in the empirical
correlation matrix amounts to noise: only 26 eigenvalues are greater than
Ī»max. The highest eigenvalue corresponds to the āmarketā, the 25 others
can be associated to āindustrial sectorsā.
It is a known stylized fact of empirical correlation matrices between
ļ¬nancial returns: Only ā 5% of their eigenvalues are greater than Ī»max.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 17 / 64
18. A somewhat arbitrary choice of methodology
The standard method is somewhat arbitrary. Adopting another one may
yield strongly diļ¬erent results. Which ones to trust? Are they both useful?
Clusters obtained are much diļ¬erent from one method to another
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 18 / 64
19. Contributions on algorithms
Several alternative algorithms have been proposed to replace the minimum
spanning tree and its corresponding clusters:
Average Linkage Minimum Spanning Tree (ALMST) [TCL+07];
Authors introduce a spanning tree associated to the Average Linkage
Clustering Algorithm (ALCA); It is designed to remedy the unwanted
chaining phenomenon of MST/SLCA.
Planar Maximally Filtered Graph (PMFG) [ADMH05, TADMM05]
which strictly contains the Minimum Spanning Tree (MST) but
encodes a larger amount of information in its internal structure.
Directed Bubble Hierarchal Tree (DBHT) [SDMA11, SDMA12]
which is designed to extract, without parameters, the deterministic
clusters from the PMFG.
Triangulated Maximally Filtered Graph (TMFG) [MDMA16];
Authors introduce another ļ¬ltered graph more suitable for big
datasets.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 19 / 64
20. Contributions on algorithms (contād)
Clustering using Potts super-paramagnetic transitions [KKM00];
When anti-correlations occur, the model creates repulsion between
the stocks which modify their clustering structure.
Clustering using maximum likelihood [GM01, GM02]; Authors
deļ¬ne the likelihood of a clustering based on a simple 1-factor model,
then devise parameter-free methods to ļ¬nd a clustering with high
likelihood.
Clustering using Random Matrix Theory (RMT) [PGR+00];
Eigenvalues help to determine the number of clusters, and
eigenvectors their composition.
[MG15] proposes network-based community detection methods whose
null hypothesis is consistent with RMT results on cross-correlation
matrices for ļ¬nancial time series data, unlike existing community
detection algorithms.
Clustering using the p-median problem [KBP14]; With this
construction, every cluster is a star, i.e. a tree with one central node.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 20 / 64
21. Planar Maximally Filtered Graph (PMFG)
The PMFG is a compelling alternative to the MST.
PMFG nodes are colored according to the clusters obtained from DBHT
Implementation of the PMFG in Python: https:
//gmarti.gitlab.io/networks/2018/06/03/pmfg-algorithm.html
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 21 / 64
22. Contributions on distances
At the heart of clustering algorithms is the fundamental notion of distance
that can be deļ¬ned upon a proper representation of data. It is thus an
obvious direction to explore. We list below what has been proposed in the
literature so far:
Distances that try to quantify how one ļ¬nancial instrument provides
information about another instrument:
Distance using Granger causality [BGLP12],
Distance using partial correlation [KTM+
10],
Study of asynchronous, lead-lag relationships by using mutual
information instead of Pearsonās correlation coeļ¬cient
[Fie14a, RTS16],
The correlation matrix is normalized using the aļ¬nity transformation:
the correlation between each pair of stocks is normalized according to
the correlations of each of the two stocks with all other stocks
[KSM+
10].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 22 / 64
23. Contributions on distances (contād)
Distances that aim at including non-linear relationships in the
analysis:
Distances using mutual information, mutual information rate, and
other information-theoretic distances
[Fie14b, RTS16, BP17a, BP17b, GHA18, GZT18],
The Brownian distance [ZPKS14],
Copula-based [MND16, DP15, B+
13] and tail dependence
[DFPW15] distances.
Distances that aim at taking into account multivariate dependence:
Each stock is represented by a bivariate time series: its returns and
traded volumes [BR08]; a distance is then applied to an ad hoc
transform of the two time series into a symbolic sequence,
Each stock is represented by a multivariate time series, for example the
daily (high, low, open, close) [LD13]; Authors use the Escouļ¬erās RV
coeļ¬cient (a multivariate extension of the Pearsonās correlation
coeļ¬cient).
A distance taking into account both the correlation between returns
and their distributions [DMV16].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 23 / 64
24. Contributions on distances (contād)
Unlike recent studies which claim that the existence of nonlinear
dependence between stock returns have eļ¬ects on network
characteristics, [HH18] documents that āmost of the apparent
nonlinearity is due to univariate non-Gaussianity. Further, strong
non-stationarity in a few speciļ¬c stocks may play a role. In particular,
the sharp decrease of some stocks during the global ļ¬nancial crisis in
2008ā gives rise to apparent negative tail dependence among stocks.
When constructing unweighted stock networks, they suggest to use
linear correlation āon marginally normalized dataā, that is Spearmanās
rank correlation. In fact, this is similar to the idea of splitting apart
the dependence information from the distribution one as in [DMV16],
where Spearmanās rank correlation stems from using a Euclidean
distance between the uniform margins of the underlying bivariate
copula. Following previous studies, and unlike in [DMV16], the
distribution information is discarded when constructing the network.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 24 / 64
25. Dependence and marginal distribution of the returns
Theorem (Sklarās theorem, 1959)
For any random vector X = (X1, . . . , XN) having continuous marginal
cumulative distribution functions Fi , its joint cumulative distribution F is
uniquely expressed as
F(X1, . . . , XN) = C(F1(X1), . . . , FN(XN)),
where C, the multivariate distribution of uniform marginals, is known as
the copula of X.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 25 / 64
26. Information-theoretic distances vs. Copula-based ones?
Copula entropy:
Hc(x) = ā
u
c(u) log c(u)du
Mutual information:
I(x) =
x
p(x) log
p(x)
i pi (xi )
dx
=
x
c(ux )
i
pi (xi ) log c(ux )dx
=
u
c(ux ) log c(ux )dux
= āHc(x)
Entropy:
H(x) =
i
H(xi ) + Hc(x)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 26 / 64
27. Contributions on other methodological aspects
Reliability and statistical uncertainty of the methods:
A bootstrap approach is used to estimate the statistical reliability of
both hierarchical trees [TLM07a, MAND16] and correlation-based
networks [TCL+
07, MMMM18],
Consistency proof of clustering algorithms for recovering clusters
deļ¬ned by nested block correlation matrices; Study of empirical
convergence rates [MAND16],
Kullback-Leibler divergence is used to estimate the amount of
ļ¬ltered information between the sample correlation matrix and the
ļ¬ltered one [TLM07b],
Cophenetic correlation is used between the original correlation
distances and the hierarchical cluster representation [PS15],
Several measures between successive (in time) clusters, dendrograms,
networks are used to estimate stability of the methods, e.g. cophenetic
correlation between dendrograms in [PLJ76], adjusted Rand index
(ARI) between clusters in [MVDN15], mutual information (MI) of link
co-occurrence between networks in [STZM11].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 27 / 64
28. Contributions on other methodological aspects (contād)
Preprocessing of the time series:
Subtract the market mode before performing a cluster or network
analysis on the returns [BMM07],
Encode both rank statistics and a distribution histogram of the
returns into a representative vector [DMV16],
Fit an ARMA(p,q)-FIEGARCH(1,d,1)-cDCC process (econometric
preprocessing) to obtain dynamic correlations instead of the common
approach of rolling window Pearson correlations [ST14],
Use a clustering of successive correlation matrices to infer a market
state [PS15].
Use of other types of networks: threshold networks [OKK04],
inļ¬uence networks [GZC15], partial-correlation networks
[KTM+10, KPGGBJ12], Granger causality networks
[BGLP12, VLB15], cointegration-based networks [Tu14], bipartite
networks [TML+11], etc.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 28 / 64
29. Consistency and empirical convergence rates [MAND16]
Model selection: The faster the (empirical) convergence, the better.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 29 / 64
30. Statistical & practical stability
One can use bootstrap, block bootstrap or other common sense and
practical perturbations of the data as presented in [MVDN15].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 30 / 64
31. Eļ¬ect of a basic preprocessing: Subtract the market mode
Visualization of the Planar Maximally Filtered Graph (PMFG) and DBHT clusters,
for both non-detrended (left) and detrended (right) log-returns [MADM15].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 31 / 64
33. Examples of other ļ¬nancial networks
supply chain networks [Wu15]
investor (security holdings and trading behaviour) networks [BKES18]
corporate board and director networks [BC04]
international trade networks [BFG10]
transaction networks [LL18]
sovereign debt (quarterly public debt-to-GDP ratio) networks [MO15]
interbank (exposures between banks) networks [SVLG13]
These networks are built from alternative data which are often:
conļ¬dential
hard or costly to obtain
Most often these studies are done in collaboration with a commercial or
regulatory organization. Some of these datasets may contain signiļ¬cant
alphas, and thus results are not publicly advertised: Papers are relatively
few in contrast to the ones on the correlation of asset returns which are
more oriented toward risk understanding.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 33 / 64
34. Section 4
Dynamics of networks
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 34 / 64
35. Studying the dynamics of networks
Comparing and ļ¬nding the diļ¬erences in a sequence of large graphs is a
computationally diļ¬cult problem. In the literature, one often studies the
following statistics:
(for networks) the normalized tree length [OCK+03], the mean
occupation layer [OCK+03], the tree half-life [OCK+03], a survival
ratio of the edges [OCKK02, JMS+05, ST14], node degree, strength
[ST14], eigenvector, betweenness, closeness centrality [ST14], the
agglomerative coeļ¬cient [MO15]
(for clusters) the merging, splitting, birth, death, contraction, and
growth of the clusters in time [PS15]
Remark. To the best of my knowledge, graph embedding into vector spaces (cf.
the recent Deep Learning literature, or this survey [GF18]) have not been used to
study time series of ļ¬nancial networks. Such a vector representation would open
the ļ¬eld to the toolbox of standard machine learning algorithms: Cluster networks
and ļ¬nd those which are associated to some events (e.g. a crisis); Predict the
future networks in a sequence of networks with a LSTM (stat arb?); Detect a
structural break, etc.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 35 / 64
37. Portfolio optimization
[OCK+
03] ļ¬nds that the Markowitz portfolio layer in the MST is higher
than the mean layer at all times.
As the stocks of the minimum risk portfolio are found on the outskirts
of the tree [PDMA13, OCK+
03], authors expect larger trees to have
greater diversiļ¬cation potential.
In [TLGM08, PLJ76], authors compare the Markowitz portfolios from
the ļ¬ltered empirical correlation matrices using the clustering approach,
the RMT approach and the shrinkage approach.
[RLL+
16, PZ16] propose to invest in diļ¬erent part of the MST
depending on the estimated market conditions.
Authors show that there is no inner-mathematical relationship between
the minimum variance portfolio from Markowitz theory and the
portfolios designed from the minimum spanning tree [HMM18].
Empirical evidence of such relations found by previous studies is
essentially a stylized fact of ļ¬nancial returns correlations and time
series, not a general property of correlation matrices.
[DFPW15] introduces a procedure to design portfolios which are
diversiļ¬ed in their tail behavior by selecting only a single asset in each
cluster.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 37 / 64
38. Trading strategy
Earnings per share forecasts prepared on the basis of statistically
grouped data (clusters) outperform forecasts made on data grouped on
traditional industrial criteria as well as forecasts prepared by mechanical
extrapolation techniques [EG71].
One can build a simple mean-reversion statistical arbitrage strategy
whereby one assumes that stocks in a given industry move together,
cross-sectionally demeans stock returns within said industry, shorts
stocks with positive residual returns and goes long stocks with negative
residual returns [KY16].
In [PS15], they suggest that tracking the merging, splitting, birth, and
death of the clusters in time could be the basis for pairs-like reversal
trading strategies but with pairs corresponding to clusters.
The paper [DC05] describes methods for index tracking and enhanced
index tracking based on clusters of ļ¬nancial time series.
[MADM16] ļ¬nds the existence of signiļ¬cant relations between past
changes in the market correlation structure and future changes in the
market volatility.
In [KLT12], authors claim that long-short strategies exploiting
mispricing due to the industry categorization bias generate statistically
signiļ¬cant and economically sizable risk-adjusted excess returns.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 38 / 64
39. Risk
In [DPT14], authors design clusters that tend to be comonotonic in
their extreme low values: To avoid contagion in the portfolio during
risky scenarios, an investor should diversify over these clusters.
In [MDMA14], authors postulate the existence of a hierarchical
structure of risks which can be deemed responsible for both stock
multivariate dependency structure and univariate multifractal
behaviour, and then propose a model that reproduces the empirical
observations (entanglement of univariate multi-scaling and multivariate
cross-correlation properties of ļ¬nancial time series). The interplay
between multi-scaling and average cross-correlation is conļ¬rmed in
[BMDM18].
Clusters (statistical industry classiļ¬cation) can be an alternative to
sometimes unavailable āfundamentalā industry classiļ¬cations (e.g. in
emerging or small markets) [KY16].
[HZYU16] ļ¬nds that ļ¬nancial institutions which have, in the
correlation networks, greater node strength, larger node betweenness
centrality, larger node closeness centrality and larger node clustering
coeļ¬cient tend to be associated with larger systemic risk contributions.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 39 / 64
40. Financial policy making
Clusters and networks can help designing ļ¬nancial policies. Several
papers propose to leverage them to detect risky market environments,
develop indicators that can predict forthcoming crisis or economic
recovery [ZLW+
11], improve economic nowcasting [EFC17], or ļ¬nd key
markets and assets that drive a whole region, and on which stimulus
can be applied eļ¬ectively.
Authors of [HSBYBY10] claim that āseparation prevents failure
propagation and connections increase risks of global crisesā whereas the
prevailing view in favor of deregulation is that banks, by investing in
diverse sectors, would have greater stability. To support their argument,
using ļ¬nancial networks, they study the aftermath of the Glass-Steagall
Act (1933) repeal by Clinton administration in 1999. They ļ¬nd that
erosion of the GlassāSteagall Act, and cross sector investments
eliminated āļ¬rewallsā that could have prevented the housing sector
decline from triggering a wider ļ¬nancial and economic crisis:
Our analysis implies that the investment across economic
sectors itself creates increased cross-linking of otherwise
much more weakly coupled parts of the economy, causing
dependencies that increase, rather than decrease, risk.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 40 / 64
41. Section 6
Opinionated views on research directions
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 41 / 64
42. Opinionated views on research directions
Whatās missing for āļ¬nancial networksā to become a mature research ļ¬eld?
Some inspiration from the booming deep learning era:
lack of reproducibility
provide code and data (at least synthetic datasets)
diļ¬culty to compare methods, re-implementation bias
build open source libraries (standardized api, optimized code)
open source software helps to engage more with practitioners
conļ¬dential data
provide synthetic datasets encoding stylized facts
propose generative models (cf. the GAN literature applied to graphs)
lack of evaluation metrics / no end-to-end approach
deļ¬ne common tasks (e.g. evaluate the clustering or network
methodology on portfolio optimization, crisis detection, mean reversion
strategy) where all the details are speciļ¬ed (e.g. a well-chosen artiļ¬cial
dataset, or samples from a generative model, or public ļ¬nancial data)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 42 / 64
43. Thank you for the attention. Questions?
Co-authorship network (left) and its MST (right)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 43 / 64
44. References I
Tomaso Aste, Tiziana Di Matteo, and ST Hyde, Complex networks on
hyperbolic surfaces, Physica A: Statistical Mechanics and its
Applications 346 (2005), no. 1, 20ā26.
Eike Christian Brechmann et al., Hierarchical kendall copulas and the
modeling of systemic and operational risk, Ph.D. thesis,
UniversitĀØatsbibliothek der TU MĀØunchen, 2013.
Stefano Battiston and Michele Catanzaro, Statistical properties of
corporate board and director networks, The European Physical Journal
B 38 (2004), no. 2, 345ā352.
Matteo Barigozzi, Giorgio Fagiolo, and Diego Garlaschelli,
Multinetwork of international trade: A commodity-speciļ¬c analysis,
Physical Review E 81 (2010), no. 4, 046104.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 44 / 64
45. References II
Monica Billio, Mila Getmansky, Andrew W Lo, and Loriana Pelizzon,
Econometric measures of connectedness and systemic risk in the
ļ¬nance and insurance sectors, Journal of Financial Economics 104
(2012), no. 3, 535ā559.
Kestutis Baltakys, Juho Kanniainen, and Frank Emmert-Streib,
Multilayer aggregation with statistical validation: Application to
investor networks, Scientiļ¬c reports 8 (2018), no. 1, 8198.
RJ Buonocore, RN Mantegna, and T Di Matteo, On the interplay
between multiscaling and average cross-correlation, arXiv preprint
arXiv:1802.01113 (2018).
Christian Borghesi, Matteo Marsili, and Salvatore Miccich`e,
Emergence of time-horizon invariant correlation structure in ļ¬nancial
returns by subtraction of the market mode, Physical Review E 76
(2007), no. 2, 026104.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 45 / 64
46. References III
Eduard Baitinger and Jochen Papenbrock, Interconnectedness risk and
active portfolio management: The information-theoretic perspective.
AQ Barbi and GA Prataviera, Nonlinear dependencies on brazilian
equity network from mutual information minimum spanning trees,
arXiv preprint arXiv:1711.06185 (2017).
Juan Gabriel Brida and Wiston AdriĀ“an Risso, Multidimensional
minimal spanning tree: The dow jones case, Physica A: Statistical
Mechanics and its Applications 387 (2008), no. 21, 5205ā5210.
Gunnar Carlsson and Facundo MĖAĖSmoli, Characterization, stability
and convergence of hierarchical clustering methods, Journal of
machine learning research 11 (2010), no. Apr, 1425ā1470.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 46 / 64
47. References IV
Christian Dose and Silvano Cincotti, Clustering of ļ¬nancial time series
with application to index and enhanced index tracking portfolio,
Physica A: Statistical Mechanics and its Applications 355 (2005),
no. 1, 145ā151.
Fabrizio Durante, Enrico Foscolo, Roberta Pappad`a, and Hao Wang,
A portfolio diversiļ¬cation strategy via tail dependence measures.
Philippe Donnat, Gautier Marti, and Philippe Very, Toward a generic
representation of random variables for machine learning, Pattern
Recognition Letters 70 (2016), 24ā31.
Fabrizio Durante and Roberta Pappada, Cluster analysis of time series
via kendall distribution, Strengthening Links Between Data Analysis
and Soft Computing, Springer, 2015, pp. 209ā216.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 47 / 64
48. References V
Fabrizio Durante, Roberta Pappad`a, and Nicola Torelli, Clustering of
ļ¬nancial time series in risky scenarios, Advances in Data Analysis and
Classiļ¬cation 8 (2014), no. 4, 359ā376.
Mohammed Elshendy and Andrea Fronzetti Colladon, Big data
analysis of economic news: Hints to forecast macroeconomic
indicators, International Journal of Engineering Business Management
9 (2017), 1847979017720040.
Edwin J Elton and Martin J Gruber, Improved forecasting through the
design of homogeneous groups, The Journal of Business 44 (1971),
no. 4, 432ā450.
Pawel Fiedor, Information-theoretic approach to lead-lag eļ¬ect on
ļ¬nancial markets, The European Physical Journal B 87 (2014), no. 8,
1ā9.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 48 / 64
49. References VI
, Networks in ļ¬nancial markets based on the mutual
information rate, Physical Review E 89 (2014), no. 5, 052801.
Palash Goyal and Emilio Ferrara, Graph embedding techniques,
applications, and performance: A survey, Knowledge-Based Systems
151 (2018), 78ā94.
Yong Kheng Goh, Haslifah M Hasim, and Chris G Antonopoulos,
Inference of ļ¬nancial networks using the normalised mutual
information rate, PloS one 13 (2018), no. 2, e0192160.
Lorenzo Giada and Matteo Marsili, Data clustering and noise
undressing of correlation matrices, Physical Review E 63 (2001),
no. 6, 061101.
, Algorithms of maximum likelihood data clustering with
applications, Physica A: Statistical Mechanics and its Applications
315 (2002), no. 3, 650ā664.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 49 / 64
50. References VII
Ya-Chun Gao, Yong Zeng, and Shi-Min Cai, Inļ¬uence network in the
Chinese stock market, Journal of Statistical Mechanics: Theory and
Experiment 2015 (2015), no. 3, P03017.
Xue Guo, Hu Zhang, and Tianhai Tian, Development of stock
correlation networks using mutual information and ļ¬nancial big data,
PloS one 13 (2018), no. 4, e0195941.
David Hartman and Jaroslav Hlinka, Nonlinearity in stock networks,
arXiv preprint arXiv:1804.10264 (2018).
Amelie HĀØuttner, Jan-Frederik Mai, and Stefano Mineo, Portfolio
selection based on graphs: Does it align with markowitz-optimal
portfolios?, Dependence Modeling (2018).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 50 / 64
51. References VIII
Dion Harmon, Blake Stacey, Yavni Bar-Yam, and Yaneer Bar-Yam,
Networks of economic market interdependence and systemic risk,
arXiv preprint arXiv:1011.3707 (2010).
Wei-Qiang Huang, Xin-Tian Zhuang, Shuang Yao, and Stan Uryasev,
A ļ¬nancial network perspective of ļ¬nancial institutionsā systemic risk
contributions, Physica A: Statistical Mechanics and its Applications
456 (2016), 183ā196.
Neil F Johnson, Mark McDonald, Omer Suleman, Stacy Williams, and
Sam Howison, What shakes the FX tree? understanding currency
dominance, dependence, and dynamics (keynote address), SPIE Third
International Symposium on Fluctuations and Noise, International
Society for Optics and Photonics, 2005, pp. 86ā99.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 51 / 64
52. References IX
Anton Kocheturov, Mikhail Batsyn, and Panos M Pardalos, Dynamics
of cluster structures in a ļ¬nancial market network, Physica A:
Statistical Mechanics and its Applications 413 (2014), 523ā533.
L Kullmann, J Kertesz, and RN Mantegna, Identiļ¬cation of clusters of
companies in stock indices via potts super-paramagnetic transitions,
Physica A: Statistical Mechanics and its Applications 287 (2000),
no. 3, 412ā419.
Philipp KrĀØuger, Augustin Landier, and David Thesmar, Categorization
bias in the stock market, Available SSRN 2034204 (2012).
Dror Y Kenett, Tobias Preis, Gitit Gur-Gershgoren, and Eshel
Ben-Jacob, Dependency network and node inļ¬uence: application to
the study of ļ¬nancial markets, International Journal of Bifurcation and
Chaos 22 (2012), no. 07, 1250181.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 52 / 64
53. References X
Dror Y Kenett, Yoash Shapira, Asaf Madi, Sharron Bransburg-Zabary,
Gitit Gur-Gershgoren, and Eshel Ben-Jacob, Dynamics of stock market
correlations, AUCO Czech Economic Review 4 (2010), no. 3, 330ā341.
Dror Y Kenett, Michele Tumminello, Asaf Madi, Gitit Gur-Gershgoren,
Rosario N Mantegna, and Eshel Ben-Jacob, Dominating clasp of the
ļ¬nancial sector revealed by partial correlation analysis of the stock
market, PloS one 5 (2010), no. 12, e15032.
Zura Kakushadze and Willie Yu, Statistical industry classiļ¬cation.
Gan Siew Lee and Maman A Djauhari, Multidimensional stock
network analysis: An Escouļ¬erās RV coeļ¬cient approach, AIP
Conference Proceedings, vol. 1, 2013, pp. 550ā555.
Elisa Letizia and Fabrizio Lillo, Corporate payments networks and
credit risk rating.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 53 / 64
54. References XI
Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong, and
Mark Flood, Clustering techniques and their eļ¬ect on portfolio
formation and risk analysis, Proceedings of the International
Workshop on Data Science for Macro-Modeling, ACM, 2014, pp. 1ā6.
NicolĀ“o Musmeci, Tomaso Aste, and Tiziana Di Matteo, Relation
between ļ¬nancial market structure and the real economy: comparison
between clustering methods, PloS one 10 (2015), no. 3, e0116201.
NicolĀ“o Musmeci, Tomaso Aste, and T Di Matteo, Interplay between
past market correlation structure changes and future volatility
outbursts, Scientiļ¬c reports 6 (2016).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 54 / 64
55. References XII
Gautier Marti, SĀ“ebastien Andler, Frank Nielsen, and Philippe Donnat,
Clustering ļ¬nancial time series: How long is enough?, Proceedings of
the Twenty-Fifth International Joint Conference on Artiļ¬cial
Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, 2016,
pp. 2583ā2589.
Raļ¬aello Morales, T Di Matteo, and Tomaso Aste, Dependency
structure and scaling properties of ļ¬nancial time series are related,
Scientiļ¬c Reports 4 (2014), no. 4589.
Guido Previde Massara, Tiziana Di Matteo, and Tomaso Aste,
Network ļ¬ltering for big data: triangulated maximally ļ¬ltered graph,
Journal of complex Networks 5 (2016), no. 2, 161ā178.
Mel MacMahon and Diego Garlaschelli, Community detection for
correlation matrices, Phys. Rev. X 5 (2015), 021006.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 55 / 64
56. References XIII
Federico Musciotto, Luca Marotta, Salvatore Miccich`e, and Rosario N
Mantegna, Bootstrap validation of links of a minimum spanning tree,
arXiv preprint arXiv:1802.03395 (2018).
Gautier Marti, Frank Nielsen, and Philippe Donnat, Optimal copula
transport for clustering multivariate time series, 2016 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), IEEE, 2016, pp. 2379ā2383.
David Matesanz and Guillermo J Ortega, Sovereign public debt crisis
in europe. a network analysis, Physica A: Statistical Mechanics and its
Applications 436 (2015), 756ā766.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 56 / 64
57. References XIV
Gautier Marti, Philippe Very, Philippe Donnat, and Frank Nielsen, A
proposal of a methodological framework with experimental guidelines
to investigate clustering stability on ļ¬nancial time series, 14th IEEE
International Conference on Machine Learning and Applications,
ICMLA 2015, Miami, FL, USA, December 9-11, 2015, 2015,
pp. 32ā37.
J-P Onnela, Anirban Chakraborti, Kimmo Kaski, Janos Kertesz, and
Antti Kanto, Dynamics of market correlations: Taxonomy and
portfolio analysis, Physical Review E 68 (2003), no. 5, 056110.
J-P Onnela, A Chakraborti, K Kaski, and J KertiĀ“esz, Dynamic asset
trees and portfolio analysis, The European Physical Journal
B-Condensed Matter and Complex Systems 30 (2002), no. 3,
285ā288.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 57 / 64
58. References XV
J-P Onnela, Kimmo Kaski, and Janos KertĀ“esz, Clustering and
information in correlation based ļ¬nancial networks, The European
Physical Journal B-Condensed Matter and Complex Systems 38
(2004), no. 2, 353ā362.
Francesco Pozzi, Tiziana Di Matteo, and Tomaso Aste, Spread of risk
across ļ¬nancial markets: better to invest in the peripheries, Scientiļ¬c
reports 3 (2013).
Vasiliki Plerou, P Gopikrishnan, Bernd Rosenow, LA Nunes Amaral,
and H Eugene Stanley, A random matrix theory approach to ļ¬nancial
cross-correlations, Physica A: Statistical Mechanics and its
Applications 287 (2000), no. 3, 374ā382.
Don B Panton, V Parker Lessig, and O Maurice Joy, Comovement of
international equity markets: a taxonomic approach, Journal of
Financial and Quantitative Analysis 11 (1976), no. 03, 415ā432.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 58 / 64
59. References XVI
Jochen Papenbrock and Peter Schwendner, Handling risk-on/risk-oļ¬
dynamics with correlation regimes and correlation networks, Financial
Markets and Portfolio Management 29 (2015), no. 2, 125ā147.
Gustavo Peralta and Abalfazl Zareei, A network approach to portfolio
selection, Journal of Empirical Finance (2016).
Fei Ren, Ya-Nan Lu, Sai-Ping Li, Xiong-Fei Jiang, Li-Xin Zhong, and
Tian Qiu, Dynamic portfolio strategy using clustering approach, arXiv
preprint arXiv:1608.03058 (2016).
Jacopo Rocchi, Enoch Yan Lok Tsui, and David Saad, Emerging
interdependence between stock values during ļ¬nancial crashes, arXiv
preprint arXiv:1611.02549 (2016).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 59 / 64
60. References XVII
Won-Min Song, Tiziana Di Matteo, and Tomaso Aste, Nested
hierarchies in planar graphs, Discrete Applied Mathematics 159
(2011), no. 17, 2135ā2146.
Won-Min Song, T Di Matteo, and Tomaso Aste, Hierarchical
information clustering by means of topologically embedded graphs,
PLoS One 7 (2012), no. 3, e31929.
Ahmet Sensoy and Benjamin M Tabak, Dynamic spanning trees in
stock market networks: The case of Asia-Paciļ¬c, Physica A:
Statistical Mechanics and its Applications 414 (2014), 387ā402.
Dong-Ming Song, Michele Tumminello, Wei-Xing Zhou, and
Rosario N Mantegna, Evolution of worldwide stock markets,
correlation structure, and correlation-based graphs, Physical Review E
84 (2011), no. 2, 026108.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 60 / 64
61. References XVIII
Tiziano Squartini, Iman Van Lelyveld, and Diego Garlaschelli,
Early-warning signals of topological collapse in interbank networks,
Scientiļ¬c reports 3 (2013).
Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and Rosario N
Mantegna, A tool for ļ¬ltering information in complex systems,
Proceedings of the National Academy of Sciences of the United States
of America 102 (2005), no. 30, 10421ā10426.
Michele Tumminello, Claudia Coronnello, Fabrizio Lillo, Salvatore
Micciche, and Rosario N Mantegna, Spanning trees and bootstrap
reliability estimation in correlation-based networks, International
Journal of Bifurcation and Chaos 17 (2007), no. 07, 2319ā2329.
Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario N
Mantegna, Cluster analysis for portfolio optimization, Journal of
Economic Dynamics and Control 32 (2008), no. 1, 235ā258.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 61 / 64
62. References XIX
Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna,
Hierarchically nested factor model from multivariate data, EPL
(Europhysics Letters) 78 (2007), no. 3, 30006.
, Kullback-leibler distance as a measure of the information
ļ¬ltered from multivariate data, Physical Review E 76 (2007), no. 3,
031123.
, Correlation, hierarchies, and networks in ļ¬nancial markets,
Journal of Economic Behavior & Organization 75 (2010), no. 1,
40ā58.
Michele Tumminello, Salvatore Miccich`e, Fabrizio Lillo, Jyrki Piilo,
and Rosario N Mantegna, Statistically validated networks in bipartite
complex systems, PloS one 6 (2011), no. 3, e17994.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 62 / 64
63. References XX
Chengyi Tu, Cointegration-based ļ¬nancial networks study in chinese
stock market, Physica A: Statistical Mechanics and its Applications
402 (2014), 245ā254.
TomĀ“aĖs V`yrost, ĖStefan LyĀ“ocsa, and Eduard BaumĀØohl, Granger causality
stock market networks: Temporal proximity and preferential
attachment, Physica A: Statistical Mechanics and its Applications 427
(2015), 262ā276.
Liuren Wu, Centrality of the supply chain network.
Yiting Zhang, Gladys Hui Ting Lee, Jian Cheng Wong, Jun Liang
Kok, Manamohan Prusty, and Siew Ann Cheong, Will the us economy
recover in 2010? a minimal spanning tree study, Physica A: Statistical
Mechanics and its Applications 390 (2011), no. 11, 2020ā2050.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 63 / 64
64. References XXI
Xin Zhang, Boris Podobnik, Dror Y Kenett, and H Eugene Stanley,
Systemic risk and causality dynamics of the world international
shipping market, Physica A: Statistical Mechanics and its Applications
415 (2014), 43ā53.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 64 / 64