SlideShare a Scribd company logo
A review of two decades of correlations, hierarchies,
networks and clustering in financial markets
Ton Duc Thang University, Ho Chi Minh City, Vietnam
Gautier Marti, Frank Nielsen, Mikolaj Bi´nkowski, Philippe Donnat
Ecole Polytechnique, Imperial College London, Hellebore Capital Ltd.
10 August 2018
HELLEBORECAPITAL
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 1 / 64
Table of contents
1 Introduction
2 Correlation networks
The standard and widely adopted methodology
Concerns about the standard methodology
Contributions for improving the methodology
On algorithms
On distances
On other methodological aspects
3 Other networks
4 Dynamics of networks
5 Applications
6 Opinionated views on research directions
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 2 / 64
Section 1
Introduction
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 3 / 64
Introduction
Motivation: A better understanding of financial markets using a scientific
approach.
Empirical studies are using data to verify hypotheses and discover stylized
facts. Example of datasets:
price, volume, returns, turnover time series
supply chain networks
market (OTC, exchange) transaction data
retail transactional data (credit cards)
corporate payments networks
international trade (import/export) networks,
...
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 4 / 64
Introduction
Several research fields are tackling the problem with their own tools:
statistical physics, econophyics:
Minimum Spanning Tree (MST)
Random Matrix Theory (RMT)
linear correlations
statistics, data mining, machine learning:
graph theory
communities detection
clustering algorithms
non-linear dependence
alternative distances
statistical significance and robustness check via bootstrapping
economics, finance, accounting, behavioural finance:
standard industry and fundamental classifications vs. statistical and
text-based classifications
networks of trades, suppliers, consumers, competitors, investors
linear regressions on network statistics, statistical significance through
t-stats
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 5 / 64
Section 2
Correlation networks
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 6 / 64
The standard and widely adopted methodology
(Mantegna, 1999) [add the proper biblio ref]
Let N be the number of assets.
Let Pi (t) be the price at time t of asset i, 1 ≤ i ≤ N.
Let ri (t) be the log-return at time t of asset i:
ri (t) = log Pi (t) − log Pi (t − 1).
For each pair i, j of assets, compute their correlation:
ρij =
ri rj − ri rj
r2
i − ri
2 r2
j − rj
2
.
Convert the correlation coefficients ρij into distances:
dij = 2(1 − ρij ).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 7 / 64
The standard and widely adopted methodology
From all the distances dij , compute a minimum spanning tree (MST)
using, for example, Algorithm 1:
Algorithm 1 Kruskal’s algorithm
1: procedure BuildMST({dij }1≤i,j≤N)
2: Start with a fully disconnected graph G = (V , E)
3: E ← ∅
4: V ← {i}1≤i≤N
5: Try to add edges by increasing distances
6: for (i, j) ∈ V 2 ordered by increasing dij do
7: Verify that i and j are not already connected by a path
8: if not connected(i, j) then
9: Add the edge (i, j) to connect i and j
10: E ← E ∪ {(i, j)}
11: G is the resulting MST return G = (V , E)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 8 / 64
The standard and widely adopted methodology
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 9 / 64
Concerns about the standard methodology
The clusters obtained from the MST (or equivalently, the Single
Linkage Clustering Algorithm (SLCA)) are known to be unstable
(small perturbations of the input data may cause big differences in
the resulting clusters) [MVDN15].
The clustering instability may be partly due to the algorithm
(MST/Single Linkage are known for the chaining phenomenon
[CM10]).
The clustering instability may be partly due to the correlation
coefficient (Pearson linear correlation) defining the distance which
is known for being brittle to outliers, and, more generally, not well
suited to distributions other than the Gaussian ones [DMV16].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 10 / 64
Single Linkage chaining problem...
makes it brittle to small perturbations in the input distances.
Clusters and hierarchies are skewed: It does not take into account some
notion of density.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 11 / 64
Pearson linear correlation...
is too sensitive to outliers.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 12 / 64
Concerns about the standard methodology
Theoretical results providing the statistical reliability of hierarchical
trees and correlation-based networks are still not available [TLM10].
One might expect that the higher the correlation associated to a link
in a correlation-based network is, the higher the reliability of this link
is. In [TCL+07], authors show that this is not always observed
empirically.
Changes affecting specific links (and clusters) during prominent crises
are of difficult interpretation due to the high level of statistical
uncertainty associated with the correlation estimation [STZM11].
The standard method is somewhat arbitrary: A change in the
method (e.g. using a different clustering algorithm or a different
correlation coefficient) may yield a huge change in the clustering
results [LRW+14, MVDN15]. As a consequence, it implies huge
variability in portfolio formation and perceived risk [LRW+14].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 13 / 64
Variance of the Pearson correlation estimator
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 14 / 64
CRLB of the Pearson correlation estimator - Proof
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 15 / 64
Random Matrix Theory & Empirical correlation matrices
Let X be the matrix storing the standardized returns of N = 560 assets
(credit default swaps) over a period of T = 2500 trading days.
Then, the empirical correlation matrix of the returns is
C =
1
T
XX .
We can compute the empirical density of its eigenvalues
ρ(λ) =
1
N
dn(λ)
dλ
,
where n(λ) counts the number of eigenvalues of C less than λ.
From random matrix theory, the Marchenko-Pastur distribution gives
the limit distribution as N → ∞, T → ∞ and T/N fixed. It reads:
ρ(λ) =
T/N
2π
(λmax − λ)(λ − λmin)
λ
,
where λmax
min = 1 + N/T ± 2 N/T, and λ ∈ [λmin, λmax].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 16 / 64
Random Matrix Theory & Empirical correlation matrices
Notice that the Marchenko-Pastur density fits well the empirical density
meaning that most of the information contained in the empirical
correlation matrix amounts to noise: only 26 eigenvalues are greater than
λmax. The highest eigenvalue corresponds to the ‘market’, the 25 others
can be associated to ‘industrial sectors’.
It is a known stylized fact of empirical correlation matrices between
financial returns: Only ≈ 5% of their eigenvalues are greater than λmax.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 17 / 64
A somewhat arbitrary choice of methodology
The standard method is somewhat arbitrary. Adopting another one may
yield strongly different results. Which ones to trust? Are they both useful?
Clusters obtained are much different from one method to another
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 18 / 64
Contributions on algorithms
Several alternative algorithms have been proposed to replace the minimum
spanning tree and its corresponding clusters:
Average Linkage Minimum Spanning Tree (ALMST) [TCL+07];
Authors introduce a spanning tree associated to the Average Linkage
Clustering Algorithm (ALCA); It is designed to remedy the unwanted
chaining phenomenon of MST/SLCA.
Planar Maximally Filtered Graph (PMFG) [ADMH05, TADMM05]
which strictly contains the Minimum Spanning Tree (MST) but
encodes a larger amount of information in its internal structure.
Directed Bubble Hierarchal Tree (DBHT) [SDMA11, SDMA12]
which is designed to extract, without parameters, the deterministic
clusters from the PMFG.
Triangulated Maximally Filtered Graph (TMFG) [MDMA16];
Authors introduce another filtered graph more suitable for big
datasets.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 19 / 64
Contributions on algorithms (cont’d)
Clustering using Potts super-paramagnetic transitions [KKM00];
When anti-correlations occur, the model creates repulsion between
the stocks which modify their clustering structure.
Clustering using maximum likelihood [GM01, GM02]; Authors
define the likelihood of a clustering based on a simple 1-factor model,
then devise parameter-free methods to find a clustering with high
likelihood.
Clustering using Random Matrix Theory (RMT) [PGR+00];
Eigenvalues help to determine the number of clusters, and
eigenvectors their composition.
[MG15] proposes network-based community detection methods whose
null hypothesis is consistent with RMT results on cross-correlation
matrices for financial time series data, unlike existing community
detection algorithms.
Clustering using the p-median problem [KBP14]; With this
construction, every cluster is a star, i.e. a tree with one central node.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 20 / 64
Planar Maximally Filtered Graph (PMFG)
The PMFG is a compelling alternative to the MST.
PMFG nodes are colored according to the clusters obtained from DBHT
Implementation of the PMFG in Python: https:
//gmarti.gitlab.io/networks/2018/06/03/pmfg-algorithm.html
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 21 / 64
Contributions on distances
At the heart of clustering algorithms is the fundamental notion of distance
that can be defined upon a proper representation of data. It is thus an
obvious direction to explore. We list below what has been proposed in the
literature so far:
Distances that try to quantify how one financial instrument provides
information about another instrument:
Distance using Granger causality [BGLP12],
Distance using partial correlation [KTM+
10],
Study of asynchronous, lead-lag relationships by using mutual
information instead of Pearson’s correlation coefficient
[Fie14a, RTS16],
The correlation matrix is normalized using the affinity transformation:
the correlation between each pair of stocks is normalized according to
the correlations of each of the two stocks with all other stocks
[KSM+
10].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 22 / 64
Contributions on distances (cont’d)
Distances that aim at including non-linear relationships in the
analysis:
Distances using mutual information, mutual information rate, and
other information-theoretic distances
[Fie14b, RTS16, BP17a, BP17b, GHA18, GZT18],
The Brownian distance [ZPKS14],
Copula-based [MND16, DP15, B+
13] and tail dependence
[DFPW15] distances.
Distances that aim at taking into account multivariate dependence:
Each stock is represented by a bivariate time series: its returns and
traded volumes [BR08]; a distance is then applied to an ad hoc
transform of the two time series into a symbolic sequence,
Each stock is represented by a multivariate time series, for example the
daily (high, low, open, close) [LD13]; Authors use the Escoufier’s RV
coefficient (a multivariate extension of the Pearson’s correlation
coefficient).
A distance taking into account both the correlation between returns
and their distributions [DMV16].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 23 / 64
Contributions on distances (cont’d)
Unlike recent studies which claim that the existence of nonlinear
dependence between stock returns have effects on network
characteristics, [HH18] documents that “most of the apparent
nonlinearity is due to univariate non-Gaussianity. Further, strong
non-stationarity in a few specific stocks may play a role. In particular,
the sharp decrease of some stocks during the global financial crisis in
2008” gives rise to apparent negative tail dependence among stocks.
When constructing unweighted stock networks, they suggest to use
linear correlation “on marginally normalized data”, that is Spearman’s
rank correlation. In fact, this is similar to the idea of splitting apart
the dependence information from the distribution one as in [DMV16],
where Spearman’s rank correlation stems from using a Euclidean
distance between the uniform margins of the underlying bivariate
copula. Following previous studies, and unlike in [DMV16], the
distribution information is discarded when constructing the network.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 24 / 64
Dependence and marginal distribution of the returns
Theorem (Sklar’s theorem, 1959)
For any random vector X = (X1, . . . , XN) having continuous marginal
cumulative distribution functions Fi , its joint cumulative distribution F is
uniquely expressed as
F(X1, . . . , XN) = C(F1(X1), . . . , FN(XN)),
where C, the multivariate distribution of uniform marginals, is known as
the copula of X.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 25 / 64
Information-theoretic distances vs. Copula-based ones?
Copula entropy:
Hc(x) = −
u
c(u) log c(u)du
Mutual information:
I(x) =
x
p(x) log
p(x)
i pi (xi )
dx
=
x
c(ux )
i
pi (xi ) log c(ux )dx
=
u
c(ux ) log c(ux )dux
= −Hc(x)
Entropy:
H(x) =
i
H(xi ) + Hc(x)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 26 / 64
Contributions on other methodological aspects
Reliability and statistical uncertainty of the methods:
A bootstrap approach is used to estimate the statistical reliability of
both hierarchical trees [TLM07a, MAND16] and correlation-based
networks [TCL+
07, MMMM18],
Consistency proof of clustering algorithms for recovering clusters
defined by nested block correlation matrices; Study of empirical
convergence rates [MAND16],
Kullback-Leibler divergence is used to estimate the amount of
filtered information between the sample correlation matrix and the
filtered one [TLM07b],
Cophenetic correlation is used between the original correlation
distances and the hierarchical cluster representation [PS15],
Several measures between successive (in time) clusters, dendrograms,
networks are used to estimate stability of the methods, e.g. cophenetic
correlation between dendrograms in [PLJ76], adjusted Rand index
(ARI) between clusters in [MVDN15], mutual information (MI) of link
co-occurrence between networks in [STZM11].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 27 / 64
Contributions on other methodological aspects (cont’d)
Preprocessing of the time series:
Subtract the market mode before performing a cluster or network
analysis on the returns [BMM07],
Encode both rank statistics and a distribution histogram of the
returns into a representative vector [DMV16],
Fit an ARMA(p,q)-FIEGARCH(1,d,1)-cDCC process (econometric
preprocessing) to obtain dynamic correlations instead of the common
approach of rolling window Pearson correlations [ST14],
Use a clustering of successive correlation matrices to infer a market
state [PS15].
Use of other types of networks: threshold networks [OKK04],
influence networks [GZC15], partial-correlation networks
[KTM+10, KPGGBJ12], Granger causality networks
[BGLP12, VLB15], cointegration-based networks [Tu14], bipartite
networks [TML+11], etc.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 28 / 64
Consistency and empirical convergence rates [MAND16]
Model selection: The faster the (empirical) convergence, the better.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 29 / 64
Statistical & practical stability
One can use bootstrap, block bootstrap or other common sense and
practical perturbations of the data as presented in [MVDN15].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 30 / 64
Effect of a basic preprocessing: Subtract the market mode
Visualization of the Planar Maximally Filtered Graph (PMFG) and DBHT clusters,
for both non-detrended (left) and detrended (right) log-returns [MADM15].
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 31 / 64
Section 3
Other networks
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 32 / 64
Examples of other financial networks
supply chain networks [Wu15]
investor (security holdings and trading behaviour) networks [BKES18]
corporate board and director networks [BC04]
international trade networks [BFG10]
transaction networks [LL18]
sovereign debt (quarterly public debt-to-GDP ratio) networks [MO15]
interbank (exposures between banks) networks [SVLG13]
These networks are built from alternative data which are often:
confidential
hard or costly to obtain
Most often these studies are done in collaboration with a commercial or
regulatory organization. Some of these datasets may contain significant
alphas, and thus results are not publicly advertised: Papers are relatively
few in contrast to the ones on the correlation of asset returns which are
more oriented toward risk understanding.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 33 / 64
Section 4
Dynamics of networks
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 34 / 64
Studying the dynamics of networks
Comparing and finding the differences in a sequence of large graphs is a
computationally difficult problem. In the literature, one often studies the
following statistics:
(for networks) the normalized tree length [OCK+03], the mean
occupation layer [OCK+03], the tree half-life [OCK+03], a survival
ratio of the edges [OCKK02, JMS+05, ST14], node degree, strength
[ST14], eigenvector, betweenness, closeness centrality [ST14], the
agglomerative coefficient [MO15]
(for clusters) the merging, splitting, birth, death, contraction, and
growth of the clusters in time [PS15]
Remark. To the best of my knowledge, graph embedding into vector spaces (cf.
the recent Deep Learning literature, or this survey [GF18]) have not been used to
study time series of financial networks. Such a vector representation would open
the field to the toolbox of standard machine learning algorithms: Cluster networks
and find those which are associated to some events (e.g. a crisis); Predict the
future networks in a sequence of networks with a LSTM (stat arb?); Detect a
structural break, etc.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 35 / 64
Section 5
Applications
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 36 / 64
Portfolio optimization
[OCK+
03] finds that the Markowitz portfolio layer in the MST is higher
than the mean layer at all times.
As the stocks of the minimum risk portfolio are found on the outskirts
of the tree [PDMA13, OCK+
03], authors expect larger trees to have
greater diversification potential.
In [TLGM08, PLJ76], authors compare the Markowitz portfolios from
the filtered empirical correlation matrices using the clustering approach,
the RMT approach and the shrinkage approach.
[RLL+
16, PZ16] propose to invest in different part of the MST
depending on the estimated market conditions.
Authors show that there is no inner-mathematical relationship between
the minimum variance portfolio from Markowitz theory and the
portfolios designed from the minimum spanning tree [HMM18].
Empirical evidence of such relations found by previous studies is
essentially a stylized fact of financial returns correlations and time
series, not a general property of correlation matrices.
[DFPW15] introduces a procedure to design portfolios which are
diversified in their tail behavior by selecting only a single asset in each
cluster.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 37 / 64
Trading strategy
Earnings per share forecasts prepared on the basis of statistically
grouped data (clusters) outperform forecasts made on data grouped on
traditional industrial criteria as well as forecasts prepared by mechanical
extrapolation techniques [EG71].
One can build a simple mean-reversion statistical arbitrage strategy
whereby one assumes that stocks in a given industry move together,
cross-sectionally demeans stock returns within said industry, shorts
stocks with positive residual returns and goes long stocks with negative
residual returns [KY16].
In [PS15], they suggest that tracking the merging, splitting, birth, and
death of the clusters in time could be the basis for pairs-like reversal
trading strategies but with pairs corresponding to clusters.
The paper [DC05] describes methods for index tracking and enhanced
index tracking based on clusters of financial time series.
[MADM16] finds the existence of significant relations between past
changes in the market correlation structure and future changes in the
market volatility.
In [KLT12], authors claim that long-short strategies exploiting
mispricing due to the industry categorization bias generate statistically
significant and economically sizable risk-adjusted excess returns.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 38 / 64
Risk
In [DPT14], authors design clusters that tend to be comonotonic in
their extreme low values: To avoid contagion in the portfolio during
risky scenarios, an investor should diversify over these clusters.
In [MDMA14], authors postulate the existence of a hierarchical
structure of risks which can be deemed responsible for both stock
multivariate dependency structure and univariate multifractal
behaviour, and then propose a model that reproduces the empirical
observations (entanglement of univariate multi-scaling and multivariate
cross-correlation properties of financial time series). The interplay
between multi-scaling and average cross-correlation is confirmed in
[BMDM18].
Clusters (statistical industry classification) can be an alternative to
sometimes unavailable “fundamental” industry classifications (e.g. in
emerging or small markets) [KY16].
[HZYU16] finds that financial institutions which have, in the
correlation networks, greater node strength, larger node betweenness
centrality, larger node closeness centrality and larger node clustering
coefficient tend to be associated with larger systemic risk contributions.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 39 / 64
Financial policy making
Clusters and networks can help designing financial policies. Several
papers propose to leverage them to detect risky market environments,
develop indicators that can predict forthcoming crisis or economic
recovery [ZLW+
11], improve economic nowcasting [EFC17], or find key
markets and assets that drive a whole region, and on which stimulus
can be applied effectively.
Authors of [HSBYBY10] claim that “separation prevents failure
propagation and connections increase risks of global crises” whereas the
prevailing view in favor of deregulation is that banks, by investing in
diverse sectors, would have greater stability. To support their argument,
using financial networks, they study the aftermath of the Glass-Steagall
Act (1933) repeal by Clinton administration in 1999. They find that
erosion of the Glass–Steagall Act, and cross sector investments
eliminated “firewalls” that could have prevented the housing sector
decline from triggering a wider financial and economic crisis:
Our analysis implies that the investment across economic
sectors itself creates increased cross-linking of otherwise
much more weakly coupled parts of the economy, causing
dependencies that increase, rather than decrease, risk.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 40 / 64
Section 6
Opinionated views on research directions
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 41 / 64
Opinionated views on research directions
What’s missing for “financial networks” to become a mature research field?
Some inspiration from the booming deep learning era:
lack of reproducibility
provide code and data (at least synthetic datasets)
difficulty to compare methods, re-implementation bias
build open source libraries (standardized api, optimized code)
open source software helps to engage more with practitioners
confidential data
provide synthetic datasets encoding stylized facts
propose generative models (cf. the GAN literature applied to graphs)
lack of evaluation metrics / no end-to-end approach
define common tasks (e.g. evaluate the clustering or network
methodology on portfolio optimization, crisis detection, mean reversion
strategy) where all the details are specified (e.g. a well-chosen artificial
dataset, or samples from a generative model, or public financial data)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 42 / 64
Thank you for the attention. Questions?
Co-authorship network (left) and its MST (right)
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 43 / 64
References I
Tomaso Aste, Tiziana Di Matteo, and ST Hyde, Complex networks on
hyperbolic surfaces, Physica A: Statistical Mechanics and its
Applications 346 (2005), no. 1, 20–26.
Eike Christian Brechmann et al., Hierarchical kendall copulas and the
modeling of systemic and operational risk, Ph.D. thesis,
Universit¨atsbibliothek der TU M¨unchen, 2013.
Stefano Battiston and Michele Catanzaro, Statistical properties of
corporate board and director networks, The European Physical Journal
B 38 (2004), no. 2, 345–352.
Matteo Barigozzi, Giorgio Fagiolo, and Diego Garlaschelli,
Multinetwork of international trade: A commodity-specific analysis,
Physical Review E 81 (2010), no. 4, 046104.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 44 / 64
References II
Monica Billio, Mila Getmansky, Andrew W Lo, and Loriana Pelizzon,
Econometric measures of connectedness and systemic risk in the
finance and insurance sectors, Journal of Financial Economics 104
(2012), no. 3, 535–559.
Kestutis Baltakys, Juho Kanniainen, and Frank Emmert-Streib,
Multilayer aggregation with statistical validation: Application to
investor networks, Scientific reports 8 (2018), no. 1, 8198.
RJ Buonocore, RN Mantegna, and T Di Matteo, On the interplay
between multiscaling and average cross-correlation, arXiv preprint
arXiv:1802.01113 (2018).
Christian Borghesi, Matteo Marsili, and Salvatore Miccich`e,
Emergence of time-horizon invariant correlation structure in financial
returns by subtraction of the market mode, Physical Review E 76
(2007), no. 2, 026104.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 45 / 64
References III
Eduard Baitinger and Jochen Papenbrock, Interconnectedness risk and
active portfolio management: The information-theoretic perspective.
AQ Barbi and GA Prataviera, Nonlinear dependencies on brazilian
equity network from mutual information minimum spanning trees,
arXiv preprint arXiv:1711.06185 (2017).
Juan Gabriel Brida and Wiston Adri´an Risso, Multidimensional
minimal spanning tree: The dow jones case, Physica A: Statistical
Mechanics and its Applications 387 (2008), no. 21, 5205–5210.
Gunnar Carlsson and Facundo M˜AˇSmoli, Characterization, stability
and convergence of hierarchical clustering methods, Journal of
machine learning research 11 (2010), no. Apr, 1425–1470.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 46 / 64
References IV
Christian Dose and Silvano Cincotti, Clustering of financial time series
with application to index and enhanced index tracking portfolio,
Physica A: Statistical Mechanics and its Applications 355 (2005),
no. 1, 145–151.
Fabrizio Durante, Enrico Foscolo, Roberta Pappad`a, and Hao Wang,
A portfolio diversification strategy via tail dependence measures.
Philippe Donnat, Gautier Marti, and Philippe Very, Toward a generic
representation of random variables for machine learning, Pattern
Recognition Letters 70 (2016), 24–31.
Fabrizio Durante and Roberta Pappada, Cluster analysis of time series
via kendall distribution, Strengthening Links Between Data Analysis
and Soft Computing, Springer, 2015, pp. 209–216.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 47 / 64
References V
Fabrizio Durante, Roberta Pappad`a, and Nicola Torelli, Clustering of
financial time series in risky scenarios, Advances in Data Analysis and
Classification 8 (2014), no. 4, 359–376.
Mohammed Elshendy and Andrea Fronzetti Colladon, Big data
analysis of economic news: Hints to forecast macroeconomic
indicators, International Journal of Engineering Business Management
9 (2017), 1847979017720040.
Edwin J Elton and Martin J Gruber, Improved forecasting through the
design of homogeneous groups, The Journal of Business 44 (1971),
no. 4, 432–450.
Pawel Fiedor, Information-theoretic approach to lead-lag effect on
financial markets, The European Physical Journal B 87 (2014), no. 8,
1–9.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 48 / 64
References VI
, Networks in financial markets based on the mutual
information rate, Physical Review E 89 (2014), no. 5, 052801.
Palash Goyal and Emilio Ferrara, Graph embedding techniques,
applications, and performance: A survey, Knowledge-Based Systems
151 (2018), 78–94.
Yong Kheng Goh, Haslifah M Hasim, and Chris G Antonopoulos,
Inference of financial networks using the normalised mutual
information rate, PloS one 13 (2018), no. 2, e0192160.
Lorenzo Giada and Matteo Marsili, Data clustering and noise
undressing of correlation matrices, Physical Review E 63 (2001),
no. 6, 061101.
, Algorithms of maximum likelihood data clustering with
applications, Physica A: Statistical Mechanics and its Applications
315 (2002), no. 3, 650–664.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 49 / 64
References VII
Ya-Chun Gao, Yong Zeng, and Shi-Min Cai, Influence network in the
Chinese stock market, Journal of Statistical Mechanics: Theory and
Experiment 2015 (2015), no. 3, P03017.
Xue Guo, Hu Zhang, and Tianhai Tian, Development of stock
correlation networks using mutual information and financial big data,
PloS one 13 (2018), no. 4, e0195941.
David Hartman and Jaroslav Hlinka, Nonlinearity in stock networks,
arXiv preprint arXiv:1804.10264 (2018).
Amelie H¨uttner, Jan-Frederik Mai, and Stefano Mineo, Portfolio
selection based on graphs: Does it align with markowitz-optimal
portfolios?, Dependence Modeling (2018).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 50 / 64
References VIII
Dion Harmon, Blake Stacey, Yavni Bar-Yam, and Yaneer Bar-Yam,
Networks of economic market interdependence and systemic risk,
arXiv preprint arXiv:1011.3707 (2010).
Wei-Qiang Huang, Xin-Tian Zhuang, Shuang Yao, and Stan Uryasev,
A financial network perspective of financial institutions’ systemic risk
contributions, Physica A: Statistical Mechanics and its Applications
456 (2016), 183–196.
Neil F Johnson, Mark McDonald, Omer Suleman, Stacy Williams, and
Sam Howison, What shakes the FX tree? understanding currency
dominance, dependence, and dynamics (keynote address), SPIE Third
International Symposium on Fluctuations and Noise, International
Society for Optics and Photonics, 2005, pp. 86–99.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 51 / 64
References IX
Anton Kocheturov, Mikhail Batsyn, and Panos M Pardalos, Dynamics
of cluster structures in a financial market network, Physica A:
Statistical Mechanics and its Applications 413 (2014), 523–533.
L Kullmann, J Kertesz, and RN Mantegna, Identification of clusters of
companies in stock indices via potts super-paramagnetic transitions,
Physica A: Statistical Mechanics and its Applications 287 (2000),
no. 3, 412–419.
Philipp Kr¨uger, Augustin Landier, and David Thesmar, Categorization
bias in the stock market, Available SSRN 2034204 (2012).
Dror Y Kenett, Tobias Preis, Gitit Gur-Gershgoren, and Eshel
Ben-Jacob, Dependency network and node influence: application to
the study of financial markets, International Journal of Bifurcation and
Chaos 22 (2012), no. 07, 1250181.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 52 / 64
References X
Dror Y Kenett, Yoash Shapira, Asaf Madi, Sharron Bransburg-Zabary,
Gitit Gur-Gershgoren, and Eshel Ben-Jacob, Dynamics of stock market
correlations, AUCO Czech Economic Review 4 (2010), no. 3, 330–341.
Dror Y Kenett, Michele Tumminello, Asaf Madi, Gitit Gur-Gershgoren,
Rosario N Mantegna, and Eshel Ben-Jacob, Dominating clasp of the
financial sector revealed by partial correlation analysis of the stock
market, PloS one 5 (2010), no. 12, e15032.
Zura Kakushadze and Willie Yu, Statistical industry classification.
Gan Siew Lee and Maman A Djauhari, Multidimensional stock
network analysis: An Escoufier’s RV coefficient approach, AIP
Conference Proceedings, vol. 1, 2013, pp. 550–555.
Elisa Letizia and Fabrizio Lillo, Corporate payments networks and
credit risk rating.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 53 / 64
References XI
Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong, and
Mark Flood, Clustering techniques and their effect on portfolio
formation and risk analysis, Proceedings of the International
Workshop on Data Science for Macro-Modeling, ACM, 2014, pp. 1–6.
Nicol´o Musmeci, Tomaso Aste, and Tiziana Di Matteo, Relation
between financial market structure and the real economy: comparison
between clustering methods, PloS one 10 (2015), no. 3, e0116201.
Nicol´o Musmeci, Tomaso Aste, and T Di Matteo, Interplay between
past market correlation structure changes and future volatility
outbursts, Scientific reports 6 (2016).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 54 / 64
References XII
Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe Donnat,
Clustering financial time series: How long is enough?, Proceedings of
the Twenty-Fifth International Joint Conference on Artificial
Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, 2016,
pp. 2583–2589.
Raffaello Morales, T Di Matteo, and Tomaso Aste, Dependency
structure and scaling properties of financial time series are related,
Scientific Reports 4 (2014), no. 4589.
Guido Previde Massara, Tiziana Di Matteo, and Tomaso Aste,
Network filtering for big data: triangulated maximally filtered graph,
Journal of complex Networks 5 (2016), no. 2, 161–178.
Mel MacMahon and Diego Garlaschelli, Community detection for
correlation matrices, Phys. Rev. X 5 (2015), 021006.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 55 / 64
References XIII
Federico Musciotto, Luca Marotta, Salvatore Miccich`e, and Rosario N
Mantegna, Bootstrap validation of links of a minimum spanning tree,
arXiv preprint arXiv:1802.03395 (2018).
Gautier Marti, Frank Nielsen, and Philippe Donnat, Optimal copula
transport for clustering multivariate time series, 2016 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), IEEE, 2016, pp. 2379–2383.
David Matesanz and Guillermo J Ortega, Sovereign public debt crisis
in europe. a network analysis, Physica A: Statistical Mechanics and its
Applications 436 (2015), 756–766.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 56 / 64
References XIV
Gautier Marti, Philippe Very, Philippe Donnat, and Frank Nielsen, A
proposal of a methodological framework with experimental guidelines
to investigate clustering stability on financial time series, 14th IEEE
International Conference on Machine Learning and Applications,
ICMLA 2015, Miami, FL, USA, December 9-11, 2015, 2015,
pp. 32–37.
J-P Onnela, Anirban Chakraborti, Kimmo Kaski, Janos Kertesz, and
Antti Kanto, Dynamics of market correlations: Taxonomy and
portfolio analysis, Physical Review E 68 (2003), no. 5, 056110.
J-P Onnela, A Chakraborti, K Kaski, and J Kerti´esz, Dynamic asset
trees and portfolio analysis, The European Physical Journal
B-Condensed Matter and Complex Systems 30 (2002), no. 3,
285–288.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 57 / 64
References XV
J-P Onnela, Kimmo Kaski, and Janos Kert´esz, Clustering and
information in correlation based financial networks, The European
Physical Journal B-Condensed Matter and Complex Systems 38
(2004), no. 2, 353–362.
Francesco Pozzi, Tiziana Di Matteo, and Tomaso Aste, Spread of risk
across financial markets: better to invest in the peripheries, Scientific
reports 3 (2013).
Vasiliki Plerou, P Gopikrishnan, Bernd Rosenow, LA Nunes Amaral,
and H Eugene Stanley, A random matrix theory approach to financial
cross-correlations, Physica A: Statistical Mechanics and its
Applications 287 (2000), no. 3, 374–382.
Don B Panton, V Parker Lessig, and O Maurice Joy, Comovement of
international equity markets: a taxonomic approach, Journal of
Financial and Quantitative Analysis 11 (1976), no. 03, 415–432.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 58 / 64
References XVI
Jochen Papenbrock and Peter Schwendner, Handling risk-on/risk-off
dynamics with correlation regimes and correlation networks, Financial
Markets and Portfolio Management 29 (2015), no. 2, 125–147.
Gustavo Peralta and Abalfazl Zareei, A network approach to portfolio
selection, Journal of Empirical Finance (2016).
Fei Ren, Ya-Nan Lu, Sai-Ping Li, Xiong-Fei Jiang, Li-Xin Zhong, and
Tian Qiu, Dynamic portfolio strategy using clustering approach, arXiv
preprint arXiv:1608.03058 (2016).
Jacopo Rocchi, Enoch Yan Lok Tsui, and David Saad, Emerging
interdependence between stock values during financial crashes, arXiv
preprint arXiv:1611.02549 (2016).
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 59 / 64
References XVII
Won-Min Song, Tiziana Di Matteo, and Tomaso Aste, Nested
hierarchies in planar graphs, Discrete Applied Mathematics 159
(2011), no. 17, 2135–2146.
Won-Min Song, T Di Matteo, and Tomaso Aste, Hierarchical
information clustering by means of topologically embedded graphs,
PLoS One 7 (2012), no. 3, e31929.
Ahmet Sensoy and Benjamin M Tabak, Dynamic spanning trees in
stock market networks: The case of Asia-Pacific, Physica A:
Statistical Mechanics and its Applications 414 (2014), 387–402.
Dong-Ming Song, Michele Tumminello, Wei-Xing Zhou, and
Rosario N Mantegna, Evolution of worldwide stock markets,
correlation structure, and correlation-based graphs, Physical Review E
84 (2011), no. 2, 026108.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 60 / 64
References XVIII
Tiziano Squartini, Iman Van Lelyveld, and Diego Garlaschelli,
Early-warning signals of topological collapse in interbank networks,
Scientific reports 3 (2013).
Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and Rosario N
Mantegna, A tool for filtering information in complex systems,
Proceedings of the National Academy of Sciences of the United States
of America 102 (2005), no. 30, 10421–10426.
Michele Tumminello, Claudia Coronnello, Fabrizio Lillo, Salvatore
Micciche, and Rosario N Mantegna, Spanning trees and bootstrap
reliability estimation in correlation-based networks, International
Journal of Bifurcation and Chaos 17 (2007), no. 07, 2319–2329.
Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario N
Mantegna, Cluster analysis for portfolio optimization, Journal of
Economic Dynamics and Control 32 (2008), no. 1, 235–258.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 61 / 64
References XIX
Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna,
Hierarchically nested factor model from multivariate data, EPL
(Europhysics Letters) 78 (2007), no. 3, 30006.
, Kullback-leibler distance as a measure of the information
filtered from multivariate data, Physical Review E 76 (2007), no. 3,
031123.
, Correlation, hierarchies, and networks in financial markets,
Journal of Economic Behavior & Organization 75 (2010), no. 1,
40–58.
Michele Tumminello, Salvatore Miccich`e, Fabrizio Lillo, Jyrki Piilo,
and Rosario N Mantegna, Statistically validated networks in bipartite
complex systems, PloS one 6 (2011), no. 3, e17994.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 62 / 64
References XX
Chengyi Tu, Cointegration-based financial networks study in chinese
stock market, Physica A: Statistical Mechanics and its Applications
402 (2014), 245–254.
Tom´aˇs V`yrost, ˇStefan Ly´ocsa, and Eduard Baum¨ohl, Granger causality
stock market networks: Temporal proximity and preferential
attachment, Physica A: Statistical Mechanics and its Applications 427
(2015), 262–276.
Liuren Wu, Centrality of the supply chain network.
Yiting Zhang, Gladys Hui Ting Lee, Jian Cheng Wong, Jun Liang
Kok, Manamohan Prusty, and Siew Ann Cheong, Will the us economy
recover in 2010? a minimal spanning tree study, Physica A: Statistical
Mechanics and its Applications 390 (2011), no. 11, 2020–2050.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 63 / 64
References XXI
Xin Zhang, Boris Podobnik, Dror Y Kenett, and H Eugene Stanley,
Systemic risk and causality dynamics of the world international
shipping market, Physica A: Statistical Mechanics and its Applications
415 (2014), 43–53.
Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 64 / 64

More Related Content

What's hot

Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
Mokhtar SELLAMI
 
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
SYRTO Project
 
Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...
butest
 

What's hot (20)

Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
 
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
 
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...
 
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
 
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
 
Clustering in dynamic causal networks as a measure of systemic risk on the eu...
Clustering in dynamic causal networks as a measure of systemic risk on the eu...Clustering in dynamic causal networks as a measure of systemic risk on the eu...
Clustering in dynamic causal networks as a measure of systemic risk on the eu...
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Entropy and systemic risk measures
Entropy and systemic risk measuresEntropy and systemic risk measures
Entropy and systemic risk measures
 
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Options on Quantum Money: Quantum Path- Integral With Serial Shocks
Options on Quantum Money: Quantum Path- Integral With Serial ShocksOptions on Quantum Money: Quantum Path- Integral With Serial Shocks
Options on Quantum Money: Quantum Path- Integral With Serial Shocks
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
 
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
Unit 6: All
Unit 6: AllUnit 6: All
Unit 6: All
 
A prospect theory model of route choice with context dependent reference points
A prospect theory model of route choice with context dependent reference pointsA prospect theory model of route choice with context dependent reference points
A prospect theory model of route choice with context dependent reference points
 
Application of transportation problem under pentagonal neutrosophic environment
Application of transportation problem under pentagonal neutrosophic environmentApplication of transportation problem under pentagonal neutrosophic environment
Application of transportation problem under pentagonal neutrosophic environment
 
Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...
 

Similar to A review of two decades of correlations, hierarchies, networks and clustering in financial markets

Similar to A review of two decades of correlations, hierarchies, networks and clustering in financial markets (20)

CoopLoc Technical Presentation
CoopLoc Technical PresentationCoopLoc Technical Presentation
CoopLoc Technical Presentation
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
Glm
GlmGlm
Glm
 
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
 
Information filtering networks
Information filtering networksInformation filtering networks
Information filtering networks
 
Increasing electrical grid stability classification performance using ensemb...
Increasing electrical grid stability classification performance  using ensemb...Increasing electrical grid stability classification performance  using ensemb...
Increasing electrical grid stability classification performance using ensemb...
 
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSEVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
 
Centrality Prediction in Mobile Social Networks
Centrality Prediction in Mobile Social NetworksCentrality Prediction in Mobile Social Networks
Centrality Prediction in Mobile Social Networks
 
Metaheuristic Optimization: Algorithm Analysis and Open Problems
Metaheuristic Optimization: Algorithm Analysis and Open ProblemsMetaheuristic Optimization: Algorithm Analysis and Open Problems
Metaheuristic Optimization: Algorithm Analysis and Open Problems
 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
 
IRJET - Exploring Agglomerative Spectral Clustering Technique Employed for...
IRJET - 	  Exploring Agglomerative Spectral Clustering Technique Employed for...IRJET - 	  Exploring Agglomerative Spectral Clustering Technique Employed for...
IRJET - Exploring Agglomerative Spectral Clustering Technique Employed for...
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
recko_paper
recko_paperrecko_paper
recko_paper
 
mlcourse.ai. Clustering
mlcourse.ai. Clusteringmlcourse.ai. Clustering
mlcourse.ai. Clustering
 
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
 
50120130406039
5012013040603950120130406039
50120130406039
 
algorithms
algorithmsalgorithms
algorithms
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
 
Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...
 

More from Gautier Marti

More from Gautier Marti (13)

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...
 
A quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsA quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptions
 
How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?
 
Generating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGenerating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in Finance
 
Applications of GANs in Finance
Applications of GANs in FinanceApplications of GANs in Finance
Applications of GANs in Finance
 
My recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsMy recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returns
 
Takeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaTakeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, California
 
Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...
 
Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond Correlation
 
On the stability of clustering financial time series
On the stability of clustering financial time seriesOn the stability of clustering financial time series
On the stability of clustering financial time series
 
Clustering Random Walk Time Series
Clustering Random Walk Time SeriesClustering Random Walk Time Series
Clustering Random Walk Time Series
 

Recently uploaded

Introduction to Economics II Chapter 28 Unemployment (1).pdf
Introduction to Economics II Chapter 28 Unemployment (1).pdfIntroduction to Economics II Chapter 28 Unemployment (1).pdf
Introduction to Economics II Chapter 28 Unemployment (1).pdf
Safa444074
 
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
Amil Baba Dawood bangali
 
USDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptxUSDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptx
marketing367770
 

Recently uploaded (20)

how can i trade pi coins for Bitcoin easily.
how can i trade pi coins for Bitcoin easily.how can i trade pi coins for Bitcoin easily.
how can i trade pi coins for Bitcoin easily.
 
Introduction to Economics II Chapter 28 Unemployment (1).pdf
Introduction to Economics II Chapter 28 Unemployment (1).pdfIntroduction to Economics II Chapter 28 Unemployment (1).pdf
Introduction to Economics II Chapter 28 Unemployment (1).pdf
 
what is a pi whale and how to access one.
what is a pi whale and how to access one.what is a pi whale and how to access one.
what is a pi whale and how to access one.
 
when officially can i withdraw my pi Network coins.
when officially can i withdraw my pi Network coins.when officially can i withdraw my pi Network coins.
when officially can i withdraw my pi Network coins.
 
Introduction to Economics II Chapter 25 Production and Growth.pdf
Introduction to Economics II Chapter 25 Production and Growth.pdfIntroduction to Economics II Chapter 25 Production and Growth.pdf
Introduction to Economics II Chapter 25 Production and Growth.pdf
 
how can I sell my pi coins for cash in a pi APP
how can I sell my pi coins for cash in a pi APPhow can I sell my pi coins for cash in a pi APP
how can I sell my pi coins for cash in a pi APP
 
Empowering the Unbanked: The Vital Role of NBFCs in Promoting Financial Inclu...
Empowering the Unbanked: The Vital Role of NBFCs in Promoting Financial Inclu...Empowering the Unbanked: The Vital Role of NBFCs in Promoting Financial Inclu...
Empowering the Unbanked: The Vital Role of NBFCs in Promoting Financial Inclu...
 
how can I sell my pi coins in China 2024.
how can I sell my pi coins in China 2024.how can I sell my pi coins in China 2024.
how can I sell my pi coins in China 2024.
 
Commercial Bank Economic Capsule - May 2024
Commercial Bank Economic Capsule - May 2024Commercial Bank Economic Capsule - May 2024
Commercial Bank Economic Capsule - May 2024
 
how to sell pi coins on Bitmart crypto exchange
how to sell pi coins on Bitmart crypto exchangehow to sell pi coins on Bitmart crypto exchange
how to sell pi coins on Bitmart crypto exchange
 
how to sell pi coins in Canada, Uk and Australia
how to sell pi coins in Canada, Uk and Australiahow to sell pi coins in Canada, Uk and Australia
how to sell pi coins in Canada, Uk and Australia
 
Webinar Exploring DORA for Fintechs - Simont Braun
Webinar Exploring DORA for Fintechs - Simont BraunWebinar Exploring DORA for Fintechs - Simont Braun
Webinar Exploring DORA for Fintechs - Simont Braun
 
Isios-2024-Professional-Independent-Trustee-Survey.pdf
Isios-2024-Professional-Independent-Trustee-Survey.pdfIsios-2024-Professional-Independent-Trustee-Survey.pdf
Isios-2024-Professional-Independent-Trustee-Survey.pdf
 
9th issue of our inhouse magazine Ingenious May 2024.pdf
9th issue of our inhouse magazine Ingenious May 2024.pdf9th issue of our inhouse magazine Ingenious May 2024.pdf
9th issue of our inhouse magazine Ingenious May 2024.pdf
 
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
 
is it possible to sell pi network coin in 2024.
is it possible to sell pi network coin in 2024.is it possible to sell pi network coin in 2024.
is it possible to sell pi network coin in 2024.
 
Bitcoin Masterclass TechweekNZ v3.1.pptx
Bitcoin Masterclass TechweekNZ v3.1.pptxBitcoin Masterclass TechweekNZ v3.1.pptx
Bitcoin Masterclass TechweekNZ v3.1.pptx
 
Most Profitable Cryptocurrency to Invest in 2024.pdf
Most Profitable Cryptocurrency to Invest in 2024.pdfMost Profitable Cryptocurrency to Invest in 2024.pdf
Most Profitable Cryptocurrency to Invest in 2024.pdf
 
how to sell pi coins at high rate quickly.
how to sell pi coins at high rate quickly.how to sell pi coins at high rate quickly.
how to sell pi coins at high rate quickly.
 
USDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptxUSDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptx
 

A review of two decades of correlations, hierarchies, networks and clustering in financial markets

  • 1. A review of two decades of correlations, hierarchies, networks and clustering in financial markets Ton Duc Thang University, Ho Chi Minh City, Vietnam Gautier Marti, Frank Nielsen, Mikolaj Bi´nkowski, Philippe Donnat Ecole Polytechnique, Imperial College London, Hellebore Capital Ltd. 10 August 2018 HELLEBORECAPITAL Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 1 / 64
  • 2. Table of contents 1 Introduction 2 Correlation networks The standard and widely adopted methodology Concerns about the standard methodology Contributions for improving the methodology On algorithms On distances On other methodological aspects 3 Other networks 4 Dynamics of networks 5 Applications 6 Opinionated views on research directions Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 2 / 64
  • 3. Section 1 Introduction Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 3 / 64
  • 4. Introduction Motivation: A better understanding of financial markets using a scientific approach. Empirical studies are using data to verify hypotheses and discover stylized facts. Example of datasets: price, volume, returns, turnover time series supply chain networks market (OTC, exchange) transaction data retail transactional data (credit cards) corporate payments networks international trade (import/export) networks, ... Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 4 / 64
  • 5. Introduction Several research fields are tackling the problem with their own tools: statistical physics, econophyics: Minimum Spanning Tree (MST) Random Matrix Theory (RMT) linear correlations statistics, data mining, machine learning: graph theory communities detection clustering algorithms non-linear dependence alternative distances statistical significance and robustness check via bootstrapping economics, finance, accounting, behavioural finance: standard industry and fundamental classifications vs. statistical and text-based classifications networks of trades, suppliers, consumers, competitors, investors linear regressions on network statistics, statistical significance through t-stats Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 5 / 64
  • 6. Section 2 Correlation networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 6 / 64
  • 7. The standard and widely adopted methodology (Mantegna, 1999) [add the proper biblio ref] Let N be the number of assets. Let Pi (t) be the price at time t of asset i, 1 ≤ i ≤ N. Let ri (t) be the log-return at time t of asset i: ri (t) = log Pi (t) − log Pi (t − 1). For each pair i, j of assets, compute their correlation: ρij = ri rj − ri rj r2 i − ri 2 r2 j − rj 2 . Convert the correlation coefficients ρij into distances: dij = 2(1 − ρij ). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 7 / 64
  • 8. The standard and widely adopted methodology From all the distances dij , compute a minimum spanning tree (MST) using, for example, Algorithm 1: Algorithm 1 Kruskal’s algorithm 1: procedure BuildMST({dij }1≤i,j≤N) 2: Start with a fully disconnected graph G = (V , E) 3: E ← ∅ 4: V ← {i}1≤i≤N 5: Try to add edges by increasing distances 6: for (i, j) ∈ V 2 ordered by increasing dij do 7: Verify that i and j are not already connected by a path 8: if not connected(i, j) then 9: Add the edge (i, j) to connect i and j 10: E ← E ∪ {(i, j)} 11: G is the resulting MST return G = (V , E) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 8 / 64
  • 9. The standard and widely adopted methodology Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 9 / 64
  • 10. Concerns about the standard methodology The clusters obtained from the MST (or equivalently, the Single Linkage Clustering Algorithm (SLCA)) are known to be unstable (small perturbations of the input data may cause big differences in the resulting clusters) [MVDN15]. The clustering instability may be partly due to the algorithm (MST/Single Linkage are known for the chaining phenomenon [CM10]). The clustering instability may be partly due to the correlation coefficient (Pearson linear correlation) defining the distance which is known for being brittle to outliers, and, more generally, not well suited to distributions other than the Gaussian ones [DMV16]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 10 / 64
  • 11. Single Linkage chaining problem... makes it brittle to small perturbations in the input distances. Clusters and hierarchies are skewed: It does not take into account some notion of density. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 11 / 64
  • 12. Pearson linear correlation... is too sensitive to outliers. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 12 / 64
  • 13. Concerns about the standard methodology Theoretical results providing the statistical reliability of hierarchical trees and correlation-based networks are still not available [TLM10]. One might expect that the higher the correlation associated to a link in a correlation-based network is, the higher the reliability of this link is. In [TCL+07], authors show that this is not always observed empirically. Changes affecting specific links (and clusters) during prominent crises are of difficult interpretation due to the high level of statistical uncertainty associated with the correlation estimation [STZM11]. The standard method is somewhat arbitrary: A change in the method (e.g. using a different clustering algorithm or a different correlation coefficient) may yield a huge change in the clustering results [LRW+14, MVDN15]. As a consequence, it implies huge variability in portfolio formation and perceived risk [LRW+14]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 13 / 64
  • 14. Variance of the Pearson correlation estimator Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 14 / 64
  • 15. CRLB of the Pearson correlation estimator - Proof Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 15 / 64
  • 16. Random Matrix Theory & Empirical correlation matrices Let X be the matrix storing the standardized returns of N = 560 assets (credit default swaps) over a period of T = 2500 trading days. Then, the empirical correlation matrix of the returns is C = 1 T XX . We can compute the empirical density of its eigenvalues ρ(λ) = 1 N dn(λ) dλ , where n(λ) counts the number of eigenvalues of C less than λ. From random matrix theory, the Marchenko-Pastur distribution gives the limit distribution as N → ∞, T → ∞ and T/N fixed. It reads: ρ(λ) = T/N 2π (λmax − λ)(λ − λmin) λ , where λmax min = 1 + N/T ± 2 N/T, and λ ∈ [λmin, λmax]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 16 / 64
  • 17. Random Matrix Theory & Empirical correlation matrices Notice that the Marchenko-Pastur density fits well the empirical density meaning that most of the information contained in the empirical correlation matrix amounts to noise: only 26 eigenvalues are greater than λmax. The highest eigenvalue corresponds to the ‘market’, the 25 others can be associated to ‘industrial sectors’. It is a known stylized fact of empirical correlation matrices between financial returns: Only ≈ 5% of their eigenvalues are greater than λmax. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 17 / 64
  • 18. A somewhat arbitrary choice of methodology The standard method is somewhat arbitrary. Adopting another one may yield strongly different results. Which ones to trust? Are they both useful? Clusters obtained are much different from one method to another Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 18 / 64
  • 19. Contributions on algorithms Several alternative algorithms have been proposed to replace the minimum spanning tree and its corresponding clusters: Average Linkage Minimum Spanning Tree (ALMST) [TCL+07]; Authors introduce a spanning tree associated to the Average Linkage Clustering Algorithm (ALCA); It is designed to remedy the unwanted chaining phenomenon of MST/SLCA. Planar Maximally Filtered Graph (PMFG) [ADMH05, TADMM05] which strictly contains the Minimum Spanning Tree (MST) but encodes a larger amount of information in its internal structure. Directed Bubble Hierarchal Tree (DBHT) [SDMA11, SDMA12] which is designed to extract, without parameters, the deterministic clusters from the PMFG. Triangulated Maximally Filtered Graph (TMFG) [MDMA16]; Authors introduce another filtered graph more suitable for big datasets. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 19 / 64
  • 20. Contributions on algorithms (cont’d) Clustering using Potts super-paramagnetic transitions [KKM00]; When anti-correlations occur, the model creates repulsion between the stocks which modify their clustering structure. Clustering using maximum likelihood [GM01, GM02]; Authors define the likelihood of a clustering based on a simple 1-factor model, then devise parameter-free methods to find a clustering with high likelihood. Clustering using Random Matrix Theory (RMT) [PGR+00]; Eigenvalues help to determine the number of clusters, and eigenvectors their composition. [MG15] proposes network-based community detection methods whose null hypothesis is consistent with RMT results on cross-correlation matrices for financial time series data, unlike existing community detection algorithms. Clustering using the p-median problem [KBP14]; With this construction, every cluster is a star, i.e. a tree with one central node. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 20 / 64
  • 21. Planar Maximally Filtered Graph (PMFG) The PMFG is a compelling alternative to the MST. PMFG nodes are colored according to the clusters obtained from DBHT Implementation of the PMFG in Python: https: //gmarti.gitlab.io/networks/2018/06/03/pmfg-algorithm.html Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 21 / 64
  • 22. Contributions on distances At the heart of clustering algorithms is the fundamental notion of distance that can be defined upon a proper representation of data. It is thus an obvious direction to explore. We list below what has been proposed in the literature so far: Distances that try to quantify how one financial instrument provides information about another instrument: Distance using Granger causality [BGLP12], Distance using partial correlation [KTM+ 10], Study of asynchronous, lead-lag relationships by using mutual information instead of Pearson’s correlation coefficient [Fie14a, RTS16], The correlation matrix is normalized using the affinity transformation: the correlation between each pair of stocks is normalized according to the correlations of each of the two stocks with all other stocks [KSM+ 10]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 22 / 64
  • 23. Contributions on distances (cont’d) Distances that aim at including non-linear relationships in the analysis: Distances using mutual information, mutual information rate, and other information-theoretic distances [Fie14b, RTS16, BP17a, BP17b, GHA18, GZT18], The Brownian distance [ZPKS14], Copula-based [MND16, DP15, B+ 13] and tail dependence [DFPW15] distances. Distances that aim at taking into account multivariate dependence: Each stock is represented by a bivariate time series: its returns and traded volumes [BR08]; a distance is then applied to an ad hoc transform of the two time series into a symbolic sequence, Each stock is represented by a multivariate time series, for example the daily (high, low, open, close) [LD13]; Authors use the Escoufier’s RV coefficient (a multivariate extension of the Pearson’s correlation coefficient). A distance taking into account both the correlation between returns and their distributions [DMV16]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 23 / 64
  • 24. Contributions on distances (cont’d) Unlike recent studies which claim that the existence of nonlinear dependence between stock returns have effects on network characteristics, [HH18] documents that “most of the apparent nonlinearity is due to univariate non-Gaussianity. Further, strong non-stationarity in a few specific stocks may play a role. In particular, the sharp decrease of some stocks during the global financial crisis in 2008” gives rise to apparent negative tail dependence among stocks. When constructing unweighted stock networks, they suggest to use linear correlation “on marginally normalized data”, that is Spearman’s rank correlation. In fact, this is similar to the idea of splitting apart the dependence information from the distribution one as in [DMV16], where Spearman’s rank correlation stems from using a Euclidean distance between the uniform margins of the underlying bivariate copula. Following previous studies, and unlike in [DMV16], the distribution information is discarded when constructing the network. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 24 / 64
  • 25. Dependence and marginal distribution of the returns Theorem (Sklar’s theorem, 1959) For any random vector X = (X1, . . . , XN) having continuous marginal cumulative distribution functions Fi , its joint cumulative distribution F is uniquely expressed as F(X1, . . . , XN) = C(F1(X1), . . . , FN(XN)), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 25 / 64
  • 26. Information-theoretic distances vs. Copula-based ones? Copula entropy: Hc(x) = − u c(u) log c(u)du Mutual information: I(x) = x p(x) log p(x) i pi (xi ) dx = x c(ux ) i pi (xi ) log c(ux )dx = u c(ux ) log c(ux )dux = −Hc(x) Entropy: H(x) = i H(xi ) + Hc(x) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 26 / 64
  • 27. Contributions on other methodological aspects Reliability and statistical uncertainty of the methods: A bootstrap approach is used to estimate the statistical reliability of both hierarchical trees [TLM07a, MAND16] and correlation-based networks [TCL+ 07, MMMM18], Consistency proof of clustering algorithms for recovering clusters defined by nested block correlation matrices; Study of empirical convergence rates [MAND16], Kullback-Leibler divergence is used to estimate the amount of filtered information between the sample correlation matrix and the filtered one [TLM07b], Cophenetic correlation is used between the original correlation distances and the hierarchical cluster representation [PS15], Several measures between successive (in time) clusters, dendrograms, networks are used to estimate stability of the methods, e.g. cophenetic correlation between dendrograms in [PLJ76], adjusted Rand index (ARI) between clusters in [MVDN15], mutual information (MI) of link co-occurrence between networks in [STZM11]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 27 / 64
  • 28. Contributions on other methodological aspects (cont’d) Preprocessing of the time series: Subtract the market mode before performing a cluster or network analysis on the returns [BMM07], Encode both rank statistics and a distribution histogram of the returns into a representative vector [DMV16], Fit an ARMA(p,q)-FIEGARCH(1,d,1)-cDCC process (econometric preprocessing) to obtain dynamic correlations instead of the common approach of rolling window Pearson correlations [ST14], Use a clustering of successive correlation matrices to infer a market state [PS15]. Use of other types of networks: threshold networks [OKK04], influence networks [GZC15], partial-correlation networks [KTM+10, KPGGBJ12], Granger causality networks [BGLP12, VLB15], cointegration-based networks [Tu14], bipartite networks [TML+11], etc. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 28 / 64
  • 29. Consistency and empirical convergence rates [MAND16] Model selection: The faster the (empirical) convergence, the better. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 29 / 64
  • 30. Statistical & practical stability One can use bootstrap, block bootstrap or other common sense and practical perturbations of the data as presented in [MVDN15]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 30 / 64
  • 31. Effect of a basic preprocessing: Subtract the market mode Visualization of the Planar Maximally Filtered Graph (PMFG) and DBHT clusters, for both non-detrended (left) and detrended (right) log-returns [MADM15]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 31 / 64
  • 32. Section 3 Other networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 32 / 64
  • 33. Examples of other financial networks supply chain networks [Wu15] investor (security holdings and trading behaviour) networks [BKES18] corporate board and director networks [BC04] international trade networks [BFG10] transaction networks [LL18] sovereign debt (quarterly public debt-to-GDP ratio) networks [MO15] interbank (exposures between banks) networks [SVLG13] These networks are built from alternative data which are often: confidential hard or costly to obtain Most often these studies are done in collaboration with a commercial or regulatory organization. Some of these datasets may contain significant alphas, and thus results are not publicly advertised: Papers are relatively few in contrast to the ones on the correlation of asset returns which are more oriented toward risk understanding. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 33 / 64
  • 34. Section 4 Dynamics of networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 34 / 64
  • 35. Studying the dynamics of networks Comparing and finding the differences in a sequence of large graphs is a computationally difficult problem. In the literature, one often studies the following statistics: (for networks) the normalized tree length [OCK+03], the mean occupation layer [OCK+03], the tree half-life [OCK+03], a survival ratio of the edges [OCKK02, JMS+05, ST14], node degree, strength [ST14], eigenvector, betweenness, closeness centrality [ST14], the agglomerative coefficient [MO15] (for clusters) the merging, splitting, birth, death, contraction, and growth of the clusters in time [PS15] Remark. To the best of my knowledge, graph embedding into vector spaces (cf. the recent Deep Learning literature, or this survey [GF18]) have not been used to study time series of financial networks. Such a vector representation would open the field to the toolbox of standard machine learning algorithms: Cluster networks and find those which are associated to some events (e.g. a crisis); Predict the future networks in a sequence of networks with a LSTM (stat arb?); Detect a structural break, etc. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 35 / 64
  • 36. Section 5 Applications Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 36 / 64
  • 37. Portfolio optimization [OCK+ 03] finds that the Markowitz portfolio layer in the MST is higher than the mean layer at all times. As the stocks of the minimum risk portfolio are found on the outskirts of the tree [PDMA13, OCK+ 03], authors expect larger trees to have greater diversification potential. In [TLGM08, PLJ76], authors compare the Markowitz portfolios from the filtered empirical correlation matrices using the clustering approach, the RMT approach and the shrinkage approach. [RLL+ 16, PZ16] propose to invest in different part of the MST depending on the estimated market conditions. Authors show that there is no inner-mathematical relationship between the minimum variance portfolio from Markowitz theory and the portfolios designed from the minimum spanning tree [HMM18]. Empirical evidence of such relations found by previous studies is essentially a stylized fact of financial returns correlations and time series, not a general property of correlation matrices. [DFPW15] introduces a procedure to design portfolios which are diversified in their tail behavior by selecting only a single asset in each cluster. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 37 / 64
  • 38. Trading strategy Earnings per share forecasts prepared on the basis of statistically grouped data (clusters) outperform forecasts made on data grouped on traditional industrial criteria as well as forecasts prepared by mechanical extrapolation techniques [EG71]. One can build a simple mean-reversion statistical arbitrage strategy whereby one assumes that stocks in a given industry move together, cross-sectionally demeans stock returns within said industry, shorts stocks with positive residual returns and goes long stocks with negative residual returns [KY16]. In [PS15], they suggest that tracking the merging, splitting, birth, and death of the clusters in time could be the basis for pairs-like reversal trading strategies but with pairs corresponding to clusters. The paper [DC05] describes methods for index tracking and enhanced index tracking based on clusters of financial time series. [MADM16] finds the existence of significant relations between past changes in the market correlation structure and future changes in the market volatility. In [KLT12], authors claim that long-short strategies exploiting mispricing due to the industry categorization bias generate statistically significant and economically sizable risk-adjusted excess returns. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 38 / 64
  • 39. Risk In [DPT14], authors design clusters that tend to be comonotonic in their extreme low values: To avoid contagion in the portfolio during risky scenarios, an investor should diversify over these clusters. In [MDMA14], authors postulate the existence of a hierarchical structure of risks which can be deemed responsible for both stock multivariate dependency structure and univariate multifractal behaviour, and then propose a model that reproduces the empirical observations (entanglement of univariate multi-scaling and multivariate cross-correlation properties of financial time series). The interplay between multi-scaling and average cross-correlation is confirmed in [BMDM18]. Clusters (statistical industry classification) can be an alternative to sometimes unavailable “fundamental” industry classifications (e.g. in emerging or small markets) [KY16]. [HZYU16] finds that financial institutions which have, in the correlation networks, greater node strength, larger node betweenness centrality, larger node closeness centrality and larger node clustering coefficient tend to be associated with larger systemic risk contributions. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 39 / 64
  • 40. Financial policy making Clusters and networks can help designing financial policies. Several papers propose to leverage them to detect risky market environments, develop indicators that can predict forthcoming crisis or economic recovery [ZLW+ 11], improve economic nowcasting [EFC17], or find key markets and assets that drive a whole region, and on which stimulus can be applied effectively. Authors of [HSBYBY10] claim that “separation prevents failure propagation and connections increase risks of global crises” whereas the prevailing view in favor of deregulation is that banks, by investing in diverse sectors, would have greater stability. To support their argument, using financial networks, they study the aftermath of the Glass-Steagall Act (1933) repeal by Clinton administration in 1999. They find that erosion of the Glass–Steagall Act, and cross sector investments eliminated “firewalls” that could have prevented the housing sector decline from triggering a wider financial and economic crisis: Our analysis implies that the investment across economic sectors itself creates increased cross-linking of otherwise much more weakly coupled parts of the economy, causing dependencies that increase, rather than decrease, risk. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 40 / 64
  • 41. Section 6 Opinionated views on research directions Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 41 / 64
  • 42. Opinionated views on research directions What’s missing for “financial networks” to become a mature research field? Some inspiration from the booming deep learning era: lack of reproducibility provide code and data (at least synthetic datasets) difficulty to compare methods, re-implementation bias build open source libraries (standardized api, optimized code) open source software helps to engage more with practitioners confidential data provide synthetic datasets encoding stylized facts propose generative models (cf. the GAN literature applied to graphs) lack of evaluation metrics / no end-to-end approach define common tasks (e.g. evaluate the clustering or network methodology on portfolio optimization, crisis detection, mean reversion strategy) where all the details are specified (e.g. a well-chosen artificial dataset, or samples from a generative model, or public financial data) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 42 / 64
  • 43. Thank you for the attention. Questions? Co-authorship network (left) and its MST (right) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 43 / 64
  • 44. References I Tomaso Aste, Tiziana Di Matteo, and ST Hyde, Complex networks on hyperbolic surfaces, Physica A: Statistical Mechanics and its Applications 346 (2005), no. 1, 20–26. Eike Christian Brechmann et al., Hierarchical kendall copulas and the modeling of systemic and operational risk, Ph.D. thesis, Universit¨atsbibliothek der TU M¨unchen, 2013. Stefano Battiston and Michele Catanzaro, Statistical properties of corporate board and director networks, The European Physical Journal B 38 (2004), no. 2, 345–352. Matteo Barigozzi, Giorgio Fagiolo, and Diego Garlaschelli, Multinetwork of international trade: A commodity-specific analysis, Physical Review E 81 (2010), no. 4, 046104. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 44 / 64
  • 45. References II Monica Billio, Mila Getmansky, Andrew W Lo, and Loriana Pelizzon, Econometric measures of connectedness and systemic risk in the finance and insurance sectors, Journal of Financial Economics 104 (2012), no. 3, 535–559. Kestutis Baltakys, Juho Kanniainen, and Frank Emmert-Streib, Multilayer aggregation with statistical validation: Application to investor networks, Scientific reports 8 (2018), no. 1, 8198. RJ Buonocore, RN Mantegna, and T Di Matteo, On the interplay between multiscaling and average cross-correlation, arXiv preprint arXiv:1802.01113 (2018). Christian Borghesi, Matteo Marsili, and Salvatore Miccich`e, Emergence of time-horizon invariant correlation structure in financial returns by subtraction of the market mode, Physical Review E 76 (2007), no. 2, 026104. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 45 / 64
  • 46. References III Eduard Baitinger and Jochen Papenbrock, Interconnectedness risk and active portfolio management: The information-theoretic perspective. AQ Barbi and GA Prataviera, Nonlinear dependencies on brazilian equity network from mutual information minimum spanning trees, arXiv preprint arXiv:1711.06185 (2017). Juan Gabriel Brida and Wiston Adri´an Risso, Multidimensional minimal spanning tree: The dow jones case, Physica A: Statistical Mechanics and its Applications 387 (2008), no. 21, 5205–5210. Gunnar Carlsson and Facundo M˜AˇSmoli, Characterization, stability and convergence of hierarchical clustering methods, Journal of machine learning research 11 (2010), no. Apr, 1425–1470. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 46 / 64
  • 47. References IV Christian Dose and Silvano Cincotti, Clustering of financial time series with application to index and enhanced index tracking portfolio, Physica A: Statistical Mechanics and its Applications 355 (2005), no. 1, 145–151. Fabrizio Durante, Enrico Foscolo, Roberta Pappad`a, and Hao Wang, A portfolio diversification strategy via tail dependence measures. Philippe Donnat, Gautier Marti, and Philippe Very, Toward a generic representation of random variables for machine learning, Pattern Recognition Letters 70 (2016), 24–31. Fabrizio Durante and Roberta Pappada, Cluster analysis of time series via kendall distribution, Strengthening Links Between Data Analysis and Soft Computing, Springer, 2015, pp. 209–216. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 47 / 64
  • 48. References V Fabrizio Durante, Roberta Pappad`a, and Nicola Torelli, Clustering of financial time series in risky scenarios, Advances in Data Analysis and Classification 8 (2014), no. 4, 359–376. Mohammed Elshendy and Andrea Fronzetti Colladon, Big data analysis of economic news: Hints to forecast macroeconomic indicators, International Journal of Engineering Business Management 9 (2017), 1847979017720040. Edwin J Elton and Martin J Gruber, Improved forecasting through the design of homogeneous groups, The Journal of Business 44 (1971), no. 4, 432–450. Pawel Fiedor, Information-theoretic approach to lead-lag effect on financial markets, The European Physical Journal B 87 (2014), no. 8, 1–9. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 48 / 64
  • 49. References VI , Networks in financial markets based on the mutual information rate, Physical Review E 89 (2014), no. 5, 052801. Palash Goyal and Emilio Ferrara, Graph embedding techniques, applications, and performance: A survey, Knowledge-Based Systems 151 (2018), 78–94. Yong Kheng Goh, Haslifah M Hasim, and Chris G Antonopoulos, Inference of financial networks using the normalised mutual information rate, PloS one 13 (2018), no. 2, e0192160. Lorenzo Giada and Matteo Marsili, Data clustering and noise undressing of correlation matrices, Physical Review E 63 (2001), no. 6, 061101. , Algorithms of maximum likelihood data clustering with applications, Physica A: Statistical Mechanics and its Applications 315 (2002), no. 3, 650–664. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 49 / 64
  • 50. References VII Ya-Chun Gao, Yong Zeng, and Shi-Min Cai, Influence network in the Chinese stock market, Journal of Statistical Mechanics: Theory and Experiment 2015 (2015), no. 3, P03017. Xue Guo, Hu Zhang, and Tianhai Tian, Development of stock correlation networks using mutual information and financial big data, PloS one 13 (2018), no. 4, e0195941. David Hartman and Jaroslav Hlinka, Nonlinearity in stock networks, arXiv preprint arXiv:1804.10264 (2018). Amelie H¨uttner, Jan-Frederik Mai, and Stefano Mineo, Portfolio selection based on graphs: Does it align with markowitz-optimal portfolios?, Dependence Modeling (2018). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 50 / 64
  • 51. References VIII Dion Harmon, Blake Stacey, Yavni Bar-Yam, and Yaneer Bar-Yam, Networks of economic market interdependence and systemic risk, arXiv preprint arXiv:1011.3707 (2010). Wei-Qiang Huang, Xin-Tian Zhuang, Shuang Yao, and Stan Uryasev, A financial network perspective of financial institutions’ systemic risk contributions, Physica A: Statistical Mechanics and its Applications 456 (2016), 183–196. Neil F Johnson, Mark McDonald, Omer Suleman, Stacy Williams, and Sam Howison, What shakes the FX tree? understanding currency dominance, dependence, and dynamics (keynote address), SPIE Third International Symposium on Fluctuations and Noise, International Society for Optics and Photonics, 2005, pp. 86–99. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 51 / 64
  • 52. References IX Anton Kocheturov, Mikhail Batsyn, and Panos M Pardalos, Dynamics of cluster structures in a financial market network, Physica A: Statistical Mechanics and its Applications 413 (2014), 523–533. L Kullmann, J Kertesz, and RN Mantegna, Identification of clusters of companies in stock indices via potts super-paramagnetic transitions, Physica A: Statistical Mechanics and its Applications 287 (2000), no. 3, 412–419. Philipp Kr¨uger, Augustin Landier, and David Thesmar, Categorization bias in the stock market, Available SSRN 2034204 (2012). Dror Y Kenett, Tobias Preis, Gitit Gur-Gershgoren, and Eshel Ben-Jacob, Dependency network and node influence: application to the study of financial markets, International Journal of Bifurcation and Chaos 22 (2012), no. 07, 1250181. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 52 / 64
  • 53. References X Dror Y Kenett, Yoash Shapira, Asaf Madi, Sharron Bransburg-Zabary, Gitit Gur-Gershgoren, and Eshel Ben-Jacob, Dynamics of stock market correlations, AUCO Czech Economic Review 4 (2010), no. 3, 330–341. Dror Y Kenett, Michele Tumminello, Asaf Madi, Gitit Gur-Gershgoren, Rosario N Mantegna, and Eshel Ben-Jacob, Dominating clasp of the financial sector revealed by partial correlation analysis of the stock market, PloS one 5 (2010), no. 12, e15032. Zura Kakushadze and Willie Yu, Statistical industry classification. Gan Siew Lee and Maman A Djauhari, Multidimensional stock network analysis: An Escoufier’s RV coefficient approach, AIP Conference Proceedings, vol. 1, 2013, pp. 550–555. Elisa Letizia and Fabrizio Lillo, Corporate payments networks and credit risk rating. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 53 / 64
  • 54. References XI Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong, and Mark Flood, Clustering techniques and their effect on portfolio formation and risk analysis, Proceedings of the International Workshop on Data Science for Macro-Modeling, ACM, 2014, pp. 1–6. Nicol´o Musmeci, Tomaso Aste, and Tiziana Di Matteo, Relation between financial market structure and the real economy: comparison between clustering methods, PloS one 10 (2015), no. 3, e0116201. Nicol´o Musmeci, Tomaso Aste, and T Di Matteo, Interplay between past market correlation structure changes and future volatility outbursts, Scientific reports 6 (2016). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 54 / 64
  • 55. References XII Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe Donnat, Clustering financial time series: How long is enough?, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, 2016, pp. 2583–2589. Raffaello Morales, T Di Matteo, and Tomaso Aste, Dependency structure and scaling properties of financial time series are related, Scientific Reports 4 (2014), no. 4589. Guido Previde Massara, Tiziana Di Matteo, and Tomaso Aste, Network filtering for big data: triangulated maximally filtered graph, Journal of complex Networks 5 (2016), no. 2, 161–178. Mel MacMahon and Diego Garlaschelli, Community detection for correlation matrices, Phys. Rev. X 5 (2015), 021006. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 55 / 64
  • 56. References XIII Federico Musciotto, Luca Marotta, Salvatore Miccich`e, and Rosario N Mantegna, Bootstrap validation of links of a minimum spanning tree, arXiv preprint arXiv:1802.03395 (2018). Gautier Marti, Frank Nielsen, and Philippe Donnat, Optimal copula transport for clustering multivariate time series, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2016, pp. 2379–2383. David Matesanz and Guillermo J Ortega, Sovereign public debt crisis in europe. a network analysis, Physica A: Statistical Mechanics and its Applications 436 (2015), 756–766. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 56 / 64
  • 57. References XIV Gautier Marti, Philippe Very, Philippe Donnat, and Frank Nielsen, A proposal of a methodological framework with experimental guidelines to investigate clustering stability on financial time series, 14th IEEE International Conference on Machine Learning and Applications, ICMLA 2015, Miami, FL, USA, December 9-11, 2015, 2015, pp. 32–37. J-P Onnela, Anirban Chakraborti, Kimmo Kaski, Janos Kertesz, and Antti Kanto, Dynamics of market correlations: Taxonomy and portfolio analysis, Physical Review E 68 (2003), no. 5, 056110. J-P Onnela, A Chakraborti, K Kaski, and J Kerti´esz, Dynamic asset trees and portfolio analysis, The European Physical Journal B-Condensed Matter and Complex Systems 30 (2002), no. 3, 285–288. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 57 / 64
  • 58. References XV J-P Onnela, Kimmo Kaski, and Janos Kert´esz, Clustering and information in correlation based financial networks, The European Physical Journal B-Condensed Matter and Complex Systems 38 (2004), no. 2, 353–362. Francesco Pozzi, Tiziana Di Matteo, and Tomaso Aste, Spread of risk across financial markets: better to invest in the peripheries, Scientific reports 3 (2013). Vasiliki Plerou, P Gopikrishnan, Bernd Rosenow, LA Nunes Amaral, and H Eugene Stanley, A random matrix theory approach to financial cross-correlations, Physica A: Statistical Mechanics and its Applications 287 (2000), no. 3, 374–382. Don B Panton, V Parker Lessig, and O Maurice Joy, Comovement of international equity markets: a taxonomic approach, Journal of Financial and Quantitative Analysis 11 (1976), no. 03, 415–432. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 58 / 64
  • 59. References XVI Jochen Papenbrock and Peter Schwendner, Handling risk-on/risk-off dynamics with correlation regimes and correlation networks, Financial Markets and Portfolio Management 29 (2015), no. 2, 125–147. Gustavo Peralta and Abalfazl Zareei, A network approach to portfolio selection, Journal of Empirical Finance (2016). Fei Ren, Ya-Nan Lu, Sai-Ping Li, Xiong-Fei Jiang, Li-Xin Zhong, and Tian Qiu, Dynamic portfolio strategy using clustering approach, arXiv preprint arXiv:1608.03058 (2016). Jacopo Rocchi, Enoch Yan Lok Tsui, and David Saad, Emerging interdependence between stock values during financial crashes, arXiv preprint arXiv:1611.02549 (2016). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 59 / 64
  • 60. References XVII Won-Min Song, Tiziana Di Matteo, and Tomaso Aste, Nested hierarchies in planar graphs, Discrete Applied Mathematics 159 (2011), no. 17, 2135–2146. Won-Min Song, T Di Matteo, and Tomaso Aste, Hierarchical information clustering by means of topologically embedded graphs, PLoS One 7 (2012), no. 3, e31929. Ahmet Sensoy and Benjamin M Tabak, Dynamic spanning trees in stock market networks: The case of Asia-Pacific, Physica A: Statistical Mechanics and its Applications 414 (2014), 387–402. Dong-Ming Song, Michele Tumminello, Wei-Xing Zhou, and Rosario N Mantegna, Evolution of worldwide stock markets, correlation structure, and correlation-based graphs, Physical Review E 84 (2011), no. 2, 026108. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 60 / 64
  • 61. References XVIII Tiziano Squartini, Iman Van Lelyveld, and Diego Garlaschelli, Early-warning signals of topological collapse in interbank networks, Scientific reports 3 (2013). Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and Rosario N Mantegna, A tool for filtering information in complex systems, Proceedings of the National Academy of Sciences of the United States of America 102 (2005), no. 30, 10421–10426. Michele Tumminello, Claudia Coronnello, Fabrizio Lillo, Salvatore Micciche, and Rosario N Mantegna, Spanning trees and bootstrap reliability estimation in correlation-based networks, International Journal of Bifurcation and Chaos 17 (2007), no. 07, 2319–2329. Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario N Mantegna, Cluster analysis for portfolio optimization, Journal of Economic Dynamics and Control 32 (2008), no. 1, 235–258. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 61 / 64
  • 62. References XIX Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna, Hierarchically nested factor model from multivariate data, EPL (Europhysics Letters) 78 (2007), no. 3, 30006. , Kullback-leibler distance as a measure of the information filtered from multivariate data, Physical Review E 76 (2007), no. 3, 031123. , Correlation, hierarchies, and networks in financial markets, Journal of Economic Behavior & Organization 75 (2010), no. 1, 40–58. Michele Tumminello, Salvatore Miccich`e, Fabrizio Lillo, Jyrki Piilo, and Rosario N Mantegna, Statistically validated networks in bipartite complex systems, PloS one 6 (2011), no. 3, e17994. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 62 / 64
  • 63. References XX Chengyi Tu, Cointegration-based financial networks study in chinese stock market, Physica A: Statistical Mechanics and its Applications 402 (2014), 245–254. Tom´aˇs V`yrost, ˇStefan Ly´ocsa, and Eduard Baum¨ohl, Granger causality stock market networks: Temporal proximity and preferential attachment, Physica A: Statistical Mechanics and its Applications 427 (2015), 262–276. Liuren Wu, Centrality of the supply chain network. Yiting Zhang, Gladys Hui Ting Lee, Jian Cheng Wong, Jun Liang Kok, Manamohan Prusty, and Siew Ann Cheong, Will the us economy recover in 2010? a minimal spanning tree study, Physica A: Statistical Mechanics and its Applications 390 (2011), no. 11, 2020–2050. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 63 / 64
  • 64. References XXI Xin Zhang, Boris Podobnik, Dror Y Kenett, and H Eugene Stanley, Systemic risk and causality dynamics of the world international shipping market, Physica A: Statistical Mechanics and its Applications 415 (2014), 43–53. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 64 / 64