Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A review of two decades of correlations, hierarchies, networks and clustering in financial markets

214 views

Published on

Opinionated review of two decades of correlations, hierarchies,
networks and clustering in financial markets presented at Ton Duc Thang University in Ho Chi Minh City, Vietnam.

Published in: Economy & Finance
  • Be the first to comment

A review of two decades of correlations, hierarchies, networks and clustering in financial markets

  1. 1. A review of two decades of correlations, hierarchies, networks and clustering in financial markets Ton Duc Thang University, Ho Chi Minh City, Vietnam Gautier Marti, Frank Nielsen, Mikolaj Bi´nkowski, Philippe Donnat Ecole Polytechnique, Imperial College London, Hellebore Capital Ltd. 10 August 2018 HELLEBORECAPITAL Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 1 / 64
  2. 2. Table of contents 1 Introduction 2 Correlation networks The standard and widely adopted methodology Concerns about the standard methodology Contributions for improving the methodology On algorithms On distances On other methodological aspects 3 Other networks 4 Dynamics of networks 5 Applications 6 Opinionated views on research directions Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 2 / 64
  3. 3. Section 1 Introduction Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 3 / 64
  4. 4. Introduction Motivation: A better understanding of financial markets using a scientific approach. Empirical studies are using data to verify hypotheses and discover stylized facts. Example of datasets: price, volume, returns, turnover time series supply chain networks market (OTC, exchange) transaction data retail transactional data (credit cards) corporate payments networks international trade (import/export) networks, ... Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 4 / 64
  5. 5. Introduction Several research fields are tackling the problem with their own tools: statistical physics, econophyics: Minimum Spanning Tree (MST) Random Matrix Theory (RMT) linear correlations statistics, data mining, machine learning: graph theory communities detection clustering algorithms non-linear dependence alternative distances statistical significance and robustness check via bootstrapping economics, finance, accounting, behavioural finance: standard industry and fundamental classifications vs. statistical and text-based classifications networks of trades, suppliers, consumers, competitors, investors linear regressions on network statistics, statistical significance through t-stats Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 5 / 64
  6. 6. Section 2 Correlation networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 6 / 64
  7. 7. The standard and widely adopted methodology (Mantegna, 1999) [add the proper biblio ref] Let N be the number of assets. Let Pi (t) be the price at time t of asset i, 1 ≤ i ≤ N. Let ri (t) be the log-return at time t of asset i: ri (t) = log Pi (t) − log Pi (t − 1). For each pair i, j of assets, compute their correlation: ρij = ri rj − ri rj r2 i − ri 2 r2 j − rj 2 . Convert the correlation coefficients ρij into distances: dij = 2(1 − ρij ). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 7 / 64
  8. 8. The standard and widely adopted methodology From all the distances dij , compute a minimum spanning tree (MST) using, for example, Algorithm 1: Algorithm 1 Kruskal’s algorithm 1: procedure BuildMST({dij }1≤i,j≤N) 2: Start with a fully disconnected graph G = (V , E) 3: E ← ∅ 4: V ← {i}1≤i≤N 5: Try to add edges by increasing distances 6: for (i, j) ∈ V 2 ordered by increasing dij do 7: Verify that i and j are not already connected by a path 8: if not connected(i, j) then 9: Add the edge (i, j) to connect i and j 10: E ← E ∪ {(i, j)} 11: G is the resulting MST return G = (V , E) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 8 / 64
  9. 9. The standard and widely adopted methodology Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 9 / 64
  10. 10. Concerns about the standard methodology The clusters obtained from the MST (or equivalently, the Single Linkage Clustering Algorithm (SLCA)) are known to be unstable (small perturbations of the input data may cause big differences in the resulting clusters) [MVDN15]. The clustering instability may be partly due to the algorithm (MST/Single Linkage are known for the chaining phenomenon [CM10]). The clustering instability may be partly due to the correlation coefficient (Pearson linear correlation) defining the distance which is known for being brittle to outliers, and, more generally, not well suited to distributions other than the Gaussian ones [DMV16]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 10 / 64
  11. 11. Single Linkage chaining problem... makes it brittle to small perturbations in the input distances. Clusters and hierarchies are skewed: It does not take into account some notion of density. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 11 / 64
  12. 12. Pearson linear correlation... is too sensitive to outliers. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 12 / 64
  13. 13. Concerns about the standard methodology Theoretical results providing the statistical reliability of hierarchical trees and correlation-based networks are still not available [TLM10]. One might expect that the higher the correlation associated to a link in a correlation-based network is, the higher the reliability of this link is. In [TCL+07], authors show that this is not always observed empirically. Changes affecting specific links (and clusters) during prominent crises are of difficult interpretation due to the high level of statistical uncertainty associated with the correlation estimation [STZM11]. The standard method is somewhat arbitrary: A change in the method (e.g. using a different clustering algorithm or a different correlation coefficient) may yield a huge change in the clustering results [LRW+14, MVDN15]. As a consequence, it implies huge variability in portfolio formation and perceived risk [LRW+14]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 13 / 64
  14. 14. Variance of the Pearson correlation estimator Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 14 / 64
  15. 15. CRLB of the Pearson correlation estimator - Proof Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 15 / 64
  16. 16. Random Matrix Theory & Empirical correlation matrices Let X be the matrix storing the standardized returns of N = 560 assets (credit default swaps) over a period of T = 2500 trading days. Then, the empirical correlation matrix of the returns is C = 1 T XX . We can compute the empirical density of its eigenvalues ρ(λ) = 1 N dn(λ) dλ , where n(λ) counts the number of eigenvalues of C less than λ. From random matrix theory, the Marchenko-Pastur distribution gives the limit distribution as N → ∞, T → ∞ and T/N fixed. It reads: ρ(λ) = T/N 2π (λmax − λ)(λ − λmin) λ , where λmax min = 1 + N/T ± 2 N/T, and λ ∈ [λmin, λmax]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 16 / 64
  17. 17. Random Matrix Theory & Empirical correlation matrices Notice that the Marchenko-Pastur density fits well the empirical density meaning that most of the information contained in the empirical correlation matrix amounts to noise: only 26 eigenvalues are greater than λmax. The highest eigenvalue corresponds to the ‘market’, the 25 others can be associated to ‘industrial sectors’. It is a known stylized fact of empirical correlation matrices between financial returns: Only ≈ 5% of their eigenvalues are greater than λmax. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 17 / 64
  18. 18. A somewhat arbitrary choice of methodology The standard method is somewhat arbitrary. Adopting another one may yield strongly different results. Which ones to trust? Are they both useful? Clusters obtained are much different from one method to another Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 18 / 64
  19. 19. Contributions on algorithms Several alternative algorithms have been proposed to replace the minimum spanning tree and its corresponding clusters: Average Linkage Minimum Spanning Tree (ALMST) [TCL+07]; Authors introduce a spanning tree associated to the Average Linkage Clustering Algorithm (ALCA); It is designed to remedy the unwanted chaining phenomenon of MST/SLCA. Planar Maximally Filtered Graph (PMFG) [ADMH05, TADMM05] which strictly contains the Minimum Spanning Tree (MST) but encodes a larger amount of information in its internal structure. Directed Bubble Hierarchal Tree (DBHT) [SDMA11, SDMA12] which is designed to extract, without parameters, the deterministic clusters from the PMFG. Triangulated Maximally Filtered Graph (TMFG) [MDMA16]; Authors introduce another filtered graph more suitable for big datasets. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 19 / 64
  20. 20. Contributions on algorithms (cont’d) Clustering using Potts super-paramagnetic transitions [KKM00]; When anti-correlations occur, the model creates repulsion between the stocks which modify their clustering structure. Clustering using maximum likelihood [GM01, GM02]; Authors define the likelihood of a clustering based on a simple 1-factor model, then devise parameter-free methods to find a clustering with high likelihood. Clustering using Random Matrix Theory (RMT) [PGR+00]; Eigenvalues help to determine the number of clusters, and eigenvectors their composition. [MG15] proposes network-based community detection methods whose null hypothesis is consistent with RMT results on cross-correlation matrices for financial time series data, unlike existing community detection algorithms. Clustering using the p-median problem [KBP14]; With this construction, every cluster is a star, i.e. a tree with one central node. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 20 / 64
  21. 21. Planar Maximally Filtered Graph (PMFG) The PMFG is a compelling alternative to the MST. PMFG nodes are colored according to the clusters obtained from DBHT Implementation of the PMFG in Python: https: //gmarti.gitlab.io/networks/2018/06/03/pmfg-algorithm.html Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 21 / 64
  22. 22. Contributions on distances At the heart of clustering algorithms is the fundamental notion of distance that can be defined upon a proper representation of data. It is thus an obvious direction to explore. We list below what has been proposed in the literature so far: Distances that try to quantify how one financial instrument provides information about another instrument: Distance using Granger causality [BGLP12], Distance using partial correlation [KTM+ 10], Study of asynchronous, lead-lag relationships by using mutual information instead of Pearson’s correlation coefficient [Fie14a, RTS16], The correlation matrix is normalized using the affinity transformation: the correlation between each pair of stocks is normalized according to the correlations of each of the two stocks with all other stocks [KSM+ 10]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 22 / 64
  23. 23. Contributions on distances (cont’d) Distances that aim at including non-linear relationships in the analysis: Distances using mutual information, mutual information rate, and other information-theoretic distances [Fie14b, RTS16, BP17a, BP17b, GHA18, GZT18], The Brownian distance [ZPKS14], Copula-based [MND16, DP15, B+ 13] and tail dependence [DFPW15] distances. Distances that aim at taking into account multivariate dependence: Each stock is represented by a bivariate time series: its returns and traded volumes [BR08]; a distance is then applied to an ad hoc transform of the two time series into a symbolic sequence, Each stock is represented by a multivariate time series, for example the daily (high, low, open, close) [LD13]; Authors use the Escoufier’s RV coefficient (a multivariate extension of the Pearson’s correlation coefficient). A distance taking into account both the correlation between returns and their distributions [DMV16]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 23 / 64
  24. 24. Contributions on distances (cont’d) Unlike recent studies which claim that the existence of nonlinear dependence between stock returns have effects on network characteristics, [HH18] documents that “most of the apparent nonlinearity is due to univariate non-Gaussianity. Further, strong non-stationarity in a few specific stocks may play a role. In particular, the sharp decrease of some stocks during the global financial crisis in 2008” gives rise to apparent negative tail dependence among stocks. When constructing unweighted stock networks, they suggest to use linear correlation “on marginally normalized data”, that is Spearman’s rank correlation. In fact, this is similar to the idea of splitting apart the dependence information from the distribution one as in [DMV16], where Spearman’s rank correlation stems from using a Euclidean distance between the uniform margins of the underlying bivariate copula. Following previous studies, and unlike in [DMV16], the distribution information is discarded when constructing the network. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 24 / 64
  25. 25. Dependence and marginal distribution of the returns Theorem (Sklar’s theorem, 1959) For any random vector X = (X1, . . . , XN) having continuous marginal cumulative distribution functions Fi , its joint cumulative distribution F is uniquely expressed as F(X1, . . . , XN) = C(F1(X1), . . . , FN(XN)), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 25 / 64
  26. 26. Information-theoretic distances vs. Copula-based ones? Copula entropy: Hc(x) = − u c(u) log c(u)du Mutual information: I(x) = x p(x) log p(x) i pi (xi ) dx = x c(ux ) i pi (xi ) log c(ux )dx = u c(ux ) log c(ux )dux = −Hc(x) Entropy: H(x) = i H(xi ) + Hc(x) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 26 / 64
  27. 27. Contributions on other methodological aspects Reliability and statistical uncertainty of the methods: A bootstrap approach is used to estimate the statistical reliability of both hierarchical trees [TLM07a, MAND16] and correlation-based networks [TCL+ 07, MMMM18], Consistency proof of clustering algorithms for recovering clusters defined by nested block correlation matrices; Study of empirical convergence rates [MAND16], Kullback-Leibler divergence is used to estimate the amount of filtered information between the sample correlation matrix and the filtered one [TLM07b], Cophenetic correlation is used between the original correlation distances and the hierarchical cluster representation [PS15], Several measures between successive (in time) clusters, dendrograms, networks are used to estimate stability of the methods, e.g. cophenetic correlation between dendrograms in [PLJ76], adjusted Rand index (ARI) between clusters in [MVDN15], mutual information (MI) of link co-occurrence between networks in [STZM11]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 27 / 64
  28. 28. Contributions on other methodological aspects (cont’d) Preprocessing of the time series: Subtract the market mode before performing a cluster or network analysis on the returns [BMM07], Encode both rank statistics and a distribution histogram of the returns into a representative vector [DMV16], Fit an ARMA(p,q)-FIEGARCH(1,d,1)-cDCC process (econometric preprocessing) to obtain dynamic correlations instead of the common approach of rolling window Pearson correlations [ST14], Use a clustering of successive correlation matrices to infer a market state [PS15]. Use of other types of networks: threshold networks [OKK04], influence networks [GZC15], partial-correlation networks [KTM+10, KPGGBJ12], Granger causality networks [BGLP12, VLB15], cointegration-based networks [Tu14], bipartite networks [TML+11], etc. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 28 / 64
  29. 29. Consistency and empirical convergence rates [MAND16] Model selection: The faster the (empirical) convergence, the better. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 29 / 64
  30. 30. Statistical & practical stability One can use bootstrap, block bootstrap or other common sense and practical perturbations of the data as presented in [MVDN15]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 30 / 64
  31. 31. Effect of a basic preprocessing: Subtract the market mode Visualization of the Planar Maximally Filtered Graph (PMFG) and DBHT clusters, for both non-detrended (left) and detrended (right) log-returns [MADM15]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 31 / 64
  32. 32. Section 3 Other networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 32 / 64
  33. 33. Examples of other financial networks supply chain networks [Wu15] investor (security holdings and trading behaviour) networks [BKES18] corporate board and director networks [BC04] international trade networks [BFG10] transaction networks [LL18] sovereign debt (quarterly public debt-to-GDP ratio) networks [MO15] interbank (exposures between banks) networks [SVLG13] These networks are built from alternative data which are often: confidential hard or costly to obtain Most often these studies are done in collaboration with a commercial or regulatory organization. Some of these datasets may contain significant alphas, and thus results are not publicly advertised: Papers are relatively few in contrast to the ones on the correlation of asset returns which are more oriented toward risk understanding. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 33 / 64
  34. 34. Section 4 Dynamics of networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 34 / 64
  35. 35. Studying the dynamics of networks Comparing and finding the differences in a sequence of large graphs is a computationally difficult problem. In the literature, one often studies the following statistics: (for networks) the normalized tree length [OCK+03], the mean occupation layer [OCK+03], the tree half-life [OCK+03], a survival ratio of the edges [OCKK02, JMS+05, ST14], node degree, strength [ST14], eigenvector, betweenness, closeness centrality [ST14], the agglomerative coefficient [MO15] (for clusters) the merging, splitting, birth, death, contraction, and growth of the clusters in time [PS15] Remark. To the best of my knowledge, graph embedding into vector spaces (cf. the recent Deep Learning literature, or this survey [GF18]) have not been used to study time series of financial networks. Such a vector representation would open the field to the toolbox of standard machine learning algorithms: Cluster networks and find those which are associated to some events (e.g. a crisis); Predict the future networks in a sequence of networks with a LSTM (stat arb?); Detect a structural break, etc. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 35 / 64
  36. 36. Section 5 Applications Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 36 / 64
  37. 37. Portfolio optimization [OCK+ 03] finds that the Markowitz portfolio layer in the MST is higher than the mean layer at all times. As the stocks of the minimum risk portfolio are found on the outskirts of the tree [PDMA13, OCK+ 03], authors expect larger trees to have greater diversification potential. In [TLGM08, PLJ76], authors compare the Markowitz portfolios from the filtered empirical correlation matrices using the clustering approach, the RMT approach and the shrinkage approach. [RLL+ 16, PZ16] propose to invest in different part of the MST depending on the estimated market conditions. Authors show that there is no inner-mathematical relationship between the minimum variance portfolio from Markowitz theory and the portfolios designed from the minimum spanning tree [HMM18]. Empirical evidence of such relations found by previous studies is essentially a stylized fact of financial returns correlations and time series, not a general property of correlation matrices. [DFPW15] introduces a procedure to design portfolios which are diversified in their tail behavior by selecting only a single asset in each cluster. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 37 / 64
  38. 38. Trading strategy Earnings per share forecasts prepared on the basis of statistically grouped data (clusters) outperform forecasts made on data grouped on traditional industrial criteria as well as forecasts prepared by mechanical extrapolation techniques [EG71]. One can build a simple mean-reversion statistical arbitrage strategy whereby one assumes that stocks in a given industry move together, cross-sectionally demeans stock returns within said industry, shorts stocks with positive residual returns and goes long stocks with negative residual returns [KY16]. In [PS15], they suggest that tracking the merging, splitting, birth, and death of the clusters in time could be the basis for pairs-like reversal trading strategies but with pairs corresponding to clusters. The paper [DC05] describes methods for index tracking and enhanced index tracking based on clusters of financial time series. [MADM16] finds the existence of significant relations between past changes in the market correlation structure and future changes in the market volatility. In [KLT12], authors claim that long-short strategies exploiting mispricing due to the industry categorization bias generate statistically significant and economically sizable risk-adjusted excess returns. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 38 / 64
  39. 39. Risk In [DPT14], authors design clusters that tend to be comonotonic in their extreme low values: To avoid contagion in the portfolio during risky scenarios, an investor should diversify over these clusters. In [MDMA14], authors postulate the existence of a hierarchical structure of risks which can be deemed responsible for both stock multivariate dependency structure and univariate multifractal behaviour, and then propose a model that reproduces the empirical observations (entanglement of univariate multi-scaling and multivariate cross-correlation properties of financial time series). The interplay between multi-scaling and average cross-correlation is confirmed in [BMDM18]. Clusters (statistical industry classification) can be an alternative to sometimes unavailable “fundamental” industry classifications (e.g. in emerging or small markets) [KY16]. [HZYU16] finds that financial institutions which have, in the correlation networks, greater node strength, larger node betweenness centrality, larger node closeness centrality and larger node clustering coefficient tend to be associated with larger systemic risk contributions. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 39 / 64
  40. 40. Financial policy making Clusters and networks can help designing financial policies. Several papers propose to leverage them to detect risky market environments, develop indicators that can predict forthcoming crisis or economic recovery [ZLW+ 11], improve economic nowcasting [EFC17], or find key markets and assets that drive a whole region, and on which stimulus can be applied effectively. Authors of [HSBYBY10] claim that “separation prevents failure propagation and connections increase risks of global crises” whereas the prevailing view in favor of deregulation is that banks, by investing in diverse sectors, would have greater stability. To support their argument, using financial networks, they study the aftermath of the Glass-Steagall Act (1933) repeal by Clinton administration in 1999. They find that erosion of the Glass–Steagall Act, and cross sector investments eliminated “firewalls” that could have prevented the housing sector decline from triggering a wider financial and economic crisis: Our analysis implies that the investment across economic sectors itself creates increased cross-linking of otherwise much more weakly coupled parts of the economy, causing dependencies that increase, rather than decrease, risk. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 40 / 64
  41. 41. Section 6 Opinionated views on research directions Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 41 / 64
  42. 42. Opinionated views on research directions What’s missing for “financial networks” to become a mature research field? Some inspiration from the booming deep learning era: lack of reproducibility provide code and data (at least synthetic datasets) difficulty to compare methods, re-implementation bias build open source libraries (standardized api, optimized code) open source software helps to engage more with practitioners confidential data provide synthetic datasets encoding stylized facts propose generative models (cf. the GAN literature applied to graphs) lack of evaluation metrics / no end-to-end approach define common tasks (e.g. evaluate the clustering or network methodology on portfolio optimization, crisis detection, mean reversion strategy) where all the details are specified (e.g. a well-chosen artificial dataset, or samples from a generative model, or public financial data) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 42 / 64
  43. 43. Thank you for the attention. Questions? Co-authorship network (left) and its MST (right) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 43 / 64
  44. 44. References I Tomaso Aste, Tiziana Di Matteo, and ST Hyde, Complex networks on hyperbolic surfaces, Physica A: Statistical Mechanics and its Applications 346 (2005), no. 1, 20–26. Eike Christian Brechmann et al., Hierarchical kendall copulas and the modeling of systemic and operational risk, Ph.D. thesis, Universit¨atsbibliothek der TU M¨unchen, 2013. Stefano Battiston and Michele Catanzaro, Statistical properties of corporate board and director networks, The European Physical Journal B 38 (2004), no. 2, 345–352. Matteo Barigozzi, Giorgio Fagiolo, and Diego Garlaschelli, Multinetwork of international trade: A commodity-specific analysis, Physical Review E 81 (2010), no. 4, 046104. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 44 / 64
  45. 45. References II Monica Billio, Mila Getmansky, Andrew W Lo, and Loriana Pelizzon, Econometric measures of connectedness and systemic risk in the finance and insurance sectors, Journal of Financial Economics 104 (2012), no. 3, 535–559. Kestutis Baltakys, Juho Kanniainen, and Frank Emmert-Streib, Multilayer aggregation with statistical validation: Application to investor networks, Scientific reports 8 (2018), no. 1, 8198. RJ Buonocore, RN Mantegna, and T Di Matteo, On the interplay between multiscaling and average cross-correlation, arXiv preprint arXiv:1802.01113 (2018). Christian Borghesi, Matteo Marsili, and Salvatore Miccich`e, Emergence of time-horizon invariant correlation structure in financial returns by subtraction of the market mode, Physical Review E 76 (2007), no. 2, 026104. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 45 / 64
  46. 46. References III Eduard Baitinger and Jochen Papenbrock, Interconnectedness risk and active portfolio management: The information-theoretic perspective. AQ Barbi and GA Prataviera, Nonlinear dependencies on brazilian equity network from mutual information minimum spanning trees, arXiv preprint arXiv:1711.06185 (2017). Juan Gabriel Brida and Wiston Adri´an Risso, Multidimensional minimal spanning tree: The dow jones case, Physica A: Statistical Mechanics and its Applications 387 (2008), no. 21, 5205–5210. Gunnar Carlsson and Facundo M˜AˇSmoli, Characterization, stability and convergence of hierarchical clustering methods, Journal of machine learning research 11 (2010), no. Apr, 1425–1470. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 46 / 64
  47. 47. References IV Christian Dose and Silvano Cincotti, Clustering of financial time series with application to index and enhanced index tracking portfolio, Physica A: Statistical Mechanics and its Applications 355 (2005), no. 1, 145–151. Fabrizio Durante, Enrico Foscolo, Roberta Pappad`a, and Hao Wang, A portfolio diversification strategy via tail dependence measures. Philippe Donnat, Gautier Marti, and Philippe Very, Toward a generic representation of random variables for machine learning, Pattern Recognition Letters 70 (2016), 24–31. Fabrizio Durante and Roberta Pappada, Cluster analysis of time series via kendall distribution, Strengthening Links Between Data Analysis and Soft Computing, Springer, 2015, pp. 209–216. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 47 / 64
  48. 48. References V Fabrizio Durante, Roberta Pappad`a, and Nicola Torelli, Clustering of financial time series in risky scenarios, Advances in Data Analysis and Classification 8 (2014), no. 4, 359–376. Mohammed Elshendy and Andrea Fronzetti Colladon, Big data analysis of economic news: Hints to forecast macroeconomic indicators, International Journal of Engineering Business Management 9 (2017), 1847979017720040. Edwin J Elton and Martin J Gruber, Improved forecasting through the design of homogeneous groups, The Journal of Business 44 (1971), no. 4, 432–450. Pawel Fiedor, Information-theoretic approach to lead-lag effect on financial markets, The European Physical Journal B 87 (2014), no. 8, 1–9. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 48 / 64
  49. 49. References VI , Networks in financial markets based on the mutual information rate, Physical Review E 89 (2014), no. 5, 052801. Palash Goyal and Emilio Ferrara, Graph embedding techniques, applications, and performance: A survey, Knowledge-Based Systems 151 (2018), 78–94. Yong Kheng Goh, Haslifah M Hasim, and Chris G Antonopoulos, Inference of financial networks using the normalised mutual information rate, PloS one 13 (2018), no. 2, e0192160. Lorenzo Giada and Matteo Marsili, Data clustering and noise undressing of correlation matrices, Physical Review E 63 (2001), no. 6, 061101. , Algorithms of maximum likelihood data clustering with applications, Physica A: Statistical Mechanics and its Applications 315 (2002), no. 3, 650–664. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 49 / 64
  50. 50. References VII Ya-Chun Gao, Yong Zeng, and Shi-Min Cai, Influence network in the Chinese stock market, Journal of Statistical Mechanics: Theory and Experiment 2015 (2015), no. 3, P03017. Xue Guo, Hu Zhang, and Tianhai Tian, Development of stock correlation networks using mutual information and financial big data, PloS one 13 (2018), no. 4, e0195941. David Hartman and Jaroslav Hlinka, Nonlinearity in stock networks, arXiv preprint arXiv:1804.10264 (2018). Amelie H¨uttner, Jan-Frederik Mai, and Stefano Mineo, Portfolio selection based on graphs: Does it align with markowitz-optimal portfolios?, Dependence Modeling (2018). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 50 / 64
  51. 51. References VIII Dion Harmon, Blake Stacey, Yavni Bar-Yam, and Yaneer Bar-Yam, Networks of economic market interdependence and systemic risk, arXiv preprint arXiv:1011.3707 (2010). Wei-Qiang Huang, Xin-Tian Zhuang, Shuang Yao, and Stan Uryasev, A financial network perspective of financial institutions’ systemic risk contributions, Physica A: Statistical Mechanics and its Applications 456 (2016), 183–196. Neil F Johnson, Mark McDonald, Omer Suleman, Stacy Williams, and Sam Howison, What shakes the FX tree? understanding currency dominance, dependence, and dynamics (keynote address), SPIE Third International Symposium on Fluctuations and Noise, International Society for Optics and Photonics, 2005, pp. 86–99. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 51 / 64
  52. 52. References IX Anton Kocheturov, Mikhail Batsyn, and Panos M Pardalos, Dynamics of cluster structures in a financial market network, Physica A: Statistical Mechanics and its Applications 413 (2014), 523–533. L Kullmann, J Kertesz, and RN Mantegna, Identification of clusters of companies in stock indices via potts super-paramagnetic transitions, Physica A: Statistical Mechanics and its Applications 287 (2000), no. 3, 412–419. Philipp Kr¨uger, Augustin Landier, and David Thesmar, Categorization bias in the stock market, Available SSRN 2034204 (2012). Dror Y Kenett, Tobias Preis, Gitit Gur-Gershgoren, and Eshel Ben-Jacob, Dependency network and node influence: application to the study of financial markets, International Journal of Bifurcation and Chaos 22 (2012), no. 07, 1250181. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 52 / 64
  53. 53. References X Dror Y Kenett, Yoash Shapira, Asaf Madi, Sharron Bransburg-Zabary, Gitit Gur-Gershgoren, and Eshel Ben-Jacob, Dynamics of stock market correlations, AUCO Czech Economic Review 4 (2010), no. 3, 330–341. Dror Y Kenett, Michele Tumminello, Asaf Madi, Gitit Gur-Gershgoren, Rosario N Mantegna, and Eshel Ben-Jacob, Dominating clasp of the financial sector revealed by partial correlation analysis of the stock market, PloS one 5 (2010), no. 12, e15032. Zura Kakushadze and Willie Yu, Statistical industry classification. Gan Siew Lee and Maman A Djauhari, Multidimensional stock network analysis: An Escoufier’s RV coefficient approach, AIP Conference Proceedings, vol. 1, 2013, pp. 550–555. Elisa Letizia and Fabrizio Lillo, Corporate payments networks and credit risk rating. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 53 / 64
  54. 54. References XI Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong, and Mark Flood, Clustering techniques and their effect on portfolio formation and risk analysis, Proceedings of the International Workshop on Data Science for Macro-Modeling, ACM, 2014, pp. 1–6. Nicol´o Musmeci, Tomaso Aste, and Tiziana Di Matteo, Relation between financial market structure and the real economy: comparison between clustering methods, PloS one 10 (2015), no. 3, e0116201. Nicol´o Musmeci, Tomaso Aste, and T Di Matteo, Interplay between past market correlation structure changes and future volatility outbursts, Scientific reports 6 (2016). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 54 / 64
  55. 55. References XII Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe Donnat, Clustering financial time series: How long is enough?, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, 2016, pp. 2583–2589. Raffaello Morales, T Di Matteo, and Tomaso Aste, Dependency structure and scaling properties of financial time series are related, Scientific Reports 4 (2014), no. 4589. Guido Previde Massara, Tiziana Di Matteo, and Tomaso Aste, Network filtering for big data: triangulated maximally filtered graph, Journal of complex Networks 5 (2016), no. 2, 161–178. Mel MacMahon and Diego Garlaschelli, Community detection for correlation matrices, Phys. Rev. X 5 (2015), 021006. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 55 / 64
  56. 56. References XIII Federico Musciotto, Luca Marotta, Salvatore Miccich`e, and Rosario N Mantegna, Bootstrap validation of links of a minimum spanning tree, arXiv preprint arXiv:1802.03395 (2018). Gautier Marti, Frank Nielsen, and Philippe Donnat, Optimal copula transport for clustering multivariate time series, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2016, pp. 2379–2383. David Matesanz and Guillermo J Ortega, Sovereign public debt crisis in europe. a network analysis, Physica A: Statistical Mechanics and its Applications 436 (2015), 756–766. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 56 / 64
  57. 57. References XIV Gautier Marti, Philippe Very, Philippe Donnat, and Frank Nielsen, A proposal of a methodological framework with experimental guidelines to investigate clustering stability on financial time series, 14th IEEE International Conference on Machine Learning and Applications, ICMLA 2015, Miami, FL, USA, December 9-11, 2015, 2015, pp. 32–37. J-P Onnela, Anirban Chakraborti, Kimmo Kaski, Janos Kertesz, and Antti Kanto, Dynamics of market correlations: Taxonomy and portfolio analysis, Physical Review E 68 (2003), no. 5, 056110. J-P Onnela, A Chakraborti, K Kaski, and J Kerti´esz, Dynamic asset trees and portfolio analysis, The European Physical Journal B-Condensed Matter and Complex Systems 30 (2002), no. 3, 285–288. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 57 / 64
  58. 58. References XV J-P Onnela, Kimmo Kaski, and Janos Kert´esz, Clustering and information in correlation based financial networks, The European Physical Journal B-Condensed Matter and Complex Systems 38 (2004), no. 2, 353–362. Francesco Pozzi, Tiziana Di Matteo, and Tomaso Aste, Spread of risk across financial markets: better to invest in the peripheries, Scientific reports 3 (2013). Vasiliki Plerou, P Gopikrishnan, Bernd Rosenow, LA Nunes Amaral, and H Eugene Stanley, A random matrix theory approach to financial cross-correlations, Physica A: Statistical Mechanics and its Applications 287 (2000), no. 3, 374–382. Don B Panton, V Parker Lessig, and O Maurice Joy, Comovement of international equity markets: a taxonomic approach, Journal of Financial and Quantitative Analysis 11 (1976), no. 03, 415–432. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 58 / 64
  59. 59. References XVI Jochen Papenbrock and Peter Schwendner, Handling risk-on/risk-off dynamics with correlation regimes and correlation networks, Financial Markets and Portfolio Management 29 (2015), no. 2, 125–147. Gustavo Peralta and Abalfazl Zareei, A network approach to portfolio selection, Journal of Empirical Finance (2016). Fei Ren, Ya-Nan Lu, Sai-Ping Li, Xiong-Fei Jiang, Li-Xin Zhong, and Tian Qiu, Dynamic portfolio strategy using clustering approach, arXiv preprint arXiv:1608.03058 (2016). Jacopo Rocchi, Enoch Yan Lok Tsui, and David Saad, Emerging interdependence between stock values during financial crashes, arXiv preprint arXiv:1611.02549 (2016). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 59 / 64
  60. 60. References XVII Won-Min Song, Tiziana Di Matteo, and Tomaso Aste, Nested hierarchies in planar graphs, Discrete Applied Mathematics 159 (2011), no. 17, 2135–2146. Won-Min Song, T Di Matteo, and Tomaso Aste, Hierarchical information clustering by means of topologically embedded graphs, PLoS One 7 (2012), no. 3, e31929. Ahmet Sensoy and Benjamin M Tabak, Dynamic spanning trees in stock market networks: The case of Asia-Pacific, Physica A: Statistical Mechanics and its Applications 414 (2014), 387–402. Dong-Ming Song, Michele Tumminello, Wei-Xing Zhou, and Rosario N Mantegna, Evolution of worldwide stock markets, correlation structure, and correlation-based graphs, Physical Review E 84 (2011), no. 2, 026108. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 60 / 64
  61. 61. References XVIII Tiziano Squartini, Iman Van Lelyveld, and Diego Garlaschelli, Early-warning signals of topological collapse in interbank networks, Scientific reports 3 (2013). Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and Rosario N Mantegna, A tool for filtering information in complex systems, Proceedings of the National Academy of Sciences of the United States of America 102 (2005), no. 30, 10421–10426. Michele Tumminello, Claudia Coronnello, Fabrizio Lillo, Salvatore Micciche, and Rosario N Mantegna, Spanning trees and bootstrap reliability estimation in correlation-based networks, International Journal of Bifurcation and Chaos 17 (2007), no. 07, 2319–2329. Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario N Mantegna, Cluster analysis for portfolio optimization, Journal of Economic Dynamics and Control 32 (2008), no. 1, 235–258. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 61 / 64
  62. 62. References XIX Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna, Hierarchically nested factor model from multivariate data, EPL (Europhysics Letters) 78 (2007), no. 3, 30006. , Kullback-leibler distance as a measure of the information filtered from multivariate data, Physical Review E 76 (2007), no. 3, 031123. , Correlation, hierarchies, and networks in financial markets, Journal of Economic Behavior & Organization 75 (2010), no. 1, 40–58. Michele Tumminello, Salvatore Miccich`e, Fabrizio Lillo, Jyrki Piilo, and Rosario N Mantegna, Statistically validated networks in bipartite complex systems, PloS one 6 (2011), no. 3, e17994. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 62 / 64
  63. 63. References XX Chengyi Tu, Cointegration-based financial networks study in chinese stock market, Physica A: Statistical Mechanics and its Applications 402 (2014), 245–254. Tom´aˇs V`yrost, ˇStefan Ly´ocsa, and Eduard Baum¨ohl, Granger causality stock market networks: Temporal proximity and preferential attachment, Physica A: Statistical Mechanics and its Applications 427 (2015), 262–276. Liuren Wu, Centrality of the supply chain network. Yiting Zhang, Gladys Hui Ting Lee, Jian Cheng Wong, Jun Liang Kok, Manamohan Prusty, and Siew Ann Cheong, Will the us economy recover in 2010? a minimal spanning tree study, Physica A: Statistical Mechanics and its Applications 390 (2011), no. 11, 2020–2050. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 63 / 64
  64. 64. References XXI Xin Zhang, Boris Podobnik, Dror Y Kenett, and H Eugene Stanley, Systemic risk and causality dynamics of the world international shipping market, Physica A: Statistical Mechanics and its Applications 415 (2014), 43–53. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 64 / 64

×