SlideShare a Scribd company logo
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
On clustering financial time series
A need for distances between dependent random variables
Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat
24 September 2015
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
1 Introduction
2 Dependence and Distribution
3 Toward an extension to the multivariate case
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Motivations: Why clustering?
Motivations:
Mathematical finance: Use of variance-covariance matrices
(e.g., Markowitz, Value-at-Risk)
Stylized fact: Empirical
variance-covariance matrices
estimated on financial time
series are very noisy
(Random Matrix Theory,
Noise Dressing of Financial
Correlation Matrices, Laloux
et al, 1999)
Figure: Marchenko-Pastur
distribution vs. eigenvalues of the
empirical correlation matrix
How to filter these variance-covariance matrices?
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Information filtering? Clustering!
Mantegna (1999) et al’s work:
Limits: focus on ρij (Pearson correlation) which is not robust to
outliers / heavy tails → could lead to spurious clusters
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Modelling
Asset i variations or returns follow random variable Xi
Assets variations or returns are ”correlated”
i.i.d. observations:
X1 : X1
1 , X2
1 , . . . , XT
1
X2 : X1
2 , X2
2 , . . . , XT
2
. . . , . . . , . . . , . . . , . . .
XN : X1
N, X2
N, . . . , XT
N
Which distances d(Xi , Xj ) between dependent random variables?
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
1 Introduction
2 Dependence and Distribution
3 Toward an extension to the multivariate case
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Pitfalls of a basic distance
Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N(µX , σ2
X ),
Y ∼ N(µY , σ2
Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1].
E[(X − Y )2
] = (µX − µY )2
+ (σX − σY )2
+ 2σX σY (1 − ρ(X, Y ))
Now, consider the following values for correlation:
ρ(X, Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2
X + σ2
Y .
Assume µX = µY and σX = σY . For σX = σY 1, we
obtain E[(X − Y )2] 1 instead of the distance 0, expected
from comparing two equal Gaussians.
ρ(X, Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Pitfalls of a basic distance
(Marti, Nielsen, Very, Donnat, ICMLA 2015)
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
The Financial Engineer Bias: Correlation
correlation patterns are blatant
Mantegna et al. aim at filtering information from the
correlation matrix using clustering
O(N2) (correlation) vs. O(N) (distribution) parameters
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Information Geometry and its statistical distances
original poster: http://www.sonycsl.co.jp/person/nielsen/FrankNielsen-distances-figs.pdf
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Sklar’s Theorem and the Copula Transform
Theorem (Sklar’s Theorem (1959))
For any random vector X = (X1, . . . , XN) having continuous
marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is
uniquely expressed as
P(X1, . . . , XN) = C(P1(X1), . . . , PN(XN)),
where C, the multivariate distribution of uniform marginals, is
known as the copula of X.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Sklar’s Theorem and the Copula Transform
Definition (The Copula Transform)
Let X = (X1, . . . , XN) be a random vector with continuous
marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N.
The random vector
U = (U1, . . . , UN) := P(X) = (P1(X1), . . . , PN(XN))
is known as the copula transform.
Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probability
integral transform): for Pi the cdf of Xi , we have
x = Pi (Pi
−1
(x)) = Pr(Xi ≤ Pi
−1
(x)) = Pr(Pi (Xi ) ≤ x), thus
Pi (Xi ) ∼ U[0, 1].
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Distance Design
d2
θ (Xi , Xj ) = θ3E |Pi (Xi ) − Pj (Xj )|2
+ (1 − θ)
1
2 R
dPi
dλ
−
dPj
dλ
2
dλ
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Results: Data from Hierarchical Block Model
Adjusted Rand Index
Algo. Distance A B C
HC-AL
(1 − ρ)/2 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01
E[(X − Y )2
] 0.00 ±0.00 0.09 ±0.12 0.55 ±0.05
GPR θ = 0 0.34 ±0.01 0.01 ±0.01 0.06 ±0.02
GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01
GPR θ = .5 0.34 ±0.01 0.59 ±0.12 0.57 ±0.01
GNPR θ = 0 1 0.00 ±0.00 0.17 ±0.00
GNPR θ = 1 0.00 ±0.00 1 0.57 ±0.00
GNPR θ = .5 0.99 ±0.01 0.25 ±0.20 0.95 ±0.08
AP
(1 − ρ)/2 0.00 ±0.00 0.99 ±0.07 0.48 ±0.02
E[(X − Y )2
] 0.14 ±0.03 0.94 ±0.02 0.59 ±0.00
GPR θ = 0 0.25 ±0.08 0.01 ±0.01 0.05 ±0.02
GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.48 ±0.02
GPR θ = .5 0.06 ±0.00 0.80 ±0.10 0.52 ±0.02
GNPR θ = 0 1 0.00 ±0.00 0.18 ±0.01
GNPR θ = 1 0.00 ±0.01 1 0.59 ±0.00
GNPR θ = .5 0.39 ±0.02 0.39 ±0.11 1
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Results: Data from CDS market
(Marti, Nielsen, Very, Donnat, ICMLA 2015)
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Limits and questions
Why a convex combination? no a priori support from geometry
In practice:
no real control on the weight of correlation and on the weight
of distribution
stability methods are still prone to overfitting for selecting
parameters
θ actually depends on the convergence rate of the estimators:
correlation measures converge faster than distribution
estimation
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
1 Introduction
2 Dependence and Distribution
3 Toward an extension to the multivariate case
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Overview
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Multivariate dependence
What is the state of the art on multivariate dependence?
multivariate mutual information: In information theory
there have been various attempts over the years to
extend the definition of mutual information to more than
two random variables. These attempts have met with a
great deal of confusion and a realization that interactions
among many random variables are poorly understood.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Optimal Copula Transport for intra-dependence
Dintra(X1, X2) := EMD(s1, s2),
EMD(s1, s2) := min
f
1≤i,j≤n
pi − qj fij
subject to fij ≥ 0, 1 ≤ i, j ≤ n,
n
j=1
fij ≤ wpi
, 1 ≤ i ≤ n,
n
i=1
fij ≤ wqj
, 1 ≤ j ≤ n,
n
i=1
n
j=1
fij = 1.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Optimal Copula Transport for inter-dependence
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Limits and questions
does not scale well with even moderate dimensionality:
density estimation
computing cost
full parametric approach?
how to connect with the (copula,margins) representation?
information geometry?
(approximate) optimal transport?
kernel embedding of distributions?
contact: gautier.marti@helleborecapital.com
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas
Popat.
NP-hardness of Euclidean sum-of-squares clustering.
Machine Learning, 75(2):245–248, 2009.
Luigi Ambrosio and Nicola Gigli.
A user’s guide to optimal transport.
In Modelling and optimisation of flows on networks, pages
1–155. Springer, 2013.
David Applegate, Tamraparni Dasu, Shankar Krishnan, and
Simon Urbanek.
Unsupervised clustering of multidimensional distributions using
earth mover distance.
In Proceedings of the 17th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages
636–644. ACM, 2011.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Shai Ben-David, Ulrike Von Luxburg, and D´avid P´al.
A sober look at clustering stability.
In Learning theory, pages 5–19. Springer, 2006.
Petro Borysov, Jan Hannig, and JS Marron.
Asymptotics of hierarchical clustering for growing dimension.
Journal of Multivariate Analysis, 124:465–479, 2014.
Leo Breiman and Jerome H Friedman.
Estimating optimal transformations for multiple regression and
correlation.
Journal of the American statistical Association, 80(391):
580–598, 1985.
Jo¨el Bun, Romain Allez, Jean-Philippe Bouchaud, and Marc
Potters.
Rotational invariant estimator for general noisy matrices.
arXiv preprint arXiv:1502.06736, 2015.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Gunnar Carlsson and Facundo M´emoli.
Characterization, stability and convergence of hierarchical
clustering methods.
The Journal of Machine Learning Research, 11:1425–1470,
2010.
Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum,
Anthony Bagnall, Abdullah Mueen, and Gustavo Batista.
The UCR time series classification archive, July 2015.
www.cs.ucr.edu/~eamonn/time_series_data/.
Tamraparni Dasu, Deborah F Swayne, and David Poole.
Grouping multivariate time series: A case study.
In Proceedings of the IEEE Workshop on Temporal Data
Mining: Algorithms, Theory and Applications, in conjunction
with the Conference on Data Mining, Houston, pages 25–32,
2005.
Paul Deheuvels.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
La fonction de d´ependance empirique et ses propri´et´es. un test
non param´etrique d’ind´ependance.
Acad. Roy. Belg. Bull. Cl. Sci.(5), 65(6):274–292, 1979.
Paul Deheuvels.
An asymptotic decomposition for multivariate distribution-free
tests of independence.
Journal of Multivariate Analysis, 11(1):102–113, 1981.
T Di Matteo, T Aste, ST Hyde, and S Ramsden.
Interest rates hierarchical structure.
Physica A: Statistical Mechanics and its Applications, 355(1):
21–33, 2005.
T Di Matteo, Francesca Pozzi, and Tomaso Aste.
The use of dynamical networks to detect the hierarchical
organization of financial market sectors.
The European Physical Journal B-Condensed Matter and
Complex Systems, 73(1):3–11, 2010.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Francis X Diebold and Canlin Li.
Forecasting the term structure of government bond yields.
Journal of econometrics, 130(2):337–364, 2006.
A Adam Ding and Yi Li.
Copula correlation: An equitable dependence measure and
extension of pearson’s correlation.
arXiv preprint arXiv:1312.7214, 2013.
Bradley Efron.
Bootstrap methods: another look at the jackknife.
The annals of Statistics, pages 1–26, 1979.
Gal Elidan.
Copulas in machine learning.
In Copulae in Mathematical and Quantitative Finance, pages
39–60. Springer, 2013.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Sira Ferradans, Nicolas Papadakis, Julien Rabin, Gabriel Peyr´e,
and Jean-Fran¸cois Aujol.
Regularized discrete optimal transport.
Springer, 2013.
Hans Gebelein.
Das statistische problem der korrelation als variations-und
eigenwertproblem und sein zusammenhang mit der
ausgleichsrechnung.
ZAMM-Journal of Applied Mathematics and
Mechanics/Zeitschrift f¨ur Angewandte Mathematik und
Mechanik, 21(6):364–379, 1941.
Cyril Goutte, Peter Toft, Egill Rostrup, Finn ˚A Nielsen, and
Lars Kai Hansen.
On clustering fMRI time series.
NeuroImage, 9(3):298–310, 1999.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Clive WJ Granger and Paul Newbold.
Spurious regressions in econometrics.
Journal of econometrics, 2(2):111–120, 1974.
Isabelle Guyon, Ulrike Von Luxburg, and Robert C Williamson.
Clustering: Science or art.
In NIPS 2009 Workshop on Clustering Theory, 2009.
Jiang Hangjin and Ding Yiming.
Equitability of dependence measure.
stat, 1050:9, 2015.
Keith Henderson, Brian Gallagher, and Tina Eliassi-Rad.
EP-MEANS: An efficient nonparametric clustering of empirical
probability distributions.
2015.
Weiming Hu, Tieniu Tan, Liang Wang, and Steve Maybank.
A survey on visual surveillance of object motion and behaviors.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Systems, Man, and Cybernetics, Part C: Applications and
Reviews, IEEE Transactions on, 34(3):334–352, 2004.
John C Hull.
Options, futures, and other derivatives.
Pearson Education, 2006.
Anil K Jain.
Data clustering: 50 years beyond k-means.
Pattern recognition letters, 31(8):651–666, 2010.
Konstantinos Kalpakis, Dhiral Gada, and Vasundhara
Puttagunta.
Distance measures for effective clustering of ARIMA
time-series.
In Data Mining, 2001. ICDM 2001, Proceedings IEEE
International Conference on, pages 273–280. IEEE, 2001.
M Kanevski, V Timonin, A Pozdnoukhov, and M Maignan.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Evolution of interest rate curve: empirical analysis of patterns
using nonlinear clustering tools.
In European Symposium on Time Series Prediction, 2008.
Leonid Vitalievich Kantorovich.
On the translocation of masses.
In Dokl. Akad. Nauk SSSR, volume 37, pages 199–201, 1942.
Justin B Kinney and Gurinder S Atwal.
Equitability, mutual information, and the maximal information
coefficient.
Proceedings of the National Academy of Sciences, 111(9):
3354–3359, 2014.
Jon M. Kleinberg.
An impossibility theorem for clustering.
In S. Thrun and K. Obermayer, editors, Advances in Neural
Information Processing Systems 15, pages 446–453. MIT
Press, Cambridge, MA, 2002.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
URL
http://books.nips.cc/papers/files/nips15/LT17.pdf.
Laurent Laloux, Pierre Cizeau, Marc Potters, and
Jean-Philippe Bouchaud.
Random matrix theory and financial correlations.
International Journal of Theoretical and Applied Finance, 3
(03):391–397, 2000.
Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong,
and Mark Flood.
Clustering techniques and their effect on portfolio formation
and risk analysis.
In Proceedings of the International Workshop on Data Science
for Macro-Modeling, pages 1–6. ACM, 2014.
Erel Levine and Eytan Domany.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Resampling method for unsupervised estimation of cluster
validity.
Neural computation, 13(11):2573–2593, 2001.
T Warren Liao.
Clustering of time series data—a survey.
Pattern recognition, 38(11):1857–1874, 2005.
Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu.
A symbolic representation of time series, with implications for
streaming algorithms.
In Proceedings of the 8th ACM SIGMOD workshop on
Research issues in data mining and knowledge discovery, pages
2–11. ACM, 2003.
Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios
Gunopulos.
Iterative incremental clustering of time series.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
In Advances in Database Technology-EDBT 2004, pages
106–122. Springer, 2004.
Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi.
Experiencing SAX: a novel symbolic representation of time
series.
Data Mining and knowledge discovery, 15(2):107–144, 2007.
David Lopez-Paz, Philipp Hennig, and Bernhard Sch¨olkopf.
The randomized dependence coefficient.
arXiv preprint arXiv:1304.7717, 2013.
Rosario N Mantegna.
Hierarchical structure in financial markets.
The European Physical Journal B-Condensed Matter and
Complex Systems, 11(1):193–197, 1999.
Martin Martens and Ser-Huang Poon.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Returns synchronization and daily correlation dynamics
between international stock markets.
Journal of Banking & Finance, 25(10):1805–1827, 2001.
Gautier Marti, Philippe Donnat, Frank Nielsen, and Philippe
Very.
HCMapper: An interactive visualization tool to compare
partition-based flat clustering extracted from pairs of
dendrograms.
arXiv preprint arXiv:1507.08137, 2015a.
Gautier Marti, Philippe Very, and Philippe Donnat.
Toward a generic representation of random variables for
machine learning.
arXiv preprint arXiv:1506.00976, 2015b.
Sergio Mayordomo, Juan Ignacio Pe˜na, and Eduardo S
Schwartz.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Are all credit default swap databases equal?
Technical report, National Bureau of Economic Research,
2010.
Sergio Mayordomo, Juan Ignacio Pe˜na, and Eduardo S
Schwartz.
Are all credit default swap databases equal?
European Financial Management, 20(4):677–713, 2014.
Gaspard Monge.
M´emoire sur la th´eorie des d´eblais et des remblais.
De l’Imprimerie Royale, 1781.
James Munkres.
Algorithms for the assignment and transportation problems.
Journal of the Society for Industrial and Applied Mathematics,
5(1):32–38, 1957.
Nicolo Musmeci, Tomaso Aste, and Tiziana Di Matteo.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Relation between financial market structure and the real
economy: Comparison between clustering methods.
Available at SSRN 2525291, 2014.
Nicol´o Musmeci, Tomaso Aste, and Tiziana Di Matteo.
Relation between financial market structure and the real
economy: comparison between clustering methods.
2015.
Roger B Nelsen.
An introduction to copulas, volume 139.
Springer Science & Business Media, 2013.
Dominic O’Kane.
Modelling single-name and multi-name credit derivatives,
volume 573.
John Wiley & Sons, 2011.
Barnab´as P´oczos, Zoubin Ghahramani, and Jeff Schneider.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Copula-based kernel dependency measures.
arXiv preprint arXiv:1206.4682, 2012.
David N Reshef, Yakir A Reshef, Hilary K Finucane, Sharon R
Grossman, Gilean McVean, Peter J Turnbaugh, Eric S Lander,
Michael Mitzenmacher, and Pardis C Sabeti.
Detecting novel associations in large data sets.
science, 334(6062):1518–1524, 2011.
David N Reshef, Yakir A Reshef, Pardis C Sabeti, and
Michael M Mitzenmacher.
An empirical study of leading measures of dependence.
arXiv preprint arXiv:1505.02214, 2015a.
Yakir A Reshef, David N Reshef, Hilary K Finucane, Pardis C
Sabeti, and Michael M Mitzenmacher.
Measuring dependence powerfully and equitably.
arXiv preprint arXiv:1505.02213, 2015b.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Yakir A Reshef, David N Reshef, Pardis C Sabeti, and
Michael M Mitzenmacher.
Equitability, interval estimation, and statistical power.
arXiv preprint arXiv:1505.02212, 2015c.
Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas.
The earth mover’s distance as a metric for image retrieval.
International journal of computer vision, 40(2):99–121, 2000.
Daniil Ryabko.
Clustering processes.
arXiv preprint arXiv:1004.5194, 2010.
Ohad Shamir and Naftali Tishby.
Cluster stability for finite samples.
In NIPS, 2007.
Robert H Shumway.
Time-frequency clustering and discriminant analysis.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Statistics & probability letters, 63(3):307–314, 2003.
Noah Simon and Robert Tibshirani.
Comment on”detecting novel associations in large data sets”
by reshef et al, science dec 16, 2011.
arXiv preprint arXiv:1401.7645, 2014.
Ashish Singhal and Dale E Seborg.
Clustering of multivariate time-series data.
Journal of Chemometrics, 19:427—-438, 2005.
A Sklar.
Fonctions de r´epartition `a n dimensions et leurs marges.
Universit´e Paris 8, 1959.
Won-Min Song, T Di Matteo, and Tomaso Aste.
Hierarchical information clustering by means of topologically
embedded graphs.
PLoS One, 7(3):e31929, 2012.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and
Philip S Yu.
Graphscope: parameter-free mining of large time-evolving
graphs.
In Proceedings of the 13th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages
687–696. ACM, 2007.
G´abor J Sz´ekely, Maria L Rizzo, Nail K Bakirov, et al.
Measuring and testing dependence by correlation of distances.
The Annals of Statistics, 35(6):2769–2794, 2007.
Chayant Tantipathananandh and Tanya Y Berger-Wolf.
Finding communities in dynamic social networks.
In Data Mining (ICDM), 2011 IEEE 11th International
Conference on, pages 1236–1241. IEEE, 2011.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario N
Mantegna.
Cluster analysis for portfolio optimization.
Journal of Economic Dynamics and Control, 32(1):235–258,
2008.
Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and
Rosario N Mantegna.
A tool for filtering information in complex systems.
Proceedings of the National Academy of Sciences of the
United States of America, 102(30):10421–10426, 2005.
Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna.
Correlation, hierarchies, and networks in financial markets.
Journal of Economic Behavior & Organization, 75(1):40–58,
2010.
C´edric Villani.
Gautier Marti, Frank Nielsen On clustering financial time series
Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Optimal transport: old and new, volume 338.
Springer Science & Business Media, 2008.
Kiyoung Yang and Cyrus Shahabi.
A pca-based similarity measure for multivariate time series.
In Proceedings of the 2nd ACM international workshop on
Multimedia databases, pages 65–74. ACM, 2004.
Kiyoung Yang and Cyrus Shahabi.
On the stationarity of multivariate time series for
correlation-based data analysis.
In Data Mining, Fifth IEEE International Conference on, pages
4–pp. IEEE, 2005.
Gautier Marti, Frank Nielsen On clustering financial time series

More Related Content

What's hot

A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...
Gautier Marti
 
A closer look at correlations
A closer look at correlationsA closer look at correlations
A closer look at correlations
Gautier Marti
 
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesAutoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Gautier Marti
 
Using Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication FlowUsing Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication Flow
Martin Harrigan
 
ABC in Varanasi
ABC in VaranasiABC in Varanasi
ABC in Varanasi
Christian Robert
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
Christian Robert
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
SYRTO Project
 
Network and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspectiveNetwork and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspective
SYRTO Project
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
Chiheb Ben Hammouda
 
Numerical smoothing and hierarchical approximations for efficient option pric...
Numerical smoothing and hierarchical approximations for efficient option pric...Numerical smoothing and hierarchical approximations for efficient option pric...
Numerical smoothing and hierarchical approximations for efficient option pric...
Chiheb Ben Hammouda
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
Christian Robert
 
Scalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilityScalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatility
SYRTO Project
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
Erika G. G.
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Chiheb Ben Hammouda
 
11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...
Alexander Decker
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
Mokhtar SELLAMI
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
Christian Robert
 
Csr2011 june16 15_45_meer
Csr2011 june16 15_45_meerCsr2011 june16 15_45_meer
Csr2011 june16 15_45_meer
CSR2011
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methods
Christian Robert
 
Parameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial DecisionsParameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial Decisions
Daniel Bruggisser
 

What's hot (20)

A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...
 
A closer look at correlations
A closer look at correlationsA closer look at correlations
A closer look at correlations
 
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesAutoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
 
Using Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication FlowUsing Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication Flow
 
ABC in Varanasi
ABC in VaranasiABC in Varanasi
ABC in Varanasi
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
 
Network and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspectiveNetwork and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspective
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
 
Numerical smoothing and hierarchical approximations for efficient option pric...
Numerical smoothing and hierarchical approximations for efficient option pric...Numerical smoothing and hierarchical approximations for efficient option pric...
Numerical smoothing and hierarchical approximations for efficient option pric...
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
Scalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilityScalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatility
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
 
11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
Csr2011 june16 15_45_meer
Csr2011 june16 15_45_meerCsr2011 june16 15_45_meer
Csr2011 june16 15_45_meer
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methods
 
Parameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial DecisionsParameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial Decisions
 

Viewers also liked

Chapter 2. Multivariate Analysis of Stationary Time Series
 Chapter 2. Multivariate Analysis of Stationary Time Series Chapter 2. Multivariate Analysis of Stationary Time Series
Chapter 2. Multivariate Analysis of Stationary Time Series
Chengjun Wang
 
Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Fi...
Using R for Analyzing Loans, Portfolios and Risk:  From Academic Theory to Fi...Using R for Analyzing Loans, Portfolios and Risk:  From Academic Theory to Fi...
Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Fi...
Revolution Analytics
 
Carla Casilli - Cineca + open badges - May 2015
Carla Casilli - Cineca + open badges - May 2015Carla Casilli - Cineca + open badges - May 2015
Carla Casilli - Cineca + open badges - May 2015
Bestr
 
2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要
2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要
2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要
Zimb_
 
integrating climate risks in agricultural value chains enamul haque
integrating climate risks in agricultural value chains   enamul haqueintegrating climate risks in agricultural value chains   enamul haque
integrating climate risks in agricultural value chains enamul haque
Enamul Haque
 
Cormac Ferrick Sociology 204 Final Presentation
Cormac Ferrick Sociology 204 Final PresentationCormac Ferrick Sociology 204 Final Presentation
Cormac Ferrick Sociology 204 Final Presentation
Mac Ferrick
 
IBM - Security Intelligence para PYMES
IBM - Security Intelligence para PYMESIBM - Security Intelligence para PYMES
IBM - Security Intelligence para PYMES
Fernando M. Imperiale
 
Diapo bourse aux sports
Diapo bourse aux sportsDiapo bourse aux sports
Diapo bourse aux sports
mfrfye
 
Cv bank pa
Cv bank paCv bank pa
Cv bank pa
Vijaya_varma
 
National Development 5.15.15
National Development 5.15.15National Development 5.15.15
National Development 5.15.15
Jack Murray III
 
Here be dragons
Here be dragonsHere be dragons
Here be dragons
deelay1
 
Searching for the grey gold - 2013
Searching for the grey gold - 2013Searching for the grey gold - 2013
Searching for the grey gold - 2013
Olle Bergendahl
 
Prevenzione
PrevenzionePrevenzione
Prevenzione
Michele Barilaro
 
Fernando Imperiale - Security Intelligence para PYMES
Fernando Imperiale - Security Intelligence para PYMESFernando Imperiale - Security Intelligence para PYMES
Fernando Imperiale - Security Intelligence para PYMES
Fernando M. Imperiale
 
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_DiplomamunkaBartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_DiplomamunkaLili Eva Bartha
 

Viewers also liked (15)

Chapter 2. Multivariate Analysis of Stationary Time Series
 Chapter 2. Multivariate Analysis of Stationary Time Series Chapter 2. Multivariate Analysis of Stationary Time Series
Chapter 2. Multivariate Analysis of Stationary Time Series
 
Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Fi...
Using R for Analyzing Loans, Portfolios and Risk:  From Academic Theory to Fi...Using R for Analyzing Loans, Portfolios and Risk:  From Academic Theory to Fi...
Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Fi...
 
Carla Casilli - Cineca + open badges - May 2015
Carla Casilli - Cineca + open badges - May 2015Carla Casilli - Cineca + open badges - May 2015
Carla Casilli - Cineca + open badges - May 2015
 
2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要
2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要
2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要
 
integrating climate risks in agricultural value chains enamul haque
integrating climate risks in agricultural value chains   enamul haqueintegrating climate risks in agricultural value chains   enamul haque
integrating climate risks in agricultural value chains enamul haque
 
Cormac Ferrick Sociology 204 Final Presentation
Cormac Ferrick Sociology 204 Final PresentationCormac Ferrick Sociology 204 Final Presentation
Cormac Ferrick Sociology 204 Final Presentation
 
IBM - Security Intelligence para PYMES
IBM - Security Intelligence para PYMESIBM - Security Intelligence para PYMES
IBM - Security Intelligence para PYMES
 
Diapo bourse aux sports
Diapo bourse aux sportsDiapo bourse aux sports
Diapo bourse aux sports
 
Cv bank pa
Cv bank paCv bank pa
Cv bank pa
 
National Development 5.15.15
National Development 5.15.15National Development 5.15.15
National Development 5.15.15
 
Here be dragons
Here be dragonsHere be dragons
Here be dragons
 
Searching for the grey gold - 2013
Searching for the grey gold - 2013Searching for the grey gold - 2013
Searching for the grey gold - 2013
 
Prevenzione
PrevenzionePrevenzione
Prevenzione
 
Fernando Imperiale - Security Intelligence para PYMES
Fernando Imperiale - Security Intelligence para PYMESFernando Imperiale - Security Intelligence para PYMES
Fernando Imperiale - Security Intelligence para PYMES
 
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_DiplomamunkaBartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
 

Similar to On clustering financial time series - A need for distances between dependent random variables

intro
introintro
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
Julyan Arbel
 
Cointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon ForecastingCointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon Forecasting
محمد إسماعيل
 
Machine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & InsuranceMachine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & Insurance
Arthur Charpentier
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Umberto Picchini
 
Climate Extremes Workshop - Extreme Value Theory and the Re-assessment in th...
Climate Extremes Workshop -  Extreme Value Theory and the Re-assessment in th...Climate Extremes Workshop -  Extreme Value Theory and the Re-assessment in th...
Climate Extremes Workshop - Extreme Value Theory and the Re-assessment in th...
The Statistical and Applied Mathematical Sciences Institute
 
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
Chiheb Ben Hammouda
 
MSL 5080, Methods of Analysis for Business Operations 1 .docx
MSL 5080, Methods of Analysis for Business Operations 1 .docxMSL 5080, Methods of Analysis for Business Operations 1 .docx
MSL 5080, Methods of Analysis for Business Operations 1 .docx
madlynplamondon
 
A Monte Carlo strategy for structure multiple-step-head time series prediction
A Monte Carlo strategy for structure multiple-step-head time series predictionA Monte Carlo strategy for structure multiple-step-head time series prediction
A Monte Carlo strategy for structure multiple-step-head time series prediction
Gianluca Bontempi
 
main
mainmain
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
Tomaso Aste
 
Improving on daily measures of price discovery
Improving on daily measures of price discoveryImproving on daily measures of price discovery
Improving on daily measures of price discovery
FGV Brazil
 
Talk slides imsct2016
Talk slides imsct2016Talk slides imsct2016
Talk slides imsct2016
ychaubey
 
Optimization Methods in Finance
Optimization Methods in FinanceOptimization Methods in Finance
Optimization Methods in Finance
thilankm
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
Christian Robert
 
"Correlated Volatility Shocks" by Dr. Xiao Qiao, Researcher at SummerHaven In...
"Correlated Volatility Shocks" by Dr. Xiao Qiao, Researcher at SummerHaven In..."Correlated Volatility Shocks" by Dr. Xiao Qiao, Researcher at SummerHaven In...
"Correlated Volatility Shocks" by Dr. Xiao Qiao, Researcher at SummerHaven In...
Quantopian
 
ICCF_2022_talk.pdf
ICCF_2022_talk.pdfICCF_2022_talk.pdf
ICCF_2022_talk.pdf
Chiheb Ben Hammouda
 
Characterization of student’s t distribution with some application to finance
Characterization of student’s t  distribution with some application to financeCharacterization of student’s t  distribution with some application to finance
Characterization of student’s t distribution with some application to finance
Alexander Decker
 
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksQuantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Valentin De Bortoli
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
The Statistical and Applied Mathematical Sciences Institute
 

Similar to On clustering financial time series - A need for distances between dependent random variables (20)

intro
introintro
intro
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
 
Cointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon ForecastingCointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon Forecasting
 
Machine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & InsuranceMachine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & Insurance
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
Climate Extremes Workshop - Extreme Value Theory and the Re-assessment in th...
Climate Extremes Workshop -  Extreme Value Theory and the Re-assessment in th...Climate Extremes Workshop -  Extreme Value Theory and the Re-assessment in th...
Climate Extremes Workshop - Extreme Value Theory and the Re-assessment in th...
 
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
 
MSL 5080, Methods of Analysis for Business Operations 1 .docx
MSL 5080, Methods of Analysis for Business Operations 1 .docxMSL 5080, Methods of Analysis for Business Operations 1 .docx
MSL 5080, Methods of Analysis for Business Operations 1 .docx
 
A Monte Carlo strategy for structure multiple-step-head time series prediction
A Monte Carlo strategy for structure multiple-step-head time series predictionA Monte Carlo strategy for structure multiple-step-head time series prediction
A Monte Carlo strategy for structure multiple-step-head time series prediction
 
main
mainmain
main
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
Improving on daily measures of price discovery
Improving on daily measures of price discoveryImproving on daily measures of price discovery
Improving on daily measures of price discovery
 
Talk slides imsct2016
Talk slides imsct2016Talk slides imsct2016
Talk slides imsct2016
 
Optimization Methods in Finance
Optimization Methods in FinanceOptimization Methods in Finance
Optimization Methods in Finance
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
"Correlated Volatility Shocks" by Dr. Xiao Qiao, Researcher at SummerHaven In...
"Correlated Volatility Shocks" by Dr. Xiao Qiao, Researcher at SummerHaven In..."Correlated Volatility Shocks" by Dr. Xiao Qiao, Researcher at SummerHaven In...
"Correlated Volatility Shocks" by Dr. Xiao Qiao, Researcher at SummerHaven In...
 
ICCF_2022_talk.pdf
ICCF_2022_talk.pdfICCF_2022_talk.pdf
ICCF_2022_talk.pdf
 
Characterization of student’s t distribution with some application to finance
Characterization of student’s t  distribution with some application to financeCharacterization of student’s t  distribution with some application to finance
Characterization of student’s t distribution with some application to finance
 
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksQuantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
 

More from Gautier Marti

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
Gautier Marti
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...
Gautier Marti
 
A quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsA quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptions
Gautier Marti
 
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
Gautier Marti
 
How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?
Gautier Marti
 
Generating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGenerating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in Finance
Gautier Marti
 
Applications of GANs in Finance
Applications of GANs in FinanceApplications of GANs in Finance
Applications of GANs in Finance
Gautier Marti
 
My recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsMy recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returns
Gautier Marti
 
Takeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaTakeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, California
Gautier Marti
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond Correlation
Gautier Marti
 

More from Gautier Marti (10)

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...
 
A quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsA quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptions
 
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
 
How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?
 
Generating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGenerating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in Finance
 
Applications of GANs in Finance
Applications of GANs in FinanceApplications of GANs in Finance
Applications of GANs in Finance
 
My recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsMy recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returns
 
Takeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaTakeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, California
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond Correlation
 

Recently uploaded

Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
QusayMaghayerh
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Nutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptxNutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptx
vimalveerammal
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Sérgio Sacani
 
一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理
gyhwyo
 
Lattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptxLattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptx
DrRajeshDas
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
fatima132662
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptxSynopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
FarhanaHussain18
 
23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference
RDhivya6
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
OmAle5
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
yourprojectpartner05
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 

Recently uploaded (20)

Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Nutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptxNutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptx
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
 
一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理
 
Lattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptxLattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptx
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptxSynopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
 
23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 

On clustering financial time series - A need for distances between dependent random variables

  • 1. Introduction Dependence and Distribution Toward an extension to the multivariate case On clustering financial time series A need for distances between dependent random variables Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 24 September 2015 Gautier Marti, Frank Nielsen On clustering financial time series
  • 2. Introduction Dependence and Distribution Toward an extension to the multivariate case 1 Introduction 2 Dependence and Distribution 3 Toward an extension to the multivariate case Gautier Marti, Frank Nielsen On clustering financial time series
  • 3. Introduction Dependence and Distribution Toward an extension to the multivariate case Motivations: Why clustering? Motivations: Mathematical finance: Use of variance-covariance matrices (e.g., Markowitz, Value-at-Risk) Stylized fact: Empirical variance-covariance matrices estimated on financial time series are very noisy (Random Matrix Theory, Noise Dressing of Financial Correlation Matrices, Laloux et al, 1999) Figure: Marchenko-Pastur distribution vs. eigenvalues of the empirical correlation matrix How to filter these variance-covariance matrices? Gautier Marti, Frank Nielsen On clustering financial time series
  • 4. Introduction Dependence and Distribution Toward an extension to the multivariate case Information filtering? Clustering! Mantegna (1999) et al’s work: Limits: focus on ρij (Pearson correlation) which is not robust to outliers / heavy tails → could lead to spurious clusters Gautier Marti, Frank Nielsen On clustering financial time series
  • 5. Introduction Dependence and Distribution Toward an extension to the multivariate case Modelling Asset i variations or returns follow random variable Xi Assets variations or returns are ”correlated” i.i.d. observations: X1 : X1 1 , X2 1 , . . . , XT 1 X2 : X1 2 , X2 2 , . . . , XT 2 . . . , . . . , . . . , . . . , . . . XN : X1 N, X2 N, . . . , XT N Which distances d(Xi , Xj ) between dependent random variables? Gautier Marti, Frank Nielsen On clustering financial time series
  • 6. Introduction Dependence and Distribution Toward an extension to the multivariate case 1 Introduction 2 Dependence and Distribution 3 Toward an extension to the multivariate case Gautier Marti, Frank Nielsen On clustering financial time series
  • 7. Introduction Dependence and Distribution Toward an extension to the multivariate case Pitfalls of a basic distance Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N(µX , σ2 X ), Y ∼ N(µY , σ2 Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1]. E[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 + 2σX σY (1 − ρ(X, Y )) Now, consider the following values for correlation: ρ(X, Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2 X + σ2 Y . Assume µX = µY and σX = σY . For σX = σY 1, we obtain E[(X − Y )2] 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ(X, Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2. Gautier Marti, Frank Nielsen On clustering financial time series
  • 8. Introduction Dependence and Distribution Toward an extension to the multivariate case Pitfalls of a basic distance (Marti, Nielsen, Very, Donnat, ICMLA 2015) Gautier Marti, Frank Nielsen On clustering financial time series
  • 9. Introduction Dependence and Distribution Toward an extension to the multivariate case The Financial Engineer Bias: Correlation correlation patterns are blatant Mantegna et al. aim at filtering information from the correlation matrix using clustering O(N2) (correlation) vs. O(N) (distribution) parameters Gautier Marti, Frank Nielsen On clustering financial time series
  • 10. Introduction Dependence and Distribution Toward an extension to the multivariate case Information Geometry and its statistical distances original poster: http://www.sonycsl.co.jp/person/nielsen/FrankNielsen-distances-figs.pdf Gautier Marti, Frank Nielsen On clustering financial time series
  • 11. Introduction Dependence and Distribution Toward an extension to the multivariate case Sklar’s Theorem and the Copula Transform Theorem (Sklar’s Theorem (1959)) For any random vector X = (X1, . . . , XN) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P(X1, . . . , XN) = C(P1(X1), . . . , PN(XN)), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti, Frank Nielsen On clustering financial time series
  • 12. Introduction Dependence and Distribution Toward an extension to the multivariate case Sklar’s Theorem and the Copula Transform Definition (The Copula Transform) Let X = (X1, . . . , XN) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N. The random vector U = (U1, . . . , UN) := P(X) = (P1(X1), . . . , PN(XN)) is known as the copula transform. Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probability integral transform): for Pi the cdf of Xi , we have x = Pi (Pi −1 (x)) = Pr(Xi ≤ Pi −1 (x)) = Pr(Pi (Xi ) ≤ x), thus Pi (Xi ) ∼ U[0, 1]. Gautier Marti, Frank Nielsen On clustering financial time series
  • 13. Introduction Dependence and Distribution Toward an extension to the multivariate case Distance Design d2 θ (Xi , Xj ) = θ3E |Pi (Xi ) − Pj (Xj )|2 + (1 − θ) 1 2 R dPi dλ − dPj dλ 2 dλ Gautier Marti, Frank Nielsen On clustering financial time series
  • 14. Introduction Dependence and Distribution Toward an extension to the multivariate case Results: Data from Hierarchical Block Model Adjusted Rand Index Algo. Distance A B C HC-AL (1 − ρ)/2 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01 E[(X − Y )2 ] 0.00 ±0.00 0.09 ±0.12 0.55 ±0.05 GPR θ = 0 0.34 ±0.01 0.01 ±0.01 0.06 ±0.02 GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01 GPR θ = .5 0.34 ±0.01 0.59 ±0.12 0.57 ±0.01 GNPR θ = 0 1 0.00 ±0.00 0.17 ±0.00 GNPR θ = 1 0.00 ±0.00 1 0.57 ±0.00 GNPR θ = .5 0.99 ±0.01 0.25 ±0.20 0.95 ±0.08 AP (1 − ρ)/2 0.00 ±0.00 0.99 ±0.07 0.48 ±0.02 E[(X − Y )2 ] 0.14 ±0.03 0.94 ±0.02 0.59 ±0.00 GPR θ = 0 0.25 ±0.08 0.01 ±0.01 0.05 ±0.02 GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.48 ±0.02 GPR θ = .5 0.06 ±0.00 0.80 ±0.10 0.52 ±0.02 GNPR θ = 0 1 0.00 ±0.00 0.18 ±0.01 GNPR θ = 1 0.00 ±0.01 1 0.59 ±0.00 GNPR θ = .5 0.39 ±0.02 0.39 ±0.11 1 Gautier Marti, Frank Nielsen On clustering financial time series
  • 15. Introduction Dependence and Distribution Toward an extension to the multivariate case Results: Data from CDS market (Marti, Nielsen, Very, Donnat, ICMLA 2015) Gautier Marti, Frank Nielsen On clustering financial time series
  • 16. Introduction Dependence and Distribution Toward an extension to the multivariate case Limits and questions Why a convex combination? no a priori support from geometry In practice: no real control on the weight of correlation and on the weight of distribution stability methods are still prone to overfitting for selecting parameters θ actually depends on the convergence rate of the estimators: correlation measures converge faster than distribution estimation Gautier Marti, Frank Nielsen On clustering financial time series
  • 17. Introduction Dependence and Distribution Toward an extension to the multivariate case 1 Introduction 2 Dependence and Distribution 3 Toward an extension to the multivariate case Gautier Marti, Frank Nielsen On clustering financial time series
  • 18. Introduction Dependence and Distribution Toward an extension to the multivariate case Overview Gautier Marti, Frank Nielsen On clustering financial time series
  • 19. Introduction Dependence and Distribution Toward an extension to the multivariate case Multivariate dependence What is the state of the art on multivariate dependence? multivariate mutual information: In information theory there have been various attempts over the years to extend the definition of mutual information to more than two random variables. These attempts have met with a great deal of confusion and a realization that interactions among many random variables are poorly understood. Gautier Marti, Frank Nielsen On clustering financial time series
  • 20. Introduction Dependence and Distribution Toward an extension to the multivariate case Optimal Copula Transport for intra-dependence Dintra(X1, X2) := EMD(s1, s2), EMD(s1, s2) := min f 1≤i,j≤n pi − qj fij subject to fij ≥ 0, 1 ≤ i, j ≤ n, n j=1 fij ≤ wpi , 1 ≤ i ≤ n, n i=1 fij ≤ wqj , 1 ≤ j ≤ n, n i=1 n j=1 fij = 1. Gautier Marti, Frank Nielsen On clustering financial time series
  • 21. Introduction Dependence and Distribution Toward an extension to the multivariate case Optimal Copula Transport for inter-dependence Gautier Marti, Frank Nielsen On clustering financial time series
  • 22. Introduction Dependence and Distribution Toward an extension to the multivariate case Limits and questions does not scale well with even moderate dimensionality: density estimation computing cost full parametric approach? how to connect with the (copula,margins) representation? information geometry? (approximate) optimal transport? kernel embedding of distributions? contact: gautier.marti@helleborecapital.com Gautier Marti, Frank Nielsen On clustering financial time series
  • 23. Introduction Dependence and Distribution Toward an extension to the multivariate case Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245–248, 2009. Luigi Ambrosio and Nicola Gigli. A user’s guide to optimal transport. In Modelling and optimisation of flows on networks, pages 1–155. Springer, 2013. David Applegate, Tamraparni Dasu, Shankar Krishnan, and Simon Urbanek. Unsupervised clustering of multidimensional distributions using earth mover distance. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 636–644. ACM, 2011. Gautier Marti, Frank Nielsen On clustering financial time series
  • 24. Introduction Dependence and Distribution Toward an extension to the multivariate case Shai Ben-David, Ulrike Von Luxburg, and D´avid P´al. A sober look at clustering stability. In Learning theory, pages 5–19. Springer, 2006. Petro Borysov, Jan Hannig, and JS Marron. Asymptotics of hierarchical clustering for growing dimension. Journal of Multivariate Analysis, 124:465–479, 2014. Leo Breiman and Jerome H Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association, 80(391): 580–598, 1985. Jo¨el Bun, Romain Allez, Jean-Philippe Bouchaud, and Marc Potters. Rotational invariant estimator for general noisy matrices. arXiv preprint arXiv:1502.06736, 2015. Gautier Marti, Frank Nielsen On clustering financial time series
  • 25. Introduction Dependence and Distribution Toward an extension to the multivariate case Gunnar Carlsson and Facundo M´emoli. Characterization, stability and convergence of hierarchical clustering methods. The Journal of Machine Learning Research, 11:1425–1470, 2010. Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, and Gustavo Batista. The UCR time series classification archive, July 2015. www.cs.ucr.edu/~eamonn/time_series_data/. Tamraparni Dasu, Deborah F Swayne, and David Poole. Grouping multivariate time series: A case study. In Proceedings of the IEEE Workshop on Temporal Data Mining: Algorithms, Theory and Applications, in conjunction with the Conference on Data Mining, Houston, pages 25–32, 2005. Paul Deheuvels. Gautier Marti, Frank Nielsen On clustering financial time series
  • 26. Introduction Dependence and Distribution Toward an extension to the multivariate case La fonction de d´ependance empirique et ses propri´et´es. un test non param´etrique d’ind´ependance. Acad. Roy. Belg. Bull. Cl. Sci.(5), 65(6):274–292, 1979. Paul Deheuvels. An asymptotic decomposition for multivariate distribution-free tests of independence. Journal of Multivariate Analysis, 11(1):102–113, 1981. T Di Matteo, T Aste, ST Hyde, and S Ramsden. Interest rates hierarchical structure. Physica A: Statistical Mechanics and its Applications, 355(1): 21–33, 2005. T Di Matteo, Francesca Pozzi, and Tomaso Aste. The use of dynamical networks to detect the hierarchical organization of financial market sectors. The European Physical Journal B-Condensed Matter and Complex Systems, 73(1):3–11, 2010. Gautier Marti, Frank Nielsen On clustering financial time series
  • 27. Introduction Dependence and Distribution Toward an extension to the multivariate case Francis X Diebold and Canlin Li. Forecasting the term structure of government bond yields. Journal of econometrics, 130(2):337–364, 2006. A Adam Ding and Yi Li. Copula correlation: An equitable dependence measure and extension of pearson’s correlation. arXiv preprint arXiv:1312.7214, 2013. Bradley Efron. Bootstrap methods: another look at the jackknife. The annals of Statistics, pages 1–26, 1979. Gal Elidan. Copulas in machine learning. In Copulae in Mathematical and Quantitative Finance, pages 39–60. Springer, 2013. Gautier Marti, Frank Nielsen On clustering financial time series
  • 28. Introduction Dependence and Distribution Toward an extension to the multivariate case Sira Ferradans, Nicolas Papadakis, Julien Rabin, Gabriel Peyr´e, and Jean-Fran¸cois Aujol. Regularized discrete optimal transport. Springer, 2013. Hans Gebelein. Das statistische problem der korrelation als variations-und eigenwertproblem und sein zusammenhang mit der ausgleichsrechnung. ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift f¨ur Angewandte Mathematik und Mechanik, 21(6):364–379, 1941. Cyril Goutte, Peter Toft, Egill Rostrup, Finn ˚A Nielsen, and Lars Kai Hansen. On clustering fMRI time series. NeuroImage, 9(3):298–310, 1999. Gautier Marti, Frank Nielsen On clustering financial time series
  • 29. Introduction Dependence and Distribution Toward an extension to the multivariate case Clive WJ Granger and Paul Newbold. Spurious regressions in econometrics. Journal of econometrics, 2(2):111–120, 1974. Isabelle Guyon, Ulrike Von Luxburg, and Robert C Williamson. Clustering: Science or art. In NIPS 2009 Workshop on Clustering Theory, 2009. Jiang Hangjin and Ding Yiming. Equitability of dependence measure. stat, 1050:9, 2015. Keith Henderson, Brian Gallagher, and Tina Eliassi-Rad. EP-MEANS: An efficient nonparametric clustering of empirical probability distributions. 2015. Weiming Hu, Tieniu Tan, Liang Wang, and Steve Maybank. A survey on visual surveillance of object motion and behaviors. Gautier Marti, Frank Nielsen On clustering financial time series
  • 30. Introduction Dependence and Distribution Toward an extension to the multivariate case Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 34(3):334–352, 2004. John C Hull. Options, futures, and other derivatives. Pearson Education, 2006. Anil K Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651–666, 2010. Konstantinos Kalpakis, Dhiral Gada, and Vasundhara Puttagunta. Distance measures for effective clustering of ARIMA time-series. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pages 273–280. IEEE, 2001. M Kanevski, V Timonin, A Pozdnoukhov, and M Maignan. Gautier Marti, Frank Nielsen On clustering financial time series
  • 31. Introduction Dependence and Distribution Toward an extension to the multivariate case Evolution of interest rate curve: empirical analysis of patterns using nonlinear clustering tools. In European Symposium on Time Series Prediction, 2008. Leonid Vitalievich Kantorovich. On the translocation of masses. In Dokl. Akad. Nauk SSSR, volume 37, pages 199–201, 1942. Justin B Kinney and Gurinder S Atwal. Equitability, mutual information, and the maximal information coefficient. Proceedings of the National Academy of Sciences, 111(9): 3354–3359, 2014. Jon M. Kleinberg. An impossibility theorem for clustering. In S. Thrun and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 446–453. MIT Press, Cambridge, MA, 2002. Gautier Marti, Frank Nielsen On clustering financial time series
  • 32. Introduction Dependence and Distribution Toward an extension to the multivariate case URL http://books.nips.cc/papers/files/nips15/LT17.pdf. Laurent Laloux, Pierre Cizeau, Marc Potters, and Jean-Philippe Bouchaud. Random matrix theory and financial correlations. International Journal of Theoretical and Applied Finance, 3 (03):391–397, 2000. Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong, and Mark Flood. Clustering techniques and their effect on portfolio formation and risk analysis. In Proceedings of the International Workshop on Data Science for Macro-Modeling, pages 1–6. ACM, 2014. Erel Levine and Eytan Domany. Gautier Marti, Frank Nielsen On clustering financial time series
  • 33. Introduction Dependence and Distribution Toward an extension to the multivariate case Resampling method for unsupervised estimation of cluster validity. Neural computation, 13(11):2573–2593, 2001. T Warren Liao. Clustering of time series data—a survey. Pattern recognition, 38(11):1857–1874, 2005. Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pages 2–11. ACM, 2003. Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos. Iterative incremental clustering of time series. Gautier Marti, Frank Nielsen On clustering financial time series
  • 34. Introduction Dependence and Distribution Toward an extension to the multivariate case In Advances in Database Technology-EDBT 2004, pages 106–122. Springer, 2004. Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. Experiencing SAX: a novel symbolic representation of time series. Data Mining and knowledge discovery, 15(2):107–144, 2007. David Lopez-Paz, Philipp Hennig, and Bernhard Sch¨olkopf. The randomized dependence coefficient. arXiv preprint arXiv:1304.7717, 2013. Rosario N Mantegna. Hierarchical structure in financial markets. The European Physical Journal B-Condensed Matter and Complex Systems, 11(1):193–197, 1999. Martin Martens and Ser-Huang Poon. Gautier Marti, Frank Nielsen On clustering financial time series
  • 35. Introduction Dependence and Distribution Toward an extension to the multivariate case Returns synchronization and daily correlation dynamics between international stock markets. Journal of Banking & Finance, 25(10):1805–1827, 2001. Gautier Marti, Philippe Donnat, Frank Nielsen, and Philippe Very. HCMapper: An interactive visualization tool to compare partition-based flat clustering extracted from pairs of dendrograms. arXiv preprint arXiv:1507.08137, 2015a. Gautier Marti, Philippe Very, and Philippe Donnat. Toward a generic representation of random variables for machine learning. arXiv preprint arXiv:1506.00976, 2015b. Sergio Mayordomo, Juan Ignacio Pe˜na, and Eduardo S Schwartz. Gautier Marti, Frank Nielsen On clustering financial time series
  • 36. Introduction Dependence and Distribution Toward an extension to the multivariate case Are all credit default swap databases equal? Technical report, National Bureau of Economic Research, 2010. Sergio Mayordomo, Juan Ignacio Pe˜na, and Eduardo S Schwartz. Are all credit default swap databases equal? European Financial Management, 20(4):677–713, 2014. Gaspard Monge. M´emoire sur la th´eorie des d´eblais et des remblais. De l’Imprimerie Royale, 1781. James Munkres. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 5(1):32–38, 1957. Nicolo Musmeci, Tomaso Aste, and Tiziana Di Matteo. Gautier Marti, Frank Nielsen On clustering financial time series
  • 37. Introduction Dependence and Distribution Toward an extension to the multivariate case Relation between financial market structure and the real economy: Comparison between clustering methods. Available at SSRN 2525291, 2014. Nicol´o Musmeci, Tomaso Aste, and Tiziana Di Matteo. Relation between financial market structure and the real economy: comparison between clustering methods. 2015. Roger B Nelsen. An introduction to copulas, volume 139. Springer Science & Business Media, 2013. Dominic O’Kane. Modelling single-name and multi-name credit derivatives, volume 573. John Wiley & Sons, 2011. Barnab´as P´oczos, Zoubin Ghahramani, and Jeff Schneider. Gautier Marti, Frank Nielsen On clustering financial time series
  • 38. Introduction Dependence and Distribution Toward an extension to the multivariate case Copula-based kernel dependency measures. arXiv preprint arXiv:1206.4682, 2012. David N Reshef, Yakir A Reshef, Hilary K Finucane, Sharon R Grossman, Gilean McVean, Peter J Turnbaugh, Eric S Lander, Michael Mitzenmacher, and Pardis C Sabeti. Detecting novel associations in large data sets. science, 334(6062):1518–1524, 2011. David N Reshef, Yakir A Reshef, Pardis C Sabeti, and Michael M Mitzenmacher. An empirical study of leading measures of dependence. arXiv preprint arXiv:1505.02214, 2015a. Yakir A Reshef, David N Reshef, Hilary K Finucane, Pardis C Sabeti, and Michael M Mitzenmacher. Measuring dependence powerfully and equitably. arXiv preprint arXiv:1505.02213, 2015b. Gautier Marti, Frank Nielsen On clustering financial time series
  • 39. Introduction Dependence and Distribution Toward an extension to the multivariate case Yakir A Reshef, David N Reshef, Pardis C Sabeti, and Michael M Mitzenmacher. Equitability, interval estimation, and statistical power. arXiv preprint arXiv:1505.02212, 2015c. Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The earth mover’s distance as a metric for image retrieval. International journal of computer vision, 40(2):99–121, 2000. Daniil Ryabko. Clustering processes. arXiv preprint arXiv:1004.5194, 2010. Ohad Shamir and Naftali Tishby. Cluster stability for finite samples. In NIPS, 2007. Robert H Shumway. Time-frequency clustering and discriminant analysis. Gautier Marti, Frank Nielsen On clustering financial time series
  • 40. Introduction Dependence and Distribution Toward an extension to the multivariate case Statistics & probability letters, 63(3):307–314, 2003. Noah Simon and Robert Tibshirani. Comment on”detecting novel associations in large data sets” by reshef et al, science dec 16, 2011. arXiv preprint arXiv:1401.7645, 2014. Ashish Singhal and Dale E Seborg. Clustering of multivariate time-series data. Journal of Chemometrics, 19:427—-438, 2005. A Sklar. Fonctions de r´epartition `a n dimensions et leurs marges. Universit´e Paris 8, 1959. Won-Min Song, T Di Matteo, and Tomaso Aste. Hierarchical information clustering by means of topologically embedded graphs. PLoS One, 7(3):e31929, 2012. Gautier Marti, Frank Nielsen On clustering financial time series
  • 41. Introduction Dependence and Distribution Toward an extension to the multivariate case Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and Philip S Yu. Graphscope: parameter-free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 687–696. ACM, 2007. G´abor J Sz´ekely, Maria L Rizzo, Nail K Bakirov, et al. Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6):2769–2794, 2007. Chayant Tantipathananandh and Tanya Y Berger-Wolf. Finding communities in dynamic social networks. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 1236–1241. IEEE, 2011. Gautier Marti, Frank Nielsen On clustering financial time series
  • 42. Introduction Dependence and Distribution Toward an extension to the multivariate case Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario N Mantegna. Cluster analysis for portfolio optimization. Journal of Economic Dynamics and Control, 32(1):235–258, 2008. Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and Rosario N Mantegna. A tool for filtering information in complex systems. Proceedings of the National Academy of Sciences of the United States of America, 102(30):10421–10426, 2005. Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna. Correlation, hierarchies, and networks in financial markets. Journal of Economic Behavior & Organization, 75(1):40–58, 2010. C´edric Villani. Gautier Marti, Frank Nielsen On clustering financial time series
  • 43. Introduction Dependence and Distribution Toward an extension to the multivariate case Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008. Kiyoung Yang and Cyrus Shahabi. A pca-based similarity measure for multivariate time series. In Proceedings of the 2nd ACM international workshop on Multimedia databases, pages 65–74. ACM, 2004. Kiyoung Yang and Cyrus Shahabi. On the stationarity of multivariate time series for correlation-based data analysis. In Data Mining, Fifth IEEE International Conference on, pages 4–pp. IEEE, 2005. Gautier Marti, Frank Nielsen On clustering financial time series