Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- môn bao bì thực phẩm by Lê Đông 311 views
- The autonomous vehicle revolution: ... by Autelligence 447 views
- Design Your Ecommerce Website With ... by eCommerce Ninja 245 views
- The Ecommerceninja Web Design Agency by eCommerce Ninja 347 views
- Nutrifit parcial vane by vanessaghia12 318 views
- ”Learning a Second Language with Mu... by Jess Sarabia 191 views

This talk was given at the Paris Machine Learning Meetup.

No Downloads

Total views

424

On SlideShare

0

From Embeds

0

Number of Embeds

0

Shares

0

Downloads

24

Comments

3

Likes

2

No notes for slide

- 1. Introduction How to deﬁne a distance between two random walks? Applications Conclusion How to cluster random walks? Paris Machine Learning #5 Season 2: Time Series and FinTech Philippe Donnat1 Gautier Marti1,2 Frank Nielsen2 Philippe Very1 1Hellebore Capital Management 2Ecole Polytechnique th January Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 2. Introduction How to deﬁne a distance between two random walks? Applications Conclusion 1 Introduction Data Science for the CDS market How to group random walks? What is a clustering program? 2 How to deﬁne a distance between two random walks? Standard approach on time series Comovements and distributions GNPR: the best of both worlds 3 Applications Results on synthetic datasets Clustering Credit Default Swaps 4 Conclusion Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 3. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Data Science for the CDS market How to group random walks? What is a clustering program? 1 Introduction Data Science for the CDS market How to group random walks? What is a clustering program? 2 How to deﬁne a distance between two random walks? Standard approach on time series Comovements and distributions GNPR: the best of both worlds 3 Applications Results on synthetic datasets Clustering Credit Default Swaps 4 Conclusion Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 4. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Data Science for the CDS market How to group random walks? What is a clustering program? Hellebore Capital Management & Data Science Current R&D projects in Data Science: Data mining: parsing & natural language processing Inference: incomplete data sources Portfolio & Risk analysis: understanding joint behaviours Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 5. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Data Science for the CDS market How to group random walks? What is a clustering program? Do you see clusters? Random walks French banks and building materials CDS over 2006-2010 Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 6. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Data Science for the CDS market How to group random walks? What is a clustering program? Do you see clusters? Random walks French banks and building materials CDS over 2006-2010 Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 7. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Data Science for the CDS market How to group random walks? What is a clustering program? Do you see clusters? Random walks French banks and building materials CDS over 2006-2010 Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 8. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Data Science for the CDS market How to group random walks? What is a clustering program? Do you see clusters? Random walks French banks and building materials CDS over 2006-2010 Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 9. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Data Science for the CDS market How to group random walks? What is a clustering program? Do you see clusters? Random walks French banks and building materials CDS over 2006-2010 Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 10. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Data Science for the CDS market How to group random walks? What is a clustering program? Do you see clusters? Random walks French banks and building materials CDS over 2006-2010 Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 11. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Data Science for the CDS market How to group random walks? What is a clustering program? Do you see clusters? Random walks French banks and building materials CDS over 2006-2015 Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 12. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Data Science for the CDS market How to group random walks? What is a clustering program? Do you see clusters? Random walks French banks and building materials CDS over 2006-2015 Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 13. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Data Science for the CDS market How to group random walks? What is a clustering program? What is a clustering program? Deﬁnition Clustering is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than those in diﬀerent groups. Deﬁnition We aim at ﬁnding K groups by positioning K group centers {c1, . . . , cK } such that data points {x1, . . . , xn} minimize min c1,...,cK n i=1 K min j=1 d(xi , cj )2 But, what is the distance d between two random walks? Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 14. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Standard approach on time series Comovements and distributions GNPR: the best of both worlds 1 Introduction Data Science for the CDS market How to group random walks? What is a clustering program? 2 How to deﬁne a distance between two random walks? Standard approach on time series Comovements and distributions GNPR: the best of both worlds 3 Applications Results on synthetic datasets Clustering Credit Default Swaps 4 Conclusion Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 15. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Standard approach on time series Comovements and distributions GNPR: the best of both worlds Naive distance between two random walks random walks Y ,Y d· −→ increments X,X covariance scatterplot X = (y2 − y1, . . . , yT − yT−1) X,X points in RT : ||X − X ||2 = T−1 i=1 (Xi − Xi )2 apply normalizations: e.g. (X − µ)/σ, (X − min)/(max − min) capture rather well comovements drawbacks: not robust to outliers, blind to signal shape Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 16. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Standard approach on time series Comovements and distributions GNPR: the best of both worlds Our approach: split comovements and distributions ? Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 17. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Standard approach on time series Comovements and distributions GNPR: the best of both worlds GNPR: A suitable representation Deﬁnition GNPR (Generic Non-Parametric Representation) projection: T : VN → UN × GN (1) X → (GX (X), GX ) GX : x → P[X ≤ x] cumulative distribution function GX (X) ∼ U[0, 1] 1 T rank(Xt) = 1 T k≤T 1{Xk ≤ Xt} →T∞ P[X ≤ Xt] = GX (Xt) Property T is a bijection. N.B. It replicates Sklar’s theorem, the seminal result of Copula Theory. Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 18. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Standard approach on time series Comovements and distributions GNPR: the best of both worlds A distance dθ leveraging GNPR Deﬁnition Let (X, Y ) ∈ V2. Let GX , GY be vectors of marginal cdf. Let θ ∈ [0, 1]. We deﬁne the following distance d2 θ (X, Y ) = θd2 1 (GX (X), GY (Y )) + (1 − θ)d2 0 (GX , GY ), (2) where d2 1 (GX (X), GY (Y )) = 3E[|GX (X) − GY (Y )|2 ], (3) and d2 0 (GX , GY ) = 1 2 R dGX dλ − dGY dλ 2 dλ. (4) d0 Hellinger; d1 = (1 − ρS )/2, with ρS the rank correlation between X and Y . Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 19. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Standard approach on time series Comovements and distributions GNPR: the best of both worlds GNPR θ = 1: Increase of correlation Correlation Density −0.2 0.0 0.2 0.4 0.6 0.8 1.0 0.00.51.01.52.02.53.03.5 Pearson Correlation Spearman Correlation Distribution of Correlations 10% more correlation, in average, using GNPR θ = 1 (rank statistics) instead of standard correlation Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 20. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Standard approach on time series Comovements and distributions GNPR: the best of both worlds GNPR θ = 0: Find distribution peculiarities Parametric modelling: Real-life CDS variations: Which distribution? Nokia −310 −280 −250 −220 −190 −160 −130 −100 −70 −40 −10 10 30 50 70 90 110 130 150 170 190 Distribution Nokia IncrementLogDensity 5e−051e−042e−045e−041e−032e−035e−031e−022e−02 020040060080010001200 Nokia 5Y CDS Time CDSSpread Jan−2006 Oct−2006 Jul−2007 Apr−2008 Feb−2009 Nov−2009 Sep−2010 Jul−2011 Apr−2012 Jan−2013 Oct−2013 Jul−2014 Telecom Italia −62 −56 −50 −44 −38 −32 −26 −20 −14 −8 −2 2 6 10 16 22 28 34 40 46 52 58 64 Distribution Telecom Italia Increment LogDensity 2e−045e−041e−032e−035e−031e−022e−025e−021e−01 100200300400500 Telecom Italia 5Y CDS Time CDSSpread Jan−2006 Dec−2006 Nov−2007 Oct−2008 Sep−2009 Jul−2010 Jun−2011 May−2012 Apr−2013 Mar−2014 Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 21. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Results on synthetic datasets Clustering Credit Default Swaps 1 Introduction Data Science for the CDS market How to group random walks? What is a clustering program? 2 How to deﬁne a distance between two random walks? Standard approach on time series Comovements and distributions GNPR: the best of both worlds 3 Applications Results on synthetic datasets Clustering Credit Default Swaps 4 Conclusion Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 22. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Results on synthetic datasets Clustering Credit Default Swaps Description of the testing datasets We deﬁne some interesting test case datasets to study: distribution clustering (dataset A), dependence clustering (dataset B), a mix of both (dataset C). Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 23. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Results on synthetic datasets Clustering Credit Default Swaps Results: GNPR works! Adjusted Rand Index Representation Algorithm A B C X Ward 0 0.94 0.42 (X − µX )/σX Ward 0 0.94 0.42 (X − min)/(max − min) Ward 0 0.48 0.45 GNPR θ = 0 Ward 1 0 0.47 GNPR θ = 1 Ward 0 0.91 0.72 GNPR θ Ward 1 0.92 1 X k-means++ 0 0.90 0.44 (X − µX )/σX k-means++ 0 0.91 0.45 (X − min)/(max − min) k-means++ 0.11 0.55 0.47 GNPR θ = 0 k-means++ 1 0 0.53 GNPR θ = 1 k-means++ 0.06 0.99 0.80 GNPR θ k-means++ 1 0.99 1 Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 24. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Results on synthetic datasets Clustering Credit Default Swaps HCMapper: Compare Hierarchical Clustering Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 25. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Results on synthetic datasets Clustering Credit Default Swaps HCMapper: Compare Hierarchical Clustering Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 26. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Results on synthetic datasets Clustering Credit Default Swaps “Western sovereigns”cluster Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 27. Introduction How to deﬁne a distance between two random walks? Applications Conclusion 1 Introduction Data Science for the CDS market How to group random walks? What is a clustering program? 2 How to deﬁne a distance between two random walks? Standard approach on time series Comovements and distributions GNPR: the best of both worlds 3 Applications Results on synthetic datasets Clustering Credit Default Swaps 4 Conclusion Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 28. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Conclusion: Take Home Message GNPR is a way to deal separately with dependence information, distribution information, without loosing any. Avenue for research: better aggregation: generalized means? consistency proof? any idea of interesting random walks outside ﬁnance? Check out www.datagrapple.com to follow our R&D stuﬀ and news about the CDS market! Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 29. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Internships If interested, please contact gautier.marti@helleborecapital.com philippe.very@helleborecapital.com for further details and application. Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 30. Introduction How to deﬁne a distance between two random walks? Applications Conclusion References I Shai Ben-David and Ulrike Von Luxburg, Relating clustering stability to properties of cluster boundaries., COLT, vol. 2008, 2008, pp. 379–390. Shai Ben-David, Ulrike Von Luxburg, and D´avid P´al, A sober look at clustering stability, Learning theory, Springer, 2006, pp. 5–19. Asa Ben-Hur, Andre Elisseeﬀ, and Isabelle Guyon, A stability based method for discovering structure in clustered data, Paciﬁc symposium on biocomputing, vol. 7, 2001, pp. 6–17. Tilman Lange, Volker Roth, Mikio L Braun, and Joachim M Buhmann, Stability-based validation of clustering solutions, Neural computation 16 (2004), no. 6, 1299–1323. Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 31. Introduction How to deﬁne a distance between two random walks? Applications Conclusion How to ﬁnd θ ? Clustering stability using perturbations due to: bootstrap time-sliding window draws from an oracle stability pros [BHEG01, LRBB04] stability cons [BDVLP06, BDVL08] theta T accuracy Accuracy landscape 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.2 0.4 0.6 0.8 1.0 ARI θ Accuracy and stability of clustering using GNPR Accuracy Bootstrap stability Time stability Cross-datasets stability Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 32. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Properties of dθ Property ∀θ ∈ [0, 1], 0 ≤ dθ ≤ 1 Property For 0 < θ < 1, dθ is a metric For θ ∈ {0, 1}, U ∼ U[0, 1] = 1 − U, yet d0(U, 1 − U) = 0 V = 2V , yet d1(V , 2V ) = 0 Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 33. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Description of the testing datasets For 1 ≤ i ≤ N = pK S s=1 Ks, we deﬁne Xi = S s=1 Ks k=1 βs k,i Y s k + K k=1 αk,i Zi k, (5) where a) αk,i = 1, if i ≡ k − 1 (mod K), 0 otherwise; b) βs k ∈ [0, 1], c) βs k,i = βs k, if iKs/N = k, 0 otherwise. (Xi )N i=1 are partitioned into Q = K S s=1 Ks clusters of p random variables each. Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 34. Introduction How to deﬁne a distance between two random walks? Applications Conclusion OTC data ﬂow processing Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?
- 35. Introduction How to deﬁne a distance between two random walks? Applications Conclusion Conjecture: consistency of clustering random walks 200 400 600 0 500 1000 1500 2000 T N 0.4 0.6 0.8 1.0 ARI 0.0 0.2 0.4 0.6 0.8 1.0 0 500 1000 1500 2000 ARI T Clustering convergence to the ground-truth partition Clustering distribution θ = 0 Clustering dependence θ = 1 Clustering total information θ = θ* Philippe Donnat, Gautier Marti, Frank Nielsen, Philippe Very How to cluster random walks?

No public clipboards found for this slide

Login to see the comments