## Just for you: FREE 60-day trial to the world’s largest digital library.

The SlideShare family just got bigger. Enjoy access to millions of ebooks, audiobooks, magazines, and more from Scribd.

Cancel anytime.Free with a 14 day trial from Scribd

- 1. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion A closer look at correlations Paris Machine Learning Meetup #3 Season 4 G. Marti, S. Andler, F. Nielsen, P. Donnat HELLEBORECAPITAL November 9, 2016 Gautier Marti A closer look at correlations
- 2. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 3. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion What is correlation? E[Xi Xj ] − E[Xi ]E[Xj ] (E[X2 i ] − E[Xi ]2)(E[X2 j ] − E[Xj ]2) ∈ [−1, 1] N k=1(xik − xi )(xjk − xj ) N k=1(xik − xi )2 N k=1(xjk − xj )2 ∈ [−1, 1] import numpy as np np.corrcoef(x_i,x_j) Gautier Marti A closer look at correlations
- 4. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 5. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 6. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Pearson correlation Gautier Marti A closer look at correlations
- 7. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Pearson correlation Gautier Marti A closer look at correlations
- 8. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Pearson correlation Gautier Marti A closer look at correlations
- 9. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Pearson correlation Gautier Marti A closer look at correlations
- 10. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Pearson correlation Gautier Marti A closer look at correlations
- 11. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Pearson correlation Gautier Marti A closer look at correlations
- 12. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Pearson correlation with outliers Gautier Marti A closer look at correlations
- 13. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 14. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
- 15. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
- 16. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
- 17. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
- 18. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
- 19. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
- 20. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Pearson correlation coeﬃcient Spearman correlation coeﬃcient Spearman correlation with outliers Gautier Marti A closer look at correlations
- 21. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 22. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 23. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC From ranks to empirical copula Sklar’s Theorem [3] For (Xi , Xj ) having continuous marginal cdfs FXi , FXj , its joint cumulative distribution F is uniquely expressed as F(Xi , Xj ) = C(FXi (Xi ), FXj (Xj )), where C is known as the copula of (Xi , Xj ). Gautier Marti A closer look at correlations
- 24. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC Minimum, Independence, Maximum copulas Fr´echet–Hoeﬀding copula bounds For any copula C : [0, 1]2 → [0, 1] and any (u, v) ∈ [0, 1]2 the following bounds hold: W(u, v) ≤ C(u, v) ≤ M(u, v), where W is the copula for counter-monotonic random variables, and M is the copula for co-monotonic random variables. 0 0.5 1 ui 0 0.5 1 uj w(ui,uj) 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0 0.5 1 ui 0 0.5 1 uj W(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.5 1 ui 0 0.5 1 uj π(ui,uj) 0.00036 0.00037 0.00038 0.00039 0.00040 0.00041 0.00042 0.00043 0.00044 0 0.5 1 ui 0 0.5 1 uj Π(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.5 1 ui 0 0.5 1 uj m(ui,uj) 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0 0.5 1 ui 0 0.5 1 uj M(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Gautier Marti A closer look at correlations
- 25. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 26. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
- 27. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
- 28. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC Which metric? (Regularized) Optimal Transport Distance is the minimum cost of transportation to transform one pile of dirt into another one, i.e. the amount of dirt moved times the distance by which it is moved. EMD = |x1 − x2| EMD = 1 6|x1 − x3| + 1 6|x2 − x3| Gautier Marti A closer look at correlations
- 29. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC Which metric? (Regularized) Optimal Transport Its geometry has good properties in general [1], and for copulas [2]. 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 Bregman barycenter copula 0.0000 0.0008 0.0016 0.0024 0.0032 0.0040 0.0048 0.0056 0 0.5 1 0 0.5 1 Wasserstein barycenter copula 0.0000 0.0004 0.0008 0.0012 0.0016 0.0020 0.0024 0.0028 0.0032 Gautier Marti A closer look at correlations
- 30. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
- 31. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
- 32. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
- 33. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
- 34. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 35. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC The Target/Forget Dependence Coeﬃcient (TFDC) Gautier Marti A closer look at correlations
- 36. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC The Target/Forget Dependence Coeﬃcient (TFDC) Now, we can deﬁne our bespoke dependence coeﬃcient: Build the forget-dependence copulas {CF l }l Build the target-dependence copulas {CT k }k Compute the empirical copula Cij from xi , xj TFDC(Cij ) = minl D(CF l , Cij ) minl D(CF l , Cij ) + mink D(Cij , CT k ) Gautier Marti A closer look at correlations
- 37. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC TFDC Power 0.00.20.40.60.81.0 xvals power.cor[typ,] xvals power.cor[typ,] 0.00.20.40.60.81.0 xvals power.cor[typ,] xvals power.cor[typ,] cor dCor MIC ACE MMD CMMD RDC TFDC 0.00.20.40.60.81.0 xvals power.cor[typ,] xvals power.cor[typ,] 0 20 40 60 80 100 0.00.20.40.60.81.0 xvals power.cor[typ,] 0 20 40 60 80 100 xvals power.cor[typ,] Noise Level Power Figure: Power of several dependence coeﬃcients as a function of the noise level in eight diﬀerent scenarios. Insets show the noise-free form of each association pattern. The coeﬃcient power was estimated via 500 simulations with sample size 500 each. Gautier Marti A closer look at correlations
- 38. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 39. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 40. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Clustering of empirical copulas Gautier Marti A closer look at correlations
- 41. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Financial correlations - Stocks CAC 40 Figure: Stocks: More mass in the bottom-left corner, i.e. lower tail dependence. Stock prices tend to plummet together. Gautier Marti A closer look at correlations
- 42. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Financial correlations - Credit Default Swaps Figure: Credit default swaps: More mass in the top-right corner, i.e. upper tail dependence. Insurance cost against entities’ default tends to soar in stressed market. Gautier Marti A closer look at correlations
- 43. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Financial correlations - FX rates Figure: FX rates: Empirical copulas show that dependence between FX rates are various. For example, rates may exhibit either strong dependence or independence while being anti-correlated during extreme events. Gautier Marti A closer look at correlations
- 44. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Associations between features in UCI datasets Dependence patterns (= clustering centroids) found between features in UCI datasets Breast Cancer (wdbc) 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Libras Movement 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Parkinsons 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Gamma Telescope 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Gautier Marti A closer look at correlations
- 45. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 46. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC The Art of formulating questions about correlations Encode your dependence hypothesis as a copula, and your query as a “k-NN search”. Gautier Marti A closer look at correlations
- 47. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion 1 Introduction 2 Standard correlation coeﬃcients Pearson correlation coeﬃcient Spearman correlation coeﬃcient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coeﬃcient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
- 48. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Summary Designing data-driven tailored correlation coeﬃcients Gautier Marti A closer look at correlations
- 49. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Take Home Message Gautier Marti A closer look at correlations
- 50. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Internships at Hellebore If you are interested by an internship at Hellebore in applied machine learning for Finance (NLP, Text Classiﬁcation, Information Extraction), please contact: stage@helleboretech.com in ML/Finance research (copulas, bayesian inference, clustering, time series analysis), please contact: gmarti@helleborecapital.com Gautier Marti A closer look at correlations
- 51. HELLEBORECAPITAL Introduction Standard correlation coeﬃcients A metric space for copulas Applications Conclusion Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages 2292–2300, 2013. Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe Donnat. Optimal transport vs. ﬁsher-rao distance between copulas for clustering multivariate time series. In IEEE Statistical Signal Processing Workshop, SSP 2016, Palma de Mallorca, Spain, June 26-29, 2016, pages 1–5, 2016. A Sklar. Fonctions de r´epartition `a n dimensions et leurs marges. Universit´e Paris 8, 1959. Gautier Marti A closer look at correlations