HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
A closer look at correlations
Paris Machine Learning Meetup #3 Season 4
G. Marti, S. Andler, F. Nielsen, P. Donnat
HELLEBORECAPITAL
November 9, 2016
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
What is correlation?
E[Xi Xj ] − E[Xi ]E[Xj ]
(E[X2
i ] − E[Xi ]2)(E[X2
j ] − E[Xj ]2)
∈ [−1, 1]
N
k=1(xik
− xi )(xjk
− xj )
N
k=1(xik
− xi )2 N
k=1(xjk
− xj )2
∈ [−1, 1]
import numpy as np
np.corrcoef(x_i,x_j)
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Pearson correlation with outliers
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Spearman correlation with outliers
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
From ranks to empirical copula
Sklar’s Theorem [3]
For (Xi , Xj ) having continuous marginal cdfs FXi
, FXj
, its joint cumulative
distribution F is uniquely expressed as
F(Xi , Xj ) = C(FXi
(Xi ), FXj
(Xj )),
where C is known as the copula of (Xi , Xj ).
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
Minimum, Independence, Maximum copulas
Fr´echet–Hoeffding copula bounds
For any copula C : [0, 1]2
→ [0, 1] and any (u, v) ∈ [0, 1]2
the following
bounds hold:
W(u, v) ≤ C(u, v) ≤ M(u, v),
where W is the copula for counter-monotonic random variables, and M
is the copula for co-monotonic random variables.
0 0.5 1
ui
0
0.5
1
uj
w(ui,uj)
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0 0.5 1
ui
0
0.5
1
uj
W(ui,uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 0.5 1
ui
0
0.5
1
uj
π(ui,uj)
0.00036
0.00037
0.00038
0.00039
0.00040
0.00041
0.00042
0.00043
0.00044
0 0.5 1
ui
0
0.5
1
uj Π(ui,uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 0.5 1
ui
0
0.5
1
uj
m(ui,uj)
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0 0.5 1
ui
0
0.5
1
uj
M(ui,uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
Which metric? (Regularized) Optimal Transport
Distance is the minimum cost of transportation to transform one
pile of dirt into another one, i.e. the amount of dirt moved times
the distance by which it is moved.
EMD = |x1 − x2| EMD = 1
6|x1 − x3| + 1
6|x2 − x3|
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
Which metric? (Regularized) Optimal Transport
Its geometry has good properties in general [1], and for copulas [2].
0 0.5 1
0
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
0 0.5 1
0
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
0 0.5 1
0
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
0 0.5 1
0
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
0 0.5 1
0
0.5
1 Bregman barycenter copula
0.0000
0.0008
0.0016
0.0024
0.0032
0.0040
0.0048
0.0056
0 0.5 1
0
0.5
1 Wasserstein barycenter copula
0.0000
0.0004
0.0008
0.0012
0.0016
0.0020
0.0024
0.0028
0.0032
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
The Target/Forget Dependence Coefficient (TFDC)
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
The Target/Forget Dependence Coefficient (TFDC)
Now, we can define our bespoke dependence coefficient:
Build the forget-dependence copulas {CF
l }l
Build the target-dependence copulas {CT
k }k
Compute the empirical copula Cij from xi , xj
TFDC(Cij ) =
minl D(CF
l , Cij )
minl D(CF
l , Cij ) + mink D(Cij , CT
k )
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
TFDC Power
0.00.20.40.60.81.0
xvals
power.cor[typ,]
xvals
power.cor[typ,]
0.00.20.40.60.81.0
xvals
power.cor[typ,]
xvals
power.cor[typ,]
cor
dCor
MIC
ACE
MMD
CMMD
RDC
TFDC
0.00.20.40.60.81.0
xvals
power.cor[typ,]
xvals
power.cor[typ,]
0 20 40 60 80 100
0.00.20.40.60.81.0
xvals
power.cor[typ,]
0 20 40 60 80 100
xvals
power.cor[typ,]
Noise Level
Power
Figure: Power of several dependence coefficients as a function of the
noise level in eight different scenarios. Insets show the noise-free form of
each association pattern. The coefficient power was estimated via 500
simulations with sample size 500 each.
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Explore the correlations with clustering
Query your dataset about correlations with TFDC
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Explore the correlations with clustering
Query your dataset about correlations with TFDC
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Explore the correlations with clustering
Query your dataset about correlations with TFDC
Clustering of empirical copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Explore the correlations with clustering
Query your dataset about correlations with TFDC
Financial correlations - Stocks CAC 40
Figure: Stocks: More mass in the bottom-left corner, i.e. lower tail
dependence. Stock prices tend to plummet together.
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Explore the correlations with clustering
Query your dataset about correlations with TFDC
Financial correlations - Credit Default Swaps
Figure: Credit default swaps: More mass in the top-right corner, i.e.
upper tail dependence. Insurance cost against entities’ default tends to
soar in stressed market.
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Explore the correlations with clustering
Query your dataset about correlations with TFDC
Financial correlations - FX rates
Figure: FX rates: Empirical copulas show that dependence between FX
rates are various. For example, rates may exhibit either strong
dependence or independence while being anti-correlated during extreme
events.
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Explore the correlations with clustering
Query your dataset about correlations with TFDC
Associations between features in UCI datasets
Dependence patterns (= clustering centroids) found between features in UCI datasets
Breast Cancer (wdbc) 0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
Libras Movement 0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
Parkinsons 0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
Gamma Telescope 0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
0 0.5 1
0
0.5
1
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Explore the correlations with clustering
Query your dataset about correlations with TFDC
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Explore the correlations with clustering
Query your dataset about correlations with TFDC
The Art of formulating questions about correlations
Encode your dependence hypothesis as a copula, and your query as a
“k-NN search”.
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Summary
Designing data-driven tailored correlation coefficients
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Take Home Message
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Internships at Hellebore
If you are interested by an internship at Hellebore
in applied machine learning for Finance (NLP, Text
Classification, Information Extraction), please contact:
stage@helleboretech.com
in ML/Finance research (copulas, bayesian inference,
clustering, time series analysis), please contact:
gmarti@helleborecapital.com
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Marco Cuturi.
Sinkhorn distances: Lightspeed computation of optimal
transport.
In Advances in Neural Information Processing Systems, pages
2292–2300, 2013.
Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe
Donnat.
Optimal transport vs. fisher-rao distance between copulas for
clustering multivariate time series.
In IEEE Statistical Signal Processing Workshop, SSP 2016,
Palma de Mallorca, Spain, June 26-29, 2016, pages 1–5, 2016.
A Sklar.
Fonctions de r´epartition `a n dimensions et leurs marges.
Universit´e Paris 8, 1959.
Gautier Marti A closer look at correlations

A closer look at correlations

  • 1.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion A closer look at correlations Paris Machine Learning Meetup #3 Season 4 G. Marti, S. Andler, F. Nielsen, P. Donnat HELLEBORECAPITAL November 9, 2016 Gautier Marti A closer look at correlations
  • 2.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 3.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion What is correlation? E[Xi Xj ] − E[Xi ]E[Xj ] (E[X2 i ] − E[Xi ]2)(E[X2 j ] − E[Xj ]2) ∈ [−1, 1] N k=1(xik − xi )(xjk − xj ) N k=1(xik − xi )2 N k=1(xjk − xj )2 ∈ [−1, 1] import numpy as np np.corrcoef(x_i,x_j) Gautier Marti A closer look at correlations
  • 4.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 5.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 6.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 7.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 8.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 9.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 10.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 11.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 12.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation with outliers Gautier Marti A closer look at correlations
  • 13.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 14.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 15.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 16.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 17.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 18.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 19.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 20.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation with outliers Gautier Marti A closer look at correlations
  • 21.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 22.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 23.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC From ranks to empirical copula Sklar’s Theorem [3] For (Xi , Xj ) having continuous marginal cdfs FXi , FXj , its joint cumulative distribution F is uniquely expressed as F(Xi , Xj ) = C(FXi (Xi ), FXj (Xj )), where C is known as the copula of (Xi , Xj ). Gautier Marti A closer look at correlations
  • 24.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC Minimum, Independence, Maximum copulas Fr´echet–Hoeffding copula bounds For any copula C : [0, 1]2 → [0, 1] and any (u, v) ∈ [0, 1]2 the following bounds hold: W(u, v) ≤ C(u, v) ≤ M(u, v), where W is the copula for counter-monotonic random variables, and M is the copula for co-monotonic random variables. 0 0.5 1 ui 0 0.5 1 uj w(ui,uj) 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0 0.5 1 ui 0 0.5 1 uj W(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.5 1 ui 0 0.5 1 uj π(ui,uj) 0.00036 0.00037 0.00038 0.00039 0.00040 0.00041 0.00042 0.00043 0.00044 0 0.5 1 ui 0 0.5 1 uj Π(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.5 1 ui 0 0.5 1 uj m(ui,uj) 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0 0.5 1 ui 0 0.5 1 uj M(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Gautier Marti A closer look at correlations
  • 25.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 26.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 27.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 28.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC Which metric? (Regularized) Optimal Transport Distance is the minimum cost of transportation to transform one pile of dirt into another one, i.e. the amount of dirt moved times the distance by which it is moved. EMD = |x1 − x2| EMD = 1 6|x1 − x3| + 1 6|x2 − x3| Gautier Marti A closer look at correlations
  • 29.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC Which metric? (Regularized) Optimal Transport Its geometry has good properties in general [1], and for copulas [2]. 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 Bregman barycenter copula 0.0000 0.0008 0.0016 0.0024 0.0032 0.0040 0.0048 0.0056 0 0.5 1 0 0.5 1 Wasserstein barycenter copula 0.0000 0.0004 0.0008 0.0012 0.0016 0.0020 0.0024 0.0028 0.0032 Gautier Marti A closer look at correlations
  • 30.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 31.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 32.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 33.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 34.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 35.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC The Target/Forget Dependence Coefficient (TFDC) Gautier Marti A closer look at correlations
  • 36.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC The Target/Forget Dependence Coefficient (TFDC) Now, we can define our bespoke dependence coefficient: Build the forget-dependence copulas {CF l }l Build the target-dependence copulas {CT k }k Compute the empirical copula Cij from xi , xj TFDC(Cij ) = minl D(CF l , Cij ) minl D(CF l , Cij ) + mink D(Cij , CT k ) Gautier Marti A closer look at correlations
  • 37.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC TFDC Power 0.00.20.40.60.81.0 xvals power.cor[typ,] xvals power.cor[typ,] 0.00.20.40.60.81.0 xvals power.cor[typ,] xvals power.cor[typ,] cor dCor MIC ACE MMD CMMD RDC TFDC 0.00.20.40.60.81.0 xvals power.cor[typ,] xvals power.cor[typ,] 0 20 40 60 80 100 0.00.20.40.60.81.0 xvals power.cor[typ,] 0 20 40 60 80 100 xvals power.cor[typ,] Noise Level Power Figure: Power of several dependence coefficients as a function of the noise level in eight different scenarios. Insets show the noise-free form of each association pattern. The coefficient power was estimated via 500 simulations with sample size 500 each. Gautier Marti A closer look at correlations
  • 38.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 39.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 40.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Clustering of empirical copulas Gautier Marti A closer look at correlations
  • 41.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Financial correlations - Stocks CAC 40 Figure: Stocks: More mass in the bottom-left corner, i.e. lower tail dependence. Stock prices tend to plummet together. Gautier Marti A closer look at correlations
  • 42.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Financial correlations - Credit Default Swaps Figure: Credit default swaps: More mass in the top-right corner, i.e. upper tail dependence. Insurance cost against entities’ default tends to soar in stressed market. Gautier Marti A closer look at correlations
  • 43.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Financial correlations - FX rates Figure: FX rates: Empirical copulas show that dependence between FX rates are various. For example, rates may exhibit either strong dependence or independence while being anti-correlated during extreme events. Gautier Marti A closer look at correlations
  • 44.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Associations between features in UCI datasets Dependence patterns (= clustering centroids) found between features in UCI datasets Breast Cancer (wdbc) 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Libras Movement 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Parkinsons 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Gamma Telescope 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Gautier Marti A closer look at correlations
  • 45.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 46.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC The Art of formulating questions about correlations Encode your dependence hypothesis as a copula, and your query as a “k-NN search”. Gautier Marti A closer look at correlations
  • 47.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 48.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Summary Designing data-driven tailored correlation coefficients Gautier Marti A closer look at correlations
  • 49.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Take Home Message Gautier Marti A closer look at correlations
  • 50.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Internships at Hellebore If you are interested by an internship at Hellebore in applied machine learning for Finance (NLP, Text Classification, Information Extraction), please contact: stage@helleboretech.com in ML/Finance research (copulas, bayesian inference, clustering, time series analysis), please contact: gmarti@helleborecapital.com Gautier Marti A closer look at correlations
  • 51.
    HELLEBORECAPITAL Introduction Standard correlation coefficients Ametric space for copulas Applications Conclusion Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages 2292–2300, 2013. Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe Donnat. Optimal transport vs. fisher-rao distance between copulas for clustering multivariate time series. In IEEE Statistical Signal Processing Workshop, SSP 2016, Palma de Mallorca, Spain, June 26-29, 2016, pages 1–5, 2016. A Sklar. Fonctions de r´epartition `a n dimensions et leurs marges. Universit´e Paris 8, 1959. Gautier Marti A closer look at correlations