SlideShare a Scribd company logo

A closer look at correlations

You may have already read many times that the job of a Data Scientist is to skim through a huge amount of data searching for correlations between some variables of interest. And also, that one of his worst enemies (besides correlation doesn't imply causation) is spurious correlation. But what really is correlation? Are there several types of correlations? Some "good", some "bad"? What about their estimation? This talk will be a very visual presentation around the notion of correlation and dependence. I will first illustrate how the standard linear correlation is estimated (Pearson coefficient), then some more robust alternative: the Spearman coefficient. Building on the geometric understanding of their nature, I will present a generalization that can help Data Scientists to explore, interpret, and measure the dependence (not necessarily linear or comonotonic) between the variables of a given dataset. Financial time series (stocks, credit default swaps, fx rates), and features from the UCI datasets are considered as use cases.

1 of 51
Download to read offline
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
A closer look at correlations
Paris Machine Learning Meetup #3 Season 4
G. Marti, S. Andler, F. Nielsen, P. Donnat
HELLEBORECAPITAL
November 9, 2016
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
What is correlation?
E[Xi Xj ] − E[Xi ]E[Xj ]
(E[X2
i ] − E[Xi ]2)(E[X2
j ] − E[Xj ]2)
∈ [−1, 1]
N
k=1(xik
− xi )(xjk
− xj )
N
k=1(xik
− xi )2 N
k=1(xjk
− xj )2
∈ [−1, 1]
import numpy as np
np.corrcoef(x_i,x_j)
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
1 Introduction
2 Standard correlation coefficients
Pearson correlation coefficient
Spearman correlation coefficient
3 A metric space for copulas
On the importance of the normalization
Which metric? (Regularized) Optimal Transport
A customizable dependence coefficient: TFDC
4 Applications
Explore the correlations with clustering
Query your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
Introduction
Standard correlation coefficients
A metric space for copulas
Applications
Conclusion
Pearson correlation coefficient
Spearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations

Recommended

Clustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence ratesClustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence ratesGautier Marti
 
Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?Gautier Marti
 
Optimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time SeriesOptimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time SeriesGautier Marti
 
Optimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between CopulasOptimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between CopulasGautier Marti
 
On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...Gautier Marti
 
Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...Gautier Marti
 
On the stability of clustering financial time series
On the stability of clustering financial time seriesOn the stability of clustering financial time series
On the stability of clustering financial time seriesGautier Marti
 
A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...Gautier Marti
 

More Related Content

What's hot

Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesAutoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesGautier Marti
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
 
Real time clustering of time series
Real time clustering of time seriesReal time clustering of time series
Real time clustering of time seriescsandit
 
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...Mokhtar SELLAMI
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierCSCJournals
 
Historical Simulation with Component Weight and Ghosted Scenarios
Historical Simulation with Component Weight and Ghosted ScenariosHistorical Simulation with Component Weight and Ghosted Scenarios
Historical Simulation with Component Weight and Ghosted Scenariossimonliuxinyi
 
Heuristic Sensing Schemes for Four-Target Detection in Time-Constrained Vecto...
Heuristic Sensing Schemes for Four-Target Detection in Time-Constrained Vecto...Heuristic Sensing Schemes for Four-Target Detection in Time-Constrained Vecto...
Heuristic Sensing Schemes for Four-Target Detection in Time-Constrained Vecto...sipij
 
Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 
31 Machine Learning Unsupervised Cluster Validity
31 Machine Learning Unsupervised Cluster Validity31 Machine Learning Unsupervised Cluster Validity
31 Machine Learning Unsupervised Cluster ValidityAndres Mendez-Vazquez
 

What's hot (12)

Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesAutoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 
Real time clustering of time series
Real time clustering of time seriesReal time clustering of time series
Real time clustering of time series
 
SwingOptions
SwingOptionsSwingOptions
SwingOptions
 
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
 
ABC in Varanasi
ABC in VaranasiABC in Varanasi
ABC in Varanasi
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target Classifier
 
Historical Simulation with Component Weight and Ghosted Scenarios
Historical Simulation with Component Weight and Ghosted ScenariosHistorical Simulation with Component Weight and Ghosted Scenarios
Historical Simulation with Component Weight and Ghosted Scenarios
 
Report
ReportReport
Report
 
Heuristic Sensing Schemes for Four-Target Detection in Time-Constrained Vecto...
Heuristic Sensing Schemes for Four-Target Detection in Time-Constrained Vecto...Heuristic Sensing Schemes for Four-Target Detection in Time-Constrained Vecto...
Heuristic Sensing Schemes for Four-Target Detection in Time-Constrained Vecto...
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
31 Machine Learning Unsupervised Cluster Validity
31 Machine Learning Unsupervised Cluster Validity31 Machine Learning Unsupervised Cluster Validity
31 Machine Learning Unsupervised Cluster Validity
 

Similar to A closer look at correlations

Evoknow17 Large Scale Problems in Practice
Evoknow17 Large Scale Problems in PracticeEvoknow17 Large Scale Problems in Practice
Evoknow17 Large Scale Problems in PracticeFabio Caraffini
 
Factor analysis
Factor analysisFactor analysis
Factor analysis緯鈞 沈
 
PhD Completion Seminar
PhD Completion Seminar PhD Completion Seminar
PhD Completion Seminar Simone Romano
 
Correlation testing
Correlation testingCorrelation testing
Correlation testingSteve Bishop
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_JieMDO_Lab
 
Recommender system
Recommender systemRecommender system
Recommender systemBhumi Patel
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfJermaeDizon2
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlationdomsr
 
Cannonical correlation
Cannonical correlationCannonical correlation
Cannonical correlationdomsr
 
longitudinal.pdf
longitudinal.pdflongitudinal.pdf
longitudinal.pdfkulchi1
 
A Matrix Based Approach for Weighted Argumentation Frameworks
A Matrix Based Approach for Weighted Argumentation FrameworksA Matrix Based Approach for Weighted Argumentation Frameworks
A Matrix Based Approach for Weighted Argumentation FrameworksCarlo Taticchi
 
Evaluating the Stability and Credibility of Ontology Matching Methods
Evaluating the Stability and Credibility of Ontology Matching MethodsEvaluating the Stability and Credibility of Ontology Matching Methods
Evaluating the Stability and Credibility of Ontology Matching MethodsXing Niu
 
A Heuristic Approach for optimization of Non Linear process using Firefly Alg...
A Heuristic Approach for optimization of Non Linear process using Firefly Alg...A Heuristic Approach for optimization of Non Linear process using Firefly Alg...
A Heuristic Approach for optimization of Non Linear process using Firefly Alg...IJERA Editor
 
Similarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation ExtractionSimilarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation ExtractionAlexander Panchenko
 
Accounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarksAccounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarksDevansh16
 
Recent Advances in Flower Pollination Algorithm
Recent Advances in Flower Pollination AlgorithmRecent Advances in Flower Pollination Algorithm
Recent Advances in Flower Pollination AlgorithmEditor IJCATR
 
Limiting Logical Violations in Ontology Alignnment Through Negotiation
Limiting Logical Violations in Ontology Alignnment Through NegotiationLimiting Logical Violations in Ontology Alignnment Through Negotiation
Limiting Logical Violations in Ontology Alignnment Through NegotiationErnesto Jimenez Ruiz
 
Absolute Best Model For Forecasting
Absolute Best Model For ForecastingAbsolute Best Model For Forecasting
Absolute Best Model For ForecastingAnnie Hanson
 

Similar to A closer look at correlations (20)

Evoknow17 Large Scale Problems in Practice
Evoknow17 Large Scale Problems in PracticeEvoknow17 Large Scale Problems in Practice
Evoknow17 Large Scale Problems in Practice
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
CFA Fit Statistics
CFA Fit StatisticsCFA Fit Statistics
CFA Fit Statistics
 
PhD Completion Seminar
PhD Completion Seminar PhD Completion Seminar
PhD Completion Seminar
 
Correlation testing
Correlation testingCorrelation testing
Correlation testing
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_Jie
 
Recommender system
Recommender systemRecommender system
Recommender system
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdf
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
 
Cannonical correlation
Cannonical correlationCannonical correlation
Cannonical correlation
 
longitudinal.pdf
longitudinal.pdflongitudinal.pdf
longitudinal.pdf
 
A Matrix Based Approach for Weighted Argumentation Frameworks
A Matrix Based Approach for Weighted Argumentation FrameworksA Matrix Based Approach for Weighted Argumentation Frameworks
A Matrix Based Approach for Weighted Argumentation Frameworks
 
Evaluating the Stability and Credibility of Ontology Matching Methods
Evaluating the Stability and Credibility of Ontology Matching MethodsEvaluating the Stability and Credibility of Ontology Matching Methods
Evaluating the Stability and Credibility of Ontology Matching Methods
 
A Heuristic Approach for optimization of Non Linear process using Firefly Alg...
A Heuristic Approach for optimization of Non Linear process using Firefly Alg...A Heuristic Approach for optimization of Non Linear process using Firefly Alg...
A Heuristic Approach for optimization of Non Linear process using Firefly Alg...
 
Similarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation ExtractionSimilarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation Extraction
 
Accounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarksAccounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarks
 
Recent Advances in Flower Pollination Algorithm
Recent Advances in Flower Pollination AlgorithmRecent Advances in Flower Pollination Algorithm
Recent Advances in Flower Pollination Algorithm
 
Characteristics and simulation analysis of nonlinear correlation coefficient ...
Characteristics and simulation analysis of nonlinear correlation coefficient ...Characteristics and simulation analysis of nonlinear correlation coefficient ...
Characteristics and simulation analysis of nonlinear correlation coefficient ...
 
Limiting Logical Violations in Ontology Alignnment Through Negotiation
Limiting Logical Violations in Ontology Alignnment Through NegotiationLimiting Logical Violations in Ontology Alignnment Through Negotiation
Limiting Logical Violations in Ontology Alignnment Through Negotiation
 
Absolute Best Model For Forecasting
Absolute Best Model For ForecastingAbsolute Best Model For Forecasting
Absolute Best Model For Forecasting
 

More from Gautier Marti

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeGautier Marti
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...Gautier Marti
 
A quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsA quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsGautier Marti
 
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...Gautier Marti
 
How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?Gautier Marti
 
Generating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGenerating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGautier Marti
 
Applications of GANs in Finance
Applications of GANs in FinanceApplications of GANs in Finance
Applications of GANs in FinanceGautier Marti
 
My recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsMy recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsGautier Marti
 
Takeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaTakeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaGautier Marti
 
Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Gautier Marti
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationGautier Marti
 
Clustering Random Walk Time Series
Clustering Random Walk Time SeriesClustering Random Walk Time Series
Clustering Random Walk Time SeriesGautier Marti
 

More from Gautier Marti (12)

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...
 
A quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsA quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptions
 
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
 
How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?
 
Generating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGenerating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in Finance
 
Applications of GANs in Finance
Applications of GANs in FinanceApplications of GANs in Finance
Applications of GANs in Finance
 
My recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsMy recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returns
 
Takeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaTakeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, California
 
Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond Correlation
 
Clustering Random Walk Time Series
Clustering Random Walk Time SeriesClustering Random Walk Time Series
Clustering Random Walk Time Series
 

Recently uploaded

What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxJose Briones
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)UNCResearchHub
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsDataArchiva
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Thibaud Le Douarin
 
Business Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalBusiness Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalRavindra Nath Shukla
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for usersStephenEfange3
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfAustraliaChapterIIBA
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxMdRafiqulIslam403212
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxPoonamRijal
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaAdrian Sanabria
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxHizkiaJastis
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensKondapi V Siva Rama Brahmam
 

Recently uploaded (17)

What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptx
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data Goals
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
 
Business Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalBusiness Analytics _ Confidence Interval
Business Analytics _ Confidence Interval
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for users
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptx
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
Electricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptxElectricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptx
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptx
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix Enigma
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptx
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample Screens
 

A closer look at correlations

  • 1. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion A closer look at correlations Paris Machine Learning Meetup #3 Season 4 G. Marti, S. Andler, F. Nielsen, P. Donnat HELLEBORECAPITAL November 9, 2016 Gautier Marti A closer look at correlations
  • 2. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 3. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion What is correlation? E[Xi Xj ] − E[Xi ]E[Xj ] (E[X2 i ] − E[Xi ]2)(E[X2 j ] − E[Xj ]2) ∈ [−1, 1] N k=1(xik − xi )(xjk − xj ) N k=1(xik − xi )2 N k=1(xjk − xj )2 ∈ [−1, 1] import numpy as np np.corrcoef(x_i,x_j) Gautier Marti A closer look at correlations
  • 4. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 5. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 6. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 7. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 8. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 9. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 10. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 11. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation Gautier Marti A closer look at correlations
  • 12. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Pearson correlation with outliers Gautier Marti A closer look at correlations
  • 13. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 14. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 15. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 16. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 17. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 18. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 19. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation: Pearson on ranks Gautier Marti A closer look at correlations
  • 20. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Pearson correlation coefficient Spearman correlation coefficient Spearman correlation with outliers Gautier Marti A closer look at correlations
  • 21. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 22. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 23. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC From ranks to empirical copula Sklar’s Theorem [3] For (Xi , Xj ) having continuous marginal cdfs FXi , FXj , its joint cumulative distribution F is uniquely expressed as F(Xi , Xj ) = C(FXi (Xi ), FXj (Xj )), where C is known as the copula of (Xi , Xj ). Gautier Marti A closer look at correlations
  • 24. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC Minimum, Independence, Maximum copulas Fr´echet–Hoeffding copula bounds For any copula C : [0, 1]2 → [0, 1] and any (u, v) ∈ [0, 1]2 the following bounds hold: W(u, v) ≤ C(u, v) ≤ M(u, v), where W is the copula for counter-monotonic random variables, and M is the copula for co-monotonic random variables. 0 0.5 1 ui 0 0.5 1 uj w(ui,uj) 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0 0.5 1 ui 0 0.5 1 uj W(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.5 1 ui 0 0.5 1 uj π(ui,uj) 0.00036 0.00037 0.00038 0.00039 0.00040 0.00041 0.00042 0.00043 0.00044 0 0.5 1 ui 0 0.5 1 uj Π(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.5 1 ui 0 0.5 1 uj m(ui,uj) 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0 0.5 1 ui 0 0.5 1 uj M(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Gautier Marti A closer look at correlations
  • 25. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 26. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 27. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 28. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC Which metric? (Regularized) Optimal Transport Distance is the minimum cost of transportation to transform one pile of dirt into another one, i.e. the amount of dirt moved times the distance by which it is moved. EMD = |x1 − x2| EMD = 1 6|x1 − x3| + 1 6|x2 − x3| Gautier Marti A closer look at correlations
  • 29. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC Which metric? (Regularized) Optimal Transport Its geometry has good properties in general [1], and for copulas [2]. 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 0.0000 0.0015 0.0030 0.0045 0.0060 0.0075 0.0090 0.0105 0.0120 0 0.5 1 0 0.5 1 Bregman barycenter copula 0.0000 0.0008 0.0016 0.0024 0.0032 0.0040 0.0048 0.0056 0 0.5 1 0 0.5 1 Wasserstein barycenter copula 0.0000 0.0004 0.0008 0.0012 0.0016 0.0020 0.0024 0.0028 0.0032 Gautier Marti A closer look at correlations
  • 30. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 31. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 32. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 33. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC A metric space for copulas Gautier Marti A closer look at correlations
  • 34. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 35. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC The Target/Forget Dependence Coefficient (TFDC) Gautier Marti A closer look at correlations
  • 36. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC The Target/Forget Dependence Coefficient (TFDC) Now, we can define our bespoke dependence coefficient: Build the forget-dependence copulas {CF l }l Build the target-dependence copulas {CT k }k Compute the empirical copula Cij from xi , xj TFDC(Cij ) = minl D(CF l , Cij ) minl D(CF l , Cij ) + mink D(Cij , CT k ) Gautier Marti A closer look at correlations
  • 37. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC TFDC Power 0.00.20.40.60.81.0 xvals power.cor[typ,] xvals power.cor[typ,] 0.00.20.40.60.81.0 xvals power.cor[typ,] xvals power.cor[typ,] cor dCor MIC ACE MMD CMMD RDC TFDC 0.00.20.40.60.81.0 xvals power.cor[typ,] xvals power.cor[typ,] 0 20 40 60 80 100 0.00.20.40.60.81.0 xvals power.cor[typ,] 0 20 40 60 80 100 xvals power.cor[typ,] Noise Level Power Figure: Power of several dependence coefficients as a function of the noise level in eight different scenarios. Insets show the noise-free form of each association pattern. The coefficient power was estimated via 500 simulations with sample size 500 each. Gautier Marti A closer look at correlations
  • 38. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 39. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 40. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Clustering of empirical copulas Gautier Marti A closer look at correlations
  • 41. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Financial correlations - Stocks CAC 40 Figure: Stocks: More mass in the bottom-left corner, i.e. lower tail dependence. Stock prices tend to plummet together. Gautier Marti A closer look at correlations
  • 42. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Financial correlations - Credit Default Swaps Figure: Credit default swaps: More mass in the top-right corner, i.e. upper tail dependence. Insurance cost against entities’ default tends to soar in stressed market. Gautier Marti A closer look at correlations
  • 43. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Financial correlations - FX rates Figure: FX rates: Empirical copulas show that dependence between FX rates are various. For example, rates may exhibit either strong dependence or independence while being anti-correlated during extreme events. Gautier Marti A closer look at correlations
  • 44. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC Associations between features in UCI datasets Dependence patterns (= clustering centroids) found between features in UCI datasets Breast Cancer (wdbc) 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Libras Movement 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Parkinsons 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Gamma Telescope 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Gautier Marti A closer look at correlations
  • 45. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 46. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Explore the correlations with clustering Query your dataset about correlations with TFDC The Art of formulating questions about correlations Encode your dependence hypothesis as a copula, and your query as a “k-NN search”. Gautier Marti A closer look at correlations
  • 47. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion 1 Introduction 2 Standard correlation coefficients Pearson correlation coefficient Spearman correlation coefficient 3 A metric space for copulas On the importance of the normalization Which metric? (Regularized) Optimal Transport A customizable dependence coefficient: TFDC 4 Applications Explore the correlations with clustering Query your dataset about correlations with TFDC 5 Conclusion Gautier Marti A closer look at correlations
  • 48. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Summary Designing data-driven tailored correlation coefficients Gautier Marti A closer look at correlations
  • 49. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Take Home Message Gautier Marti A closer look at correlations
  • 50. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Internships at Hellebore If you are interested by an internship at Hellebore in applied machine learning for Finance (NLP, Text Classification, Information Extraction), please contact: stage@helleboretech.com in ML/Finance research (copulas, bayesian inference, clustering, time series analysis), please contact: gmarti@helleborecapital.com Gautier Marti A closer look at correlations
  • 51. HELLEBORECAPITAL Introduction Standard correlation coefficients A metric space for copulas Applications Conclusion Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages 2292–2300, 2013. Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe Donnat. Optimal transport vs. fisher-rao distance between copulas for clustering multivariate time series. In IEEE Statistical Signal Processing Workshop, SSP 2016, Palma de Mallorca, Spain, June 26-29, 2016, pages 1–5, 2016. A Sklar. Fonctions de r´epartition `a n dimensions et leurs marges. Universit´e Paris 8, 1959. Gautier Marti A closer look at correlations