This document discusses clustering financial time series data using distances between dependent random variables. It notes that traditional clustering based only on correlation can lead to spurious clusters, as correlation does not fully capture dependence. The paper proposes a distance measure that combines information about both the correlation and distribution of random variables. It tests this distance measure on synthetic data from a hierarchical block model and real credit default swap market data, finding it performs better than distances based only on correlation or distribution individually. Some open questions are also discussed, such as how to select the optimal weighting of correlation vs distribution information.
Clustering Financial Time Series using their Correlations and their Distribut...Gautier Marti
This document discusses methods for clustering random walks. It introduces the GNPR (Generic Non-Parametric Representation) method for defining a distance between two random walks that separates dependence and distribution information. The GNPR method is shown to outperform standard approaches on synthetic datasets containing different clusters based on distribution and dependence. The GNPR method is also used to cluster credit default swaps, identifying a cluster of "Western sovereigns". The document concludes that GNPR is an effective way to deal with dependence and distribution information separately without losing information.
Optimal Transport between Copulas for Clustering Time SeriesGautier Marti
Presentation slides of our ICASSP 2016 conference paper in Shanghai. They describe the motivation and design of the Target Dependence Coefficient, a coefficient which can target or forget specific dependence relationships between the variables. This coefficient can be useful for clustering financial time series. Several of such use-cases are described on our Tech Blog https://www.datagrapple.com/Tech/optimal-copula-transport.html
Some contributions to the clustering of financial time series - Applications ...Gautier Marti
This document discusses contributions to clustering financial time series, specifically credit default swap data. It introduces credit default swaps and the raw data set. It then discusses challenges in clustering financial time series due to non-stationarity and noisy correlations. It presents initial work on analyzing the consistency of clustering as the sample size increases, through simulations in a simplified setting. Finally, it proposes a two-step approach to proving consistency, by first identifying geometrical configurations that lead to the true clustering structure.
Clustering CDS: algorithms, distances, stability and convergence ratesGautier Marti
Talk given at CMStatistics 2016 (http://cmstatistics.org/CMStatistics2016/).
The standard methodology for clustering financial time series is quite brittle to outliers / heavy-tails for many reasons: Single Linkage / MST suffers from the chaining phenomenon; Pearson correlation coefficient is relevant for Gaussian distributions which is usually not the case for financial returns (especially for credit derivatives). At Hellebore Capital Ltd, we strive to improve the methodology and to ground it. We think that stability is a paramount property to verify, which is closely linked to statistical convergence rates of the methodologies (combination of clustering algorithms and dependence estimators). This gives us a model selection criterion: The best clustering methodology is the methodology that can reach a given 'accuracy' with the minimum sample size.
Optimal Transport vs. Fisher-Rao distance between CopulasGautier Marti
How can we compare two dependence structures (represented by copulas)? It depends on the task. For clustering variables with similar dependence, prefer Optimal Transport. For detecting change points in a dynamical dependence structure, prefer Fisher-Rao and its associated f-divergences (for example, an approach a la Frédéric Barbaresco in radar signal processing). This study illustrates these properties with bivariate Gaussian copulas.
On the stability of clustering financial time seriesGautier Marti
Talk at IEEE ICMLA 2015 Miami
In this presentation, we suggest some data perturbations that can help to validate or reject a clustering methodology besides yielding insights on the time series at hand. We show in this study that Pearson correlation is not that relevant for clustering these time series since it yields unstable clusters; prefer a more robust measure such as Spearman correlation based on rank statistics.
Clustering Financial Time Series: How Long is Enough?Gautier Marti
IJCAI-16, New York, conference presentation of paper http://www.ijcai.org/Proceedings/16/Papers/367.pdf
Researchers have used from 30 days to several
years of daily returns as source data for clustering
financial time series based on their correlations.
This paper sets up a statistical framework to study
the validity of such practices. We first show that
clustering correlated random variables from their
observed values is statistically consistent. Then,
we also give a first empirical answer to the much
debated question: How long should the time series
be? If too short, the clusters found can be spurious;
if too long, dynamics can be smoothed out.
This document discusses clustering random walk time series. It introduces the concept of clustering and discusses challenges in clustering time series, such as how to define a distance between dependent random variables. It proposes using the copula transform and empirical copula transform to map time series to uniform distributions before calculating distances. The hierarchical block model for nested data partitions is presented. Experimental results show the proposed distances perform well in recovering the nested partitions on both synthetic hierarchical block model data and real credit default swap time series data. Consistency of clustering algorithms is defined in relation to recovering the hierarchical block model partitions.
Clustering Financial Time Series using their Correlations and their Distribut...Gautier Marti
This document discusses methods for clustering random walks. It introduces the GNPR (Generic Non-Parametric Representation) method for defining a distance between two random walks that separates dependence and distribution information. The GNPR method is shown to outperform standard approaches on synthetic datasets containing different clusters based on distribution and dependence. The GNPR method is also used to cluster credit default swaps, identifying a cluster of "Western sovereigns". The document concludes that GNPR is an effective way to deal with dependence and distribution information separately without losing information.
Optimal Transport between Copulas for Clustering Time SeriesGautier Marti
Presentation slides of our ICASSP 2016 conference paper in Shanghai. They describe the motivation and design of the Target Dependence Coefficient, a coefficient which can target or forget specific dependence relationships between the variables. This coefficient can be useful for clustering financial time series. Several of such use-cases are described on our Tech Blog https://www.datagrapple.com/Tech/optimal-copula-transport.html
Some contributions to the clustering of financial time series - Applications ...Gautier Marti
This document discusses contributions to clustering financial time series, specifically credit default swap data. It introduces credit default swaps and the raw data set. It then discusses challenges in clustering financial time series due to non-stationarity and noisy correlations. It presents initial work on analyzing the consistency of clustering as the sample size increases, through simulations in a simplified setting. Finally, it proposes a two-step approach to proving consistency, by first identifying geometrical configurations that lead to the true clustering structure.
Clustering CDS: algorithms, distances, stability and convergence ratesGautier Marti
Talk given at CMStatistics 2016 (http://cmstatistics.org/CMStatistics2016/).
The standard methodology for clustering financial time series is quite brittle to outliers / heavy-tails for many reasons: Single Linkage / MST suffers from the chaining phenomenon; Pearson correlation coefficient is relevant for Gaussian distributions which is usually not the case for financial returns (especially for credit derivatives). At Hellebore Capital Ltd, we strive to improve the methodology and to ground it. We think that stability is a paramount property to verify, which is closely linked to statistical convergence rates of the methodologies (combination of clustering algorithms and dependence estimators). This gives us a model selection criterion: The best clustering methodology is the methodology that can reach a given 'accuracy' with the minimum sample size.
Optimal Transport vs. Fisher-Rao distance between CopulasGautier Marti
How can we compare two dependence structures (represented by copulas)? It depends on the task. For clustering variables with similar dependence, prefer Optimal Transport. For detecting change points in a dynamical dependence structure, prefer Fisher-Rao and its associated f-divergences (for example, an approach a la Frédéric Barbaresco in radar signal processing). This study illustrates these properties with bivariate Gaussian copulas.
On the stability of clustering financial time seriesGautier Marti
Talk at IEEE ICMLA 2015 Miami
In this presentation, we suggest some data perturbations that can help to validate or reject a clustering methodology besides yielding insights on the time series at hand. We show in this study that Pearson correlation is not that relevant for clustering these time series since it yields unstable clusters; prefer a more robust measure such as Spearman correlation based on rank statistics.
Clustering Financial Time Series: How Long is Enough?Gautier Marti
IJCAI-16, New York, conference presentation of paper http://www.ijcai.org/Proceedings/16/Papers/367.pdf
Researchers have used from 30 days to several
years of daily returns as source data for clustering
financial time series based on their correlations.
This paper sets up a statistical framework to study
the validity of such practices. We first show that
clustering correlated random variables from their
observed values is statistically consistent. Then,
we also give a first empirical answer to the much
debated question: How long should the time series
be? If too short, the clusters found can be spurious;
if too long, dynamics can be smoothed out.
This document discusses clustering random walk time series. It introduces the concept of clustering and discusses challenges in clustering time series, such as how to define a distance between dependent random variables. It proposes using the copula transform and empirical copula transform to map time series to uniform distributions before calculating distances. The hierarchical block model for nested data partitions is presented. Experimental results show the proposed distances perform well in recovering the nested partitions on both synthetic hierarchical block model data and real credit default swap time series data. Consistency of clustering algorithms is defined in relation to recovering the hierarchical block model partitions.
A review of two decades of correlations, hierarchies, networks and clustering...Gautier Marti
Opinionated review of two decades of correlations, hierarchies,
networks and clustering in financial markets presented at Ton Duc Thang University in Ho Chi Minh City, Vietnam.
You may have already read many times that the job of a Data Scientist is to skim through a huge amount of data searching for correlations between some variables of interest. And also, that one of his worst enemies (besides correlation doesn't imply causation) is spurious correlation. But what really is correlation? Are there several types of correlations? Some "good", some "bad"? What about their estimation? This talk will be a very visual presentation around the notion of correlation and dependence. I will first illustrate how the standard linear correlation is estimated (Pearson coefficient), then some more robust alternative: the Spearman coefficient. Building on the geometric understanding of their nature, I will present a generalization that can help Data Scientists to explore, interpret, and measure the dependence (not necessarily linear or comonotonic) between the variables of a given dataset. Financial time series (stocks, credit default swaps, fx rates), and features from the UCI datasets are considered as use cases.
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesGautier Marti
In this talk, we present a CNN architecture for predicting autoregressive asynchronous time series. We illustrate its application on predicting traders’ quotes of credit default swaps (proprietary dataset from Hellebore Capital), and on artificial time series. The paper is available there: http://proceedings.mlr.press/v80/binkowski18a/binkowski18a.pdf
Using Vector Clocks to Visualize Communication FlowMartin Harrigan
The document describes a methodology for visualizing communication flow using vector clocks. It begins with an introduction discussing communication flow as a metaphor and the need to ground visualizations in the analyzed substance. It then outlines the methodology which involves (1) assigning each vertex a vector clock representing its position in a high-dimensional space, (2) updating vector clocks based on communications, (3) computing distances between vector clocks, (4) constructing a dissimilarity matrix, and (5) using multidimensional scaling to produce a 2D visualization. Experiments applying the methodology to artificial and real communication datasets are discussed. The conclusion covers future work opportunities around the distance metric, modeling additional properties, and scalability issues.
Approximate Bayesian computation (ABC) is a computational technique for Bayesian inference when the likelihood function is intractable or impossible to compute directly. ABC approximates the likelihood by simulating data under different parameter values and comparing simulated and observed data using summary statistics. ABC produces a parameter sample without evaluating the full likelihood function, thus allowing Bayesian inference when likelihoods are unavailable or difficult to compute.
This document discusses computational issues that arise in Bayesian statistics. It provides examples of latent variable models like mixture models that make computation difficult due to the large number of terms that must be calculated. It also discusses time series models like the AR(p) and MA(q) models, noting that they have complex parameter spaces due to stationarity constraints. The document outlines the Metropolis-Hastings algorithm, Gibbs sampler, and other methods like Population Monte Carlo and Approximate Bayesian Computation that can help address these computational challenges.
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...SYRTO Project
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time Series Models. Andre Lucas. Amsterdam - June, 25 2015. European Financial Management Association 2015 Annual Meetings.
Network and risk spillovers: a multivariate GARCH perspectiveSYRTO Project
M. Billio, M. Caporin, L. Frattarolo, L. Pelizzon: “Network and risk spillovers: a multivariate GARCH perspective”.
Final SYRTO Conference - Université Paris1 Panthéon-Sorbonne
February 19, 2016
Numerical smoothing and hierarchical approximations for efficient option pric...Chiheb Ben Hammouda
1. The document presents a numerical smoothing technique to improve the efficiency of option pricing and density estimation when analytic smoothing is not possible.
2. The technique involves numerically determining discontinuities in the integrand and computing the integral only over the smooth regions. It also uses hierarchical representations and Brownian bridges to reduce the effective dimension of the problem.
3. The numerical smoothing approach outperforms Monte Carlo methods for high dimensional cases and improves the complexity of multilevel Monte Carlo from O(TOL^-2.5) to O(TOL^-2 log(TOL)^2).
The document discusses Approximate Bayesian Computation (ABC), a computational technique for Bayesian inference when the likelihood function is intractable. ABC allows sampling from the likelihood and making inferences based on simulated data without calculating the actual likelihood. The technique originated in population genetics models where likelihoods for genetic polymorphism data cannot be calculated in closed form. ABC is presented as both an inference machine with its own legitimacy compared to classical Bayesian approaches, as well as a way to address computational issues with intractable likelihoods.
Scalable inference for a full multivariate stochastic volatilitySYRTO Project
Scalable inference for a full multivariate stochastic volatility
P. Dellaportas, A. Plataniotis and M. Titsias UCL(London), AUEB(Athens), AUEB(Athens)
Final SYRTO Conference - Université Paris1 Panthéon-Sorbonne
February 19, 2016
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
This document discusses using a maximum entropy approach to model loss distributions for operational risk modeling. It begins by motivating the need to accurately model loss distributions given challenges with limited datasets, including heavy tails and dependence between risks. It then provides an overview of the loss distribution approach commonly used in operational risk modeling and its limitations. The document introduces the maximum entropy approach, which frames the problem as maximizing entropy subject to moment constraints. It discusses using the Laplace transform of loss distributions to compress information into moments and how these can be estimated from sample data or fitted distributions to serve as constraints for the maximum entropy approach.
11.the comparative study of finite difference method and monte carlo method f...Alexander Decker
This document compares the finite difference method and Monte Carlo method for pricing European options. It provides an overview of these two primary numerical methods used in financial modeling. The Monte Carlo method simulates asset price paths and averages discounted payoffs to estimate option value. It is well-suited for path-dependent options but converges slower than finite difference. The finite difference method solves the Black-Scholes PDE by approximating it on a grid. Specifically, it discusses the Crank-Nicolson scheme, which is unconditionally stable and converges faster than Monte Carlo for standard options.
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...Mokhtar SELLAMI
This document presents a study comparing Long Short-Term Memory (LSTM) architectures for next frame forecasting in satellite image time series data. Three models - ConvLSTM, Stack-LSTM and CNN-LSTM - were implemented and evaluated based on training loss, time and structural similarity between predicted and actual images. The CNN-LSTM architecture was found to provide the best performance, achieving accurate predictions while requiring less processing time than ConvLSTM for higher resolution images. Overall, the study demonstrates the suitability of deep learning models like CNN-LSTM for predictive tasks using earth observation satellite imagery time series data.
This document discusses triangular tree-width, a new measure for directed graphs that is related to efficiently computing matrix permanents. Triangular tree-width is defined based on the tree-widths of the increasing and decreasing edge subgraphs of a directed graph with respect to a particular vertex ordering. The goal is that bounding the triangular tree-width of a matrix's incidence graph would allow its permanent to be computed efficiently, whereas the standard tree-width may be unbounded. The document outlines the basics of tree-width and motivates the definition of triangular tree-width.
This document provides an introduction to advanced Markov chain Monte Carlo (MCMC) methods. It begins with a motivating example using mixture models that have latent variables, making the likelihood intractable. This introduces challenges for Bayesian computation. The document then describes the Metropolis-Hastings algorithm, which allows generating samples from a target distribution using an ergodic Markov chain, even when direct sampling is impossible. Several extensions and properties of the Metropolis-Hastings algorithm are discussed.
Chapter 2. Multivariate Analysis of Stationary Time SeriesChengjun Wang
This document discusses multivariate time series analysis and vector autoregressive (VAR) models. It covers simulation and estimation of VAR models in R, as well as diagnostic testing, forecasting, impulse response analysis, and structural VAR (SVAR) models which impose restrictions to identify structural shocks. Methods for SVAR models include the A-model which restricts the A matrix and the B-model which restricts the B matrix. Impulse response functions and forecast error variance decompositions are used to analyze the dynamic effects of shocks in SVAR models.
Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Fi...Revolution Analytics
Dr. Sanjiv Das has held positions as at Citibank, Harvard University Professor and Program Director at the FDIC’s Center for Financial Research. His research relies heavily on R for analysis and decision-making. In this webinar, Dr. Das will present a mix of some of his more current and topical research that uses R-based models, and some pedagogical applications of R. He will present:
* An R-based model for optimizing loan modifications on distressed home loans, and the economics of these modifications.
* A goal-based portfolio optimization model for investors who use derivatives.
*Using network modeling tools in R to detect systemically risky financial institutions.
*Using R for web delivery of financial models and random generation of pedagogical problems.
Promising to be entertaining and enlightening, this webinar will emphasize the interplay of mathematical models, economic problems, and R.
A review of two decades of correlations, hierarchies, networks and clustering...Gautier Marti
Opinionated review of two decades of correlations, hierarchies,
networks and clustering in financial markets presented at Ton Duc Thang University in Ho Chi Minh City, Vietnam.
You may have already read many times that the job of a Data Scientist is to skim through a huge amount of data searching for correlations between some variables of interest. And also, that one of his worst enemies (besides correlation doesn't imply causation) is spurious correlation. But what really is correlation? Are there several types of correlations? Some "good", some "bad"? What about their estimation? This talk will be a very visual presentation around the notion of correlation and dependence. I will first illustrate how the standard linear correlation is estimated (Pearson coefficient), then some more robust alternative: the Spearman coefficient. Building on the geometric understanding of their nature, I will present a generalization that can help Data Scientists to explore, interpret, and measure the dependence (not necessarily linear or comonotonic) between the variables of a given dataset. Financial time series (stocks, credit default swaps, fx rates), and features from the UCI datasets are considered as use cases.
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesGautier Marti
In this talk, we present a CNN architecture for predicting autoregressive asynchronous time series. We illustrate its application on predicting traders’ quotes of credit default swaps (proprietary dataset from Hellebore Capital), and on artificial time series. The paper is available there: http://proceedings.mlr.press/v80/binkowski18a/binkowski18a.pdf
Using Vector Clocks to Visualize Communication FlowMartin Harrigan
The document describes a methodology for visualizing communication flow using vector clocks. It begins with an introduction discussing communication flow as a metaphor and the need to ground visualizations in the analyzed substance. It then outlines the methodology which involves (1) assigning each vertex a vector clock representing its position in a high-dimensional space, (2) updating vector clocks based on communications, (3) computing distances between vector clocks, (4) constructing a dissimilarity matrix, and (5) using multidimensional scaling to produce a 2D visualization. Experiments applying the methodology to artificial and real communication datasets are discussed. The conclusion covers future work opportunities around the distance metric, modeling additional properties, and scalability issues.
Approximate Bayesian computation (ABC) is a computational technique for Bayesian inference when the likelihood function is intractable or impossible to compute directly. ABC approximates the likelihood by simulating data under different parameter values and comparing simulated and observed data using summary statistics. ABC produces a parameter sample without evaluating the full likelihood function, thus allowing Bayesian inference when likelihoods are unavailable or difficult to compute.
This document discusses computational issues that arise in Bayesian statistics. It provides examples of latent variable models like mixture models that make computation difficult due to the large number of terms that must be calculated. It also discusses time series models like the AR(p) and MA(q) models, noting that they have complex parameter spaces due to stationarity constraints. The document outlines the Metropolis-Hastings algorithm, Gibbs sampler, and other methods like Population Monte Carlo and Approximate Bayesian Computation that can help address these computational challenges.
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...SYRTO Project
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time Series Models. Andre Lucas. Amsterdam - June, 25 2015. European Financial Management Association 2015 Annual Meetings.
Network and risk spillovers: a multivariate GARCH perspectiveSYRTO Project
M. Billio, M. Caporin, L. Frattarolo, L. Pelizzon: “Network and risk spillovers: a multivariate GARCH perspective”.
Final SYRTO Conference - Université Paris1 Panthéon-Sorbonne
February 19, 2016
Numerical smoothing and hierarchical approximations for efficient option pric...Chiheb Ben Hammouda
1. The document presents a numerical smoothing technique to improve the efficiency of option pricing and density estimation when analytic smoothing is not possible.
2. The technique involves numerically determining discontinuities in the integrand and computing the integral only over the smooth regions. It also uses hierarchical representations and Brownian bridges to reduce the effective dimension of the problem.
3. The numerical smoothing approach outperforms Monte Carlo methods for high dimensional cases and improves the complexity of multilevel Monte Carlo from O(TOL^-2.5) to O(TOL^-2 log(TOL)^2).
The document discusses Approximate Bayesian Computation (ABC), a computational technique for Bayesian inference when the likelihood function is intractable. ABC allows sampling from the likelihood and making inferences based on simulated data without calculating the actual likelihood. The technique originated in population genetics models where likelihoods for genetic polymorphism data cannot be calculated in closed form. ABC is presented as both an inference machine with its own legitimacy compared to classical Bayesian approaches, as well as a way to address computational issues with intractable likelihoods.
Scalable inference for a full multivariate stochastic volatilitySYRTO Project
Scalable inference for a full multivariate stochastic volatility
P. Dellaportas, A. Plataniotis and M. Titsias UCL(London), AUEB(Athens), AUEB(Athens)
Final SYRTO Conference - Université Paris1 Panthéon-Sorbonne
February 19, 2016
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
This document discusses using a maximum entropy approach to model loss distributions for operational risk modeling. It begins by motivating the need to accurately model loss distributions given challenges with limited datasets, including heavy tails and dependence between risks. It then provides an overview of the loss distribution approach commonly used in operational risk modeling and its limitations. The document introduces the maximum entropy approach, which frames the problem as maximizing entropy subject to moment constraints. It discusses using the Laplace transform of loss distributions to compress information into moments and how these can be estimated from sample data or fitted distributions to serve as constraints for the maximum entropy approach.
11.the comparative study of finite difference method and monte carlo method f...Alexander Decker
This document compares the finite difference method and Monte Carlo method for pricing European options. It provides an overview of these two primary numerical methods used in financial modeling. The Monte Carlo method simulates asset price paths and averages discounted payoffs to estimate option value. It is well-suited for path-dependent options but converges slower than finite difference. The finite difference method solves the Black-Scholes PDE by approximating it on a grid. Specifically, it discusses the Crank-Nicolson scheme, which is unconditionally stable and converges faster than Monte Carlo for standard options.
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...Mokhtar SELLAMI
This document presents a study comparing Long Short-Term Memory (LSTM) architectures for next frame forecasting in satellite image time series data. Three models - ConvLSTM, Stack-LSTM and CNN-LSTM - were implemented and evaluated based on training loss, time and structural similarity between predicted and actual images. The CNN-LSTM architecture was found to provide the best performance, achieving accurate predictions while requiring less processing time than ConvLSTM for higher resolution images. Overall, the study demonstrates the suitability of deep learning models like CNN-LSTM for predictive tasks using earth observation satellite imagery time series data.
This document discusses triangular tree-width, a new measure for directed graphs that is related to efficiently computing matrix permanents. Triangular tree-width is defined based on the tree-widths of the increasing and decreasing edge subgraphs of a directed graph with respect to a particular vertex ordering. The goal is that bounding the triangular tree-width of a matrix's incidence graph would allow its permanent to be computed efficiently, whereas the standard tree-width may be unbounded. The document outlines the basics of tree-width and motivates the definition of triangular tree-width.
This document provides an introduction to advanced Markov chain Monte Carlo (MCMC) methods. It begins with a motivating example using mixture models that have latent variables, making the likelihood intractable. This introduces challenges for Bayesian computation. The document then describes the Metropolis-Hastings algorithm, which allows generating samples from a target distribution using an ergodic Markov chain, even when direct sampling is impossible. Several extensions and properties of the Metropolis-Hastings algorithm are discussed.
Chapter 2. Multivariate Analysis of Stationary Time SeriesChengjun Wang
This document discusses multivariate time series analysis and vector autoregressive (VAR) models. It covers simulation and estimation of VAR models in R, as well as diagnostic testing, forecasting, impulse response analysis, and structural VAR (SVAR) models which impose restrictions to identify structural shocks. Methods for SVAR models include the A-model which restricts the A matrix and the B-model which restricts the B matrix. Impulse response functions and forecast error variance decompositions are used to analyze the dynamic effects of shocks in SVAR models.
Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Fi...Revolution Analytics
Dr. Sanjiv Das has held positions as at Citibank, Harvard University Professor and Program Director at the FDIC’s Center for Financial Research. His research relies heavily on R for analysis and decision-making. In this webinar, Dr. Das will present a mix of some of his more current and topical research that uses R-based models, and some pedagogical applications of R. He will present:
* An R-based model for optimizing loan modifications on distressed home loans, and the economics of these modifications.
* A goal-based portfolio optimization model for investors who use derivatives.
*Using network modeling tools in R to detect systemically risky financial institutions.
*Using R for web delivery of financial models and random generation of pedagogical problems.
Promising to be entertaining and enlightening, this webinar will emphasize the interplay of mathematical models, economic problems, and R.
Cormac Ferrick Sociology 204 Final PresentationMac Ferrick
The Italian Market in South Philadelphia is a bustling area with a long history as a shopping destination for immigrants. It began in the 1880s when an Italian immigrant opened a boarding house and shops to serve the local Italian community. Through the early 20th century, the Market grew and was dominated by Italian business owners and shoppers. While many Italians moved to the suburbs after World War II, the Market still attracted customers. More recently, the Mexican population has increased in the area, with many Mexican-owned businesses and residents relying on the Market as a community gathering place, as immigrants have for over a century.
Security intelligence involves analyzing all available security data sources in an organization to generate actionable information. It is essential due to increasingly sophisticated attacks, disappearing network perimeters, and security teams facing high volumes of data with limited resources. IBM's QRadar security intelligence platform provides automation, integration, and intelligence to help organizations optimize security through advanced threat detection, compliance, and eliminating data silos. It uses embedded intelligence to identify true security incidents from massive amounts of data through automated collection, analysis, and reduction. Virtual appliance models are available in different capacities to suit organizations' needs.
A sale organized by the MFR de Fyé in collaboration with the social center of Oisseau-le-Petit will take place on Friday May 29th and Saturday May 30th 2015, with items for sale upon arrival on Friday and remaining items and money from sales collected on Saturday.
This document provides details about the HR master data, personnel actions, organizational structure, payroll structure, and time management configuration for an enterprise. It describes the key HR objects like company codes, PAs, PSAs, cost centers, employee groups, and payroll areas. It also explains various personnel actions like hiring, transfer, promotion etc. and how they are configured. Finally, it discusses the time management schema, attendance, overtime calculation, and various time management related objects and operations.
The document summarizes information about PDQ, a fast casual chicken restaurant chain. It details that PDQ was awarded the number 1 fastest growing small chain in America in 2014. It provides information about site requirements including a minimum size of 0.8-1.2 acres, parking and traffic counts. Demographic data within a 5 and 10 minute drive time is given. Minimum utility requirements for sanitary, water, electrical, telecom, and gas service are outlined. Current and planned growth including 40 open locations and 15 more by the end of 2015 are mentioned.
This document summarizes a presentation about translating the OpenStack project. It discusses how someone can become an OpenStack translator by helping users in their native language. It also outlines the tools and processes used for translation, including Transifex, mailing lists, and integrating translations with the development workflow. The presentation concludes by highlighting some of the challenges of translation, such as handling punctuation, plurals, declension, and ensuring design compatibility across languages. Attendees are encouraged to get involved with OpenStack translation.
The document summarizes research on targeting the "Grey Gold" consumer segment, those who are retired or nearing retirement. It finds that while this segment is growing in size and spending power, most advertisers do not directly target it. Reasons include misconceptions about the segment and the young average age of advertising industry employees. Industries seeing growth targeting Grey Gold include technology, travel, housing, food, glasses/aids, and health. The document explores views from advertisers, agencies, industry voices, and research on how to better market to Grey Gold consumers. It proposes potential solutions for TV4, such as launching a new TV channel or changing its pricing model to charge for ads targeting this valuable segment.
Security intelligence involves analyzing all available security data sources in an organization to generate actionable information. It is essential due to increasingly sophisticated attacks, disappearing network perimeters, and security teams facing high volumes of data with limited resources. IBM's QRadar security intelligence platform provides automation, integration, and intelligence to help organizations optimize security through advanced threat detection, compliance, and eliminating data silos. It uses embedded intelligence to identify true security incidents from massive amounts of data through automated collection, analysis, and reduction. Virtual appliances are available in different models and capacities to support SMBs and enterprises.
This document provides information about a computational stochastic processes course, including lecture details, prerequisites, syllabus, and examples. The key points are:
- Lectures will cover Monte Carlo simulation, stochastic differential equations, Markov chain Monte Carlo methods, and inference for stochastic processes.
- Prerequisites include probability, stochastic processes, and programming.
- Assessments will include a coursework and exam. The coursework will involve computational problems in Python, Julia, R, or similar languages.
- Motivating examples discussed include using Monte Carlo methods to evaluate high-dimensional integrals and simulating Langevin dynamics in statistical physics.
Dependent processes in Bayesian NonparametricsJulyan Arbel
This document summarizes dependent processes in Bayesian nonparametrics. It motivates the need for dependent random probability measures to accommodate temporal dependence structures beyond the exchangeability assumption. It describes modeling collections of random probability measures indexed by time as either discrete-time or continuous-time processes. The diffusive Dirichlet process is introduced as a dependent Dirichlet process with Dirichlet marginal distributions at each time point and continuous sample paths. Simulation and estimation methods are discussed for this model.
Cointegration and Long-Horizon Forecastingمحمد إسماعيل
This document summarizes research on comparing the accuracy of long-horizon forecasts from multivariate cointegrated systems versus univariate models that ignore cointegration. The main findings are:
1) When accuracy is measured using standard trace mean squared error, imposing cointegration provides no benefit over univariate models at long horizons.
2) Both multivariate and univariate long-horizon forecasts satisfy the cointegrating relationships exactly.
3) The cointegrating combinations of forecast errors from both approaches have finite variance at long horizons.
This document discusses the use of machine learning techniques in actuarial science and insurance. It begins with an overview of predictive modeling applications in insurance such as fraud detection, premium computation, and claims reserving. It then covers traditional econometric techniques like Poisson and gamma regression models and how machine learning is emerging as an alternative. The document emphasizes evaluating model goodness of fit and uncertainty, and addresses issues like price discrimination and fairness.
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
An important, and well studied, class of stochastic models is given by stochastic differential equations (SDEs). In this talk, we consider Bayesian inference based on measurements from several individuals, to provide inference at the "population level" using mixed-effects modelling. We consider the case where dynamics are expressed via SDEs or other stochastic (Markovian) models. Stochastic differential equation mixed-effects models (SDEMEMs) are flexible hierarchical models that account for (i) the intrinsic random variability in the latent states dynamics, as well as (ii) the variability between individuals, and also (iii) account for measurement error. This flexibility gives rise to methodological and computational difficulties.
Fully Bayesian inference for nonlinear SDEMEMs is complicated by the typical intractability of the observed data likelihood which motivates the use of sampling-based approaches such as Markov chain Monte Carlo. A Gibbs sampler is proposed to target the marginal posterior of all parameters of interest. The algorithm is made computationally efficient through careful use of blocking strategies, particle filters (sequential Monte Carlo) and correlated pseudo-marginal approaches. The resulting methodology is is flexible, general and is able to deal with a large class of nonlinear SDEMEMs [1]. In a more recent work [2], we also explored ways to make inference even more scalable to an increasing number of individuals, while also dealing with state-space models driven by other stochastic dynamic models than SDEs, eg Markov jump processes and nonlinear solvers typically used in systems biology.
[1] S. Wiqvist, A. Golightly, AT McLean, U. Picchini (2020). Efficient inference for stochastic differential mixed-effects models using correlated particle pseudo-marginal algorithms, CSDA, https://doi.org/10.1016/j.csda.2020.107151
[2] S. Persson, N. Welkenhuysen, S. Shashkova, S. Wiqvist, P. Reith, G. W. Schmidt, U. Picchini, M. Cvijovic (2021). PEPSDI: Scalable and flexible inference framework for stochastic dynamic single-cell models, bioRxiv doi:10.1101/2021.07.01.450748.
This document discusses using extreme value theory and Bayesian analysis to reassess hurricane risk in Puerto Rico after Hurricane Maria. It analyzes rainfall data from San Juan to estimate return levels for extreme rainfall events using maximum likelihood estimation and Bayesian modeling. The Bayesian analysis results in slightly more precise predictions of extreme rainfall amounts compared to the maximum likelihood estimates. Hurricane Maria dropped over 36 inches of rain in some areas of Puerto Rico in September 2017, the highest rainfall amount ever recorded from a hurricane in Puerto Rico.
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...Chiheb Ben Hammouda
My talk at the "Stochastic Numerics and Statistical Learning: Theory and Applications" Workshop at KAUST (King Abdullah University of Science and Technology), May 23, 2022, about my recent works "Numerical Smoothing with Hierarchical Adaptive Sparse Grids and Quasi-Monte Carlo Methods for Efficient Option Pricing" and "Multilevel Monte Carlo combined with numerical smoothing for robust and efficient option pricing and density estimation".
MSL 5080, Methods of Analysis for Business Operations 1 .docxmadlynplamondon
MSL 5080, Methods of Analysis for Business Operations 1
Course Learning Outcomes for Unit III
Upon completion of this unit, students should be able to:
2. Distinguish between the approaches to determining probability.
3. Contrast the major differences between the normal distribution and the exponential and Poisson
distributions.
Reading Assignment
Chapter 2: Probability Concepts and Applications, pp. 32–48
Unit Lesson
Mathematical truths provide us several useful means to estimate what will happen based on factors that are
given or researched. After becoming familiar with the idea of probability, one can see how mathematics make
applications in government and business possible.
Probability Distributions
To look at probability distributions, one should define a random variable as an unknown that could be any
real number, including decimals or fractions. Many problems in life have real numbers of any value of a whole
number and fraction or decimal as the value of the random variable amount. Discrete random variables will
have a certain limited range of values, and continuous random variables may have an infinite range of
possible values. These continuous random variables could be any value at all (Render, Stair, Hanna, &
Hale, 2015).
One true tendency is that events that occur in a group of trials tend to cluster around a middle point of values
as the most occurring, or highest probabilities they will occur. They then taper off to one or both sides as there
are lower probabilities that the events will be very low from the middle (or zero) and very high from the middle.
This middle point is called the mean or expected value E(X):
n
E(X) = ∑ Xi P(Xi)
i=1
Where Xi is the random variable value, and the summation sign ∑ with n and i=1 means you are adding all n
possible values (Render et al., 2015).
The sum of these events can be shown as graphs. If the random variable has a discrete probability
distribution (e.g., cans of paint that can be sold in a day), then the graph of events may look like this:
UNIT III STUDY GUIDE
Binomial and Normal Distributions
MSL 5080, Methods of Analysis for Business Operations 2
UNIT x STUDY GUIDE
Title
The bar heights show the probability for any X (or, P(X) ) along the y-axis, given the discrete number for X
along the x-axis and no fractions for discrete variables (no half-cans of paint).
The variance (σ2) is the spread of the distribution of events in a probability distribution (Render et al., 2015).
The variance is interesting because a small variance may indicate that the event value will most likely be near
the mean most of the time, and a large variance may show that the mean is not all that reliable a guide of
what the event values will be, as the sp.
A Monte Carlo strategy for structure multiple-step-head time series predictionGianluca Bontempi
The document proposes a Monte Carlo approach called SMC (Structured Monte Carlo) for multiple-step-ahead time series forecasting that takes into account the structural dependencies between predictions. It generates samples using a direct forecasting approach and weights them based on how well they satisfy dependencies identified by an iterated approach. Experiments on three benchmark datasets show the SMC approach achieves more accurate forecasts as measured by SMAPE than iterated, direct, or other comparison methods for most prediction horizons tested.
The document discusses distributed online convex optimization algorithms for coordinating multiple agents. It presents a coordination algorithm where each agent performs proportional-integral feedback to minimize local objectives while sharing information with neighbors over noisy communication channels. The algorithm is proven to achieve exponential convergence of second moments to the optimal solution and an ultimate bound on the error that depends on the noise level. Simulation results on a medical diagnosis example are also presented to illustrate the algorithm's behavior.
Improving on daily measures of price discoveryFGV Brazil
We formulate a continuous-time price discovery model in which the price discovery measure varies (stochastically) at daily frequency. We estimate daily measures of price discovery using a kernel-based OLS estimator instead of running separate daily VECM regressions as standard in the literature. We show that our estimator is not only consistent, but also outperforms the standard daily VECM in finite samples. We illustrate our theoretical findings by studying the price discovery process of 10 actively traded stocks in the U.S. from 2007 to 2013.
Date: 2017-03
Authors:
Dias, Gustavo Fruet
Fernandes, Marcelo
Scherrer, Cristina Mabel
This document provides motivation for using a circular kernel density estimator for nonparametric density estimation of circular data. It describes how a simple approximation theory from linear kernel estimation can be adapted to the circular case by replacing the kernel with a sequence of periodic densities on [-π,π] that converge to a degenerate distribution at θ=0. It shows that the wrapped Cauchy density satisfies the conditions to serve as such a kernel, resulting in the circular kernel density estimator proposed in equation 1.12. This estimator is shown to converge uniformly to the true density f(θ) as the sample size increases, providing theoretical justification for its use in smooth nonparametric density estimation for circular variables.
Non-linear optimization applications in finance including volatility estimation with ARCH and GARCH models, line search methods, Newton's method, steepest descent method, golden section search method, and conjugate gradient method.
Multiple estimators for Monte Carlo approximationsChristian Robert
This document discusses multiple estimators that can be used to approximate integrals using Monte Carlo simulations. It begins by introducing concepts like multiple importance sampling, Rao-Blackwellisation, and delayed acceptance that allow combining multiple estimators to improve accuracy. It then discusses approaches like mixtures as proposals, global adaptation, and nonparametric maximum likelihood estimation (NPMLE) that frame Monte Carlo estimation as a statistical estimation problem. The document notes various advantages of the statistical formulation, like the ability to directly estimate simulation error from the Fisher information. Overall, the document presents an overview of different techniques for combining Monte Carlo simulations to obtain more accurate integral approximations.
"Correlated Volatility Shocks" by Dr. Xiao Qiao, Researcher at SummerHaven In...Quantopian
Commonality in idiosyncratic volatility cannot be completely explained by time-varying volatility. After removing the effects of time-varying volatility, idiosyncratic volatility innovations are still positively correlated. This result suggests correlated volatility shocks contribute to the comovement in idiosyncratic volatility.
Motivated by this fact, we propose the Dynamic Factor Correlation (DFC) model, which fits the data well and captures the cross-sectional correlations in idiosyncratic volatility innovations. We decompose the common factor in idiosyncratic volatility (CIV) of Herskovic et al. (2016) into the volatility innovation factor (VIN) and time-varying volatility factor (TVV). Whereas VIN is associated with strong variation in average returns, TVV is only weakly priced in the cross section
A strategy that takes a long position in the portfolio with the lowest VIN and TVV betas, and a short position in the portfolio with the highest VIN and TVV betas earns average returns of 8.0% per year.
My talk entitled "Numerical Smoothing and Hierarchical Approximations for Efficient Option Pricing and Density Estimation", that I gave at the "International Conference on Computational Finance (ICCF)", Wuppertal June 6-10, 2022. The talk is related to our recent works "Numerical Smoothing with Hierarchical Adaptive Sparse Grids and Quasi-Monte Carlo Methods for Efficient Option Pricing" (link: https://arxiv.org/abs/2111.01874) and "Multilevel Monte Carlo combined with numerical smoothing for robust and efficient option pricing and density estimation" (link: https://arxiv.org/abs/2003.05708). In these two works, we introduce the numerical smoothing technique that improves the regularity of observables when approximating expectations (or the related integration problems). We provide a smoothness analysis and we show how this technique leads to better performance for the different methods that we used (i) adaptive sparse grids, (ii) Quasi-Monte Carlo, and (iii) multilevel Monte Carlo. Our applications are option pricing and density estimation. Our approach is generic and can be applied to solve a broad class of problems, particularly for approximating distribution functions, financial Greeks computation, and risk estimation.
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksValentin De Bortoli
The document discusses quantitative analysis of stochastic gradient descent (SGD) for training wide neural networks. It presents two different regimes - a deterministic regime where the limiting dynamics is described by an ordinary differential equation, and a stochastic regime where the limiting dynamics is a stochastic differential equation. Experiments on MNIST classification show that the stochastic regime with larger step sizes exhibits better regularization properties. The analysis provides insights into the behavior of neural network training as the number of neurons becomes large.
There is now a huge literature on Bayesian methods for variable selection that use spike-and-slab priors. Such methods, in particular, have been quite successful for applications in a variety of different fields. High-throughput genomics and neuroimaging are two of such examples. There, novel methodological questions are being generated, requiring the integration of different concepts, methods, tools and data types. These have in particular motivated the development of variable selection priors that go beyond the independence assumptions of a simple Bernoulli prior on the variable inclusion indicators. In this talk I will describe various prior constructions that incorporate information about structural dependencies among the variables. I will also address extensions of the models to the analysis of count data. I will motivate the development of the models using specific applications from neuroimaging and from studies that use microbiome data.
Similar to On clustering financial time series - A need for distances between dependent random variables (20)
Using Large Language Models in 10 Lines of CodeGautier Marti
Modern NLP models can be daunting: No more bag-of-words but complex neural network architectures, with billions of parameters. Engineers, financial analysts, entrepreneurs, and mere tinkerers, fear not! You can get started with as little as 10 lines of code.
Presentation prepared for the Abu Dhabi Machine Learning Meetup Season 3 Episode 3 hosted at ADGM in Abu Dhabi.
... two decades of correlation, hierarchies, networks and clustering in financial markets
Summary of some of my past research work at Complex Networks 2022.
The study of correlations, hierarchies, networks and communities (or clustering) has more than 20 years of history in econophysics.
However, for the practitioner, it seems that these tools are not fully ready yet:
Many questions around their proper use for trading or risk monitoring are left unanswered.
Deep Learning might help solve some hard problems such as finding more reliably communities (or clusters) and their number.
Running large simulations (based on GANs, VAEs or realistic market simulators) could also help understand when complex networks methods can give wrong insights (e.g. not enough data, or not stationary enough; too low correlations).
Conference: Complex Networks 2022 in Palermo, Sicily, Italy.
A quick demo of Top2Vec With application on 2020 10-K business descriptionsGautier Marti
A short presentation I did at the Hong Kong Machine Learning Meetup Season 4 Episode 4. Top2Vec is a novel method to find topics in a corpus of documents. It can automatically find a relevant number of topics in the corpus. Besides, you get also relevant word and document vectors for further processing.
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...Gautier Marti
A Generative Adversarial Networks model to generate realistic correlation matrices. In these slides, we discuss a use case in quantitative finance (comparison of risk-based portfolio allocation methods), and how to improve the seminal model with information geometry (Riemannian neural networks suited for correlation matrices). There are many use cases to explore within, and outside, quantitative finance. The Riemannian geometry of correlation matrices is still under-developed.
We highlight exciting problems at the intersection of Riemannian geometry and deep learning.
How deep generative models can help quants reduce the risk of overfitting?Gautier Marti
How deep generative models can help quants reduce
the risk of overfitting? Applications of GANs for Quants.
Presentation at the "QuantUniversity Autumn School 2020".
Generating Realistic Synthetic Data in FinanceGautier Marti
Talk at IHS Markit Webinar (15 October 2020) on the potential Applications of GANs in Finance. These models could be useful for quants and their managers to avoid over-fitting, portfolio and risk managers for proper capital and risk allocation, cloud computing servicing willing to work with banks and other sensitive data rich organizations, auditors and regulators to detect frauds, and data vendors (such as IHS Markit) to bring new products to market and iterate quickly with clients.
This presentation highlights potential use cases of deep generative models, and Generative Adversarial Networks (GANs) in particular, in Finance. Essentially, these models are useful to generate realistic synthetic datasets. Quantitative Strategists, Traders, Asset and Risk Managers can find these novel techniques useful. Auditors and Regulators should also become aware of their existence as they may be source of new accounting frauds and misleading financial statements (deepfakes).
My recent attempts at using GANs for simulating realistic stocks returnsGautier Marti
A presentation for the Hong Kong Machine Learning meetup summarizing my hobby research over the past year. My goal is to be able to simulate realistic multivariate financial time series. If so, I will be able to compare different statistical methods for portfolio construction, studying complex networks, algorithmic trading, being able to do some reinforcement learning, etc. Still far from being achieved...
Takeaways from ICML 2019, Long Beach, CaliforniaGautier Marti
The document summarizes takeaways from various talks and presentations at the ICML 2019 conference. It discusses topics like safe machine learning and biases in algorithms, active learning techniques, attention mechanisms in deep learning, differential privacy in census data, time series forecasting methods, Hawkes processes, Shapley values for explainability and data valuation, topological data analysis, optimal transport, applications of machine learning in robotics, Gaussian processes, learning from noisy labels, interpretability methods in NLP, and the GluonTS library for probabilistic time series modeling.
On Clustering Financial Time Series - Beyond CorrelationGautier Marti
This document discusses clustering financial time series data using correlation matrices. It summarizes that analyzing 560 credit default swaps over 2500 days, the empirical correlation matrix eigenvalues closely match the theoretical Marchenko-Pastur distribution, indicating noise. Only 26 eigenvalues exceed the theoretical maximum, which may correspond to market and industry factors. Hierarchical clustering can reorder assets to reveal correlation patterns. Filtering by this reveals the underlying network structure. Beyond correlations, copulas represent the dependence structure, and a distance measure is proposed combining L1 and L0 distances of cumulative distribution functions to cluster on full distributions rather than just correlations. Stability tests show the proposed approach yields more robust clusters than standard correlation-based methods.
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Sérgio Sacani
Magmatic iron-meteorite parent bodies are the earliest planetesimals in the Solar System,and they preserve information about conditions and planet-forming processes in thesolar nebula. In this study, we include comprehensive elemental compositions andfractional-crystallization modeling for iron meteorites from the cores of five differenti-ated asteroids from the inner Solar System. Together with previous results of metalliccores from the outer Solar System, we conclude that asteroidal cores from the outerSolar System have smaller sizes, elevated siderophile-element abundances, and simplercrystallization processes than those from the inner Solar System. These differences arerelated to the formation locations of the parent asteroids because the solar protoplane-tary disk varied in redox conditions, elemental distributions, and dynamics at differentheliocentric distances. Using highly siderophile-element data from iron meteorites, wereconstruct the distribution of calcium-aluminum-rich inclusions (CAIs) across theprotoplanetary disk within the first million years of Solar-System history. CAIs, the firstsolids to condense in the Solar System, formed close to the Sun. They were, however,concentrated within the outer disk and depleted within the inner disk. Future modelsof the structure and evolution of the protoplanetary disk should account for this dis-tribution pattern of CAIs.
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Sérgio Sacani
Wereport the study of a huge optical intraday flare on 2021 November 12 at 2 a.m. UT in the blazar OJ287. In the binary black hole model, it is associated with an impact of the secondary black hole on the accretion disk of the primary. Our multifrequency observing campaign was set up to search for such a signature of the impact based on a prediction made 8 yr earlier. The first I-band results of the flare have already been reported by Kishore et al. (2024). Here we combine these data with our monitoring in the R-band. There is a big change in the R–I spectral index by 1.0 ±0.1 between the normal background and the flare, suggesting a new component of radiation. The polarization variation during the rise of the flare suggests the same. The limits on the source size place it most reasonably in the jet of the secondary BH. We then ask why we have not seen this phenomenon before. We show that OJ287 was never before observed with sufficient sensitivity on the night when the flare should have happened according to the binary model. We also study the probability that this flare is just an oversized example of intraday variability using the Krakow data set of intense monitoring between 2015 and 2023. We find that the occurrence of a flare of this size and rapidity is unlikely. In machine-readable Tables 1 and 2, we give the full orbit-linked historical light curve of OJ287 as well as the dense monitoring sample of Krakow.
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Anti-Universe And Emergent Gravity and the Dark UniverseSérgio Sacani
Recent theoretical progress indicates that spacetime and gravity emerge together from the entanglement structure of an underlying microscopic theory. These ideas are best understood in Anti-de Sitter space, where they rely on the area law for entanglement entropy. The extension to de Sitter space requires taking into account the entropy and temperature associated with the cosmological horizon. Using insights from string theory, black hole physics and quantum information theory we argue that the positive dark energy leads to a thermal volume law contribution to the entropy that overtakes the area law precisely at the cosmological horizon. Due to the competition between area and volume law entanglement the microscopic de Sitter states do not thermalise at sub-Hubble scales: they exhibit memory effects in the form of an entropy displacement caused by matter. The emergent laws of gravity contain an additional ‘dark’ gravitational force describing the ‘elastic’ response due to the entropy displacement. We derive an estimate of the strength of this extra force in terms of the baryonic mass, Newton’s constant and the Hubble acceleration scale a0 = cH0, and provide evidence for the fact that this additional ‘dark gravity force’ explains the observed phenomena in galaxies and clusters currently attributed to dark matter.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆Sérgio Sacani
Context. The early-type galaxy SDSS J133519.91+072807.4 (hereafter SDSS1335+0728), which had exhibited no prior optical variations during the preceding two decades, began showing significant nuclear variability in the Zwicky Transient Facility (ZTF) alert stream from December 2019 (as ZTF19acnskyy). This variability behaviour, coupled with the host-galaxy properties, suggests that SDSS1335+0728 hosts a ∼ 106M⊙ black hole (BH) that is currently in the process of ‘turning on’. Aims. We present a multi-wavelength photometric analysis and spectroscopic follow-up performed with the aim of better understanding the origin of the nuclear variations detected in SDSS1335+0728. Methods. We used archival photometry (from WISE, 2MASS, SDSS, GALEX, eROSITA) and spectroscopic data (from SDSS and LAMOST) to study the state of SDSS1335+0728 prior to December 2019, and new observations from Swift, SOAR/Goodman, VLT/X-shooter, and Keck/LRIS taken after its turn-on to characterise its current state. We analysed the variability of SDSS1335+0728 in the X-ray/UV/optical/mid-infrared range, modelled its spectral energy distribution prior to and after December 2019, and studied the evolution of its UV/optical spectra. Results. From our multi-wavelength photometric analysis, we find that: (a) since 2021, the UV flux (from Swift/UVOT observations) is four times brighter than the flux reported by GALEX in 2004; (b) since June 2022, the mid-infrared flux has risen more than two times, and the W1−W2 WISE colour has become redder; and (c) since February 2024, the source has begun showing X-ray emission. From our spectroscopic follow-up, we see that (i) the narrow emission line ratios are now consistent with a more energetic ionising continuum; (ii) broad emission lines are not detected; and (iii) the [OIII] line increased its flux ∼ 3.6 years after the first ZTF alert, which implies a relatively compact narrow-line-emitting region. Conclusions. We conclude that the variations observed in SDSS1335+0728 could be either explained by a ∼ 106M⊙ AGN that is just turning on or by an exotic tidal disruption event (TDE). If the former is true, SDSS1335+0728 is one of the strongest cases of an AGNobserved in the process of activating. If the latter were found to be the case, it would correspond to the longest and faintest TDE ever observed (or another class of still unknown nuclear transient). Future observations of SDSS1335+0728 are crucial to further understand its behaviour. Key words. galaxies: active– accretion, accretion discs– galaxies: individual: SDSS J133519.91+072807.4
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
On clustering financial time series - A need for distances between dependent random variables
1. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
On clustering financial time series
A need for distances between dependent random variables
Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat
24 September 2015
Gautier Marti, Frank Nielsen On clustering financial time series
2. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
1 Introduction
2 Dependence and Distribution
3 Toward an extension to the multivariate case
Gautier Marti, Frank Nielsen On clustering financial time series
3. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Motivations: Why clustering?
Motivations:
Mathematical finance: Use of variance-covariance matrices
(e.g., Markowitz, Value-at-Risk)
Stylized fact: Empirical
variance-covariance matrices
estimated on financial time
series are very noisy
(Random Matrix Theory,
Noise Dressing of Financial
Correlation Matrices, Laloux
et al, 1999)
Figure: Marchenko-Pastur
distribution vs. eigenvalues of the
empirical correlation matrix
How to filter these variance-covariance matrices?
Gautier Marti, Frank Nielsen On clustering financial time series
4. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Information filtering? Clustering!
Mantegna (1999) et al’s work:
Limits: focus on ρij (Pearson correlation) which is not robust to
outliers / heavy tails → could lead to spurious clusters
Gautier Marti, Frank Nielsen On clustering financial time series
5. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Modelling
Asset i variations or returns follow random variable Xi
Assets variations or returns are ”correlated”
i.i.d. observations:
X1 : X1
1 , X2
1 , . . . , XT
1
X2 : X1
2 , X2
2 , . . . , XT
2
. . . , . . . , . . . , . . . , . . .
XN : X1
N, X2
N, . . . , XT
N
Which distances d(Xi , Xj ) between dependent random variables?
Gautier Marti, Frank Nielsen On clustering financial time series
6. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
1 Introduction
2 Dependence and Distribution
3 Toward an extension to the multivariate case
Gautier Marti, Frank Nielsen On clustering financial time series
7. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Pitfalls of a basic distance
Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N(µX , σ2
X ),
Y ∼ N(µY , σ2
Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1].
E[(X − Y )2
] = (µX − µY )2
+ (σX − σY )2
+ 2σX σY (1 − ρ(X, Y ))
Now, consider the following values for correlation:
ρ(X, Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2
X + σ2
Y .
Assume µX = µY and σX = σY . For σX = σY 1, we
obtain E[(X − Y )2] 1 instead of the distance 0, expected
from comparing two equal Gaussians.
ρ(X, Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2.
Gautier Marti, Frank Nielsen On clustering financial time series
8. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Pitfalls of a basic distance
(Marti, Nielsen, Very, Donnat, ICMLA 2015)
Gautier Marti, Frank Nielsen On clustering financial time series
9. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
The Financial Engineer Bias: Correlation
correlation patterns are blatant
Mantegna et al. aim at filtering information from the
correlation matrix using clustering
O(N2) (correlation) vs. O(N) (distribution) parameters
Gautier Marti, Frank Nielsen On clustering financial time series
10. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Information Geometry and its statistical distances
original poster: http://www.sonycsl.co.jp/person/nielsen/FrankNielsen-distances-figs.pdf
Gautier Marti, Frank Nielsen On clustering financial time series
11. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Sklar’s Theorem and the Copula Transform
Theorem (Sklar’s Theorem (1959))
For any random vector X = (X1, . . . , XN) having continuous
marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is
uniquely expressed as
P(X1, . . . , XN) = C(P1(X1), . . . , PN(XN)),
where C, the multivariate distribution of uniform marginals, is
known as the copula of X.
Gautier Marti, Frank Nielsen On clustering financial time series
12. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Sklar’s Theorem and the Copula Transform
Definition (The Copula Transform)
Let X = (X1, . . . , XN) be a random vector with continuous
marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N.
The random vector
U = (U1, . . . , UN) := P(X) = (P1(X1), . . . , PN(XN))
is known as the copula transform.
Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probability
integral transform): for Pi the cdf of Xi , we have
x = Pi (Pi
−1
(x)) = Pr(Xi ≤ Pi
−1
(x)) = Pr(Pi (Xi ) ≤ x), thus
Pi (Xi ) ∼ U[0, 1].
Gautier Marti, Frank Nielsen On clustering financial time series
13. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Distance Design
d2
θ (Xi , Xj ) = θ3E |Pi (Xi ) − Pj (Xj )|2
+ (1 − θ)
1
2 R
dPi
dλ
−
dPj
dλ
2
dλ
Gautier Marti, Frank Nielsen On clustering financial time series
14. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Results: Data from Hierarchical Block Model
Adjusted Rand Index
Algo. Distance A B C
HC-AL
(1 − ρ)/2 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01
E[(X − Y )2
] 0.00 ±0.00 0.09 ±0.12 0.55 ±0.05
GPR θ = 0 0.34 ±0.01 0.01 ±0.01 0.06 ±0.02
GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01
GPR θ = .5 0.34 ±0.01 0.59 ±0.12 0.57 ±0.01
GNPR θ = 0 1 0.00 ±0.00 0.17 ±0.00
GNPR θ = 1 0.00 ±0.00 1 0.57 ±0.00
GNPR θ = .5 0.99 ±0.01 0.25 ±0.20 0.95 ±0.08
AP
(1 − ρ)/2 0.00 ±0.00 0.99 ±0.07 0.48 ±0.02
E[(X − Y )2
] 0.14 ±0.03 0.94 ±0.02 0.59 ±0.00
GPR θ = 0 0.25 ±0.08 0.01 ±0.01 0.05 ±0.02
GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.48 ±0.02
GPR θ = .5 0.06 ±0.00 0.80 ±0.10 0.52 ±0.02
GNPR θ = 0 1 0.00 ±0.00 0.18 ±0.01
GNPR θ = 1 0.00 ±0.01 1 0.59 ±0.00
GNPR θ = .5 0.39 ±0.02 0.39 ±0.11 1
Gautier Marti, Frank Nielsen On clustering financial time series
15. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Results: Data from CDS market
(Marti, Nielsen, Very, Donnat, ICMLA 2015)
Gautier Marti, Frank Nielsen On clustering financial time series
16. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Limits and questions
Why a convex combination? no a priori support from geometry
In practice:
no real control on the weight of correlation and on the weight
of distribution
stability methods are still prone to overfitting for selecting
parameters
θ actually depends on the convergence rate of the estimators:
correlation measures converge faster than distribution
estimation
Gautier Marti, Frank Nielsen On clustering financial time series
17. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
1 Introduction
2 Dependence and Distribution
3 Toward an extension to the multivariate case
Gautier Marti, Frank Nielsen On clustering financial time series
19. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Multivariate dependence
What is the state of the art on multivariate dependence?
multivariate mutual information: In information theory
there have been various attempts over the years to
extend the definition of mutual information to more than
two random variables. These attempts have met with a
great deal of confusion and a realization that interactions
among many random variables are poorly understood.
Gautier Marti, Frank Nielsen On clustering financial time series
20. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Optimal Copula Transport for intra-dependence
Dintra(X1, X2) := EMD(s1, s2),
EMD(s1, s2) := min
f
1≤i,j≤n
pi − qj fij
subject to fij ≥ 0, 1 ≤ i, j ≤ n,
n
j=1
fij ≤ wpi
, 1 ≤ i ≤ n,
n
i=1
fij ≤ wqj
, 1 ≤ j ≤ n,
n
i=1
n
j=1
fij = 1.
Gautier Marti, Frank Nielsen On clustering financial time series
21. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Optimal Copula Transport for inter-dependence
Gautier Marti, Frank Nielsen On clustering financial time series
22. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Limits and questions
does not scale well with even moderate dimensionality:
density estimation
computing cost
full parametric approach?
how to connect with the (copula,margins) representation?
information geometry?
(approximate) optimal transport?
kernel embedding of distributions?
contact: gautier.marti@helleborecapital.com
Gautier Marti, Frank Nielsen On clustering financial time series
23. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas
Popat.
NP-hardness of Euclidean sum-of-squares clustering.
Machine Learning, 75(2):245–248, 2009.
Luigi Ambrosio and Nicola Gigli.
A user’s guide to optimal transport.
In Modelling and optimisation of flows on networks, pages
1–155. Springer, 2013.
David Applegate, Tamraparni Dasu, Shankar Krishnan, and
Simon Urbanek.
Unsupervised clustering of multidimensional distributions using
earth mover distance.
In Proceedings of the 17th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages
636–644. ACM, 2011.
Gautier Marti, Frank Nielsen On clustering financial time series
24. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Shai Ben-David, Ulrike Von Luxburg, and D´avid P´al.
A sober look at clustering stability.
In Learning theory, pages 5–19. Springer, 2006.
Petro Borysov, Jan Hannig, and JS Marron.
Asymptotics of hierarchical clustering for growing dimension.
Journal of Multivariate Analysis, 124:465–479, 2014.
Leo Breiman and Jerome H Friedman.
Estimating optimal transformations for multiple regression and
correlation.
Journal of the American statistical Association, 80(391):
580–598, 1985.
Jo¨el Bun, Romain Allez, Jean-Philippe Bouchaud, and Marc
Potters.
Rotational invariant estimator for general noisy matrices.
arXiv preprint arXiv:1502.06736, 2015.
Gautier Marti, Frank Nielsen On clustering financial time series
25. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Gunnar Carlsson and Facundo M´emoli.
Characterization, stability and convergence of hierarchical
clustering methods.
The Journal of Machine Learning Research, 11:1425–1470,
2010.
Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum,
Anthony Bagnall, Abdullah Mueen, and Gustavo Batista.
The UCR time series classification archive, July 2015.
www.cs.ucr.edu/~eamonn/time_series_data/.
Tamraparni Dasu, Deborah F Swayne, and David Poole.
Grouping multivariate time series: A case study.
In Proceedings of the IEEE Workshop on Temporal Data
Mining: Algorithms, Theory and Applications, in conjunction
with the Conference on Data Mining, Houston, pages 25–32,
2005.
Paul Deheuvels.
Gautier Marti, Frank Nielsen On clustering financial time series
26. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
La fonction de d´ependance empirique et ses propri´et´es. un test
non param´etrique d’ind´ependance.
Acad. Roy. Belg. Bull. Cl. Sci.(5), 65(6):274–292, 1979.
Paul Deheuvels.
An asymptotic decomposition for multivariate distribution-free
tests of independence.
Journal of Multivariate Analysis, 11(1):102–113, 1981.
T Di Matteo, T Aste, ST Hyde, and S Ramsden.
Interest rates hierarchical structure.
Physica A: Statistical Mechanics and its Applications, 355(1):
21–33, 2005.
T Di Matteo, Francesca Pozzi, and Tomaso Aste.
The use of dynamical networks to detect the hierarchical
organization of financial market sectors.
The European Physical Journal B-Condensed Matter and
Complex Systems, 73(1):3–11, 2010.
Gautier Marti, Frank Nielsen On clustering financial time series
27. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Francis X Diebold and Canlin Li.
Forecasting the term structure of government bond yields.
Journal of econometrics, 130(2):337–364, 2006.
A Adam Ding and Yi Li.
Copula correlation: An equitable dependence measure and
extension of pearson’s correlation.
arXiv preprint arXiv:1312.7214, 2013.
Bradley Efron.
Bootstrap methods: another look at the jackknife.
The annals of Statistics, pages 1–26, 1979.
Gal Elidan.
Copulas in machine learning.
In Copulae in Mathematical and Quantitative Finance, pages
39–60. Springer, 2013.
Gautier Marti, Frank Nielsen On clustering financial time series
28. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Sira Ferradans, Nicolas Papadakis, Julien Rabin, Gabriel Peyr´e,
and Jean-Fran¸cois Aujol.
Regularized discrete optimal transport.
Springer, 2013.
Hans Gebelein.
Das statistische problem der korrelation als variations-und
eigenwertproblem und sein zusammenhang mit der
ausgleichsrechnung.
ZAMM-Journal of Applied Mathematics and
Mechanics/Zeitschrift f¨ur Angewandte Mathematik und
Mechanik, 21(6):364–379, 1941.
Cyril Goutte, Peter Toft, Egill Rostrup, Finn ˚A Nielsen, and
Lars Kai Hansen.
On clustering fMRI time series.
NeuroImage, 9(3):298–310, 1999.
Gautier Marti, Frank Nielsen On clustering financial time series
29. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Clive WJ Granger and Paul Newbold.
Spurious regressions in econometrics.
Journal of econometrics, 2(2):111–120, 1974.
Isabelle Guyon, Ulrike Von Luxburg, and Robert C Williamson.
Clustering: Science or art.
In NIPS 2009 Workshop on Clustering Theory, 2009.
Jiang Hangjin and Ding Yiming.
Equitability of dependence measure.
stat, 1050:9, 2015.
Keith Henderson, Brian Gallagher, and Tina Eliassi-Rad.
EP-MEANS: An efficient nonparametric clustering of empirical
probability distributions.
2015.
Weiming Hu, Tieniu Tan, Liang Wang, and Steve Maybank.
A survey on visual surveillance of object motion and behaviors.
Gautier Marti, Frank Nielsen On clustering financial time series
30. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Systems, Man, and Cybernetics, Part C: Applications and
Reviews, IEEE Transactions on, 34(3):334–352, 2004.
John C Hull.
Options, futures, and other derivatives.
Pearson Education, 2006.
Anil K Jain.
Data clustering: 50 years beyond k-means.
Pattern recognition letters, 31(8):651–666, 2010.
Konstantinos Kalpakis, Dhiral Gada, and Vasundhara
Puttagunta.
Distance measures for effective clustering of ARIMA
time-series.
In Data Mining, 2001. ICDM 2001, Proceedings IEEE
International Conference on, pages 273–280. IEEE, 2001.
M Kanevski, V Timonin, A Pozdnoukhov, and M Maignan.
Gautier Marti, Frank Nielsen On clustering financial time series
31. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Evolution of interest rate curve: empirical analysis of patterns
using nonlinear clustering tools.
In European Symposium on Time Series Prediction, 2008.
Leonid Vitalievich Kantorovich.
On the translocation of masses.
In Dokl. Akad. Nauk SSSR, volume 37, pages 199–201, 1942.
Justin B Kinney and Gurinder S Atwal.
Equitability, mutual information, and the maximal information
coefficient.
Proceedings of the National Academy of Sciences, 111(9):
3354–3359, 2014.
Jon M. Kleinberg.
An impossibility theorem for clustering.
In S. Thrun and K. Obermayer, editors, Advances in Neural
Information Processing Systems 15, pages 446–453. MIT
Press, Cambridge, MA, 2002.
Gautier Marti, Frank Nielsen On clustering financial time series
32. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
URL
http://books.nips.cc/papers/files/nips15/LT17.pdf.
Laurent Laloux, Pierre Cizeau, Marc Potters, and
Jean-Philippe Bouchaud.
Random matrix theory and financial correlations.
International Journal of Theoretical and Applied Finance, 3
(03):391–397, 2000.
Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong,
and Mark Flood.
Clustering techniques and their effect on portfolio formation
and risk analysis.
In Proceedings of the International Workshop on Data Science
for Macro-Modeling, pages 1–6. ACM, 2014.
Erel Levine and Eytan Domany.
Gautier Marti, Frank Nielsen On clustering financial time series
33. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Resampling method for unsupervised estimation of cluster
validity.
Neural computation, 13(11):2573–2593, 2001.
T Warren Liao.
Clustering of time series data—a survey.
Pattern recognition, 38(11):1857–1874, 2005.
Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu.
A symbolic representation of time series, with implications for
streaming algorithms.
In Proceedings of the 8th ACM SIGMOD workshop on
Research issues in data mining and knowledge discovery, pages
2–11. ACM, 2003.
Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios
Gunopulos.
Iterative incremental clustering of time series.
Gautier Marti, Frank Nielsen On clustering financial time series
34. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
In Advances in Database Technology-EDBT 2004, pages
106–122. Springer, 2004.
Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi.
Experiencing SAX: a novel symbolic representation of time
series.
Data Mining and knowledge discovery, 15(2):107–144, 2007.
David Lopez-Paz, Philipp Hennig, and Bernhard Sch¨olkopf.
The randomized dependence coefficient.
arXiv preprint arXiv:1304.7717, 2013.
Rosario N Mantegna.
Hierarchical structure in financial markets.
The European Physical Journal B-Condensed Matter and
Complex Systems, 11(1):193–197, 1999.
Martin Martens and Ser-Huang Poon.
Gautier Marti, Frank Nielsen On clustering financial time series
35. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Returns synchronization and daily correlation dynamics
between international stock markets.
Journal of Banking & Finance, 25(10):1805–1827, 2001.
Gautier Marti, Philippe Donnat, Frank Nielsen, and Philippe
Very.
HCMapper: An interactive visualization tool to compare
partition-based flat clustering extracted from pairs of
dendrograms.
arXiv preprint arXiv:1507.08137, 2015a.
Gautier Marti, Philippe Very, and Philippe Donnat.
Toward a generic representation of random variables for
machine learning.
arXiv preprint arXiv:1506.00976, 2015b.
Sergio Mayordomo, Juan Ignacio Pe˜na, and Eduardo S
Schwartz.
Gautier Marti, Frank Nielsen On clustering financial time series
36. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Are all credit default swap databases equal?
Technical report, National Bureau of Economic Research,
2010.
Sergio Mayordomo, Juan Ignacio Pe˜na, and Eduardo S
Schwartz.
Are all credit default swap databases equal?
European Financial Management, 20(4):677–713, 2014.
Gaspard Monge.
M´emoire sur la th´eorie des d´eblais et des remblais.
De l’Imprimerie Royale, 1781.
James Munkres.
Algorithms for the assignment and transportation problems.
Journal of the Society for Industrial and Applied Mathematics,
5(1):32–38, 1957.
Nicolo Musmeci, Tomaso Aste, and Tiziana Di Matteo.
Gautier Marti, Frank Nielsen On clustering financial time series
37. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Relation between financial market structure and the real
economy: Comparison between clustering methods.
Available at SSRN 2525291, 2014.
Nicol´o Musmeci, Tomaso Aste, and Tiziana Di Matteo.
Relation between financial market structure and the real
economy: comparison between clustering methods.
2015.
Roger B Nelsen.
An introduction to copulas, volume 139.
Springer Science & Business Media, 2013.
Dominic O’Kane.
Modelling single-name and multi-name credit derivatives,
volume 573.
John Wiley & Sons, 2011.
Barnab´as P´oczos, Zoubin Ghahramani, and Jeff Schneider.
Gautier Marti, Frank Nielsen On clustering financial time series
38. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Copula-based kernel dependency measures.
arXiv preprint arXiv:1206.4682, 2012.
David N Reshef, Yakir A Reshef, Hilary K Finucane, Sharon R
Grossman, Gilean McVean, Peter J Turnbaugh, Eric S Lander,
Michael Mitzenmacher, and Pardis C Sabeti.
Detecting novel associations in large data sets.
science, 334(6062):1518–1524, 2011.
David N Reshef, Yakir A Reshef, Pardis C Sabeti, and
Michael M Mitzenmacher.
An empirical study of leading measures of dependence.
arXiv preprint arXiv:1505.02214, 2015a.
Yakir A Reshef, David N Reshef, Hilary K Finucane, Pardis C
Sabeti, and Michael M Mitzenmacher.
Measuring dependence powerfully and equitably.
arXiv preprint arXiv:1505.02213, 2015b.
Gautier Marti, Frank Nielsen On clustering financial time series
39. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Yakir A Reshef, David N Reshef, Pardis C Sabeti, and
Michael M Mitzenmacher.
Equitability, interval estimation, and statistical power.
arXiv preprint arXiv:1505.02212, 2015c.
Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas.
The earth mover’s distance as a metric for image retrieval.
International journal of computer vision, 40(2):99–121, 2000.
Daniil Ryabko.
Clustering processes.
arXiv preprint arXiv:1004.5194, 2010.
Ohad Shamir and Naftali Tishby.
Cluster stability for finite samples.
In NIPS, 2007.
Robert H Shumway.
Time-frequency clustering and discriminant analysis.
Gautier Marti, Frank Nielsen On clustering financial time series
40. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Statistics & probability letters, 63(3):307–314, 2003.
Noah Simon and Robert Tibshirani.
Comment on”detecting novel associations in large data sets”
by reshef et al, science dec 16, 2011.
arXiv preprint arXiv:1401.7645, 2014.
Ashish Singhal and Dale E Seborg.
Clustering of multivariate time-series data.
Journal of Chemometrics, 19:427—-438, 2005.
A Sklar.
Fonctions de r´epartition `a n dimensions et leurs marges.
Universit´e Paris 8, 1959.
Won-Min Song, T Di Matteo, and Tomaso Aste.
Hierarchical information clustering by means of topologically
embedded graphs.
PLoS One, 7(3):e31929, 2012.
Gautier Marti, Frank Nielsen On clustering financial time series
41. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and
Philip S Yu.
Graphscope: parameter-free mining of large time-evolving
graphs.
In Proceedings of the 13th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages
687–696. ACM, 2007.
G´abor J Sz´ekely, Maria L Rizzo, Nail K Bakirov, et al.
Measuring and testing dependence by correlation of distances.
The Annals of Statistics, 35(6):2769–2794, 2007.
Chayant Tantipathananandh and Tanya Y Berger-Wolf.
Finding communities in dynamic social networks.
In Data Mining (ICDM), 2011 IEEE 11th International
Conference on, pages 1236–1241. IEEE, 2011.
Gautier Marti, Frank Nielsen On clustering financial time series
42. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario N
Mantegna.
Cluster analysis for portfolio optimization.
Journal of Economic Dynamics and Control, 32(1):235–258,
2008.
Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and
Rosario N Mantegna.
A tool for filtering information in complex systems.
Proceedings of the National Academy of Sciences of the
United States of America, 102(30):10421–10426, 2005.
Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna.
Correlation, hierarchies, and networks in financial markets.
Journal of Economic Behavior & Organization, 75(1):40–58,
2010.
C´edric Villani.
Gautier Marti, Frank Nielsen On clustering financial time series
43. Introduction
Dependence and Distribution
Toward an extension to the multivariate case
Optimal transport: old and new, volume 338.
Springer Science & Business Media, 2008.
Kiyoung Yang and Cyrus Shahabi.
A pca-based similarity measure for multivariate time series.
In Proceedings of the 2nd ACM international workshop on
Multimedia databases, pages 65–74. ACM, 2004.
Kiyoung Yang and Cyrus Shahabi.
On the stationarity of multivariate time series for
correlation-based data analysis.
In Data Mining, Fifth IEEE International Conference on, pages
4–pp. IEEE, 2005.
Gautier Marti, Frank Nielsen On clustering financial time series