- The document discusses different geometric approaches for clustering multinomial distributions, including total variation distance, Fisher-Rao distance, Kullback-Leibler divergence, and Hilbert cross-ratio metric.
- It benchmarks k-means clustering using these four geometries on the probability simplex, finding that Hilbert geometry clustering yields good performance with theoretical guarantees.
- The Hilbert cross-ratio metric defines a non-Riemannian Hilbert geometry on the simplex with polytopal balls, and satisfies information monotonicity properties desirable for clustering distributions.
This document discusses different geometric structures and distances that can be used for clustering probability distributions that live on the probability simplex. It reviews four main geometries: Fisher-Rao Riemannian geometry based on the Fisher information metric, information geometry based on Kullback-Leibler divergence, total variation distance and l1-norm geometry, and Hilbert projective geometry based on the Hilbert metric. It compares how k-means clustering performs using distances derived from these different geometries on the probability simplex.
An elementary introduction to information geometryFrank Nielsen
This document provides an elementary introduction to information geometry. It discusses how information geometry generalizes concepts from Riemannian geometry to study the geometry of decision making and model fitting. Specifically, it introduces:
1. Dually coupled connections (∇, ∇*) that are compatible with a metric tensor g and define dual parallel transport on a manifold.
2. The fundamental theorem of information geometry, which states that manifolds with dually coupled connections (∇, ∇*) have the same constant curvature.
3. Examples of statistical manifolds with dually flat geometry that arise from Bregman divergences and f-divergences, making them useful for modeling relationships between probability distributions
Information geometry: Dualistic manifold structures and their usesFrank Nielsen
Information geometry: Dualistic manifold structures and their uses
by Frank Nielsen
Talk given at ICML GIMLI2018
http://gimli.cc/2018/
See tutorial at:
https://arxiv.org/abs/1808.08271
``An elementary introduction to information geometry''
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
This document presents a method called Polynomial Exponential Family-Patch Matching (PEF-PM) to solve the patch matching problem. PEF-PM models patch colors using polynomial exponential families (PEFs), which are universal smooth positive densities. It estimates PEFs using a Score Matching Estimator and accelerates batch estimation using Summed Area Tables. Patch similarity is measured using a statistical projective divergence called the symmetrized γ-divergence. Experiments show PEF-PM handles noise robustly, symmetries, and outperforms baseline methods.
This document summarizes Frank Nielsen's talk on divergence-based center clustering and their applications. Some key points:
- Center-based clustering aims to minimize an objective function that assigns data points to their closest cluster centers. This is an NP-hard problem when the number of dimensions and data points are greater than 1.
- Mixed divergences use dual centroids per cluster to define cluster assignments. Total Jensen divergences are proposed as a way to make divergences more robust by incorporating a conformal factor.
- For clustering when centroids do not have closed-form solutions, initialization methods like k-means++ can be used which randomly select initial seeds without computing centroids. Total Jensen k-means++
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
This document discusses curved Mahalanobis distances in Cayley-Klein geometries and their application to classification. Specifically:
1. It introduces Mahalanobis distances and generalizes them to curved distances in Cayley-Klein geometries, which can model both elliptic and hyperbolic geometries.
2. It describes how to learn these curved Mahalanobis metrics using an adaptation of Large Margin Nearest Neighbors (LMNN) to the elliptic and hyperbolic cases.
3. Experimental results on several datasets show that curved Mahalanobis distances can achieve comparable or better classification accuracy than standard Mahalanobis distances.
This document discusses different geometric structures and distances that can be used for clustering probability distributions that live on the probability simplex. It reviews four main geometries: Fisher-Rao Riemannian geometry based on the Fisher information metric, information geometry based on Kullback-Leibler divergence, total variation distance and l1-norm geometry, and Hilbert projective geometry based on the Hilbert metric. It compares how k-means clustering performs using distances derived from these different geometries on the probability simplex.
An elementary introduction to information geometryFrank Nielsen
This document provides an elementary introduction to information geometry. It discusses how information geometry generalizes concepts from Riemannian geometry to study the geometry of decision making and model fitting. Specifically, it introduces:
1. Dually coupled connections (∇, ∇*) that are compatible with a metric tensor g and define dual parallel transport on a manifold.
2. The fundamental theorem of information geometry, which states that manifolds with dually coupled connections (∇, ∇*) have the same constant curvature.
3. Examples of statistical manifolds with dually flat geometry that arise from Bregman divergences and f-divergences, making them useful for modeling relationships between probability distributions
Information geometry: Dualistic manifold structures and their usesFrank Nielsen
Information geometry: Dualistic manifold structures and their uses
by Frank Nielsen
Talk given at ICML GIMLI2018
http://gimli.cc/2018/
See tutorial at:
https://arxiv.org/abs/1808.08271
``An elementary introduction to information geometry''
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
This document presents a method called Polynomial Exponential Family-Patch Matching (PEF-PM) to solve the patch matching problem. PEF-PM models patch colors using polynomial exponential families (PEFs), which are universal smooth positive densities. It estimates PEFs using a Score Matching Estimator and accelerates batch estimation using Summed Area Tables. Patch similarity is measured using a statistical projective divergence called the symmetrized γ-divergence. Experiments show PEF-PM handles noise robustly, symmetries, and outperforms baseline methods.
This document summarizes Frank Nielsen's talk on divergence-based center clustering and their applications. Some key points:
- Center-based clustering aims to minimize an objective function that assigns data points to their closest cluster centers. This is an NP-hard problem when the number of dimensions and data points are greater than 1.
- Mixed divergences use dual centroids per cluster to define cluster assignments. Total Jensen divergences are proposed as a way to make divergences more robust by incorporating a conformal factor.
- For clustering when centroids do not have closed-form solutions, initialization methods like k-means++ can be used which randomly select initial seeds without computing centroids. Total Jensen k-means++
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
This document discusses curved Mahalanobis distances in Cayley-Klein geometries and their application to classification. Specifically:
1. It introduces Mahalanobis distances and generalizes them to curved distances in Cayley-Klein geometries, which can model both elliptic and hyperbolic geometries.
2. It describes how to learn these curved Mahalanobis metrics using an adaptation of Large Margin Nearest Neighbors (LMNN) to the elliptic and hyperbolic cases.
3. Experimental results on several datasets show that curved Mahalanobis distances can achieve comparable or better classification accuracy than standard Mahalanobis distances.
The dual geometry of Shannon informationFrank Nielsen
The document discusses the dual geometry of Shannon information. It covers:
1. Shannon entropy and related concepts like maximum entropy principle and exponential families.
2. The properties of Kullback-Leibler divergence including its interpretation as a statistical distance and relation to maximum entropy.
3. How maximum likelihood estimation for exponential families can be viewed as minimizing Kullback-Leibler divergence between the empirical distribution and model distribution.
Bregman divergences from comparative convexityFrank Nielsen
This document discusses generalized divergences and comparative convexity. It introduces Jensen divergences, Bregman divergences, and their generalizations to quasi-arithmetic and weighted means. Quasi-arithmetic Bregman divergences are defined for strictly (ρ,τ)-convex functions using two strictly monotone functions ρ and τ. Power mean Bregman divergences are obtained as a subfamily when ρ(x)=xδ1 and τ(x)=xδ2. A criterion is given to check (ρ,τ)-convexity by testing the ordinary convexity of the transformed function G=Fρ,τ.
In this talk, we give an overview of results on numerical integration in Hermite spaces. These spaces contain functions defined on $\mathbb{R}^d$, and can be characterized by the decay of their Hermite coefficients. We consider the case of exponentially as well as polynomially decaying Hermite coefficients. For numerical integration, we either use Gauss-Hermite quadrature rules or algorithms based on quasi-Monte Carlo rules. We present upper and lower error bounds for these algorithms, and discuss their dependence on the dimension $d$. Furthermore, we comment on open problems for future research.
QMC algorithms usually rely on a choice of “N” evenly distributed integration nodes in $[0,1)^d$. A common means to assess such an equidistributional property for a point set or sequence is the so-called discrepancy function, which compares the actual number of points to the expected number of points (assuming uniform distribution on $[0,1)^{d}$) that lie within an arbitrary axis parallel rectangle anchored at the origin. The dependence of the integration error using QMC rules on various norms of the discrepancy function is made precise within the well-known Koksma--Hlawka inequality and its variations. In many cases, such as $L^{p}$ spaces, $1<p<\infty$, the best growth rate in terms of the number of points “N” as well as corresponding explicit constructions are known. In the classical setting $p=\infty$ sharp results are absent for $d\geq3$ already and appear to be intriguingly hard to obtain. This talk shall serve as a survey on discrepancy theory with a special emphasis on the $L^{\infty}$ setting. Furthermore, it highlights the evolution of recent techniques and presents the latest results.
We study QPT (quasi-polynomial tractability) in the worst case setting of linear tensor product problems defined over Hilbert spaces. We prove QPT for algorithms that use only function values under three assumptions'
1. the minimal errors for the univariate case decay polynomially fast to zero,
2. the largest singular value for the univariate case is simple,
3. the eigenfunction corresponding to the largest singular value is a multiple of the function value at some point.
The first two assumptions are necessary for QPT. The third assumption is necessary for QPT for some Hilbert spaces.
Joint work with Erich Novak
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsVjekoslavKovac1
This document summarizes a talk given by Vjekoslav Kovač at a joint mathematics conference. The talk concerned establishing T(1)-type theorems for entangled multilinear Calderón-Zygmund operators. Specifically, Kovač discussed studying multilinear singular integral forms where the functions partially share variables, known as an "entangled structure." He outlined establishing generalized modulation invariance and Lp estimates for such operators. The talk motivated further studying related problems involving bilinear ergodic averages and forms with more complex graph structures. Kovač specialized his techniques to bipartite graphs, multilinear Calderón-Zygmund kernels, and "perfect" dyadic models.
Tailored Bregman Ball Trees for Effective Nearest NeighborsFrank Nielsen
This document presents an improved Bregman ball tree (BB-tree++) for efficient nearest neighbor search using Bregman divergences. The BB-tree++ speeds up construction using Bregman 2-means++ initialization and adapts the branching factor. It also handles symmetrized Bregman divergences and prioritizes closer nodes. Experiments on image retrieval with SIFT descriptors show the BB-tree++ outperforms the original BB-tree and random sampling, providing faster approximate nearest neighbor search.
This document discusses using the Wasserstein distance for inference in generative models. It begins by introducing ABC methods that use a distance between samples to compare observed and simulated data. It then discusses using the Wasserstein distance as an alternative distance metric that has lower variance than the Euclidean distance. The document covers computational aspects of calculating the Wasserstein distance, asymptotic properties of minimum Wasserstein estimators, and applications to time series data.
On maximal and variational Fourier restrictionVjekoslavKovac1
Workshop talk slides, Follow-up workshop to trimester program "Harmonic Analysis and Partial Differential Equations", Hausdorff Institute, Bonn, May 2019.
This document discusses prior selection for mixture estimation. It begins by introducing mixture models and their common parameterization. It then discusses several types of weakly informative priors that can be used for mixture models, including empirical Bayes priors, hierarchical priors, and reparameterizations. It notes challenges with using improper priors for mixture models. The document also discusses saturated priors when the number of components is not known beforehand. It covers Jeffreys priors for mixtures and issues around propriety. It proposes some reparameterizations of mixtures, like using moments or a spherical reparameterization, that allow proper Jeffreys-like priors to be defined.
Trilinear embedding for divergence-form operatorsVjekoslavKovac1
The document discusses a trilinear embedding theorem for divergence-form operators with complex coefficients. It proves that if matrices A, B, C are appropriately p,q,r-elliptic, then there is a bound on the integral of the product of the gradients of the semigroups associated with the operators. The proof uses a Bellman function technique and shows the relationship to the concept of p-ellipticity. It generalizes previous work on bilinear embeddings to the trilinear case.
This document summarizes multivariate extreme value theory and methods for analyzing the joint behavior of extremes from multiple variables. It discusses three main approaches:
1) Limit theorems for multivariate sample maxima, which characterize the limiting distribution of component-wise maxima.
2) Alternative formulations by Ledford-Tawn and Heffernan-Tawn that allow for more flexible dependence structures between variables.
3) Max-stable processes, which generalize univariate extreme value distributions to the multivariate case through the use of exponent measures.
Estimation of multivariate extreme value models poses challenges due to their nonregular behavior and potential for high dimensionality. Most methods transform to unit Fréchet margins before modeling dependence structure.
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
This document describes a new MCMC method called the Coordinate Sampler. It is a non-reversible Gibbs-like sampler based on a piecewise deterministic Markov process (PDMP). The Coordinate Sampler generalizes the Bouncy Particle Sampler by making the bounce direction partly random and orthogonal to the gradient. It is proven that under certain conditions, the PDMP induced by the Coordinate Sampler has a unique invariant distribution of the target distribution multiplied by a uniform auxiliary variable distribution. The Coordinate Sampler is also shown to exhibit geometric ergodicity, an important convergence property, under additional regularity conditions on the target distribution.
Density theorems for Euclidean point configurationsVjekoslavKovac1
1. The document discusses density theorems for point configurations in Euclidean space. Density theorems study when a measurable set A contained in Euclidean space can be considered "large".
2. One classical result is that for any measurable set A contained in R2 with positive upper Banach density, there exist points in A whose distance is any sufficiently large real number. This has been generalized to higher dimensions and other point configurations.
3. Open questions remain about determining all point configurations P for which one can show that a sufficiently large measurable set A contained in high dimensional Euclidean space must contain a scaled copy of P.
Slides: A glance at information-geometric signal processingFrank Nielsen
This document discusses information geometry and its applications in statistical signal processing. It introduces several key concepts:
1) Statistical signal processing models data with probability distributions like Gaussians and histograms. Information geometry provides a geometric framework for intuitive reasoning about these statistical models.
2) Exponential family mixture models generalize Gaussian and Rayleigh mixtures and are algorithmically useful in dually flat spaces.
3) Distances between statistical models, like Kullback-Leibler divergence and Bregman divergences, can be interpreted geometrically in terms of convex conjugates and Legendre transformations.
The document discusses harmonic maps from the Riemann surface M=S1×R or CP1\{0,1,∞} into the complex projective space CPn. It presents the DPW method for constructing harmonic maps using loop groups. Specifically, it constructs equivariant harmonic maps in CPn from degree one potentials in the loop algebra Λgσ, relating these to whether the maps are isotropic, weakly conformal, or non-conformal. It then considers the system of ODEs and scalar ODE that must be solved to generate the harmonic maps using this method.
The dual geometry of Shannon informationFrank Nielsen
The document discusses the dual geometry of Shannon information. It covers:
1. Shannon entropy and related concepts like maximum entropy principle and exponential families.
2. The properties of Kullback-Leibler divergence including its interpretation as a statistical distance and relation to maximum entropy.
3. How maximum likelihood estimation for exponential families can be viewed as minimizing Kullback-Leibler divergence between the empirical distribution and model distribution.
Bregman divergences from comparative convexityFrank Nielsen
This document discusses generalized divergences and comparative convexity. It introduces Jensen divergences, Bregman divergences, and their generalizations to quasi-arithmetic and weighted means. Quasi-arithmetic Bregman divergences are defined for strictly (ρ,τ)-convex functions using two strictly monotone functions ρ and τ. Power mean Bregman divergences are obtained as a subfamily when ρ(x)=xδ1 and τ(x)=xδ2. A criterion is given to check (ρ,τ)-convexity by testing the ordinary convexity of the transformed function G=Fρ,τ.
In this talk, we give an overview of results on numerical integration in Hermite spaces. These spaces contain functions defined on $\mathbb{R}^d$, and can be characterized by the decay of their Hermite coefficients. We consider the case of exponentially as well as polynomially decaying Hermite coefficients. For numerical integration, we either use Gauss-Hermite quadrature rules or algorithms based on quasi-Monte Carlo rules. We present upper and lower error bounds for these algorithms, and discuss their dependence on the dimension $d$. Furthermore, we comment on open problems for future research.
QMC algorithms usually rely on a choice of “N” evenly distributed integration nodes in $[0,1)^d$. A common means to assess such an equidistributional property for a point set or sequence is the so-called discrepancy function, which compares the actual number of points to the expected number of points (assuming uniform distribution on $[0,1)^{d}$) that lie within an arbitrary axis parallel rectangle anchored at the origin. The dependence of the integration error using QMC rules on various norms of the discrepancy function is made precise within the well-known Koksma--Hlawka inequality and its variations. In many cases, such as $L^{p}$ spaces, $1<p<\infty$, the best growth rate in terms of the number of points “N” as well as corresponding explicit constructions are known. In the classical setting $p=\infty$ sharp results are absent for $d\geq3$ already and appear to be intriguingly hard to obtain. This talk shall serve as a survey on discrepancy theory with a special emphasis on the $L^{\infty}$ setting. Furthermore, it highlights the evolution of recent techniques and presents the latest results.
We study QPT (quasi-polynomial tractability) in the worst case setting of linear tensor product problems defined over Hilbert spaces. We prove QPT for algorithms that use only function values under three assumptions'
1. the minimal errors for the univariate case decay polynomially fast to zero,
2. the largest singular value for the univariate case is simple,
3. the eigenfunction corresponding to the largest singular value is a multiple of the function value at some point.
The first two assumptions are necessary for QPT. The third assumption is necessary for QPT for some Hilbert spaces.
Joint work with Erich Novak
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsVjekoslavKovac1
This document summarizes a talk given by Vjekoslav Kovač at a joint mathematics conference. The talk concerned establishing T(1)-type theorems for entangled multilinear Calderón-Zygmund operators. Specifically, Kovač discussed studying multilinear singular integral forms where the functions partially share variables, known as an "entangled structure." He outlined establishing generalized modulation invariance and Lp estimates for such operators. The talk motivated further studying related problems involving bilinear ergodic averages and forms with more complex graph structures. Kovač specialized his techniques to bipartite graphs, multilinear Calderón-Zygmund kernels, and "perfect" dyadic models.
Tailored Bregman Ball Trees for Effective Nearest NeighborsFrank Nielsen
This document presents an improved Bregman ball tree (BB-tree++) for efficient nearest neighbor search using Bregman divergences. The BB-tree++ speeds up construction using Bregman 2-means++ initialization and adapts the branching factor. It also handles symmetrized Bregman divergences and prioritizes closer nodes. Experiments on image retrieval with SIFT descriptors show the BB-tree++ outperforms the original BB-tree and random sampling, providing faster approximate nearest neighbor search.
This document discusses using the Wasserstein distance for inference in generative models. It begins by introducing ABC methods that use a distance between samples to compare observed and simulated data. It then discusses using the Wasserstein distance as an alternative distance metric that has lower variance than the Euclidean distance. The document covers computational aspects of calculating the Wasserstein distance, asymptotic properties of minimum Wasserstein estimators, and applications to time series data.
On maximal and variational Fourier restrictionVjekoslavKovac1
Workshop talk slides, Follow-up workshop to trimester program "Harmonic Analysis and Partial Differential Equations", Hausdorff Institute, Bonn, May 2019.
This document discusses prior selection for mixture estimation. It begins by introducing mixture models and their common parameterization. It then discusses several types of weakly informative priors that can be used for mixture models, including empirical Bayes priors, hierarchical priors, and reparameterizations. It notes challenges with using improper priors for mixture models. The document also discusses saturated priors when the number of components is not known beforehand. It covers Jeffreys priors for mixtures and issues around propriety. It proposes some reparameterizations of mixtures, like using moments or a spherical reparameterization, that allow proper Jeffreys-like priors to be defined.
Trilinear embedding for divergence-form operatorsVjekoslavKovac1
The document discusses a trilinear embedding theorem for divergence-form operators with complex coefficients. It proves that if matrices A, B, C are appropriately p,q,r-elliptic, then there is a bound on the integral of the product of the gradients of the semigroups associated with the operators. The proof uses a Bellman function technique and shows the relationship to the concept of p-ellipticity. It generalizes previous work on bilinear embeddings to the trilinear case.
This document summarizes multivariate extreme value theory and methods for analyzing the joint behavior of extremes from multiple variables. It discusses three main approaches:
1) Limit theorems for multivariate sample maxima, which characterize the limiting distribution of component-wise maxima.
2) Alternative formulations by Ledford-Tawn and Heffernan-Tawn that allow for more flexible dependence structures between variables.
3) Max-stable processes, which generalize univariate extreme value distributions to the multivariate case through the use of exponent measures.
Estimation of multivariate extreme value models poses challenges due to their nonregular behavior and potential for high dimensionality. Most methods transform to unit Fréchet margins before modeling dependence structure.
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
This document describes a new MCMC method called the Coordinate Sampler. It is a non-reversible Gibbs-like sampler based on a piecewise deterministic Markov process (PDMP). The Coordinate Sampler generalizes the Bouncy Particle Sampler by making the bounce direction partly random and orthogonal to the gradient. It is proven that under certain conditions, the PDMP induced by the Coordinate Sampler has a unique invariant distribution of the target distribution multiplied by a uniform auxiliary variable distribution. The Coordinate Sampler is also shown to exhibit geometric ergodicity, an important convergence property, under additional regularity conditions on the target distribution.
Density theorems for Euclidean point configurationsVjekoslavKovac1
1. The document discusses density theorems for point configurations in Euclidean space. Density theorems study when a measurable set A contained in Euclidean space can be considered "large".
2. One classical result is that for any measurable set A contained in R2 with positive upper Banach density, there exist points in A whose distance is any sufficiently large real number. This has been generalized to higher dimensions and other point configurations.
3. Open questions remain about determining all point configurations P for which one can show that a sufficiently large measurable set A contained in high dimensional Euclidean space must contain a scaled copy of P.
Slides: A glance at information-geometric signal processingFrank Nielsen
This document discusses information geometry and its applications in statistical signal processing. It introduces several key concepts:
1) Statistical signal processing models data with probability distributions like Gaussians and histograms. Information geometry provides a geometric framework for intuitive reasoning about these statistical models.
2) Exponential family mixture models generalize Gaussian and Rayleigh mixtures and are algorithmically useful in dually flat spaces.
3) Distances between statistical models, like Kullback-Leibler divergence and Bregman divergences, can be interpreted geometrically in terms of convex conjugates and Legendre transformations.
The document discusses harmonic maps from the Riemann surface M=S1×R or CP1\{0,1,∞} into the complex projective space CPn. It presents the DPW method for constructing harmonic maps using loop groups. Specifically, it constructs equivariant harmonic maps in CPn from degree one potentials in the loop algebra Λgσ, relating these to whether the maps are isotropic, weakly conformal, or non-conformal. It then considers the system of ODEs and scalar ODE that must be solved to generate the harmonic maps using this method.
This document provides an overview of computational geometry algorithms and their generalization to information spaces. It begins with a brief history of computational geometry and examples of libraries for geometric computing. The core concepts of Voronoi diagrams and their dual Delaunay complexes are reviewed for Euclidean spaces. These concepts are then generalized to Riemannian and dually affine computational information geometry, with applications to clustering and learning mixtures.
Marginal Deformations and Non-IntegrabilityRene Kotze
1) The document discusses integrability in quantum field theories and string theories, specifically N=4 supersymmetric Yang-Mills theory (SYM).
2) It was previously shown that N=4 SYM is integrable in the planar limit through a mapping to spin chains. Deformations of N=4 SYM called Leigh-Strassler deformations were also studied.
3) The document analyzes string motion on the gravity dual background known as the Lunin-Maldacena geometry, which corresponds to Leigh-Strassler deformations. Through analytic and numerical methods, it is shown that string motion on this background is non-integrable for complex deformations, indicating
This document discusses priors for mixture models. It introduces weakly informative priors like symmetric empirical Bayes priors and dependent priors. Improper independent priors are problematic for mixtures. Reparameterization techniques are discussed to define proper Jeffreys priors, including expressing components as local perturbations, using moments, and spherical reparameterization. Specific examples for Gaussian and Poisson mixtures show valid reparameterizations that lead to proper posteriors.
This document summarizes Chris Swierczewski's general exam presentation on computational applications of Riemann surfaces and Abelian functions. The presentation covered the geometry and algebra of Riemann surfaces, including bases of cycles, holomorphic differentials, and period matrices. Applications discussed include using Riemann theta functions to find periodic solutions to integrable PDEs like the Kadomtsev–Petviashvili equation. The talk also discussed linear matrix representations of algebraic curves and the constructive Schottky problem of realizing a Riemann matrix as the period matrix of a curve.
Multidimensional integrals may be approximated by weighted averages of integrand values. Quasi-Monte Carlo (QMC) methods are more accurate than simple Monte Carlo methods because they carefully choose where to evaluate the integrand. This tutorial focuses on how quickly QMC methods converge to the correct answer as the number of integrand values increases. The answer may depend on the smoothness of the integrand and the sophistication of the QMC method. QMC error analysis may assumes the integrand belongs to a reproducing kernel Hilbert space or may assume that the integrand is an instance of a stochastic process with known covariance structure. These two approaches have interesting parallels. This tutorial also explores how the computational cost of achieving a good approximation to the integral depends on the dimension of the domain of the integrand. Finally, this tutorial explores methods for determining how many integrand values are needed to satisfy the error tolerance. Relevant software is described.
This document discusses graph kernels, which are positive definite kernels defined on graphs that allow applying machine learning algorithms to graph-structured data like molecules. It covers different types of graph kernels like subgraph kernels, path kernels, and walk kernels. Walk kernels count the number of walks between two graphs and can be computed efficiently in polynomial time, unlike subgraph and path kernels. The document also discusses using product graphs to compute walk kernels and presents results on classifying mutagenicity using random walk kernels. It concludes by proposing using graph kernels and product graphs to define data depth measures for labeled graph ensembles.
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Frank Nielsen
This document discusses Voronoi diagrams in various geometries beyond Euclidean space. It begins by introducing ordinary Voronoi diagrams based on Euclidean distance. It then discusses Voronoi diagrams in non-Euclidean geometries like spherical and hyperbolic spaces. It also discusses Voronoi diagrams in Riemannian geometries defined by a metric tensor, as well as information geometries based on probability distributions. The document introduces Bregman divergences and shows how Voronoi diagrams can be defined using these divergences in dually flat spaces. It specifically discusses α-divergences and β-divergences as examples of representational Bregman divergences. It concludes by discussing centroids
The document discusses error analysis for quasi-Monte Carlo methods used for numerical integration. It introduces the concepts of reproducing kernel Hilbert spaces and mean square discrepancy to analyze integration error. Specifically, it shows that the mean square discrepancy of randomized low-discrepancy point sets can be computed in O(n) operations, whereas the standard discrepancy requires O(n^2) operations, making randomized quasi-Monte Carlo methods more efficient for high-dimensional integration problems.
This document discusses using the Wasserstein distance for inference in generative models. It begins with an overview of approximate Bayesian computation (ABC) and how distances between samples are used. It then introduces the Wasserstein distance as an alternative distance that can have lower variance than the Euclidean distance. Computational aspects and asymptotics of using the Wasserstein distance are discussed. The document also covers how transport distances can handle time series data.
On Convolution of Graph Signals and Deep Learning on Graph DomainsJean-Charles Vialatte
This document provides an outline and definitions for a thesis on convolution of graph signals and deep learning on graph domains. It discusses motivations, related work, definitions of graph signals and convolution, and different approaches to extending convolution operations to non-Euclidean graph domains. Specifically, it covers spectral approaches that define convolution in the graph spectral domain, vertex-domain approaches that define it as a sum over neighborhoods, and characterizes convolutional operators by their equivariance properties. It also discusses applications to deep learning on graphs and different notions of graph convolution.
RuleML2015: Learning Characteristic Rules in Geographic Information SystemsRuleML
We provide a general framework for learning characterization
rules of a set of objects in Geographic Information Systems (GIS) relying
on the definition of distance quantified paths. Such expressions specify
how to navigate between the different layers of the GIS starting from
the target set of objects to characterize. We have defined a generality
relation between quantified paths and proved that it is monotonous with
respect to the notion of coverage, thus allowing to develop an interactive
and effective algorithm to explore the search space of possible rules. We
describe GISMiner, an interactive system that we have developed based
on our framework. Finally, we present our experimental results from a
real GIS about mineral exploration.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
Clustering in Hilbert geometry for machine learning
1. Clustering in Hilbert geometry for machine
learning
Frank Nielsen1 Ke Sun2
1Sony Computer Science Laboratories Inc, Japan
2CSIRO, Australia
December 2018
@FrnkNlsn
https://arxiv.org/abs/1704.00454
1
2. Clustering: Cornerstone of unsupervised clustering
Clustering for discovering categories in datasets
Choice of features and distance between features
2
3. Motivations
Which geometry and distance are appropriate for a space of
probability distributions? for a space of correlation matrices?
Information geometry [5] focuses on the differential-geometric
structures associated to spaces of parametric distributions...
Benchmark different geometric space modelings
3
4. The probability simplex space
Many tasks in text mining, machine learning, computer vision
deal with multinomial distributions (or mixtures thereof),
living on the probability simplex
Which distance and underlying geometry should we
consider for handling the space of multinomials?
(L1): Total-variation distance and ℓ1-norm geometry
(FRH): Fisher-Rao distance and spherical Riemannian
geometry
(IG): Kullback-Leibler/f -divergences and information geometry
(HG): Hilbert cross-ratio metric and Hilbert projective
geometry
Benchmark experimentally these different geometries by
considering k-means/k-center clusterings of multinomials
4
5. The space of multinomial distributions
Probability simplex (standard simplex):
∆d := {p = (λ0
p, . . . , λd
p ) : pi > 0,
∑d
i=0 λi
p = 1}.
Multinomial distribution
X = (X0, . . . , Xd ) ∼ Mult(p = (λ0
p, . . . , λd
p ), m) with
probability mass function:
Pr(X0 = m0, . . . , Xd = md ) =
m!
∏d
i=0 mi !
d∏
i=0
(
λi
p
)mi
,
d∑
i=0
mi = m
Binomial (d = 1), Bernoulli (d = 1, m = 1),
multinoulli/categorical/discrete (m = 1 and d > 1)
. . . . . .
. . . . . .
. . . . . . . . . . . .
sampling
inference
k
i=1 wiMult(pi)
Categorical data
set of normalized histograms point set in the probability simplex
p1
p2
p3
H1
H2
H3
5
6. Teaser: k-center clustering in the probability simplex ∆d
k = 3 clusters
k = 5 clusters
L1 : Total-variation distance ρL1 (ℓ1-norm geometry)
FHR : Fisher-Hotelling-Rao distance ρFHR (spherical geometry)
IG : Kullback-Leibler divergence ρIG (information geometry)
HG : Hilbert metric ρHG (Hilbert projective geometry)
6
7. Review of four types of
geometry for the
probability simplex ∆d
7
8. Total variation distance and ℓ1-norm geometry
L1 metric distance ρL1:
ρL1(p, q) =
d∑
i=0
|λi
p − λi
q|
Total variation is ρTV(p, q) = 1
2∥λp − λq∥1 ≤ 1
Satisfies the information monotonicity by coarse-graining the
histogram bins
(TV is a f -divergence for generator f (u) = 1
2|u − 1|):
0 ≤ ρL1(p′
, q′
) ≤ ρL1(p, q),
for p′ = Ap and q′ = Aq for a positive d′ × d
column-stochastic matrix A, with d′ < d
ℓ1-norm-induced distance:
ρL1(p, q) = ∥λp − λq∥1
8
9. The shape of ℓ1-balls in ∆d
For trinomials in ∆2 (with λ2 = 1 − (λ0 + λ1)), the ρL1
distance is given by:
ρL1(p, q) = |λ0
p − λ0
q| + |λ1
p − λ1
q| + |λ0
q − λ0
p + λ1
q − λ1
p|.
ℓ1-distance is polytopal: cross-polytope
Z = conv(±ei : i ∈ [d]) also called co-cube (orthoplex)
Shape of ℓ1-balls:
BallL1 (p, r) = (λp ⊕ rZ) ∩ H∆d
In 2D, regular octahedron cut by H∆2 = hexagonal shapes.
9
10. Visualizing the 1D/2D/3D shapes of L1 balls in ∆d
Illustration courtesy of ’Geometry of Quantum States’, [1]
10
11. Fisher-Hotelling-Rao Riemannian geometry (1930)
Fisher metric tensor [gij] (constant positive curvature):
gij(p) =
δij
λi
p
+
1
λ0
p
.
Statistical invariance yields unique Fisher metric [2]
Distance is a geodesic length (Levi-Civita connection ∇LC
with metric-compatible parallel transport)
Rao metric distance between two multinomials on Riemannian
manifold (∆d , g):
ρFHR(p, q) = 2 arccos
( d∑
i=0
√
λi
pλi
q
)
.
Can be embedded in the positive orthant of an Euclidean
d-sphere of Rd+1 by using the square root representation
λ →
√
λ = (
√
λ0, . . . ,
√
λd )
11
12. Kullback-Leibler divergence and information geometry
Kullback-Leibler (KL) divergence ρIG:
ρIG(p, q) =
d∑
i=0
λi
p log
λi
p
λi
q
.
Satisfies the information monotonicity (f -divergence with
f (u) = − log u) when coarse-graining ∆d → ∆d′ (with
d′ < d): 0 ≤ If (p′ : q′) ≤ If (p : q). Fisher-Rao ρFHR does
not satisfy the information monotonicity
Dualistic structure: dually flat manifold (∆d , g, ∇m, ∇e)
(exponential connection ∇e and mixture connection ∇m, with
∇LC = ∇m+∇e
2 ), or expected α-geometry (∆d , g, ∇−α, ∇α)
for α ∈ R. Dual parallel transport that is metric compatible.
∆d is both an exponential family and a mixture family (= a
dually flat space)
12
13. Hilbert cross ratio metric & geometry
Log cross ratio distance:
(metric satisfying the triangle inequality)
ρHG(M, M′
) =
{
log |A′M||AM′|
|A′M′||AM| , M ̸= M′,
0 M = M′.
Euclidean line segments are geodesics (but not unique,
Busemann’s G-space)
Hilbert Geometry (HG) holds for any open convex compact
subset domain (e.g., ∆d , elliptope of correlation matrices)
13
14. Euclidean shape of balls in Hilbert projective geometry
Hilbert balls in ∆d : hexagons shapes (2D) [10], rhombic
dodecahedra (3D), polytopes [4] with d(d + 1) facets in
dimension d.
1.5
3.0
4.5
6.0
Not Riemannian/differential-geometric geometry because
infinitesimally small balls have not ellipsoidal shapes. (Tissot
indicatrix in Riemannian geometry)
A video explaining the shapes of Hilbert balls:
https://www.youtube.com/watch?v=XE5x5rAK8Hk&t=1s
14
15. Geodesics are not unique in ∆d
p
q
R(p, q)
R(p, q) := {r : ρHG(p, q) = ρHG(p, r) + ρHG(q, r)} ,
⇒ Geodesics are unique when ∂Ω is strictly smooth convex
15
16. Invariance of Hilbert geometries under collineations
∆
H(∆)p
q H(p)
H(q)
ρ∆
HG(p, q) ρ
H(∆)
HG (H(p), H(q))
a
a
H(a)
H(a )
Collineation H
ρ∆
HG(p, q) = ρ
H(∆)
HG (H(p), H(q))
The cross ratio is invariant under perspective transformations.
Automorphism groups (= ’motions’) well-studied
16
17. Hilbert geometry generalizes Cayley-Klein hyperbolic
geometry
Defined for a quadric domain
(curved elliptical/hyperbolic Mahalanobis distances [6])
Video: https://www.youtube.com/watch?v=YHJLq3-RL58
17
18. Computing Hilbert metric distance in general
M (t0 ≤ 0) and M′ (t1 ≥ 1): intersection points of the line
(1 − t)p + tq with ∂∆d .
Expression using t0 and t1:
ρHG(p, q) = log
(1 − t0)t1
(−t0)(t1 − 1)
= log
(
1 −
1
t0
)
−log
(
1 −
1
t1
)
.
18
19. Relationship with Birkhoff projective geometry
Instead of considering the (d − 1)-dimensional simplex ∆d ,
consider the the pointed cone Rd
+,∗.
Birkhoff projective metric on positive histograms:
ρBG(˜p, ˜q) := log maxi,j∈[d]
˜pi ˜qj
˜pj ˜qi
is scale-invariant ρBG(α˜p, β˜q) = ρBG(˜p, ˜q), for any α, β > 0.
O
˜p
˜q
p
q
e1 = (1, 0)
e2 = (0, 1)
∆1
R2
+,∗
ρHG(p, q)
ρBG(˜p, ˜q)
19
20. Hilbert simplex distance satisfies the information
monotonicity
Birhkoff proved (1957) :
ρBG(M˜p, M˜q) ≤ κ(M) × ρBG(˜p, ˜q)
where κ(M) is the contraction ratio of the binning scheme.
κ(M) :=
√
a(M) − 1
√
a(M) + 1
< 1, a(M) := maxi,j,k,l
Mi,kMj,l
Mj,kMi,l
< ∞.
A first example of non-separable distance that preserves
information monotonicity!
20
21. Shape of Hilbert balls in an open simplex
A ball in the Hilbert simplex geometry has a Euclidean
polytope shape with d(d + 1) facets [4]
In 2D, Hilbert balls have 2(2 + 1) = 6 hexagonal shapes
varying with center locations. In 3D, the shape of balls are
rhombic-dodecahedron
When the domain is not simplicial, Hilbert balls can have
varying complexities [10]
No metric tensor in Hilbert geometry because infinitesimally
the ball shapes are polytopes and not ellipsoids.
Hilbert geometry can be studied via Finsler geometry
21
22. Isometry of Hilbert simplex geometry with a normed vector
space (∆d
, ρHG) ∼= (V d
, ∥ · ∥NH)
V d = {v ∈ Rd+1 :
∑
i vi = 0} ⊂ Rd+1
Map p = (λ0, . . . , λd ) ∈ ∆d to v(x) = (v0, . . . , vd ) ∈ V d :
vi
=
1
d + 1
d log λi
−
∑
j̸=i
log λj
= log λi
−
1
d + 1
∑
j
log λj
.
λi
=
exp(vi )
∑
j exp(vj)
.
Norm ∥ · ∥NH in V d defined by the shape of its unit ball
BV = {v ∈ V d : |vi − vj| ≤ 1, ∀i ̸= j}.
Polytopal norm-induced distance:
ρV (v, v′
) = ∥v − v′
∥NH = inf
{
τ : v′
∈ τ(BV ⊕ {v})
}
,
Norm does not satisfy parallelogram law (no inner product)
22
28. k-means++ clustering (seeding initialization)
For an arbitrary distance D, define the k-means objective
function (NP-hard to minimize):
ED(Λ, C) =
1
n
n∑
i=1
minj∈{1,...,k}D(pi : cj)
k-means++: Pick uniformly at random at first seed c1, and
then iteratively choose the (k − 1) remaining seeds according
to the following probability distribution:
Pr(cj = pi ) =
D(pi , {c1, . . . , cj−1})
∑n
i=1 D(pi , {c1, . . . , cj−1})
(2 ≤ j ≤ k).
A randomized seeding initialization of k-means, can be further
locally optimized using Lloyd’s batch iterative updates, or
Hartigan’s single-point swap heuristic [9], etc.
28
29. Performance analysis of k-means++ [8]
Let κ1 and κ2 be two constants such that κ1 defines the
quasi-triangular inequality property:
D(x : z) ≤ κ1 (D(x : y) + D(y : z)) , ∀x, y, z ∈ ∆d
,
and κ2 handles the symmetry inequality:
D(x : y) ≤ κ2D(y : x), ∀x, y ∈ ∆d
.
Theorem
The generalized k-means++ seeding guarantees with high
probability a configuration C of cluster centers such that:
ED(Λ, C) ≤ 2κ2
1(1 + κ2)(2 + log k)E∗
D(Λ, k)
29
30. Performance analysis of k-means++ in a metric space
In any metric space (X, d), the k-means++ wrt the squared
metric distance d2 is 16(2 + log k)-competitive.
Proof: Symmetric distance (κ2 = 1). Quasi-triangular inequality
property:
d(p, q) ≤ d(p, q) + d(q, r),
d2
(p, q) ≤ (d(p, q) + d(q, r))2
,
d2
(p, q) ≤ d2
(p, q) + d2
(q, r) + 2d(p, q)d(q, r).
Let us apply the inequality of arithmetic and geometric means:
√
d2(p, q)d2(q, r) ≤
d2(p, q) + d2(q, r)
2
.
Thus we have
d2
(p, q) ≤ d2
(p, q)+d2
(q, r)+2d(p, q)d(q, r) ≤ 2(d2
(p, q)+d2
(q, r)).
That is, the squared metric distance satisfies the 2-approximate
triangle inequality, and κ1 = 2.
30
31. Performance analysis of k-means++
In any normed space (X, ∥ · ∥), the k-means++ with
D(x, y) = ∥x − y∥2 is 16(2 + log k)-competitive.
In any inner product space (X, ⟨·, ·⟩), the k-means++ with
D(x, y) = ⟨x − y, x − y⟩ is 16(2 + log k)-competitive.
Hilbert simplex geometry is isometric to a normed vector
space [4] (Theorem 3.3):
(∆d
, ρHG) ≃ (V d
, ∥ · ∥NH)
Theorem
k-means++ seeding in a Hilbert simplex geometry in fixed
dimension is 16(2 + log k)-competitive.
31
32. Clustering in ∆2 with k-means++
k = 3 clusters
k = 5 clusters
32
33. k-Center clustering
Cost function for a k-center clustering with centers C (|C| = k) is:
fD(Λ, C) = maxpi ∈Λmincj ∈C D(pi : cj)
For Hilbert simplex clustering, consider the equivalent normed
space:
r∗
HG = minc∈∆d maxi∈{1,...,n}ρHG(pi , c),
r∗
NH = minv∈V d maxi∈{1,...,n}∥vi − v∥NH.
33
34. k-Center clustering: farthest first traversal heuristic
Guaranteed approximation factor of 2 in any metric space [3]
34
35. Smallest Enclosing Ball (SEB)
Given a finite point set {p1, . . . , pn} ∈ ∆d , the SEB in Hilbert
simplex geometry is centered at
c∗
= arg min
c∈∆d
maxi∈{1,...,n}ρHG(c, xi ),
with radius
r∗
= minc∈∆d maxi∈{1,...,n}ρHG(c, xi ).
Decision problem can be solved by Linear Programming (LP),
and optimization as a LP-type problem [7]
Do not scale well in high dimensions
35
36. Fast approximations for the smallest enclosing ball
Start from an initial point, compute farthest point to current
center, displace center along a geodesic to farthest point, repeat!
36
37. Convergence rate
0 100 200 300
#iterations
0.0
0.2
0.4
0.6
0.8
1.0
Hilbertdistance
d=9; n=10
d=255; n=10
d=9; n=100
d=255; n=100
0 100 200 300
#iterations
0.0
0.2
0.4
0.6
0.8
1.0
relativeHilbertdistance
d=9; n=10
d=255; n=10
d=9; n=100
d=255; n=100
Hilbert distance with the true Hilbert center
Hilbert distance between the current minimax center and the true
center (left) or their Hilbert distance divided by the Hilbert radius
of the dataset (right). The plot is based on 100 random points in
∆9/∆255.
37
38. Examples of smallest enclosing balls in HG and HNG
0
1
Simplex
2 -2
2
-2
After isometry
0
1
2
3
4
0.0
0.8
1.6
2.4
3.2
0
1
Simplex
2 -2
2
-2
After isometry
0
1
2
3
4
0.0
0.8
1.6
2.4
3.2
3 points on the border
38
39. Examples of smallest enclosing balls in HG and HNG
0
1
Simplex
2 -2
2
-2
After isometry
0
1
2
3
4
0.0
0.8
1.6
2.4
3.2
0
1
Simplex
2 -2
2
-2
After isometry
0
1
2
3
4
0.0
0.8
1.6
2.4
3.2
2 points on the border
39
40. Visualizing SEBs in the four geometries (1)
0
1
Riemannian center IG center
0 1
0
1
Hilbert center
0 1
L1 center
0.00
0.25
0.50
0.75
1.00
0.0
0.2
0.4
0.6
0
1
2
3
4
0.0
0.2
0.4
0.6
0.8
Point Cloud 1
40
41. Visualizing smallest enclosing balls (2)
0
1
Riemannian center IG center
0 1
0
1
Hilbert center
0 1
L1 center
0.0
0.3
0.6
0.9
1.2
0.0
0.2
0.4
0.6
0.8
0
1
2
3
4
0.0
0.2
0.4
0.6
0.8
Point Cloud 2
41
42. Visualizing smallest enclosing balls (3)
0
1
Riemannian center IG center
0 1
0
1
Hilbert center
0 1
L1 center
0.0
0.3
0.6
0.9
1.2
0.00
0.25
0.50
0.75
1.00
0
1
2
3
4
0.00
0.25
0.50
0.75
1.00
Point Cloud 3
42
43. Quantitative experiments: Protocol
pick uniformly a random center c = (λ0
c, . . . , λd
c ) ∈ ∆d .
any random sample p = (λ0, . . . , λd ) associated with c is
independently generated by
λi
=
exp(log λi
c + σϵi )
∑d
i=0 exp(log λi
c + σϵi )
,
where σ > 0= noise level parameter, and each ϵi follows
independently a standard Gaussian distribution (generator 1)
or the Student’s t-distribution (5 dof, generator 2).
perform clustering based on the configurations n ∈ {50, 100},
d ∈ {9, 255}, σ ∈ {0.5, 0.9},
ρ ∈ {ρFHR, ρIG, ρHG, ρEUC, ρL1}. The number of clusters k is
set to the ground truth.
repeat the clustering experiment based on 300 different
random datasets. The performance is measured by the
normalized mutual information (NMI, the larger the better). 43
49. Discussion
Experimentally, we observe that
HG > FHR > IG
HG even better for large amount of noise
Intuitively, HG balls are more compact and better capture the
clustering structure
Not surprisingly, Euclidean geometry gets the poorest score!
49
55. Experiments: k-means++ on correlation matrices
n = 100 3 × 3 matrices with k = 3 clusters (points in C3, cluster of
almost identical size)
Each cluster is independently generated according to
P ∼ W−1
(I3×3, ν1),
Ci ∼ W−1
(P, ν2),
where W−1(A, ν) = inverse Wishart distribution with scale matrix
A and ν degrees of freedom
Ci is a point in the cluster associated with P.
500 independent runs
ν1 ν2 ρHG ρEUC ρL1 ρLD
4 10 0.62 ± 0.22 0.57±0.21 0.56±0.22 0.58±0.22
4 30 0.85 ± 0.18 0.80±0.20 0.81±0.19 0.82±0.20
4 50 0.89 ± 0.17 0.87±0.17 0.86±0.18 0.88±0.18
5 10 0.50 ± 0.21 0.49±0.21 0.48±0.20 0.47±0.21
5 30 0.77 ± 0.20 0.75±0.21 0.75±0.21 0.75±0.21
5 50 0.84 ± 0.19 0.82±0.19 0.82±0.20 0.84 ± 0.18
55
56. Experiments with Thompson metric on covariance matrices
ρTG(P, Q) := maxi | log λi (P−1
Q)|.
Matrix Pi belongs to a cluster c generated based on
Pi = Qcdiag(Li )Q⊤
c + σAi A⊤
i , where Qc is a random matrix of
orthonormal columns, Li follows a gamma distribution, the entries
of (Ai )d×d follow iid. standard Gaussian distribution, and σ =
noise level parameter.
Li σ ρIG ρrIG ρsIG ρTG
Γ(2, 1) 0.1 0.81 ± 0.09 0.82 ± 0.10 0.81 ± 0.10 0.81 ± 0.09
Γ(2, 1) 0.3 0.51 ± 0.11 0.54 ± 0.11 0.53 ± 0.11 0.52 ± 0.11
Γ(5, 1) 0.1 0.93 ± 0.07 0.93 ± 0.08 0.92 ± 0.08 0.92 ± 0.08
Γ(5, 1) 0.3 0.70 ± 0.10 0.72 ± 0.12 0.71 ± 0.10 0.70 ± 0.11
56
58. Summary and conclusion [11]
Introduced Hilbert/Birkhoff geometry for the probability
simplex (= space of categorical distributions) and the
elliptope (= space of correlation matrices). But also
simplotope (product of simplices), etc.
Hilbert simplex geometry is isometric to a normed vector
space (hilds for any Hilbert polytopal geometry)
Non-separable distance that satisfies the information
monotonicity (contraction ratio of linear operator)
Novel closed form Hilbert/Birkhoff distance formula for the
correlation (metric) and covariance matrices (projective
metric)
Benchmark experimentally four types of geometry for k-means
and k-center clusterings
Hilbert/Birkhoff geometry experimentally well-suited for
58
59. References I
Ingemar Bengtsson and Karol Życzkowski.
Geometry of quantum states: an introduction to quantum entanglement.
Cambridge university press, 2017.
L. L. Campbell.
An extended čencov characterization of the information metric.
American Mathematical Society, 98(1):135–141, 1986.
Teofilo F Gonzalez.
Clustering to minimize the maximum intercluster distance.
Theoretical Computer Science, 38:293–306, 1985.
Bas Lemmens and Roger Nussbaum.
Birkhoff’s version of Hilbert’s metric and its applications in analysis.
Handbook of Hilbert Geometry, pages 275–303, 2014.
Frank Nielsen.
An elementary introduction to information geometry.
ArXiv e-prints, August 2018.
Frank Nielsen, Boris Muzellec, and Richard Nock.
Classification with mixtures of curved Mahalanobis metrics.
In IEEE International Conference on Image Processing (ICIP), pages 241–245, 2016.
Frank Nielsen and Richard Nock.
On the smallest enclosing information disk.
Information Processing Letters, 105(3):93–97, 2008.
Frank Nielsen and Richard Nock.
Total Jensen divergences: Definition, properties and k-means++ clustering, 2013.
arXiv:1309.7109 [cs.IT].
59
60. References II
Frank Nielsen and Richard Nock.
Further heuristics for k-means: The merge-and-split heuristic and the (k, l)-means.
arXiv preprint arXiv:1406.6314, 2014.
Frank Nielsen and Laëtitia Shao.
On balls in a polygonal Hilbert geometry.
In 33st International Symposium on Computational Geometry (SoCG 2017). Schloss
Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017.
Frank Nielsen and Ke Sun.
Clustering in Hilbert simplex geometry.
CoRR, abs/1704.00454, 2017.
60