This document discusses curved Mahalanobis distances in Cayley-Klein geometries and their application to classification. Specifically:
1. It introduces Mahalanobis distances and generalizes them to curved distances in Cayley-Klein geometries, which can model both elliptic and hyperbolic geometries.
2. It describes how to learn these curved Mahalanobis metrics using an adaptation of Large Margin Nearest Neighbors (LMNN) to the elliptic and hyperbolic cases.
3. Experimental results on several datasets show that curved Mahalanobis distances can achieve comparable or better classification accuracy than standard Mahalanobis distances.
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
In this talk, we give an overview of results on numerical integration in Hermite spaces. These spaces contain functions defined on $\mathbb{R}^d$, and can be characterized by the decay of their Hermite coefficients. We consider the case of exponentially as well as polynomially decaying Hermite coefficients. For numerical integration, we either use Gauss-Hermite quadrature rules or algorithms based on quasi-Monte Carlo rules. We present upper and lower error bounds for these algorithms, and discuss their dependence on the dimension $d$. Furthermore, we comment on open problems for future research.
QMC algorithms usually rely on a choice of “N” evenly distributed integration nodes in $[0,1)^d$. A common means to assess such an equidistributional property for a point set or sequence is the so-called discrepancy function, which compares the actual number of points to the expected number of points (assuming uniform distribution on $[0,1)^{d}$) that lie within an arbitrary axis parallel rectangle anchored at the origin. The dependence of the integration error using QMC rules on various norms of the discrepancy function is made precise within the well-known Koksma--Hlawka inequality and its variations. In many cases, such as $L^{p}$ spaces, $1<p<\infty$, the best growth rate in terms of the number of points “N” as well as corresponding explicit constructions are known. In the classical setting $p=\infty$ sharp results are absent for $d\geq3$ already and appear to be intriguingly hard to obtain. This talk shall serve as a survey on discrepancy theory with a special emphasis on the $L^{\infty}$ setting. Furthermore, it highlights the evolution of recent techniques and presents the latest results.
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
We study QPT (quasi-polynomial tractability) in the worst case setting of linear tensor product problems defined over Hilbert spaces. We prove QPT for algorithms that use only function values under three assumptions'
1. the minimal errors for the univariate case decay polynomially fast to zero,
2. the largest singular value for the univariate case is simple,
3. the eigenfunction corresponding to the largest singular value is a multiple of the function value at some point.
The first two assumptions are necessary for QPT. The third assumption is necessary for QPT for some Hilbert spaces.
Joint work with Erich Novak
In this talk, we give an overview of results on numerical integration in Hermite spaces. These spaces contain functions defined on $\mathbb{R}^d$, and can be characterized by the decay of their Hermite coefficients. We consider the case of exponentially as well as polynomially decaying Hermite coefficients. For numerical integration, we either use Gauss-Hermite quadrature rules or algorithms based on quasi-Monte Carlo rules. We present upper and lower error bounds for these algorithms, and discuss their dependence on the dimension $d$. Furthermore, we comment on open problems for future research.
QMC algorithms usually rely on a choice of “N” evenly distributed integration nodes in $[0,1)^d$. A common means to assess such an equidistributional property for a point set or sequence is the so-called discrepancy function, which compares the actual number of points to the expected number of points (assuming uniform distribution on $[0,1)^{d}$) that lie within an arbitrary axis parallel rectangle anchored at the origin. The dependence of the integration error using QMC rules on various norms of the discrepancy function is made precise within the well-known Koksma--Hlawka inequality and its variations. In many cases, such as $L^{p}$ spaces, $1<p<\infty$, the best growth rate in terms of the number of points “N” as well as corresponding explicit constructions are known. In the classical setting $p=\infty$ sharp results are absent for $d\geq3$ already and appear to be intriguingly hard to obtain. This talk shall serve as a survey on discrepancy theory with a special emphasis on the $L^{\infty}$ setting. Furthermore, it highlights the evolution of recent techniques and presents the latest results.
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
We study QPT (quasi-polynomial tractability) in the worst case setting of linear tensor product problems defined over Hilbert spaces. We prove QPT for algorithms that use only function values under three assumptions'
1. the minimal errors for the univariate case decay polynomially fast to zero,
2. the largest singular value for the univariate case is simple,
3. the eigenfunction corresponding to the largest singular value is a multiple of the function value at some point.
The first two assumptions are necessary for QPT. The third assumption is necessary for QPT for some Hilbert spaces.
Joint work with Erich Novak
We study an elliptic eigenvalue problem, with a random coefficient that can be parametrised by infinitely-many stochastic parameters. The physical motivation is the criticality problem for a nuclear reactor: in steady state the fission reaction can be modeled by an elliptic eigenvalue
problem, and the smallest eigenvalue provides a measure of how close the reaction is to equilibrium -- in terms of production/absorption of neutrons. The coefficients are allowed to be random to model the uncertainty of the composition of materials inside the reactor, e.g., the
control rods, reactor structure, fuel rods etc.
The randomness in the coefficient also results in randomness in the eigenvalues and corresponding eigenfunctions. As such, our quantity of interest is the expected value, with
respect to the stochastic parameters, of the smallest eigenvalue, which we formulate as an integral over the infinite-dimensional parameter domain. Our approximation involves three steps: truncating the stochastic dimension, discretizing the spatial domain using finite elements and approximating the now finite but still high-dimensional integral.
To approximate the high-dimensional integral we use quasi-Monte Carlo (QMC) methods. These are deterministic or quasi-random quadrature rules that can be proven to be very efficient for the numerical integration of certain classes of high-dimensional functions. QMC methods have previously been applied to linear functionals of the solution of a similar elliptic source problem; however, because of the nonlinearity of eigenvalues the existing analysis of the integration error
does not hold in our case.
We show that the minimal eigenvalue belongs to the spaces required for QMC theory, outline the approximation algorithm and provide numerical results.
Talk presented on this workshop "Workshop: Imaging With Uncertainty Quantification (IUQ), September 2022",
https://people.compute.dtu.dk/pcha/CUQI/IUQworkshop.html
We consider a weakly supervised classification problem. It
is a classification problem where the target variable can be unknown
or uncertain for some subset of samples. This problem appears when
the labeling is impossible, time-consuming, or expensive. Noisy measurements
and lack of data may prevent accurate labeling. Our task
is to build an optimal classification function. For this, we construct and
minimize a specific objective function, which includes the fitting error on
labeled data and a smoothness term. Next, we use covariance and radial AQ1
basis functions to define the degree of similarity between points. The further
process involves the repeated solution of an extensive linear system
with the graph Laplacian operator. To speed up this solution process,
we introduce low-rank approximation techniques. We call the resulting
algorithm WSC-LR. Then we use the WSC-LR algorithm for analysis
CT brain scans to recognize ischemic stroke disease. We also compare
WSC-LR with other well-known machine learning algorithms.
Different kind of distance and Statistical DistanceKhulna University
A short brief of distance and statistical distance which is core of multivariate analysis.................you will get here some more simple conception about distances and statistical distance.
Minimum mean square error estimation and approximation of the Bayesian updateAlexander Litvinenko
We develop a Bayesian update surrogate. Our formula allows us to update polynomial chaos coefficients. In contrast to classical Bayesian approach, we suggest to update PCE coefficients. We show that classical Kalman filter is a particular case of our update.
Multidimensional integrals may be approximated by weighted averages of integrand values. Quasi-Monte Carlo (QMC) methods are more accurate than simple Monte Carlo methods because they carefully choose where to evaluate the integrand. This tutorial focuses on how quickly QMC methods converge to the correct answer as the number of integrand values increases. The answer may depend on the smoothness of the integrand and the sophistication of the QMC method. QMC error analysis may assumes the integrand belongs to a reproducing kernel Hilbert space or may assume that the integrand is an instance of a stochastic process with known covariance structure. These two approaches have interesting parallels. This tutorial also explores how the computational cost of achieving a good approximation to the integral depends on the dimension of the domain of the integrand. Finally, this tutorial explores methods for determining how many integrand values are needed to satisfy the error tolerance. Relevant software is described.
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
Similar to Classification with mixtures of curved Mahalanobis metrics (20)
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Cancer cell metabolism: special Reference to Lactate Pathway
Classification with mixtures of curved Mahalanobis metrics
1. Classification with mixtures of
curved Mahalanobis metrics
— or LMNN in Cayley-Klein geometries —
arXiv:1609.07082
Frank Nielsen1,2 Boris Muzellec1 Richard Nock3,4,5
1Ecole Polytechnique, France 2Sony CSL, Japan 3Data61, Australia 4ANU,
Australia 4The University of Sydney, Australia
26th September 2016
1
2. Mahalanobis distances
For Q 0, a symmetric positive definite matrix like a
covariance matrix, define Mahalanobis distance:
DQ(p, q) = (p − q) Q(p − q)
Metric distance (indiscernibles/symmetry/triangle inequality)
Eg., Q = precision matrix Σ−1, where Σ = covariance matrix
Generalize Euclidean distance when Q = I: DI (p, q) = p − q
Mahalanobis distance interpreted as Euclidean distance after
Cholesky decomposition Q = L L and affine transformation
x ← L x:
DQ(p, q) = DI (L p, L q) = p − q
2
4. Cayley-Klein geometry: Projective geometry [7, 3]
RPd
: (λx, λ) ∼ (x, 1)
homogeneous coordinates x → ˜x = (x, w = 1), and
dehomogeneization by “perspective division” ˜x → x
w
cross-ratio measure is invariant by
projectivity/homography/collineation:
(p, q; P, Q) =
|p − P||q − Q|
|p − Q||q − P|
where p, q, P, Q are collinear
P
p
q
Q
4
5. Definition of Cayley-Klein geometries
A Cayley-Klein geometry is K = (F, cdist, cangle):
1. A fundamental conic: F
2. A constant unit cdist ∈ C for measuring distances
3. A constant unit cangle ∈ C for measuring angles
See monograph [7]
5
6. Distance in Cayley-Klein geometries
dist(p, q) = cdist Log((p, q; P, Q))
where P and Q are intersection points of line l = (pq) (˜l = ˜p × ˜q
in 2D) with the conic.
Log is principal complex logarithm (modulo 2πi)
p
q
Q
P
F
l
6
7. Key properties of Cayley-Klein distances
dist(p, p) = 0 (law of indiscernibles)
Signed distances : dist(p, q) = −dist(q, p)
When p, q, r are collinear
dist(p, q) = dist(p, r) + dist(r, q)
Geodesics in Cayley-Klein geometries are straight lines
(eventually clipped within the conic domain)
Logarithm is transferring multiplicative properties of the cross-ratio
to additive properties of Cayley-Klein distances.
When p, q, P, Q are collinear:
(p, q; P, Q) = (p, r; P, Q) · (r, q; P, Q)
7
8. Dual conics
In projective geometry, points and lines are dual concepts
Dual parameterizations of the fundamental conic F = (A, A∆)
Quadratic form QA(x) = ˜x A˜x
primal conic = set of border points: CA = {˜p : QA(˜p) = 0}
dual conic = set of tangent hyperplanes:
C∗
A = {˜l : QA∆ (˜l) = 0}
A∆ = A−1|A| is the adjoint matrix
Adjoint can be computed even when A is not invertible (|A| = 0)
8
9. Taxonomy
Signature of matrix = sign of eigenvalues of its eigen decomposition
Type A A∆ Conic
Elliptic (+, +, +) (+, +, +) non-degenerate complex conic
Hyperbolic (+, +, −) (+, +, −) non-degenerate real conic
Dual Euclidean (+, +, 0) (+, +, 0) Two complex lines with a real intersection point
Dual Pseudo-euclidean (+, −, 0) (+, 0, 0) Two real lines with a double real intersection point Deux
Euclidean (+, 0, 0) (+, +, 0) Two complex points with a double real line passing through
Pseudo-euclidean (+, 0, 0) (+, −, 0) Two complex points with a double real line passing through
Galilean (+, 0, 0) (+, 0, 0) Double real line with a real intersection point
Degenerate cases are obtained as limit of non-degenerate cases.
Measurements can be elliptic, hyperbolic or parabolic (degenerate
case).
9
10. Real CK distances without cross-ratio expressions
For real Cayley-Klein measures, we choose the constants:
Constants (κ is curvature):
Elliptic (κ > 0): cdist = κ
2i
Hyperbolic (κ < 0): cdist = −κ
2
Bilinear form Spq = (p , 1) S(q, 1) = ˜p S ˜q
Get rid of cross-ratio using:
(p, q; P, Q) =
Spq + S2
pq − SppSqq
Spq − S2
pq − SppSqq
10
11. Elliptic Cayley-Klein metric distance
dE (p, q) =
κ
2i
Log
Spq + S2
pq − SppSqq
Spq − S2
pq − SppSqq
dE (p, q) = κ arccos
Spq
SppSqq
Notice that dE (p, q) < κπ, domain DS = Rd in elliptic case.
yx
y’x’
Gnomonic projection dE (x, y) = κ · arccos ( x , y )
11
13. Decomposition of the bilinear form [1]
Write S =
Σ a
a b
= SΣ,a,b with Σ 0.
Sp,q = ˜p S ˜q = p Σq + p a + a q + b
Let µ = −Σ−1a ∈ Rd (a = −Σµ) and b = µ Σµ + sign(κ) 1
κ2
κ =
(b − µ µ)−1
2 b > µ µ
−(µ µ − b)−1
2 b < µ µ
Then the bilinear form writes as:
S(p, q) = SΣ,µ,κ(p, q) = (p − µ) Σ(q − µ) + sign(κ)
1
κ2
13
14. Curved Mahalanobis metric distances
We have [1]:
lim
κ→0+
DΣ,µ,κ(p, q) = lim
κ→0−
DΣ,µ,κ(p, q) = DΣ(p, q)
Mahalanobis distance DΣ(p, q) = DΣ,0,0(p, q)
Thus hyperbolic/elliptic Cayley-Klein distances can be interpreted
as curved Mahalanobis distances, or κ-Mahalanobis distances
When S = diag(1, 1, ..., 1, −1), we recover the canonical hyperbolic
distance [5] in Cayley-Klein model:
Dh(p, q) = arccosh
1 − p, q
1 − p, p 1 − q, q
defined inside the interior of a unit ball.
14
15. Cayley-Klein bisectors are affine
Bisector Bi(p, q):
Bi(p, q) = {x ∈ DS : distS (p, x) = distS (x, q)}
S(p, x)
S(p, p)
=
S(q, x)
S(q, q)
arccos and arccosh are monotonically increasing functions.
x, |S(p, p)|Σq − |S(q, q)|Σp
+ |S(p, p)|(a (q + x) + b) − |S(q, q)|(a (p + x) + b) = 0
Hyperplanes (restricted to the domain) 15
16. Cayley-Klein Voronoi diagrams are affine
Can be computed from equivalent (clipped) power diagrams [2, 5]
https://www.youtube.com/watch?v=YHJLq3-RL58
16
19. Large Margin Nearest Neighbors [8], LMNN
Learn Mahalanobis distance M = L L 0 for a given input
data-set P
Distance of each point to its target neighbors shrink, pull(L)
S = {(xi , xj ) : yi = yj and xj ∈ N(xj )}
Keep a distance margin of each point to its impostors, push(L)
R = {(xi , xj , xl ) : (xi , xj ) ∈ S and yi = yl }
http://www.cs.cornell.edu/~kilian/code/lmnn/lmnn.html
19
20. LMNN: Cost function and optimization
Objective cost function [8]: convex and piecewise linear (SDP)
pull(L) = Σi,i→j L(xi − xj ) 2
,
push(L) = Σi,i→j Σj (1 − yil ) 1 + L(xi − xj ) 2
− L(xi − xl ) 2
+
,
(L) = (1 − µ) pull(L) + µ push(L)
i → j: xj is a target neighbor of xi
yil = 1 iff xi and xj have same label, yil = 0 otherwise.
µ set by cross-validation
Optimize by gradient descent: (Lt+1) = (Lt) − γ ∂ (Lt )
∂L
∂
∂L
= (1 − µ)Σi,i→j Cij + µΣ(i,j,l)∈Rt
(Cij − Cil )
where Cij = (xi − xj ) (xi − xj )
Easy, no projection mechanism like for Mahalanobis Metric for
Clustering (MMC) [9]
20
22. Hyperbolic Cayley-Klein LMNN (new case)
To ensure S keeps correct signature (1, d, 0) during the LMNN
gradient descent, we decompose S = L DL (with L 0) and
perform a gradient descent on L with the following gradient:
∂dH(xi , xj )
∂L
=
k
S2
ij − Sii Sjj
DL
Sij
Sii
Cii +
Sij
Sjj
Cjj − (Cij + Cji )
Recall two difficulties of hyperbolic case compared to elliptic case:
Hyperbolic Cayley-Klein distance may be very large
(unbounded vs. < κπ for elliptic case)
Data-set should be contained inside the compact domain DS
22
23. HCK-LMNN: Initialization and learning rate
Initialize L =
L
1
and D so that P ∈ DS with
Σ−1 = L L (eg., precision matrix of P).
D =
−1
...
−1
κ maxx L x 2
with κ > 1.
At iteration t, it may happen that P ∈ DSt since we do not
know the optimal learning rate γ. When this happens, we
reduce γ ← γ
2 , otherwise we let γ ← 1.01γ.
23
25. Spectral decomposition and fast proximity queries
Avoid to compute dE or dH for arbitrary S
Apply spectral decomposition (elliptic case S = L L, or
hyperbolic case S = L DL ) and perform coordinate changes
so that we consider the canonical metric distances:
dE (x , y ) = arccos
x , y
x y
,
dH(x , y ) = arccosh
1 − x , y
1 − x , x 1 − y , y
Proximity query: Eg, Vantage Point Tree
data-structures [10, 6] (with metric pruning).
25
26. Mixed curved Mahalanobis distance
d(x, y) = αdE (x, y) + (1 − α)dH(x, y)
1. Sum of Riemannian metric distances is metric (“blending”
positive with negative constant curvatures)
2. Mixed of bounded distance (elliptic CK) with unbounded
distance (hyperbolic CK), hyperparameter tuning α
Datasets Mahalanobis elliptic Hyperbolic Mixed α β = (1 − α)
Wine 0.993 0.984 0.893 0.986 0.741 0.259
Sonar 0.733 0.788 0.640 0.802 0.794 0.206
Balance 0.846 0.910 0.904 0.920 0.440 0.560
Pima 0.709 0.712 0.699 0.720 0.584 0.416
Vowel 0.827 0.825 0.816 0.841 0.407 0.593
Although mixed CK distance is a Riemannian metric distance, it is
not of constant curvature.
26
28. Contributions and perspectives
Study of Cayley-Klein elliptic/hyperbolic geometries:
Affine bisector, Voronoi diagrams from (clipped) power
diagrams, Cayley-Klein balls (Mahalanobis shapes with
displaced centers), etc.
Classification with Large Margin Nearest Neighbor (LMNN) in
Cayley-Klein elliptic/hyperbolic geometries
(hyperbolic geometry: compact domain & unbounded
distance)
Experiments on mixed Cayley-Klein distances
Ongoing work:
Extensions of Cayley-Klein geometries to Machine Learning
28
30. Yanhong Bi, Bin Fan, and Fuchao Wu.
Beyond Mahalanobis metric: Cayley-Klein metric learning.
In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
Jean-Daniel Boissonnat, Frank Nielsen, and Richard Nock.
Bregman Voronoi diagrams.
Discrete & Computational Geometry, 44(2):281–307, 2010.
C. Gunn.
Geometry, Kinematics, and Rigid Body Mechanics in Cayley-Klein Geometries.
PhD thesis, Technische UniversitÃďt Berlin, 2011.
Frank Nielsen, Boris Muzellec, and Richard Nock.
Classification with mixtures of curved Mahalanobis metrics.
In IEEE International Conference on Image Processing (ICIP), pages 241–245, Sept 2016.
Frank Nielsen and Richard Nock.
Hyperbolic Voronoi diagrams made easy.
In IEEE International Conference on Computational Science and Its Applications (ICCSA), pages
74–80, 2010.
Frank Nielsen, Paolo Piro, and Michel Barlaud.
Bregman vantage point trees for efficient nearest neighbor queries.
In 2009 IEEE International Conference on Multimedia and Expo, pages 878–881. IEEE, 2009.
Jürgen Richter-Gebert.
Perspectives on Projective Geometry: A Guided Tour Through Real and Complex Geometry.
Springer Publishing Company, Incorporated, 1st edition, 2011.
Kilian Q. Weinberger, John Blitzer, and Lawrence K. Saul.
Distance metric learning for large margin nearest neighbor classification.
In In NIPS. MIT Press, 2006.
30
31. Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart Russell.
Distance metric learning, with application to clustering with side-information.
In Advances in neural information processing systems*15, pages 505–512. MIT Press, 2003.
Peter N. Yianilos.
Data structures and algorithms for nearest neighbor search in general metric spaces.
In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’93,
pages 311–321, Philadelphia, PA, USA, 1993. Society for Industrial and Applied Mathematics.
31
32. Overview
Review of Mahalanobis distances
Basics of Cayley-Klein geometry
Distance from cross-ratio measures
Distance expressions
Dual conics
Cayley-Klein distances as curved Malahanobis distances
Computational geometry in Cayley-Klein geometries
Learning curved Mahalanobis metrics
Large Margin Nearest Neighbors (LMNN)
Elliptic Cayley-Klein LMNN
Hyperbolic Cayley-Klein LMNN
Experimental results
Nearest-neighbor classification in Cayley-Klein geometries
Mixed curved Mahalanobis distance
Contributions and perspectives
Bibliography
Supplemental information
32
33. Properties of the cross-ratio
(p, p; P, P) = 1
(p, q; Q, P) = 1
(p,q;P,Q)
(p, q; P, Q) = (p, r; P, Q) · (r, q; P, Q) when r is collinear with
p, q, P, Q
P
p
q
Q
r
33
34. Measuring angles in Cayley-Klein geometries
angle(l, m) = cangle log((l, m; L, M))
where L and M are tangent lines to A passing through the
intersection point p (p = l × m in 2D) of l m.
F
m
l
p
M
L
34
36. Cayley-Klein Voronoi diagrams from (clipped)
power diagrams
ci =
Σpi + a
2 Spi pi
r2
i =
Σpi + a 2
4Spi pi
+
a pi + b
Spi pi
36
37. Cayley-Klein balls have Mahalanobis ball shapes
Elliptic Cayley-Klein ball case:
Σ = ˜r2
Σ − aa
c = Σ −1
(b a − ˜r2
a)
r 2
= b 2
− ˜r2
b + c , c Σ
with
˜r = Sc,c cos(r)
a = Σc + a
b = a c + b
37
38. Cayley-Klein balls have Mahalanobis ball shapes
Hyperbolic Cayley-Klein ball case:
Σ = aa − ˜r2
Σ
c = Σ −1
(˜r2
a − b a )
r 2
= ˜r2
b − b 2
+ c , c Σ
with
˜r = Sc,c cosh(r)
a = Σc + a
b = a c + b
... and drawing a Mahalanobis ball amounts to draw a Euclidean
ball after affine transformation x ← L x.
38
39. Spectral decomposition and signature
Eigenvalue decomposition: S = OΛO .
Λ = diag(Λ1,1, . . . , Λd+1,d+1)
Canonical decomposition: S = OD
1
2
I 0
0 λ
D
1
2 O , where
λ =∈ {−1, 1} and O= orthogonal matrix (O−1 = O )
Diagonal matrix D has all positive values, with Di,i = Λi,i and
Dd+1,d+1 = |Λd+1,d+1|
39