SlideShare a Scribd company logo
1 of 56
Download to read offline
Computational Information Geometry
on Matrix Manifolds
Frank Nielsen
Frank.Nielsen@acm.org
www.informationgeometry.org
Sony Computer Science Laboratories, Inc.

July 2013, ICTP, Trieste, IT

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

1/56
Geometry of matrix manifolds...
◮

Euclidean geometry, Fr¨benius norm → distance:
o
M

2
F

2
mij =

=
i ,j

Mi ∗
i

2
2

=

M∗j

2
2

= tr(M ⊤ M)

j

◮

Riemannian geometry of symmetric positive definite (SPD)
matrices [9, 2]

◮

Riemannian geometry of rank-deficient positive semi-definite
(SPSD) matrices
Stiefel/Grassman manifolds [3]

◮

Quantum geometry: SPD matrices with unit trace
“One geometry cannot be more true than another;
it can only be more convenient”,
— Jules Henri Poincar´ (1902)
e

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

2/56
Forthcoming conference (GSI)
28th-30th August, Paris.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

3/56
What is Computational Information Geometry?
◮

What is Information? = Essence of data (datum=“thing”)
(make it tangible → e.g., parameters of generative models)

◮

Can we do Intrinsic computing?
(unbiased by any particular “data representation” → same
results after recoding data)

◮

Geometry −→ Science of invariance
(mother of Science, compass & ruler, Descartes
analytic=coordinate/Cartesian, imaginaries, ...).
...the open-ended poetic mathematics!

?!

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

4/56
Rationale for Computational Information Geometry
◮

Information is ...never void! → lower bounds
◮
◮
◮
◮

◮

Geometry:
◮

◮

◮

Fisher information and Cram´r-Rao lower bound (estimation)
e
Bayes error and Chernoff information (classification)
Coding and Shannon entropy (communication)
Program and Kolmogorov complexity (compression).
(Unfortunately not computable!)

Language (point, line, ball, dimension, orthogonal, projection,
geodesic, immersion, etc.)
Power of characterization (eg., intersection of two
pseudo-segments not admitting closed-form expression)

Computing: Information computing. Seeking for mathematical
convenience and mathematical tricks (RKHS in ML).
How to manipulate “space of functions” ?!?

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

5/56
Example I: Matrix manifold
Pattern = Gaussian mixture models (universal class)
Statistical (dis)similarity/distance: total Bregman divergence
(tBD, tKL).
Invariance: ..., xi ∼ N(µi , Σi ), y = A(x) = Lx + t,
yi ∼ N(Lµi + t, LΣi L⊤ ), D(X1 : X2 ) = D(Y1 : Y2 )
(L: any invertible affine transformation, t a translation)

Shape Retrieval using Hierarchical Total Bregman Soft Clustering [7],
IEEE PAMI, 2012.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

6/56
Example II: Matrix manifolds
DTI: diffusion ellipsoids, tensor interpolation.
Pattern = zero-centered “Gaussians“
Statistical (dis)similarity/distance: total Bregman divergence
(tBD, tKL).
Invariance: ..., D(A⊤ PA : A⊤ QA) = D(P : Q), A ∈ SL(d):
orthogonal matrix
(volume/orientation preserving)
total Bregman divergence (tBD).

(3D rat corpus callosum)
c

Total Bregman Divergence and its Applications to DTI Analysis [20],
IEEE TMI. 2011. Science Laboratories, Inc.
2013 Frank Nielsen, Sony Computer

7/56
Example III: Gaussian manifolds
Consider 5D Gaussian Mixture Models (GMMs) of color images
(image=RGBxy point set)

A Gaussian mixture model
wi N(µi , Σi ) is interpreted as a
weighted point set {θi = (µi , Σi )}.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

8/56
Matrix center points & clustering
Aggregation (matrix quantization for codebooks):
Given a data-set of matrices M = {M1 , ..., Mn } ⊂ M, compute a
center matrix C .
Centering as a variational minimization problem:
wi distancep (C , Mi )

(OPT ) : Cp = arg min

C ∈M

i

Notion of centrality, robustness to outliers?
For diagonal matrices, with “Euclidean” distance, usual geometric
center points:
◮

◮
◮

median (p = 1): robust to outliers (Fermat-Weber point, no
closed form),
centroid (p = 2): breakdown point of 1 (→ tBD)),
circumcenter (lim p → ∞): minimize farthest point
(minimax [1]).

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

9/56
Diffusion Tensor Magnetic Resonance Imaging
DT-MRI: Measures anisotropic diffusion of water molecules in a
3 × 3 tensor assigned to each voxel position (1990˜).
Used to analyze in-vivo connectivity patterns of brain tissues:
gray matter, white matter (corpus callosum) and cerebrospinal
fluid (CSF)

c Image courtesy Peter J. Basser
(Magnetic resonance imaging of the brain and spine, Chapter 31)
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

10/56
Gradiometry tensor: 3 × 3 SPSD matrices
Beyond the “constant” g ≃ 9.81m/s 2 . Gravity field measuring
anisotropy.

→ Oil & gas industry.
Courtesy of BellGeo.
http://www.bellgeo.com/tech/technology_theory_of_FTG.html
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

11/56
Structure tensors in computer vision
→ Pioneered in image processing: tensor descriptor of a region at
a pixel. (Harris-Stephens [6]).
Consider a kernel, and compute the tensor descriptor
I ′2 (x)

T (p = (x, y )) = K ∗

I ′ (y )I ′ (x)

I ′ (x)I ′ (y )
I ′ (y )2

,

w (u, v )∇I (u, v )(∇I (u, v ))T

=
u,v

K : uniform, Gaussian kernel (eg., s × s window W centered at the
pixel p)
I ′ (x), I ′ (y ): gradient, derivatives of the image.
Versatile method: corner detection, optical flow estimation,
segmentation, stereo matching, etc.
→ Tensor image processing
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

12/56
Harris-Stephens structure tensor (1988)
Deformation tensor field

Harris-Stephens combined corner-edge detector:
R = det T − k(tr T )2
→ Measures of tensor anisotropy.
Structure tensor represents local orientation
(eigenvectors/eigenvalues).
Harris-Stephens’ combined corner/edge detector (note)
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

13/56
Matrix with Fr¨benius metric distance
o
Matrix space M with vectorial structure
dE (P, Q) =

P −Q

=

F

tr(P − Q)T (P − Q)

Centroid of tensors:
1
CE =
n

(1)
(2)

n

wi Ti
i =1

→ scalar average of each element of the tensor.
Tensor Field Segmentation Using Region Based Active Contour
Model [21], ECCV, 2004.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

14/56
Matrix vectorization & computational geometry
Computational geometry on w × h-dimensional matrix spaces wrt
Fr¨benius distance amounts to computational geometry on
o
Euclidean vector space for D = w × h.
→ Voronoi diagrams, smallest enclosing ball, minimum spanning
tree, etc.
For symmetric matrices, we have D = d(d+1) degrees of freedom,
2
and vectorize as follows:
d

M

F

d
2
mij

=
i =1 j=1

d−1

d

d
2
mij

2
mii + 2

=
i =1

i =1 j=i +1

= m 2
√
√
√
with m = [m11 ...mdd 2m12 2m1d ... 2md−1,d ]T = M.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

15/56
Matrix functions

From the spectral decomposition:
M = UDU ⊤
with D = λ(M) = diag(λ1 , ..., λd ) the diagonal matrix of
eigenvalues, consider real-valued function x → f (x) to extend to
matrices as
f (M) = U diag(f (λ1 ), ..., f (λd )) U T
1

Examples: log x, exp x, |x|, x 2 , x 2 , etc.
O(d 3 ) SVD factorization complexity.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

16/56
Riemannian cone of SPD matrices
Exponential maps from tangent planes (symmetric matrices Sym)
to the manifold cone C:
expP

: TP C = Sym → C

Logarithmic maps from manifold cone C to tangent planes:
logP

: C → TP C = Sym
1

1

1

1

logP (Q) = P 2 log(P − 2 QP − 2 )P 2

Map any point Q ∈ Sym++ to unique tangent vector at P such
that γ0 = P and γ1 = Q.
Geodesic equation:
1

1

1

γt (P, Q) = P 2 P − 2 QP − 2

t

1

P2

Geodesic (metric length) distance:
1

1

dR (P, Q) = log P − 2 QP − 2
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

17/56
Riemannian Karcher centroid
d

dR (P, Q) =

log2 λi

tr log2 (P −1 Q) =
i =1

=

log P

1
−2

QP

1
−2

, where the λi ’s are the eigenvalues of P −1 Q.
1
1
(P −1 Q = Q 2 P −1 Q 2 )
Unique mean characterized by n=1 log(Ti−1 CR ) = 0
i
Closed-form solution only for n = 2:
1

1

1

1
2

1

CR (P, Q) = P 2 P − 2 QP − 2 P 2 otherwise iterative
approximation (CR = limt→∞ Ct ):
Ct+1 = Ct exp
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

1
n

n

log Ct−1 Ti

.

i =1
18/56
Riemannian minimax SPD center (circumcenter [1])
Case of p = ∞, center that minimizes the maximum distance.
GEO-ALG:
Starts with c1 ∈ P and iteratively update the current
1
circumcenter as follows: ci +1 = Geodesic(ci , fi , i +1 ),
where fi denotes the farthest point of P to ci , and
Geodesic(p, q, t) denotes the intermediate point m
on the geodesic passing through p and q such that
ρ(p, m) = t × ρ(p, q).
Geodesic:

1

1

1

γt (P, Q) = P 2 P − 2 QP − 2

t

1

P2

Find t such that d=1 log2 λt = t 2 d=1 log2 λi = r 2
i
i
i
That is t = r .
Prove core-set and guaranteed convergence.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

d
2
i =1 log λi .

19/56
Matrices as parameters in probability distributions
Exponential families: Gaussian, Wishart, etc.:
p(x; λ) = pF (x; θ) = exp ( t(x), θ − F (θ) + k(x)) .
Example: Poisson distribution
p(x; λ) =

λx
exp(−λ),
x!

◮

the sufficient statistic t(x) = x,

◮

θ = log λ, the natural parameter,

◮

F (θ) = exp θ, the log-normalizer → CONVEX,

◮

and k(x) = − log x! the carrier measure
(with respect to the counting measure).

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

20/56
Gaussians as an exponential family
p(x; λ) = p(x; µ, Σ) =

1
(x − µ)T Σ−1 (x − µ))
√
exp −
2
2π det Σ

1
θ = (Σ−1 µ, 2 Σ−1 ) ∈ Θ = Rd × Kd×d , with Kd×d cone of
positive definite matrices,
◮ F (θ) = 1 trθ −1 θ1 θ T − 1 log det θ2 + d log π → CONVEX
1
2
4
2
2
◮ t(x) = (x, −x T x),
◮ k(x) = 0.
Inner product : composite, sum of a dot product and a matrix
trace :
T ′
T ′
θ, θ ′ = θ1 θ1 + trθ2 θ2 .
◮

The coordinate transformation τ : Λ → Θ is given for λ = (µ, Σ)
by
τ (λ) =

1
λ−1 λ1 , λ−1 ,
2
2 2

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

τ −1 (θ) =

1 −1 1 −1
θ θ1 , θ2
2 2
2
21/56
Convex duality: Legendre transformation
◮

For a strictly convex and differentiable function F : X → R:
F ∗ (y ) = sup { y , x − F (x)}
x∈X

lF (y ;x);
◮

Maximum obtained for y = ∇F (x):
∇x lF (y ; x) = y − ∇F (x) = 0 ⇒ y = ∇F (x)

◮

Maximum unique from convexity of F (∇2 F ≻ 0):
∇2 lF (y ; x) = −∇2 F (x) ≺ 0
x

◮

Convex conjugates:
(F , X ) ⇔ (F ∗ , Y),

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

Y = {∇F (x) | x ∈ X }
22/56
Legendre duality: Geometric interpretation
Consider the epigraph of F as a convex object:
◮ convex hull (V -representation), versus
◮ half-space (H-representation).

Legendre transform also called “slope” transform.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

23/56
Legendre duality & Canonical divergence
◮

◮
◮

Convex conjugates have functional inverse gradients
∇F −1 = ∇F ∗
∇F ∗ may require numerical approximation
(not always available in analytical closed-form)
Involution: (F ∗ )∗ = F with ∇F ∗ = (∇F )−1 .
Convex conjugate F ∗ expressed using (∇F )−1 :
F ∗ (y ) =
=

◮

x, y − F (x), x = ∇y F ∗ (y )

(∇F )−1 (y ), y − F ((∇F )−1 (y ))

Fenchel-Young inequality at the heart of canonical divergence:
F (x) + F ∗ (y ) ≥ x, y
AF (x : y ) = AF ∗ (y : x) = F (x) + F ∗ (y ) − x, y ≥ 0

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

24/56
Dual Bregman divergences & canonical divergence [14]
p(x)
≥0
q(x)
= BF (θQ : θP ) = BF ∗ (ηP : ηQ )

KL(P : Q) = EP log

= F (θQ ) + F ∗ (ηP ) − θQ , ηP

= AF (θQ : ηP ) = AF ∗ (ηP : θQ )

with θQ (natural parameterization) and ηP = EP [t(X )] = ∇F (θP )
(moment parameterization).
1
1
dx − p(x) log
dx
KL(P : Q) = p(x) log
q(x)
p(x)
H × (P:Q)

H(p)=H × (P:P)

Shannon cross-entropy and entropy of EF [14]:
H × (P : Q) = F (θQ ) − θQ , ∇F (θP ) − EP [k(x)]
H(P) = F (θP ) − θP , ∇F (θP ) − EP [k(x)]

H(P) = −F ∗ (ηP ) − EP [k(x)]
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

25/56
Bregman divergence: Geometric interpretation (I)
Potential function F , graph plot F : (x, F (x)).
DF (p : q) = F (p) − F (q) − p − q, ∇F (q)

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

26/56
Bregman divergence: Geometric interpretation (II)
Potential function f , graph plot F : (x, f (x)).
Bf (p||q) = f (p) − f (q) − (p − q)f ′ (q)

Bf (.||q): vertical distance between the hyperplane Hq tangent to
F at lifted point q , and the translated hyperplane at p .
ˆ
ˆ
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

27/56
Bregman divergence: Geometric interpretation (III)
Bregman divergence and path integrals
B(θ1 : θ2 ) = F (θ1 ) − F (θ2 ) − θ1 − θ2 , ∇F (θ2 ) ,

(3)

θ1

=
θ2
η2

=
η1
∗

∇F (t) − ∇F (θ2 ), dt ,

(4)

∇F ∗ (t) − ∇F ∗ (η1 ), dt ,

(5)

= B (η2 : η1 )

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

(6)

28/56
Matrix Bregman divergences [4, 16]
Choose F a real-valued functional generator and extend F to
matrices:
F (X ) = tr(Ψ(X ))
tF ,k N k

Ψ(X ) =
k≥0

(tF ,k from the Taylor expansion of real-valued F )
BF (P : Q) = F (P) − F (Q) − tr((P − Q)⊤ ∇F (Q)),
∇F (X ) =

′
tF ,k N k
k≥0

′
(tF ,k from the Taylor expansion of real-valued F ′ )

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

29/56
Matrix Bregman divergences [16]

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

30/56
Particular case: Bregman Schatten p-divergences [5, 16]

Schatten p-norm of real symmetric matrix X :
(unitarily invariant matrix norms)
X

p

= λ(X )

p

Bregman generator:

1
X 2
p
2
Used in regularized convex optimization [5], matrix data
mining [16].
F (X ) =

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

31/56
Matrix Legendre transformation

Extends classical Legendre-Fenchel transformation:
F ∗ (η) =

sup
spec(θ)⊆dom(F )

tr(θη ⊤ ) − F (θ)

DF (θP : θQ ) = DF ∗ (ηQ : ηP ) = F (θ) + F ∗ (η) − tr(θη ⊤ )
θ and η are dual matrix coordinate systems on the matrix manifold.

Non-metric differential structure with dual coordinate systems.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

32/56
Bregman matrix means
BF (X , P) = F (X ) − F (P) − tr((X − P)T ∇F (P)),
F (·): strictly convex and differentiable function on an open convex
space.
n

C = ∇F −1

i =1

wi ∇F (Ti )

quasi-arithmetic mean for ∇F .
Since BF (X , P) = BF (P, X ), define a right-sided centroid M ′ :
Find the center of mass [13] (independent of generator F )
F (X ) = tr(X T X ): the quadratic matrix entropy,
F (X ) = − log det X : the matrix Burg entropy, and
F (X ) = tr(X log X − X ): the von Neumann entropy [19, 18, 15]
(Umegaki quantum relative entropy).
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

33/56
Total Bregman divergences (tBD)
Instead of ”vertical” projection in Bregman divergence, consider
perpendicular projection.
(Analogy with least squares and total least squares regression.)

tBF (P, Q) =

BF (P, Q)
1 + ∇F (Q)

2

→ proven statistically robust.
Applications to robust DT-MRI segmentation [8].
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

34/56
Matrix Jensen/Burbea-Rao divergences [10]

Convexity gap defines a divergence
BRF (P, Q) =

◮
◮
◮
◮

F (P) + F (Q)
−F
2

P +Q
2

≥0

F (X ) = tr(X T X ): the quadratic matrix entropy,
F (X ) = − log det X : the matrix Burg entropy, and

F (X ) = tr(X log X − X ): the von Neumann entropy.
etc.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

35/56
Smooth family of convex generators [12, 17]
1-parameter family of generators:
Fα (X ) =

1
tr(αX − X α + (1 − α)I ), α = {0, 1}
α(1 − α)

Bα (P : Q) =

∇Fα (X ) =

1
tr(Q α − P α + αQ α−1 (P − Q))
α(1 − α)

1
(I − X α−1 )
α−1

1

−1
∇Fα (X ) = (I − (α − 1)X ) α−1

When α → 1, ∇Fα (X ) = ∇F1 (X ) = log X . When α → 0,
∇Fα (X ) = ∇F0 (X ) = X −1 − I .
◮

α = 2: Quadratic matrix information

◮

α → 1: von Neumann information

◮

α → 0: Burg log-det information

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

36/56
Jensen (Burbea-Rao) divergences
Based on Jensen’s inequality for a convex function F :
BRF (X , P) =

F (X ) + F (P)
−F
2

X +P
2

def

= ≥ 0.

strictly convex function F (·).
Includes the special case of Jensen-Shannon divergence:
JS(p, q) = H

p+q
2

−

H(p) + H(q)
2

F (x) = −H(x), the negative Shannon entropy H(x) = −x log x.
→ generators are convex and entropies are concave (negative
generators)

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

37/56
Visualizing Burbea-Rao divergences

include Squared Mahalanobis distance.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

38/56
Burbea-Rao from Symmetrizing Bregman divergences [13]
◮

Jeffreys-Bregman divergences.

SF (p; q) =
=
◮

BF (p, q) + BF (q, p)
2
1
p − q, ∇F (p) − ∇F (q) ,
2

Jensen-Bregman divergences (diversity index).
JF (p; q) =
=

BF (p, p+q ) + BF (q, p+q )
2
2
2
F (p) + F (q)
p+q
−F
2
2

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

= BRF (p, q)

39/56
Skew Burbea-Rao divergences
(α)

BRF
(α)

:

X × X → R+

BRF (p, q) = αF (p) + (1 − α)F (q) − F (αp + (1 − α)q)
(α)

BRF (p, q) = αF (p) + (1 − α)F (q) − F (αp + (1 − α)q)
(1−α)

= BRF

(q, p)

Skew symmetrization of Bregman divergences:
def

αBF (p, αp + (1 − α)q) + (1 − α)BF (q, αp + (1 − α)q) =
(α)

BRF (p, q)

= skew Jensen-Bregman divergences.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

40/56
Bregman divergences = asymptotic skewed Jensen
divergences

1
(α)
BRF (p, q)
α→1 1 − α
1
(α)
BF (q, p) = lim BRF (p, q)
α→0 α
BF (p, q) =

lim

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

41/56
Burbea-Rao/Jensen centroids
(p = 1)
n
X

(α )

wi BRF i (X , Ti ) = arg min L(x)

OPT : CF = arg min

x

i =1

Wlog., equivalent to minimize
n

n

E (c) = (
i =1

wi αi )F (C ) −

i =1

wi F (αi C + (1 − αi )Ti )

Sum E = F + G of convex F + concave G function ⇒
Convex-ConCave Procedure (CCCP, NIPS*01)
Start from arbitrary c0 , and iteratively update as:
∇F (Ct+1 ) = −∇G (Ct )
⇒ guaranteed convergence to a (local) minimum.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

42/56
ConCave Convex Procedure (CCCP)
minx E (x) = F (x) + G (x)
∇F (ct+1 ) = −∇G (ct )

Decomposition may not be unique...
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

43/56
Iterative algorithm for Burbea-Rao centroids
Apply CCCP scheme
∇F (Ct+1 ) =

Ct+1 = ∇F

−1

1

n

n
i =1 wi αi i =1

1

wi αi ∇F (αi Ct + (1 − αi )Ti )

n

n
i =1 wi αi i =1

wi αi ∇F (αi Ct + (1 − αi )Ti )

Get arbitrarily fine approximations of the (skew) Burbea-Rao
matrix centroids and barycenters.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

44/56
Special case: α-log det divergence [15, 11]
Cone of Hermitian positive definite matrices (self-adjoint matrices
¯
M H = M T = M).
F (X ) = − log detX , ∇F (X ) = ∇F −1 (X ) = −X −1
Burbea-Rao α-log det divergences:

 tr(Q −1 P − I ) − log det(Q −1 P)) α = 1


det( 1−α P+ 1+α Q)
(α)
4
2
2
α ∈ R{−1, 1}
Dld (P, Q) =
1−α
2 log
1+α
 1−α
(det P) 2 (det Q) 2

 tr(P −1 Q − I ) − log det(P −1 Q) α = −1
Start with C1 =

1
n

n
i =1 Ti ,
n

Ct+1 = n
i =1

1−α
1+α
Ti +
Ct
2
2

→ unique global mean (obtained from CCCP).

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

−1

−1

45/56
Bhattacharyya coefficients/distances
Bhattacharyya coefficient and non-metric distance:
C( p, q) =

p(x)q(x)dx, 0 < C (p, q) ≤ 1, B(p, q) = − ln C (p, q).

(coefficient is always strictly positive). Hellinger metric
H(p, q) =

1
2

( p(x) −

q(x))2 dx,

such that 0 ≤ H(p, q) ≤ 1.

H(p, q) =
=

1
2

p(x)dx +

q(x)dx − 2

p(x) q(x)dx

1 − C (p, q).

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

46/56
Chernoff coefficients/α-divergences
Skew Bhattacharrya divergences based on Chernoff α-coefficients.
Bα (p, q) = − ln
= − ln

x

p α(x)q 1−α (x)dx = − ln Cα (p, q)
q(x)

x

p(x)
q(x)

α

dx

= − ln Eq [Lα (x)]
Amari α-divergence:

1−α
1+α
 4 2 1 − p(x) 2 q(x) 2 dx , α = ±1,
 1−α

p(x)
Dα (p||q) =
α = −1,
p(x) log q(x) dx = KL(p, q),


 q(x) log q(x) dx = KL(q, p),
α = 1,
p(x)
Dα (p||q) = D−α (q||p)

Remapping α′ =

1−α
2

(α = 1 − 2α′ ) to get Chernoff α′ -divergences

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

47/56
Bhattacharyya/Chernoff of exponential families [10]

Equivalence with skew Burbea-Rao distances:
(α)

Bα (pF (x; θp ), pF (x; θq )) = BRF (θp , θq ),

(7)

= αF (θp ) + (1 − α)F (θq ) − F (αθp + (1 − α)θq )
Bhat. divergence on probability distributions amounts to compute
a Jensen divergence on its parameters

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

48/56
Closed-form Bhattacharyya distances for exp. fam.

Generic formula that instantiates in those well-known formula in
statistical pattern recognition.
Exp. fam.

F (θ) (up to a constant)

Multinomial

log(1 +

Poisson

exp θ

Gaussian

1
π
1
− 4θ + 2 log(− θ )
2
2

2
2
σ2 +σq
1 (µp −µq ) + 1 ln p
4 σ2 +σ2
2
2σp σq
p
q

Gaussian

1 trΘ−1 θθ T − 1 log det Θ
4
2

1 (µ − µ )T
p
q
8

d −1
exp θi )
i =1

θ2

Bhattacharyya/Burbea-Rao BRF (λp , λq ) = BRF (τ (λp ), τ (λq ))
√
− ln d=1 pi qi
i
1 ( √µ − √µ ) 2
p
q
2

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

Σp +Σq
2

−1

Σp +Σq

det
1
2
(µp − µq ) + 2 ln det Σ det Σ
p
q

49/56
Wrapping up

◮

Besides Euclidean, log-Euclidean and Riemannian
metric-based means, proposed
divergence-based matrix centroids,

◮

Total Bregman divergences and robustness (conformal
geometry),

◮

Riemannian minimax center,

◮

skew Burbea-Rao/Jensen divergences extending Bregman
divergences,

◮

Bhattacharrya means of densities = Burbea-Rao means on
(matrix) parameters
Which mean you do you mean or need?

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

50/56
Non-metric matrix manifolds with dually affine connections

In a nutshell:
◮

asymmetric (Bregman) non-metric divergence,

◮

Legendre transform, convex conjugates & dual divergences

◮

Dual θ− or η- or mixed coordinate systems

◮

dual closed-form affine geodesics (convenient computationally)

◮

Pythagorean theorem

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

51/56
Thank you.

www.informationgeometry.org

“One geometry cannot be more true than another;
it can only be more convenient”,
— Jules Henri Poincar´ (1902)
e

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

52/56
Bibliographic references I
Marc Arnaudon and Frank Nielsen.
On approximating the Riemannian 1-center.
Comput. Geom., 46(1):93–104, 2013.
Rajendra Bhatia.
The Riemannian mean of positive matrices.
In Frank Nielsen and Rajendra Bhatia, editors, Matrix Information Geometry, pages 35–51, 2012.
Silvere Bonnabel and Rodolphe Sepulchre.
Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank.
SIAM J. Matrix Analysis Applications, 31(3):1055–1070, 2009.
Inderjit S. Dhillon and Joel A. Tropp.
Matrix nearness problems with bregman divergences.
SIAM J. Matrix Anal. Appl., 29(4):1120–1146, November 2007.
John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Ambuj Tewari.
Composite objective mirror descent.
In Adam Tauman Kalai and Mehryar Mohri, editors, COLT, pages 14–26. Omnipress, 2010.
C. Harris and M. Stephens.
A Combined Corner and Edge Detection.
In Proceedings of The Fourth Alvey Vision Conference, pages 147–151, 1988.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

53/56
Bibliographic references II
Meizhu Liu, Baba C. Vemuri, Shun-ichi Amari, and Frank Nielsen.
Shape retrieval using hierarchical total Bregman soft clustering.
Transactions on Pattern Analysis and Machine Intelligence, 34(12):2407–2419, 2012.
Meizhu Liu, Baba C. Vemuri, Shun ichi Amari, and Frank Nielsen.
Shape retrieval using hierarchical total bregman soft clustering.
IEEE Trans. Pattern Anal. Mach. Intell., 34(12):2407–2419, 2012.
Maher Moakher.
A differential geometric approach to the geometric mean of symmetric positive-definite matrices.
SIAM Journal on Matrix Analysis and Applications, 26(3):735–747, 2005.
Frank Nielsen and Sylvain Boltz.
The Burbea-Rao and Bhattacharyya centroids.
IEEE Transactions on Information Theory, 57(8):5455–5466, 2011.
Frank Nielsen, Meizhu Liu, Xiaojing Ye, and Baba C. Vemuri.
Jensen divergence based SPD matrix means and applications.
In International Conference on Pattern Recognition (ICPR), 2012.
Frank Nielsen and Richard Nock.
Quantum Voronoi diagrams and Holevo channel capacity for 1-qubit quantum states.
In IEEE International Symposium on Information Theory (ISIT), pages 96–100, 2008.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

54/56
Bibliographic references III
Frank Nielsen and Richard Nock.
Sided and symmetrized Bregman centroids.
IEEE Trans. Inf. Theor., 55(6):2882–2904, June 2009.
Frank Nielsen and Richard Nock.
Entropies and cross-entropies of exponential families.
In International Conference on Image Processing (ICIP), pages 3621–3624, 2010.
R. Nock, B. Magdalou, E. Briys, and F. Nielsen.
On tracking portfolios with certainty equivalents on a generalization of Markowitz model: the fool, the wise
and the adaptive.
In Thorsten Joachims, editor, International Conference on Machine Learning (ICML). Omnipress, 2011.
Richard Nock, Brice Magdalou, Eric Briys, and Frank Nielsen.
Mining matrix data with Bregman matrix divergences for portfolio selection.
In Frank Nielsen and Rajendra Bhatia, editors, Matrix Information Geometry, pages 373–402, 2012.
Masanori Ohya and D´nes Petz.
e
Quantum Entropy and Its Use.
1st ed. 1993. Corr 2nd printing, 2004.
Koji Tsuda, Gunnar R¨tsch, and Manfred K. Warmuth.
a
Matrix exponentiated gradient updates for on-line learning and Bregman projection.
J. Mach. Learn. Res., 6:995–1018, December 2005.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

55/56
Bibliographic references IV

Hisaharu Umegaki.
Conditional expectation in an operator algebra. IV. Entropy and information.
KodaiMathSemRep, 14(2):59, 1962.
Baba Vemuri, Meizhu Liu, Shun ichi Amari, and Frank Nielsen.
Total Bregman divergence and its applications to DTI analysis.
IEEE Transactions on Medical Imaging, 2011.
Zhizhou Wang and Baba C. Vemuri.
An affine invariant tensor dissimilarity measure and its applications to tensor-valued image segmentation.
In CVPR (1), pages 228–233, 2004.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

56/56

More Related Content

What's hot

Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDValentin De Bortoli
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT posterAlexander Litvinenko
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationGautier Marti
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methodsChristian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 

What's hot (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGD
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT poster
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond Correlation
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
Smart Multitask Bregman Clustering
Smart Multitask Bregman ClusteringSmart Multitask Bregman Clustering
Smart Multitask Bregman Clustering
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
 

Viewers also liked

Atlas bacteriologico.a
Atlas bacteriologico.aAtlas bacteriologico.a
Atlas bacteriologico.aCasiMedi.com
 
Flexitime y la biblioteca
Flexitime y la biblioteca Flexitime y la biblioteca
Flexitime y la biblioteca Edpe3129
 
Material para docente
Material para docenteMaterial para docente
Material para docenteAsun Vidal
 
What is Social Media & why should you join? - for business
What is Social Media & why should you join? - for businessWhat is Social Media & why should you join? - for business
What is Social Media & why should you join? - for businessLaundrylicious
 
Tomar café com.......Stª Casa da Misericórdia de Vizela dia 8-02-2012
Tomar café com.......Stª Casa da Misericórdia de Vizela dia 8-02-2012Tomar café com.......Stª Casa da Misericórdia de Vizela dia 8-02-2012
Tomar café com.......Stª Casa da Misericórdia de Vizela dia 8-02-2012Rotary Clube Vizela
 
38 nom 054 semarnat 1993 proc incompatibilidad residuos
38 nom 054 semarnat 1993 proc incompatibilidad residuos38 nom 054 semarnat 1993 proc incompatibilidad residuos
38 nom 054 semarnat 1993 proc incompatibilidad residuosDavid A. Godinez
 
Comportamiento colectivo e individual
Comportamiento colectivo e individualComportamiento colectivo e individual
Comportamiento colectivo e individualzulaimaHernandez80
 
Revista vigia 2014 san 2
Revista vigia 2014 san 2Revista vigia 2014 san 2
Revista vigia 2014 san 2Julián Peralta
 
Guia para implementar pei.
Guia para implementar pei.Guia para implementar pei.
Guia para implementar pei.Carlos Yampufé
 
Globalización y negocios internacionales
Globalización y negocios internacionalesGlobalización y negocios internacionales
Globalización y negocios internacionalesNombre Apellidos
 
Act 5 quiz 1 diseño de plantas industriales
Act 5 quiz 1 diseño de plantas industrialesAct 5 quiz 1 diseño de plantas industriales
Act 5 quiz 1 diseño de plantas industrialesYoana Taborda
 
Los cuatro temperamentos 2011 yfraga
Los cuatro temperamentos 2011 yfragaLos cuatro temperamentos 2011 yfraga
Los cuatro temperamentos 2011 yfragaYolanda Fraga
 
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa AsyaGrade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa AsyaNiño Caindoy
 

Viewers also liked (20)

Atlas bacteriologico.a
Atlas bacteriologico.aAtlas bacteriologico.a
Atlas bacteriologico.a
 
Matriz foda
Matriz fodaMatriz foda
Matriz foda
 
Gestor e
Gestor eGestor e
Gestor e
 
La Misericordia
La Misericordia La Misericordia
La Misericordia
 
Big data aplicado el negocio CRISP-DM
Big data aplicado el negocio CRISP-DMBig data aplicado el negocio CRISP-DM
Big data aplicado el negocio CRISP-DM
 
Ideologia de genero agl
Ideologia de genero aglIdeologia de genero agl
Ideologia de genero agl
 
Flexitime y la biblioteca
Flexitime y la biblioteca Flexitime y la biblioteca
Flexitime y la biblioteca
 
Material para docente
Material para docenteMaterial para docente
Material para docente
 
What is Social Media & why should you join? - for business
What is Social Media & why should you join? - for businessWhat is Social Media & why should you join? - for business
What is Social Media & why should you join? - for business
 
Tomar café com.......Stª Casa da Misericórdia de Vizela dia 8-02-2012
Tomar café com.......Stª Casa da Misericórdia de Vizela dia 8-02-2012Tomar café com.......Stª Casa da Misericórdia de Vizela dia 8-02-2012
Tomar café com.......Stª Casa da Misericórdia de Vizela dia 8-02-2012
 
38 nom 054 semarnat 1993 proc incompatibilidad residuos
38 nom 054 semarnat 1993 proc incompatibilidad residuos38 nom 054 semarnat 1993 proc incompatibilidad residuos
38 nom 054 semarnat 1993 proc incompatibilidad residuos
 
2476212
24762122476212
2476212
 
Comportamiento colectivo e individual
Comportamiento colectivo e individualComportamiento colectivo e individual
Comportamiento colectivo e individual
 
Revista vigia 2014 san 2
Revista vigia 2014 san 2Revista vigia 2014 san 2
Revista vigia 2014 san 2
 
Administracion de un centro de computo
Administracion de un centro de computoAdministracion de un centro de computo
Administracion de un centro de computo
 
Guia para implementar pei.
Guia para implementar pei.Guia para implementar pei.
Guia para implementar pei.
 
Globalización y negocios internacionales
Globalización y negocios internacionalesGlobalización y negocios internacionales
Globalización y negocios internacionales
 
Act 5 quiz 1 diseño de plantas industriales
Act 5 quiz 1 diseño de plantas industrialesAct 5 quiz 1 diseño de plantas industriales
Act 5 quiz 1 diseño de plantas industriales
 
Los cuatro temperamentos 2011 yfraga
Los cuatro temperamentos 2011 yfragaLos cuatro temperamentos 2011 yfraga
Los cuatro temperamentos 2011 yfraga
 
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa AsyaGrade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
 

Similar to Computational Information Geometry on Matrix Manifolds (ICTP 2013)

Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
 
On approximating the Riemannian 1-center
On approximating the Riemannian 1-centerOn approximating the Riemannian 1-center
On approximating the Riemannian 1-centerFrank Nielsen
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsMercier Jean-Marc
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Frank Nielsen
 
Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...Alexander Litvinenko
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012Zheng Mengdi
 
Clustering Random Walk Time Series
Clustering Random Walk Time SeriesClustering Random Walk Time Series
Clustering Random Walk Time SeriesGautier Marti
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Alexander Litvinenko
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applicationsFrank Nielsen
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionPedro222284
 

Similar to Computational Information Geometry on Matrix Manifolds (ICTP 2013) (20)

Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...
MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...
MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
On approximating the Riemannian 1-center
On approximating the Riemannian 1-centerOn approximating the Riemannian 1-center
On approximating the Riemannian 1-center
 
main
mainmain
main
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methods
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
 
Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
Clustering Random Walk Time Series
Clustering Random Walk Time SeriesClustering Random Walk Time Series
Clustering Random Walk Time Series
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics
 
KAUST_talk_short.pdf
KAUST_talk_short.pdfKAUST_talk_short.pdf
KAUST_talk_short.pdf
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
Pres metabief2020jmm
Pres metabief2020jmmPres metabief2020jmm
Pres metabief2020jmm
 

Recently uploaded

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Computational Information Geometry on Matrix Manifolds (ICTP 2013)

  • 1. Computational Information Geometry on Matrix Manifolds Frank Nielsen Frank.Nielsen@acm.org www.informationgeometry.org Sony Computer Science Laboratories, Inc. July 2013, ICTP, Trieste, IT c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/56
  • 2. Geometry of matrix manifolds... ◮ Euclidean geometry, Fr¨benius norm → distance: o M 2 F 2 mij = = i ,j Mi ∗ i 2 2 = M∗j 2 2 = tr(M ⊤ M) j ◮ Riemannian geometry of symmetric positive definite (SPD) matrices [9, 2] ◮ Riemannian geometry of rank-deficient positive semi-definite (SPSD) matrices Stiefel/Grassman manifolds [3] ◮ Quantum geometry: SPD matrices with unit trace “One geometry cannot be more true than another; it can only be more convenient”, — Jules Henri Poincar´ (1902) e c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/56
  • 3. Forthcoming conference (GSI) 28th-30th August, Paris. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/56
  • 4. What is Computational Information Geometry? ◮ What is Information? = Essence of data (datum=“thing”) (make it tangible → e.g., parameters of generative models) ◮ Can we do Intrinsic computing? (unbiased by any particular “data representation” → same results after recoding data) ◮ Geometry −→ Science of invariance (mother of Science, compass & ruler, Descartes analytic=coordinate/Cartesian, imaginaries, ...). ...the open-ended poetic mathematics! ?! c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/56
  • 5. Rationale for Computational Information Geometry ◮ Information is ...never void! → lower bounds ◮ ◮ ◮ ◮ ◮ Geometry: ◮ ◮ ◮ Fisher information and Cram´r-Rao lower bound (estimation) e Bayes error and Chernoff information (classification) Coding and Shannon entropy (communication) Program and Kolmogorov complexity (compression). (Unfortunately not computable!) Language (point, line, ball, dimension, orthogonal, projection, geodesic, immersion, etc.) Power of characterization (eg., intersection of two pseudo-segments not admitting closed-form expression) Computing: Information computing. Seeking for mathematical convenience and mathematical tricks (RKHS in ML). How to manipulate “space of functions” ?!? c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/56
  • 6. Example I: Matrix manifold Pattern = Gaussian mixture models (universal class) Statistical (dis)similarity/distance: total Bregman divergence (tBD, tKL). Invariance: ..., xi ∼ N(µi , Σi ), y = A(x) = Lx + t, yi ∼ N(Lµi + t, LΣi L⊤ ), D(X1 : X2 ) = D(Y1 : Y2 ) (L: any invertible affine transformation, t a translation) Shape Retrieval using Hierarchical Total Bregman Soft Clustering [7], IEEE PAMI, 2012. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/56
  • 7. Example II: Matrix manifolds DTI: diffusion ellipsoids, tensor interpolation. Pattern = zero-centered “Gaussians“ Statistical (dis)similarity/distance: total Bregman divergence (tBD, tKL). Invariance: ..., D(A⊤ PA : A⊤ QA) = D(P : Q), A ∈ SL(d): orthogonal matrix (volume/orientation preserving) total Bregman divergence (tBD). (3D rat corpus callosum) c Total Bregman Divergence and its Applications to DTI Analysis [20], IEEE TMI. 2011. Science Laboratories, Inc. 2013 Frank Nielsen, Sony Computer 7/56
  • 8. Example III: Gaussian manifolds Consider 5D Gaussian Mixture Models (GMMs) of color images (image=RGBxy point set) A Gaussian mixture model wi N(µi , Σi ) is interpreted as a weighted point set {θi = (µi , Σi )}. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/56
  • 9. Matrix center points & clustering Aggregation (matrix quantization for codebooks): Given a data-set of matrices M = {M1 , ..., Mn } ⊂ M, compute a center matrix C . Centering as a variational minimization problem: wi distancep (C , Mi ) (OPT ) : Cp = arg min C ∈M i Notion of centrality, robustness to outliers? For diagonal matrices, with “Euclidean” distance, usual geometric center points: ◮ ◮ ◮ median (p = 1): robust to outliers (Fermat-Weber point, no closed form), centroid (p = 2): breakdown point of 1 (→ tBD)), circumcenter (lim p → ∞): minimize farthest point (minimax [1]). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/56
  • 10. Diffusion Tensor Magnetic Resonance Imaging DT-MRI: Measures anisotropic diffusion of water molecules in a 3 × 3 tensor assigned to each voxel position (1990˜). Used to analyze in-vivo connectivity patterns of brain tissues: gray matter, white matter (corpus callosum) and cerebrospinal fluid (CSF) c Image courtesy Peter J. Basser (Magnetic resonance imaging of the brain and spine, Chapter 31) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/56
  • 11. Gradiometry tensor: 3 × 3 SPSD matrices Beyond the “constant” g ≃ 9.81m/s 2 . Gravity field measuring anisotropy. → Oil & gas industry. Courtesy of BellGeo. http://www.bellgeo.com/tech/technology_theory_of_FTG.html c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/56
  • 12. Structure tensors in computer vision → Pioneered in image processing: tensor descriptor of a region at a pixel. (Harris-Stephens [6]). Consider a kernel, and compute the tensor descriptor I ′2 (x) T (p = (x, y )) = K ∗ I ′ (y )I ′ (x) I ′ (x)I ′ (y ) I ′ (y )2 , w (u, v )∇I (u, v )(∇I (u, v ))T = u,v K : uniform, Gaussian kernel (eg., s × s window W centered at the pixel p) I ′ (x), I ′ (y ): gradient, derivatives of the image. Versatile method: corner detection, optical flow estimation, segmentation, stereo matching, etc. → Tensor image processing c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/56
  • 13. Harris-Stephens structure tensor (1988) Deformation tensor field Harris-Stephens combined corner-edge detector: R = det T − k(tr T )2 → Measures of tensor anisotropy. Structure tensor represents local orientation (eigenvectors/eigenvalues). Harris-Stephens’ combined corner/edge detector (note) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/56
  • 14. Matrix with Fr¨benius metric distance o Matrix space M with vectorial structure dE (P, Q) = P −Q = F tr(P − Q)T (P − Q) Centroid of tensors: 1 CE = n (1) (2) n wi Ti i =1 → scalar average of each element of the tensor. Tensor Field Segmentation Using Region Based Active Contour Model [21], ECCV, 2004. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/56
  • 15. Matrix vectorization & computational geometry Computational geometry on w × h-dimensional matrix spaces wrt Fr¨benius distance amounts to computational geometry on o Euclidean vector space for D = w × h. → Voronoi diagrams, smallest enclosing ball, minimum spanning tree, etc. For symmetric matrices, we have D = d(d+1) degrees of freedom, 2 and vectorize as follows: d M F d 2 mij = i =1 j=1 d−1 d d 2 mij 2 mii + 2 = i =1 i =1 j=i +1 = m 2 √ √ √ with m = [m11 ...mdd 2m12 2m1d ... 2md−1,d ]T = M. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/56
  • 16. Matrix functions From the spectral decomposition: M = UDU ⊤ with D = λ(M) = diag(λ1 , ..., λd ) the diagonal matrix of eigenvalues, consider real-valued function x → f (x) to extend to matrices as f (M) = U diag(f (λ1 ), ..., f (λd )) U T 1 Examples: log x, exp x, |x|, x 2 , x 2 , etc. O(d 3 ) SVD factorization complexity. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/56
  • 17. Riemannian cone of SPD matrices Exponential maps from tangent planes (symmetric matrices Sym) to the manifold cone C: expP : TP C = Sym → C Logarithmic maps from manifold cone C to tangent planes: logP : C → TP C = Sym 1 1 1 1 logP (Q) = P 2 log(P − 2 QP − 2 )P 2 Map any point Q ∈ Sym++ to unique tangent vector at P such that γ0 = P and γ1 = Q. Geodesic equation: 1 1 1 γt (P, Q) = P 2 P − 2 QP − 2 t 1 P2 Geodesic (metric length) distance: 1 1 dR (P, Q) = log P − 2 QP − 2 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/56
  • 18. Riemannian Karcher centroid d dR (P, Q) = log2 λi tr log2 (P −1 Q) = i =1 = log P 1 −2 QP 1 −2 , where the λi ’s are the eigenvalues of P −1 Q. 1 1 (P −1 Q = Q 2 P −1 Q 2 ) Unique mean characterized by n=1 log(Ti−1 CR ) = 0 i Closed-form solution only for n = 2: 1 1 1 1 2 1 CR (P, Q) = P 2 P − 2 QP − 2 P 2 otherwise iterative approximation (CR = limt→∞ Ct ): Ct+1 = Ct exp c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1 n n log Ct−1 Ti . i =1 18/56
  • 19. Riemannian minimax SPD center (circumcenter [1]) Case of p = ∞, center that minimizes the maximum distance. GEO-ALG: Starts with c1 ∈ P and iteratively update the current 1 circumcenter as follows: ci +1 = Geodesic(ci , fi , i +1 ), where fi denotes the farthest point of P to ci , and Geodesic(p, q, t) denotes the intermediate point m on the geodesic passing through p and q such that ρ(p, m) = t × ρ(p, q). Geodesic: 1 1 1 γt (P, Q) = P 2 P − 2 QP − 2 t 1 P2 Find t such that d=1 log2 λt = t 2 d=1 log2 λi = r 2 i i i That is t = r . Prove core-set and guaranteed convergence. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. d 2 i =1 log λi . 19/56
  • 20. Matrices as parameters in probability distributions Exponential families: Gaussian, Wishart, etc.: p(x; λ) = pF (x; θ) = exp ( t(x), θ − F (θ) + k(x)) . Example: Poisson distribution p(x; λ) = λx exp(−λ), x! ◮ the sufficient statistic t(x) = x, ◮ θ = log λ, the natural parameter, ◮ F (θ) = exp θ, the log-normalizer → CONVEX, ◮ and k(x) = − log x! the carrier measure (with respect to the counting measure). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 20/56
  • 21. Gaussians as an exponential family p(x; λ) = p(x; µ, Σ) = 1 (x − µ)T Σ−1 (x − µ)) √ exp − 2 2π det Σ 1 θ = (Σ−1 µ, 2 Σ−1 ) ∈ Θ = Rd × Kd×d , with Kd×d cone of positive definite matrices, ◮ F (θ) = 1 trθ −1 θ1 θ T − 1 log det θ2 + d log π → CONVEX 1 2 4 2 2 ◮ t(x) = (x, −x T x), ◮ k(x) = 0. Inner product : composite, sum of a dot product and a matrix trace : T ′ T ′ θ, θ ′ = θ1 θ1 + trθ2 θ2 . ◮ The coordinate transformation τ : Λ → Θ is given for λ = (µ, Σ) by τ (λ) = 1 λ−1 λ1 , λ−1 , 2 2 2 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. τ −1 (θ) = 1 −1 1 −1 θ θ1 , θ2 2 2 2 21/56
  • 22. Convex duality: Legendre transformation ◮ For a strictly convex and differentiable function F : X → R: F ∗ (y ) = sup { y , x − F (x)} x∈X lF (y ;x); ◮ Maximum obtained for y = ∇F (x): ∇x lF (y ; x) = y − ∇F (x) = 0 ⇒ y = ∇F (x) ◮ Maximum unique from convexity of F (∇2 F ≻ 0): ∇2 lF (y ; x) = −∇2 F (x) ≺ 0 x ◮ Convex conjugates: (F , X ) ⇔ (F ∗ , Y), c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. Y = {∇F (x) | x ∈ X } 22/56
  • 23. Legendre duality: Geometric interpretation Consider the epigraph of F as a convex object: ◮ convex hull (V -representation), versus ◮ half-space (H-representation). Legendre transform also called “slope” transform. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 23/56
  • 24. Legendre duality & Canonical divergence ◮ ◮ ◮ Convex conjugates have functional inverse gradients ∇F −1 = ∇F ∗ ∇F ∗ may require numerical approximation (not always available in analytical closed-form) Involution: (F ∗ )∗ = F with ∇F ∗ = (∇F )−1 . Convex conjugate F ∗ expressed using (∇F )−1 : F ∗ (y ) = = ◮ x, y − F (x), x = ∇y F ∗ (y ) (∇F )−1 (y ), y − F ((∇F )−1 (y )) Fenchel-Young inequality at the heart of canonical divergence: F (x) + F ∗ (y ) ≥ x, y AF (x : y ) = AF ∗ (y : x) = F (x) + F ∗ (y ) − x, y ≥ 0 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 24/56
  • 25. Dual Bregman divergences & canonical divergence [14] p(x) ≥0 q(x) = BF (θQ : θP ) = BF ∗ (ηP : ηQ ) KL(P : Q) = EP log = F (θQ ) + F ∗ (ηP ) − θQ , ηP = AF (θQ : ηP ) = AF ∗ (ηP : θQ ) with θQ (natural parameterization) and ηP = EP [t(X )] = ∇F (θP ) (moment parameterization). 1 1 dx − p(x) log dx KL(P : Q) = p(x) log q(x) p(x) H × (P:Q) H(p)=H × (P:P) Shannon cross-entropy and entropy of EF [14]: H × (P : Q) = F (θQ ) − θQ , ∇F (θP ) − EP [k(x)] H(P) = F (θP ) − θP , ∇F (θP ) − EP [k(x)] H(P) = −F ∗ (ηP ) − EP [k(x)] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 25/56
  • 26. Bregman divergence: Geometric interpretation (I) Potential function F , graph plot F : (x, F (x)). DF (p : q) = F (p) − F (q) − p − q, ∇F (q) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 26/56
  • 27. Bregman divergence: Geometric interpretation (II) Potential function f , graph plot F : (x, f (x)). Bf (p||q) = f (p) − f (q) − (p − q)f ′ (q) Bf (.||q): vertical distance between the hyperplane Hq tangent to F at lifted point q , and the translated hyperplane at p . ˆ ˆ c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 27/56
  • 28. Bregman divergence: Geometric interpretation (III) Bregman divergence and path integrals B(θ1 : θ2 ) = F (θ1 ) − F (θ2 ) − θ1 − θ2 , ∇F (θ2 ) , (3) θ1 = θ2 η2 = η1 ∗ ∇F (t) − ∇F (θ2 ), dt , (4) ∇F ∗ (t) − ∇F ∗ (η1 ), dt , (5) = B (η2 : η1 ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. (6) 28/56
  • 29. Matrix Bregman divergences [4, 16] Choose F a real-valued functional generator and extend F to matrices: F (X ) = tr(Ψ(X )) tF ,k N k Ψ(X ) = k≥0 (tF ,k from the Taylor expansion of real-valued F ) BF (P : Q) = F (P) − F (Q) − tr((P − Q)⊤ ∇F (Q)), ∇F (X ) = ′ tF ,k N k k≥0 ′ (tF ,k from the Taylor expansion of real-valued F ′ ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 29/56
  • 30. Matrix Bregman divergences [16] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 30/56
  • 31. Particular case: Bregman Schatten p-divergences [5, 16] Schatten p-norm of real symmetric matrix X : (unitarily invariant matrix norms) X p = λ(X ) p Bregman generator: 1 X 2 p 2 Used in regularized convex optimization [5], matrix data mining [16]. F (X ) = c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 31/56
  • 32. Matrix Legendre transformation Extends classical Legendre-Fenchel transformation: F ∗ (η) = sup spec(θ)⊆dom(F ) tr(θη ⊤ ) − F (θ) DF (θP : θQ ) = DF ∗ (ηQ : ηP ) = F (θ) + F ∗ (η) − tr(θη ⊤ ) θ and η are dual matrix coordinate systems on the matrix manifold. Non-metric differential structure with dual coordinate systems. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 32/56
  • 33. Bregman matrix means BF (X , P) = F (X ) − F (P) − tr((X − P)T ∇F (P)), F (·): strictly convex and differentiable function on an open convex space. n C = ∇F −1 i =1 wi ∇F (Ti ) quasi-arithmetic mean for ∇F . Since BF (X , P) = BF (P, X ), define a right-sided centroid M ′ : Find the center of mass [13] (independent of generator F ) F (X ) = tr(X T X ): the quadratic matrix entropy, F (X ) = − log det X : the matrix Burg entropy, and F (X ) = tr(X log X − X ): the von Neumann entropy [19, 18, 15] (Umegaki quantum relative entropy). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 33/56
  • 34. Total Bregman divergences (tBD) Instead of ”vertical” projection in Bregman divergence, consider perpendicular projection. (Analogy with least squares and total least squares regression.) tBF (P, Q) = BF (P, Q) 1 + ∇F (Q) 2 → proven statistically robust. Applications to robust DT-MRI segmentation [8]. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 34/56
  • 35. Matrix Jensen/Burbea-Rao divergences [10] Convexity gap defines a divergence BRF (P, Q) = ◮ ◮ ◮ ◮ F (P) + F (Q) −F 2 P +Q 2 ≥0 F (X ) = tr(X T X ): the quadratic matrix entropy, F (X ) = − log det X : the matrix Burg entropy, and F (X ) = tr(X log X − X ): the von Neumann entropy. etc. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 35/56
  • 36. Smooth family of convex generators [12, 17] 1-parameter family of generators: Fα (X ) = 1 tr(αX − X α + (1 − α)I ), α = {0, 1} α(1 − α) Bα (P : Q) = ∇Fα (X ) = 1 tr(Q α − P α + αQ α−1 (P − Q)) α(1 − α) 1 (I − X α−1 ) α−1 1 −1 ∇Fα (X ) = (I − (α − 1)X ) α−1 When α → 1, ∇Fα (X ) = ∇F1 (X ) = log X . When α → 0, ∇Fα (X ) = ∇F0 (X ) = X −1 − I . ◮ α = 2: Quadratic matrix information ◮ α → 1: von Neumann information ◮ α → 0: Burg log-det information c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 36/56
  • 37. Jensen (Burbea-Rao) divergences Based on Jensen’s inequality for a convex function F : BRF (X , P) = F (X ) + F (P) −F 2 X +P 2 def = ≥ 0. strictly convex function F (·). Includes the special case of Jensen-Shannon divergence: JS(p, q) = H p+q 2 − H(p) + H(q) 2 F (x) = −H(x), the negative Shannon entropy H(x) = −x log x. → generators are convex and entropies are concave (negative generators) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 37/56
  • 38. Visualizing Burbea-Rao divergences include Squared Mahalanobis distance. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 38/56
  • 39. Burbea-Rao from Symmetrizing Bregman divergences [13] ◮ Jeffreys-Bregman divergences. SF (p; q) = = ◮ BF (p, q) + BF (q, p) 2 1 p − q, ∇F (p) − ∇F (q) , 2 Jensen-Bregman divergences (diversity index). JF (p; q) = = BF (p, p+q ) + BF (q, p+q ) 2 2 2 F (p) + F (q) p+q −F 2 2 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. = BRF (p, q) 39/56
  • 40. Skew Burbea-Rao divergences (α) BRF (α) : X × X → R+ BRF (p, q) = αF (p) + (1 − α)F (q) − F (αp + (1 − α)q) (α) BRF (p, q) = αF (p) + (1 − α)F (q) − F (αp + (1 − α)q) (1−α) = BRF (q, p) Skew symmetrization of Bregman divergences: def αBF (p, αp + (1 − α)q) + (1 − α)BF (q, αp + (1 − α)q) = (α) BRF (p, q) = skew Jensen-Bregman divergences. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 40/56
  • 41. Bregman divergences = asymptotic skewed Jensen divergences 1 (α) BRF (p, q) α→1 1 − α 1 (α) BF (q, p) = lim BRF (p, q) α→0 α BF (p, q) = lim c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 41/56
  • 42. Burbea-Rao/Jensen centroids (p = 1) n X (α ) wi BRF i (X , Ti ) = arg min L(x) OPT : CF = arg min x i =1 Wlog., equivalent to minimize n n E (c) = ( i =1 wi αi )F (C ) − i =1 wi F (αi C + (1 − αi )Ti ) Sum E = F + G of convex F + concave G function ⇒ Convex-ConCave Procedure (CCCP, NIPS*01) Start from arbitrary c0 , and iteratively update as: ∇F (Ct+1 ) = −∇G (Ct ) ⇒ guaranteed convergence to a (local) minimum. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 42/56
  • 43. ConCave Convex Procedure (CCCP) minx E (x) = F (x) + G (x) ∇F (ct+1 ) = −∇G (ct ) Decomposition may not be unique... c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 43/56
  • 44. Iterative algorithm for Burbea-Rao centroids Apply CCCP scheme ∇F (Ct+1 ) = Ct+1 = ∇F −1 1 n n i =1 wi αi i =1 1 wi αi ∇F (αi Ct + (1 − αi )Ti ) n n i =1 wi αi i =1 wi αi ∇F (αi Ct + (1 − αi )Ti ) Get arbitrarily fine approximations of the (skew) Burbea-Rao matrix centroids and barycenters. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 44/56
  • 45. Special case: α-log det divergence [15, 11] Cone of Hermitian positive definite matrices (self-adjoint matrices ¯ M H = M T = M). F (X ) = − log detX , ∇F (X ) = ∇F −1 (X ) = −X −1 Burbea-Rao α-log det divergences:   tr(Q −1 P − I ) − log det(Q −1 P)) α = 1   det( 1−α P+ 1+α Q) (α) 4 2 2 α ∈ R{−1, 1} Dld (P, Q) = 1−α 2 log 1+α  1−α (det P) 2 (det Q) 2   tr(P −1 Q − I ) − log det(P −1 Q) α = −1 Start with C1 = 1 n n i =1 Ti , n Ct+1 = n i =1 1−α 1+α Ti + Ct 2 2 → unique global mean (obtained from CCCP). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. −1 −1 45/56
  • 46. Bhattacharyya coefficients/distances Bhattacharyya coefficient and non-metric distance: C( p, q) = p(x)q(x)dx, 0 < C (p, q) ≤ 1, B(p, q) = − ln C (p, q). (coefficient is always strictly positive). Hellinger metric H(p, q) = 1 2 ( p(x) − q(x))2 dx, such that 0 ≤ H(p, q) ≤ 1. H(p, q) = = 1 2 p(x)dx + q(x)dx − 2 p(x) q(x)dx 1 − C (p, q). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 46/56
  • 47. Chernoff coefficients/α-divergences Skew Bhattacharrya divergences based on Chernoff α-coefficients. Bα (p, q) = − ln = − ln x p α(x)q 1−α (x)dx = − ln Cα (p, q) q(x) x p(x) q(x) α dx = − ln Eq [Lα (x)] Amari α-divergence:  1−α 1+α  4 2 1 − p(x) 2 q(x) 2 dx , α = ±1,  1−α  p(x) Dα (p||q) = α = −1, p(x) log q(x) dx = KL(p, q),    q(x) log q(x) dx = KL(q, p), α = 1, p(x) Dα (p||q) = D−α (q||p) Remapping α′ = 1−α 2 (α = 1 − 2α′ ) to get Chernoff α′ -divergences c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 47/56
  • 48. Bhattacharyya/Chernoff of exponential families [10] Equivalence with skew Burbea-Rao distances: (α) Bα (pF (x; θp ), pF (x; θq )) = BRF (θp , θq ), (7) = αF (θp ) + (1 − α)F (θq ) − F (αθp + (1 − α)θq ) Bhat. divergence on probability distributions amounts to compute a Jensen divergence on its parameters c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 48/56
  • 49. Closed-form Bhattacharyya distances for exp. fam. Generic formula that instantiates in those well-known formula in statistical pattern recognition. Exp. fam. F (θ) (up to a constant) Multinomial log(1 + Poisson exp θ Gaussian 1 π 1 − 4θ + 2 log(− θ ) 2 2 2 2 σ2 +σq 1 (µp −µq ) + 1 ln p 4 σ2 +σ2 2 2σp σq p q Gaussian 1 trΘ−1 θθ T − 1 log det Θ 4 2 1 (µ − µ )T p q 8 d −1 exp θi ) i =1 θ2 Bhattacharyya/Burbea-Rao BRF (λp , λq ) = BRF (τ (λp ), τ (λq )) √ − ln d=1 pi qi i 1 ( √µ − √µ ) 2 p q 2 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. Σp +Σq 2 −1 Σp +Σq det 1 2 (µp − µq ) + 2 ln det Σ det Σ p q 49/56
  • 50. Wrapping up ◮ Besides Euclidean, log-Euclidean and Riemannian metric-based means, proposed divergence-based matrix centroids, ◮ Total Bregman divergences and robustness (conformal geometry), ◮ Riemannian minimax center, ◮ skew Burbea-Rao/Jensen divergences extending Bregman divergences, ◮ Bhattacharrya means of densities = Burbea-Rao means on (matrix) parameters Which mean you do you mean or need? c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 50/56
  • 51. Non-metric matrix manifolds with dually affine connections In a nutshell: ◮ asymmetric (Bregman) non-metric divergence, ◮ Legendre transform, convex conjugates & dual divergences ◮ Dual θ− or η- or mixed coordinate systems ◮ dual closed-form affine geodesics (convenient computationally) ◮ Pythagorean theorem c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 51/56
  • 52. Thank you. www.informationgeometry.org “One geometry cannot be more true than another; it can only be more convenient”, — Jules Henri Poincar´ (1902) e c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 52/56
  • 53. Bibliographic references I Marc Arnaudon and Frank Nielsen. On approximating the Riemannian 1-center. Comput. Geom., 46(1):93–104, 2013. Rajendra Bhatia. The Riemannian mean of positive matrices. In Frank Nielsen and Rajendra Bhatia, editors, Matrix Information Geometry, pages 35–51, 2012. Silvere Bonnabel and Rodolphe Sepulchre. Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank. SIAM J. Matrix Analysis Applications, 31(3):1055–1070, 2009. Inderjit S. Dhillon and Joel A. Tropp. Matrix nearness problems with bregman divergences. SIAM J. Matrix Anal. Appl., 29(4):1120–1146, November 2007. John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Ambuj Tewari. Composite objective mirror descent. In Adam Tauman Kalai and Mehryar Mohri, editors, COLT, pages 14–26. Omnipress, 2010. C. Harris and M. Stephens. A Combined Corner and Edge Detection. In Proceedings of The Fourth Alvey Vision Conference, pages 147–151, 1988. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 53/56
  • 54. Bibliographic references II Meizhu Liu, Baba C. Vemuri, Shun-ichi Amari, and Frank Nielsen. Shape retrieval using hierarchical total Bregman soft clustering. Transactions on Pattern Analysis and Machine Intelligence, 34(12):2407–2419, 2012. Meizhu Liu, Baba C. Vemuri, Shun ichi Amari, and Frank Nielsen. Shape retrieval using hierarchical total bregman soft clustering. IEEE Trans. Pattern Anal. Mach. Intell., 34(12):2407–2419, 2012. Maher Moakher. A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM Journal on Matrix Analysis and Applications, 26(3):735–747, 2005. Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, 2011. Frank Nielsen, Meizhu Liu, Xiaojing Ye, and Baba C. Vemuri. Jensen divergence based SPD matrix means and applications. In International Conference on Pattern Recognition (ICPR), 2012. Frank Nielsen and Richard Nock. Quantum Voronoi diagrams and Holevo channel capacity for 1-qubit quantum states. In IEEE International Symposium on Information Theory (ISIT), pages 96–100, 2008. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 54/56
  • 55. Bibliographic references III Frank Nielsen and Richard Nock. Sided and symmetrized Bregman centroids. IEEE Trans. Inf. Theor., 55(6):2882–2904, June 2009. Frank Nielsen and Richard Nock. Entropies and cross-entropies of exponential families. In International Conference on Image Processing (ICIP), pages 3621–3624, 2010. R. Nock, B. Magdalou, E. Briys, and F. Nielsen. On tracking portfolios with certainty equivalents on a generalization of Markowitz model: the fool, the wise and the adaptive. In Thorsten Joachims, editor, International Conference on Machine Learning (ICML). Omnipress, 2011. Richard Nock, Brice Magdalou, Eric Briys, and Frank Nielsen. Mining matrix data with Bregman matrix divergences for portfolio selection. In Frank Nielsen and Rajendra Bhatia, editors, Matrix Information Geometry, pages 373–402, 2012. Masanori Ohya and D´nes Petz. e Quantum Entropy and Its Use. 1st ed. 1993. Corr 2nd printing, 2004. Koji Tsuda, Gunnar R¨tsch, and Manfred K. Warmuth. a Matrix exponentiated gradient updates for on-line learning and Bregman projection. J. Mach. Learn. Res., 6:995–1018, December 2005. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 55/56
  • 56. Bibliographic references IV Hisaharu Umegaki. Conditional expectation in an operator algebra. IV. Entropy and information. KodaiMathSemRep, 14(2):59, 1962. Baba Vemuri, Meizhu Liu, Shun ichi Amari, and Frank Nielsen. Total Bregman divergence and its applications to DTI analysis. IEEE Transactions on Medical Imaging, 2011. Zhizhou Wang and Baba C. Vemuri. An affine invariant tensor dissimilarity measure and its applications to tensor-valued image segmentation. In CVPR (1), pages 228–233, 2004. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 56/56