SlideShare a Scribd company logo
The Centroids of
Symmetrized Bregman Divergences
Frank Nielsen1
1 Sony

Richard Nock2

Computer Science Laboratories Inc., FRL
´
Ecole Polytechnique, LIX
Frank.Nielsen@acm.org
2 University

of Antilles-Guyanne
CEREGMIA
Richard.Nock@martinique.univ-ag.fr

December 2007

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
The centroid in Euclidean geometry
¯
Given a point set P = {p1 , ..., pn } of Ed , the centroid c :

¯
Is the center of mass: c =

1
n

n
i=1 pi ,

1
Minimizes minc∈Rd n n ||cpi || 2 :
i=1
M IN AVG squared Euclidean distance optimization,

Plays a central role in center-based clustering methods
(k-means of Lloyd’1957)
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Centroid and barycenters
Notion of centroid extends to barycenters:
¯
b(w) =

n

wi pi
i=1

→ barycentric coordinates for interpolation, with ||w|| = 1.
¯
Barycenter b(w) minimizes:
n

wi ||cpi ||2

min

c∈Rd

i=1

→ weighted M IN AVG optimization.
Intracluster min. weighted average

Frank Nielsen and Richard Nock

n
¯
wi ||b(w)pi ||2 .
i=1

The Centroids of Symmetrized Bregman Divergences
Center points in Euclidean geometry

Centroid ×: robust to outliers with simple closed-form sol.
1
¯
“Mean” radius is n n ||c pi ||2 .
i=1
Circumcenter : minimizes the radius of enclosing ball.
Combinatorially defined by at most d + 1 points.
(MiniBall Welzl’1991)
M INI M AX (non-differentiable) optimization problem:
C = min maxi ||cpi ||2
c

M IN AVG(L2 ) → Fermat-Weber point ◦
→ no closed form solution.
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Bregman divergences

Aim at generalizing Euclidean centroids to dually flat spaces.

Bregman divergences DF :
F : X → R strictly convex and differentiable function
defined over an open convex domain X :
DF (p||q) = F (p) − F (q) − p − q, ∇F (q)
not a metric (symmetry and triangle inequality may fail)
versatile family, popular in Comp. Sci.–Machine learning.

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Bregman divergence: Geometric interpretation

DF (p||q) = F (p) − F (q) − p − q, ∇F (q)
DF (p||q) ≥ 0
(with equality iff. p = q)
F : generator, contrast function or potential function (inf. geom.).
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Bregman divergence

Example 1: The squared Euclidean distance
F (x) = x 2 : strictly convex and differentiable over Rd
(Multivariate F (x) = d xi2 , obtained coordinatewise)
i=1
DF (p||q) = F (p) − F (q) − p − q, ∇F (q)
= p2 − q 2 − p − q, 2q
− p2 − q 2 − 2 p, q + 2q 2
=

p−q

Frank Nielsen and Richard Nock

2

The Centroids of Symmetrized Bregman Divergences
Example 1: The squared Euclidean distance

Java applet:

http://www.sonycsl.co.jp/person/nielsen/BregmanDivergence/

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Bregman divergence
Example 2: The relative entropy (Kullback-Leibler divergence)
F (p) = p(x) log p(x) dx
(negative Shannon entropy)
(Discrete distributions F (p) = x p(x) log p(x) dx)
DF (p||q) =

=

(p(x) log p(x) − q(x) log q(x)
− p(x) − q(x), log q(x) + 1 )) dx
p(x)
dx
(KL divergence)
p(x) log
q(x)

Kullback-Leiber divergence also known as:
relative entropy, discrimination information or I-divergence.
(Defined either on the probability simplex or extended on the full positive quadrant – unnormalized pdf.)

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Example 2: The relative entropy
(Kullback-Leibler divergence)

Java applet:

http://www.sonycsl.co.jp/person/nielsen/BregmanDivergence/

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Bregman divergences: A versatile family of measures
Bregman divergences are versatile, suited to mixed-type data.
(Build mixed-type multivariate divergences dimensionwise using elementary uni-type divergences.)

Fact (Linearity)
Bregman divergence is a linear operator:
∀F1 ∈ C ∀F2 ∈ C DF1 +λF2 (p||q) = DF1 (p||q) + λDF2 (p||q) for
any λ ≥ 0.
Fact (Equivalence classes)
Let G(x) = F (x) + a, x + b be another strictly convex and
differentiable function, with a ∈ Rd and b ∈ R. Then
DF (p||q) = DG (p||q).
(Simplify Bregman generators by removing affine terms.)

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Sided and symmetrized centroids from M IN AVG
Right-sided and left-sided centroids:
F
cR

1
= arg min
c∈X n

1
F
cL = arg min
c∈X n

n

DF (pi || c )
i=1
n

DF ( c ||pi )
i=1

Symmetrized centroid:
1
c = arg min
c∈X n

n

F

i=1

DF (pi || c ) + DF ( c ||pi )
2
SF

Symmetrized asymmetric Bregman divergence SF is not a
Bregman divergence (because of domain convexity)
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Sided right-type centroid
Theorem
F
The right-type sided Bregman centroid cR of a set P of n points
p1 , ...pn , defined as the minimizer for the average right
divergence
1
F
cR = arg minc n n DF (pi ||c) = arg minc AVGF (P||c), is
i=1
unique, independent of the selected divergence DF , and
F
¯ 1 i=1
coincides with the center of mass cR = cR = p = n n pi .

Proof:
Start with the function to minimize:
n

AVGF (P||q) =
i=1

Frank Nielsen and Richard Nock

1
DF (pi ||q)
n

The Centroids of Symmetrized Bregman Divergences
n

AVGF (P||q) =
i=1
⎛
AVGF (P, q)

=

⎝
⎛

=

⎝
⎛

=

⎝

n
i=1
n
i=1

1

1
(F (pi ) − F (q)− < pi − q, ∇F (q) >)
n
⎞

1
n

¯
¯
F (pi ) − F (p)⎠ + ⎝ F (p ) − F (q) −
⎞

1
n
n

n i=1

⎛

⎞

n

1

i=1

⎛

¯
¯
F (pi ) − F (p)⎠ + ⎝ F (p ) − F (q) −

n

n
i=1

⎞

< pi − q, ∇F (q) >⎠ ,
⎞
1
n

(pi − q), ∇F (q) ⎠ ,

¯
¯
F (pi ) − F (p)⎠ +DF (p||q).

Independent of q

¯
¯
Minimized for q = p since DF (p||q) ≥ 0 with equality iff
¯
q=p
1
¯
Information radius: JSF (P) = n n F (pi ) − F (p) ≥ 0
i=1
→ Jensen F remainder aka. Burbea-Rao divergences
← Generalize Jensen-Shannon divergences (Lin’1991)
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Dual Bregman divergence
Legendre-Fenchel (slope) transformation F → G = LF :
G(y) = sup{< y, x > −F (x)}
x∈X

G=

F∗

(with

G∗

=

F ∗∗

def

= F ), minimized for y = x =∇F (x).

F ∗ (x ) =< x, x > −F (x)
Dual Bregman divergence
DF (p||q) = F (p) + F ∗ (∇F (q))− < p, ∇F (q) >
= F (p) + F ∗ (q )− < p, q >
= DF ∗ (q ||p )

More details in “Bregman Voronoi diagrams”, arXiv:0709.2196
The Centroids of Symmetrized Bregman Divergences
(SODA’07) Frank Nielsen and Richard Nock
Left-sided Bregman centroid
Theorem
F
The left-type sided Bregman centroid cL , defined as the
minimizer for the average left divergence
F
F
cL = arg minc∈X AVGF (c||P), is the unique point cL ∈ X such
L
F
¯
that cL = (∇F )−1 (p ) = (∇F )−1 ( n ∇F (pi )), where
i=1
F ∗ (P ) is the center of mass for the gradient point set
¯
p = cR
F
PF = {pi = ∇F (pi ) | pi ∈ P}.

F
cL

= arg min AVGF ( c ||P) ⇔ arg min AVGF ∗ (PF || c’ ) =
c∈X

c ∈X

F∗
cR (PF )

The information radii of sided Bregman centroids are equal:
1
F
F
AVGF (P||cR ) = AVGF (cL ||P) = JSF (P) =
n

n

¯
F (pi )−F (p) > 0
i=1

is the F -Jensen-Shannon divergence for the uniform weight
Frank
The Centroids of Symmetrized Bregman Divergences
distribution. Nielsen and Richard Nock
Generalized means
A sequence V of n real numbers V = {v1 , ..., vn }
1
n

M(V; f ) = f −1

n

f (vi )
i=1

For example, Pythagoras’ means:
Arithmetic: f (x) = x (average)
Geometric: f (x) = log x (central tendency)
Harmonic: f (x) =

1
x

(average of rates)

min xi ≤ M(V; f ) ≤ max xi
i

i

min and max: power means (f (x) = x p ) for p → ±∞
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Bijection: Bregman divergences and means

Bijection: Bregman divergence DF ↔ ∇F -means
M(S; f ) = M(S; af + b) ∀a ∈ R+ and ∀b ∈ R
∗
Recall one property of Bregman divergences:
Fact (Equivalence classes)
Let G(x) = F (x) + a, x + b be another strictly convex and
differentiable function, with a ∈ Rd and b ∈ R. Then
DF (p||q) = DG (p||q).

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Left- and right-sided Bregman barycenters
Left- and right-sided Bregman centroids extend to barycenters.
Theorem
Bregman divergences are in bijection with generalized means.
F
The right-type barycenter bR (w) is independent of F and
computed as the weighted arithmetic mean on the point set, a
generalized mean for the identity function:
F
bR (P; w) = bR (P; w) = M(P; x; w) with
def

M(P; f ; w) =f −1 ( n wi f (vi )). The left-type Bregman
i=1
F is computed as a generalized mean on the point
barycenter bL
F
set for the gradient function: bL (P) = M(P; ∇F ; w).
The information radius of sided barycenters is:
JSF (P; w) = d wi F (pi ) − F ( d wi pi ).
i=1
i=1

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Symmetrized Bregman centroid
Symmetrized centroid defined from the minimum average
optimization problem:
1
c = arg min
c∈X n

n

F

i=1

DF (c||pi ) + DF (pi ||c)
= arg min AVG(P; c)
c∈X
2

Lemma
The symmetrized Bregman centroid c F is unique and obtained
F
F
by minimizing minq∈X DF (cR ||q) + DF (q||cL ):
F
F
c F = arg min DF (cR ||q) + DF (q||cL ).
q∈X

→ Minimization problem depends only on sided centroids.
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Proof

n

AVGF (P||q) =
i=1

1
¯
F (pi ) − F (p)
n

¯
+ DF (p||q)

AVGF (q||P) = AVGF ∗ (PF ||q )
n

=
i=1

1 ∗
¯
F (pi ) − F ∗ (p )
n

¯
+ DF ∗ (pF ||qF )

¯
But DF ∗ (pF ||qF ) = DF ∗∗ (∇F ∗ ◦ ∇F (q)||∇F ∗ ( n ∇F (pi ))) =
i=1
∗
−1
F
DF (q||cL ) since F ∗∗ = F , ∇F = ∇F and ∇F ∗ ◦ ∇F (q) = q.
1
F
F
arg min (AVGF (P||q) + AVGF (q||P)) ⇐⇒ arg min DF (cR ||q)+DF (q||cL )
c∈X 2
q∈X

(removing all terms independent of q)
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Geometric characterization

Theorem
The symmetrized Bregman centroid c F is uniquely defined as
F
F
the minimizer of DF (cR ||q) + DF (q||cL ). It is defined
F
F
F
F
geometrically as c F = ΓF (cR , cL ) ∩ MF (cR , cL ), where
F
F
F
F
ΓF (cR , cL ) = {(∇F )−1 ((1 − λ)∇F (cR ) + λ∇F (cL )) | λ ∈ [0, 1]}
F
F
F
F
is the geodesic linking cR to cL , and MF (cR , cL ) is the
mixed-type Bregman bisector:
F
F
F
F
MF (cR , cL ) = {x ∈ X | DF (cR ||x) = DF (x||cL )}.
Proof by contradiction using Bregman Pythagoras’ theorem.

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Generalized Bregman Pythagoras’ theorem

Projection pW of point p to a convex subset W ⊆ X .
DF (w||p) ≥ DF (w||pW ) + DF (pW ||p)
with equality for and only for affine sets W

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Proof by contradiction

Bregman projection: q⊥ = arg mint∈Γ(c F ,c F ) DF (t||q)
R

L

DF (p||q) ≥ D(p||PΩ (q)) + DF (PΩ (q)||q)
PΩ (q) = arg min DF (ω||q)
ω∈Ω

F
DF (cR ||q) ≥ DF (cR ||q⊥ ) + DF (q⊥ ||q)
F
F
DF (q||cL ) ≥ DF (q||q⊥ ) + DF (q⊥ ||CL )

F
F
F
F
DF (cR ||q) + DF (q||cL ) ≥ DF (cR ||q⊥ ) + DF (q⊥ ||cL ) + (DF (q⊥ ||q) + DF (q||q⊥ ))

But DF (q⊥ ||q) + DF (q||q⊥ ) > 0 yields contradiction.
Proved mixed-type bisector by moving q along the geodesic
F
F
while reducing |DF (cR ||q) − DF (q||cL )|.
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
A few examples: Kullback-Leibler & Itakura-Saito

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
A few examples: Exponential & Logistic losses

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Non-linear mixed-type bisector

MF (p, q) = {

x ∈ X | F (p) − F (q) − 2 F (x) −
< p, x > + < x, x > + < x, q > − < q, q >= 0}

→ non-closed form solution.
Corollary
The symmetrized Bregman divergence minimization problem is
both lower and upper bounded as follows:
F
F
JSF (P) ≤ AVGF (P; c F ) ≤ DF (cR ||cL )

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Geodesic-walk dichotomic approximation algorithm
ALGORITHM:
Geodesic is parameterized by λ ∈ [0, 1].
Start with λm = 0 and λM = 1.
Geodesic walk. Compute interval midpoint λh = λm +λM and
2
corresponding geodesic point
F
F
qλh = (∇F )−1 ((1 − λh )∇F (cR ) + λh ∇F (cL )),
Mixed-type bisector side. Evaluate the sign of
F
R
DF (cR ||qh ) − DF (qh ||cL ), and
Dichotomy. Branch on [λh , λM ] if the sign is negative, or on
[λm , λh ] otherwise.
Precision and number of iterations as a function of
hF = maxx∈Γ(c F ,c F ) ||HF (x)||2 . (Bregman balls, ECML’05)
R

L

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Applications

Applications of the dichotomic geodesic walk algorithm
Symmetrized non-parametric Kullback-Leibler (SKL),
Symmetrized Kullback-Leibler of multivariate normals,
(parametric distributions)
´
Symmetrized Bregman-Csiszar centroids,
→ include J-divergence and COSH distance
(Itakura-Saito symmetrized divergence).

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
The symmetrized Kullback-Leibler divergence
For two discrete probability mass functions p and q:
d

KL(p||q) =

p
i=1

(i)

p(i)
log (i)
q

For continuous distributions
p(x)
KL(p||q) = p(x) log
.
q(x)
x
Symmetric Kullback-Leibler is called J-divergence.
Observe that finite discrete distributions are parametric.
→ Degree of freedom is cardinal of sample space minus one.
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Exponential families and divergences
Exponential families in statistics have the following pdf.:
exp(< θ, t(x) > −F (θ) + C(x))
θ: natural parameter (dimension: order of the exp. fam.)
t(x): sufficient statistics (← Fisher-Neyman factorization),
F : log normalizer (cumulant) characterizes the family,
C(x): carrier measure (usually Lebesgue or counting).
Kullback-Leibler of exp. fam. is a Bregman divergence
KL(p(θp |F )||q(θq |F )) = DF (θq ||θp )

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Discrete distributions are multinomials
Multinomials extend Bernoulli distributions.
Conversion from source q to natural θ parameters:
θ

(i)

q (i)
q (i)
= log (d) = log
q
1 − d−1 q (j)
j=1

with q (d) = 1 − d−1 q (j)
j=1
... and to convert back from natural to source parameters:
q (i) =
with q (d) =

exp θ(i)
1+

d−1
j=1 (1

+ exp θ(j) )

1
1+

d−1
(1+exp θ(j) )
j=1

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Dual log normalizers
Log normalizer of multinomials is
d−1

F (θ) = log 1 +

exp θ(i)

i=1

Logistic entropy on open space Θ = Rd−1 .
Dual Legendre function F ∗ = LF is d-ary entropy:
F ∗ (η) =

d−1

η (i) log η (i)

d−1

+

i=1

Frank Nielsen and Richard Nock

1−
i=1

η (i)

d−1

log 1 −

η (i)

i=1

The Centroids of Symmetrized Bregman Divergences
Geodesic of multinomials
Both ∇F and ∇F ∗ are necessary for:
computing the left-sided centroid (∇F -means),
walking on the geodesic ((∇F )−1 = ∇F ∗ ):
F
F
qλ = (∇F )−1 (1 − λ)∇F (cR ) + λ∇F (cL )

∇F (θ) =

exp θ(i)
1+

d−1
(j)
j=1 exp θ

def

=η

i

It follows from Legendre transformation that ∇F ∗ = ∇F −1
∇−1 F (η) =

log

Frank Nielsen and Richard Nock

η (i)
1−

d−1 (j)
j=1 η

def

=θ

i

The Centroids of Symmetrized Bregman Divergences
Example: Centroids of histograms
Centroids are generalized means:
→ coordinates are always inside the extrema...

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Example: Centroids of histograms
... but not necessarily true for the corresponding histograms!

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Centroids of histograms: Java Applet
Generalize ad-hoc convex programming method of
(Veldhuis’02).

Try with your own images online using Java applet at:
http://www.sonycsl.co.jp/person/nielsen/SBDj/
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Side-by-side comparison

(Vedlhuis’02)
Frank Nielsen and Richard Nock

Geodesic dichotomic walk
The Centroids of Symmetrized Bregman Divergences
Entropic means of multivariate normal distributions
Multivariate normal distributions of Rd has following pdf.:
Pr(X = x) =
p(x; μ, Σ)
=

1
d√

(2π) 2

(x − μ)T Σ−1 (x − μ)
exp −
2
detΣ

Source parameter is a mixed-type of vector and matrix:
˜
Λ = (μ, Σ)
Order of the parametric distribution family is D =

Frank Nielsen and Richard Nock

d(d+3)
2

> d.

The Centroids of Symmetrized Bregman Divergences
Exponential family decomposition
Multivariate normal distribution belongs to the exponential
families:
exp(< θ, t(x) > −F (θ) + C(x))
˜
Sufficient statistics: x = (x, − 1 xx T )
2
˜
Natural parameters: Θ = (θ, Θ) = (Σ−1 μ, 1 Σ−1 )
2
Log normalizer
˜ = 1 Tr(Θ−1 θθT ) − 1 log detΘ + d log π
F (Θ)
4
2
2
Mixed-type separable inner product:
˜ ˜
< Θp , Θq >=< Θp , Θq > + < θp , θq >
with matrix inner product defined as the following trace:
< Θp , Θq >= Tr(Θp ΘT )
q
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Dual Legendre functions

d
1
1
−1 T
Tr(Θ θθ ) − log detΘ + log π
4
2
2
d
1
1
∗ ˜
T −1
F (H) = − log(1 + η H η) − log det(−H) − log(2πe)
2
2
2
˜
F (Θ) =

˜
˜
Parameters transformations H ↔ Θ ↔ Λ
˜
H =

η=μ
H = −(Σ + μμT )

˜
˜
H = ∇Θ F (Θ) =
˜

∇Θ F (θ)
˜
∇Θ F (Θ)
˜

∗ ˜
˜
Θ = ∇H F (H) =
˜

⇐ Λ=
⇒ ˜

=

λ=μ
Λ=Σ

θ = Σ−1 μ
Θ = 1 Σ−1
2

⇐ Θ=
⇒ ˜

1 Θ−1 θ
2
1 Θ−1 − 1 (Θ−1 θ)(Θ−1 θ)T
−2
4

∇H F ∗ (η)
˜
∇H F ∗ (H)
˜

=

Frank Nielsen and Richard Nock

−(H + ηη T )−1 η
− 1 (H + ηη T )−1
2

=

=

μ
−(Σ + μμT )

Σ−1 μ
1 Σ−1
2

The Centroids of Symmetrized Bregman Divergences
Example of entropic multivariate normal means
Simplify and extend (Myrvoll & Soong’03)

Right in red
Left in blue
Symmetrized in green
Geodesic half point λ =
in purple

1
2

Left DF centroid is a right Kullback-Leibler centroid.
→ Generalize the approach of (Davis & Dhillon, NIPS’06).
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Clustering multivariate normal means
Center-based clustering (Banerjee et al., JMLR’05),
(Teboulle’07)

˜
Primal (natural Θ)
Frank Nielsen and Richard Nock

˜
Dual (expectation H)
The Centroids of Symmetrized Bregman Divergences
COSH distance: Symmetrized Itakura-Saito

Burg entropy F (x) = −
d

IS(p||q) =
i=1

d
log x (i)
i=1

yields Itakura-Saito div.

pi
pi
+ log − 1
qi
qi

= DF (p||q).

Symmetrized Itakura-Saito divergence is called COSH distance
IS(p||q) + IS(q||p)
COSH(p; q) =
2
(Wei & Gibson’00) COSH performs “best” for sound processing.

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
´
Bregman-Csiszar centroids
´
(Csiszar Ann. Stat.’91) and (Lafferty COLT’99) used the
following continuous family of generators:
⎧
α=0
⎨ x − log x − 1
1
(−x α + αx − α + 1) α ∈ (0, 1)
Fα =
⎩ α(1−α)
x log x − x + 1
α=1
Continuum of Bregman divergences:
q
p
BC0 (p||q) = log + − 1 Itakura-Saito
p q
1
BCα (p||q) =
q α (x) − pα (x) + αq(x)α−1 (p(x) − q(x))
α(1 − α)
p
BC1 (p||q) = p log − p + q Ext. Kullback-Leibler.
q
→ Smooth spectrum of symmetric entropic centroids.
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Summary of presentation and contributions
Bregman centroids (Kullback-Leibler exp. fam. centroids)
Left-sided and right-sided Bregman barycenters are
generalized means,
Symmetrized Bregman centroid c F is exactly geometrically
characterized,
Simple dichotomic geodesic walk to approximate c F
(2-mean on sided centroids)
Generalize ad-hoc solutions with geometric interpretation:
SKL centroid for non-parameter histograms (Veldhuis’02)
SKL centroid for parametric multivariate normals
(Myrvoll & Soong’03)
Right KL for multivariate normal (Davis & Dhillon’06)

Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences
Thank you!

References:
arXiv.org – cs – cs.CG (Computational Geometry)
arXiv:0711.3242
http://www.sonycsl.co.jp/person/nielsen/BregmanCentroids/

Acknowledgments:
´
Sony CSL, Ecole Polytechnique (LIX), CEREGMIA, ANR GAIA.
Frank Nielsen and Richard Nock

The Centroids of Symmetrized Bregman Divergences

More Related Content

What's hot

QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
The Statistical and Applied Mathematical Sciences Institute
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
zukun
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptography
wtyru1989
 

What's hot (20)

On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract means
 
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
Darmon Points: an Overview
Darmon Points: an OverviewDarmon Points: an Overview
Darmon Points: an Overview
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptography
 
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Detection & Attributi...
CLIM Fall 2017 Course: Statistics for Climate Research, Detection & Attributi...CLIM Fall 2017 Course: Statistics for Climate Research, Detection & Attributi...
CLIM Fall 2017 Course: Statistics for Climate Research, Detection & Attributi...
 
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Wireless Localization: Ranging (second part)
Wireless Localization: Ranging (second part)Wireless Localization: Ranging (second part)
Wireless Localization: Ranging (second part)
 
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
 

Similar to Slides: The Centroids of Symmetrized Bregman Divergences

Similar to Slides: The Centroids of Symmetrized Bregman Divergences (20)

Slides: The Burbea-Rao and Bhattacharyya centroids
Slides: The Burbea-Rao and Bhattacharyya centroidsSlides: The Burbea-Rao and Bhattacharyya centroids
Slides: The Burbea-Rao and Bhattacharyya centroids
 
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
 
Bregman Voronoi Diagrams (SODA 2007)
Bregman Voronoi Diagrams (SODA 2007)  Bregman Voronoi Diagrams (SODA 2007)
Bregman Voronoi Diagrams (SODA 2007)
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexity
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
 
On the smallest enclosing information disk
 On the smallest enclosing information disk On the smallest enclosing information disk
On the smallest enclosing information disk
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
Practical computation of Hecke operators
Practical computation of Hecke operatorsPractical computation of Hecke operators
Practical computation of Hecke operators
 
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
 
On approximating the Riemannian 1-center
On approximating the Riemannian 1-centerOn approximating the Riemannian 1-center
On approximating the Riemannian 1-center
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)
 
Tales on two commuting transformations or flows
Tales on two commuting transformations or flowsTales on two commuting transformations or flows
Tales on two commuting transformations or flows
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Md2521102111
Md2521102111Md2521102111
Md2521102111
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 

Slides: The Centroids of Symmetrized Bregman Divergences

  • 1. The Centroids of Symmetrized Bregman Divergences Frank Nielsen1 1 Sony Richard Nock2 Computer Science Laboratories Inc., FRL ´ Ecole Polytechnique, LIX Frank.Nielsen@acm.org 2 University of Antilles-Guyanne CEREGMIA Richard.Nock@martinique.univ-ag.fr December 2007 Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 2. The centroid in Euclidean geometry ¯ Given a point set P = {p1 , ..., pn } of Ed , the centroid c : ¯ Is the center of mass: c = 1 n n i=1 pi , 1 Minimizes minc∈Rd n n ||cpi || 2 : i=1 M IN AVG squared Euclidean distance optimization, Plays a central role in center-based clustering methods (k-means of Lloyd’1957) Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 3. Centroid and barycenters Notion of centroid extends to barycenters: ¯ b(w) = n wi pi i=1 → barycentric coordinates for interpolation, with ||w|| = 1. ¯ Barycenter b(w) minimizes: n wi ||cpi ||2 min c∈Rd i=1 → weighted M IN AVG optimization. Intracluster min. weighted average Frank Nielsen and Richard Nock n ¯ wi ||b(w)pi ||2 . i=1 The Centroids of Symmetrized Bregman Divergences
  • 4. Center points in Euclidean geometry Centroid ×: robust to outliers with simple closed-form sol. 1 ¯ “Mean” radius is n n ||c pi ||2 . i=1 Circumcenter : minimizes the radius of enclosing ball. Combinatorially defined by at most d + 1 points. (MiniBall Welzl’1991) M INI M AX (non-differentiable) optimization problem: C = min maxi ||cpi ||2 c M IN AVG(L2 ) → Fermat-Weber point ◦ → no closed form solution. Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 5. Bregman divergences Aim at generalizing Euclidean centroids to dually flat spaces. Bregman divergences DF : F : X → R strictly convex and differentiable function defined over an open convex domain X : DF (p||q) = F (p) − F (q) − p − q, ∇F (q) not a metric (symmetry and triangle inequality may fail) versatile family, popular in Comp. Sci.–Machine learning. Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 6. Bregman divergence: Geometric interpretation DF (p||q) = F (p) − F (q) − p − q, ∇F (q) DF (p||q) ≥ 0 (with equality iff. p = q) F : generator, contrast function or potential function (inf. geom.). Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 7. Bregman divergence Example 1: The squared Euclidean distance F (x) = x 2 : strictly convex and differentiable over Rd (Multivariate F (x) = d xi2 , obtained coordinatewise) i=1 DF (p||q) = F (p) − F (q) − p − q, ∇F (q) = p2 − q 2 − p − q, 2q − p2 − q 2 − 2 p, q + 2q 2 = p−q Frank Nielsen and Richard Nock 2 The Centroids of Symmetrized Bregman Divergences
  • 8. Example 1: The squared Euclidean distance Java applet: http://www.sonycsl.co.jp/person/nielsen/BregmanDivergence/ Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 9. Bregman divergence Example 2: The relative entropy (Kullback-Leibler divergence) F (p) = p(x) log p(x) dx (negative Shannon entropy) (Discrete distributions F (p) = x p(x) log p(x) dx) DF (p||q) = = (p(x) log p(x) − q(x) log q(x) − p(x) − q(x), log q(x) + 1 )) dx p(x) dx (KL divergence) p(x) log q(x) Kullback-Leiber divergence also known as: relative entropy, discrimination information or I-divergence. (Defined either on the probability simplex or extended on the full positive quadrant – unnormalized pdf.) Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 10. Example 2: The relative entropy (Kullback-Leibler divergence) Java applet: http://www.sonycsl.co.jp/person/nielsen/BregmanDivergence/ Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 11. Bregman divergences: A versatile family of measures Bregman divergences are versatile, suited to mixed-type data. (Build mixed-type multivariate divergences dimensionwise using elementary uni-type divergences.) Fact (Linearity) Bregman divergence is a linear operator: ∀F1 ∈ C ∀F2 ∈ C DF1 +λF2 (p||q) = DF1 (p||q) + λDF2 (p||q) for any λ ≥ 0. Fact (Equivalence classes) Let G(x) = F (x) + a, x + b be another strictly convex and differentiable function, with a ∈ Rd and b ∈ R. Then DF (p||q) = DG (p||q). (Simplify Bregman generators by removing affine terms.) Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 12. Sided and symmetrized centroids from M IN AVG Right-sided and left-sided centroids: F cR 1 = arg min c∈X n 1 F cL = arg min c∈X n n DF (pi || c ) i=1 n DF ( c ||pi ) i=1 Symmetrized centroid: 1 c = arg min c∈X n n F i=1 DF (pi || c ) + DF ( c ||pi ) 2 SF Symmetrized asymmetric Bregman divergence SF is not a Bregman divergence (because of domain convexity) Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 13. Sided right-type centroid Theorem F The right-type sided Bregman centroid cR of a set P of n points p1 , ...pn , defined as the minimizer for the average right divergence 1 F cR = arg minc n n DF (pi ||c) = arg minc AVGF (P||c), is i=1 unique, independent of the selected divergence DF , and F ¯ 1 i=1 coincides with the center of mass cR = cR = p = n n pi . Proof: Start with the function to minimize: n AVGF (P||q) = i=1 Frank Nielsen and Richard Nock 1 DF (pi ||q) n The Centroids of Symmetrized Bregman Divergences
  • 14. n AVGF (P||q) = i=1 ⎛ AVGF (P, q) = ⎝ ⎛ = ⎝ ⎛ = ⎝ n i=1 n i=1 1 1 (F (pi ) − F (q)− < pi − q, ∇F (q) >) n ⎞ 1 n ¯ ¯ F (pi ) − F (p)⎠ + ⎝ F (p ) − F (q) − ⎞ 1 n n n i=1 ⎛ ⎞ n 1 i=1 ⎛ ¯ ¯ F (pi ) − F (p)⎠ + ⎝ F (p ) − F (q) − n n i=1 ⎞ < pi − q, ∇F (q) >⎠ , ⎞ 1 n (pi − q), ∇F (q) ⎠ , ¯ ¯ F (pi ) − F (p)⎠ +DF (p||q). Independent of q ¯ ¯ Minimized for q = p since DF (p||q) ≥ 0 with equality iff ¯ q=p 1 ¯ Information radius: JSF (P) = n n F (pi ) − F (p) ≥ 0 i=1 → Jensen F remainder aka. Burbea-Rao divergences ← Generalize Jensen-Shannon divergences (Lin’1991) Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 15. Dual Bregman divergence Legendre-Fenchel (slope) transformation F → G = LF : G(y) = sup{< y, x > −F (x)} x∈X G= F∗ (with G∗ = F ∗∗ def = F ), minimized for y = x =∇F (x). F ∗ (x ) =< x, x > −F (x) Dual Bregman divergence DF (p||q) = F (p) + F ∗ (∇F (q))− < p, ∇F (q) > = F (p) + F ∗ (q )− < p, q > = DF ∗ (q ||p ) More details in “Bregman Voronoi diagrams”, arXiv:0709.2196 The Centroids of Symmetrized Bregman Divergences (SODA’07) Frank Nielsen and Richard Nock
  • 16. Left-sided Bregman centroid Theorem F The left-type sided Bregman centroid cL , defined as the minimizer for the average left divergence F F cL = arg minc∈X AVGF (c||P), is the unique point cL ∈ X such L F ¯ that cL = (∇F )−1 (p ) = (∇F )−1 ( n ∇F (pi )), where i=1 F ∗ (P ) is the center of mass for the gradient point set ¯ p = cR F PF = {pi = ∇F (pi ) | pi ∈ P}. F cL = arg min AVGF ( c ||P) ⇔ arg min AVGF ∗ (PF || c’ ) = c∈X c ∈X F∗ cR (PF ) The information radii of sided Bregman centroids are equal: 1 F F AVGF (P||cR ) = AVGF (cL ||P) = JSF (P) = n n ¯ F (pi )−F (p) > 0 i=1 is the F -Jensen-Shannon divergence for the uniform weight Frank The Centroids of Symmetrized Bregman Divergences distribution. Nielsen and Richard Nock
  • 17. Generalized means A sequence V of n real numbers V = {v1 , ..., vn } 1 n M(V; f ) = f −1 n f (vi ) i=1 For example, Pythagoras’ means: Arithmetic: f (x) = x (average) Geometric: f (x) = log x (central tendency) Harmonic: f (x) = 1 x (average of rates) min xi ≤ M(V; f ) ≤ max xi i i min and max: power means (f (x) = x p ) for p → ±∞ Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 18. Bijection: Bregman divergences and means Bijection: Bregman divergence DF ↔ ∇F -means M(S; f ) = M(S; af + b) ∀a ∈ R+ and ∀b ∈ R ∗ Recall one property of Bregman divergences: Fact (Equivalence classes) Let G(x) = F (x) + a, x + b be another strictly convex and differentiable function, with a ∈ Rd and b ∈ R. Then DF (p||q) = DG (p||q). Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 19. Left- and right-sided Bregman barycenters Left- and right-sided Bregman centroids extend to barycenters. Theorem Bregman divergences are in bijection with generalized means. F The right-type barycenter bR (w) is independent of F and computed as the weighted arithmetic mean on the point set, a generalized mean for the identity function: F bR (P; w) = bR (P; w) = M(P; x; w) with def M(P; f ; w) =f −1 ( n wi f (vi )). The left-type Bregman i=1 F is computed as a generalized mean on the point barycenter bL F set for the gradient function: bL (P) = M(P; ∇F ; w). The information radius of sided barycenters is: JSF (P; w) = d wi F (pi ) − F ( d wi pi ). i=1 i=1 Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 20. Symmetrized Bregman centroid Symmetrized centroid defined from the minimum average optimization problem: 1 c = arg min c∈X n n F i=1 DF (c||pi ) + DF (pi ||c) = arg min AVG(P; c) c∈X 2 Lemma The symmetrized Bregman centroid c F is unique and obtained F F by minimizing minq∈X DF (cR ||q) + DF (q||cL ): F F c F = arg min DF (cR ||q) + DF (q||cL ). q∈X → Minimization problem depends only on sided centroids. Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 21. Proof n AVGF (P||q) = i=1 1 ¯ F (pi ) − F (p) n ¯ + DF (p||q) AVGF (q||P) = AVGF ∗ (PF ||q ) n = i=1 1 ∗ ¯ F (pi ) − F ∗ (p ) n ¯ + DF ∗ (pF ||qF ) ¯ But DF ∗ (pF ||qF ) = DF ∗∗ (∇F ∗ ◦ ∇F (q)||∇F ∗ ( n ∇F (pi ))) = i=1 ∗ −1 F DF (q||cL ) since F ∗∗ = F , ∇F = ∇F and ∇F ∗ ◦ ∇F (q) = q. 1 F F arg min (AVGF (P||q) + AVGF (q||P)) ⇐⇒ arg min DF (cR ||q)+DF (q||cL ) c∈X 2 q∈X (removing all terms independent of q) Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 22. Geometric characterization Theorem The symmetrized Bregman centroid c F is uniquely defined as F F the minimizer of DF (cR ||q) + DF (q||cL ). It is defined F F F F geometrically as c F = ΓF (cR , cL ) ∩ MF (cR , cL ), where F F F F ΓF (cR , cL ) = {(∇F )−1 ((1 − λ)∇F (cR ) + λ∇F (cL )) | λ ∈ [0, 1]} F F F F is the geodesic linking cR to cL , and MF (cR , cL ) is the mixed-type Bregman bisector: F F F F MF (cR , cL ) = {x ∈ X | DF (cR ||x) = DF (x||cL )}. Proof by contradiction using Bregman Pythagoras’ theorem. Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 23. Generalized Bregman Pythagoras’ theorem Projection pW of point p to a convex subset W ⊆ X . DF (w||p) ≥ DF (w||pW ) + DF (pW ||p) with equality for and only for affine sets W Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 24. Proof by contradiction Bregman projection: q⊥ = arg mint∈Γ(c F ,c F ) DF (t||q) R L DF (p||q) ≥ D(p||PΩ (q)) + DF (PΩ (q)||q) PΩ (q) = arg min DF (ω||q) ω∈Ω F DF (cR ||q) ≥ DF (cR ||q⊥ ) + DF (q⊥ ||q) F F DF (q||cL ) ≥ DF (q||q⊥ ) + DF (q⊥ ||CL ) F F F F DF (cR ||q) + DF (q||cL ) ≥ DF (cR ||q⊥ ) + DF (q⊥ ||cL ) + (DF (q⊥ ||q) + DF (q||q⊥ )) But DF (q⊥ ||q) + DF (q||q⊥ ) > 0 yields contradiction. Proved mixed-type bisector by moving q along the geodesic F F while reducing |DF (cR ||q) − DF (q||cL )|. Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 25. A few examples: Kullback-Leibler & Itakura-Saito Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 26. A few examples: Exponential & Logistic losses Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 27. Non-linear mixed-type bisector MF (p, q) = { x ∈ X | F (p) − F (q) − 2 F (x) − < p, x > + < x, x > + < x, q > − < q, q >= 0} → non-closed form solution. Corollary The symmetrized Bregman divergence minimization problem is both lower and upper bounded as follows: F F JSF (P) ≤ AVGF (P; c F ) ≤ DF (cR ||cL ) Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 28. Geodesic-walk dichotomic approximation algorithm ALGORITHM: Geodesic is parameterized by λ ∈ [0, 1]. Start with λm = 0 and λM = 1. Geodesic walk. Compute interval midpoint λh = λm +λM and 2 corresponding geodesic point F F qλh = (∇F )−1 ((1 − λh )∇F (cR ) + λh ∇F (cL )), Mixed-type bisector side. Evaluate the sign of F R DF (cR ||qh ) − DF (qh ||cL ), and Dichotomy. Branch on [λh , λM ] if the sign is negative, or on [λm , λh ] otherwise. Precision and number of iterations as a function of hF = maxx∈Γ(c F ,c F ) ||HF (x)||2 . (Bregman balls, ECML’05) R L Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 29. Applications Applications of the dichotomic geodesic walk algorithm Symmetrized non-parametric Kullback-Leibler (SKL), Symmetrized Kullback-Leibler of multivariate normals, (parametric distributions) ´ Symmetrized Bregman-Csiszar centroids, → include J-divergence and COSH distance (Itakura-Saito symmetrized divergence). Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 30. The symmetrized Kullback-Leibler divergence For two discrete probability mass functions p and q: d KL(p||q) = p i=1 (i) p(i) log (i) q For continuous distributions p(x) KL(p||q) = p(x) log . q(x) x Symmetric Kullback-Leibler is called J-divergence. Observe that finite discrete distributions are parametric. → Degree of freedom is cardinal of sample space minus one. Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 31. Exponential families and divergences Exponential families in statistics have the following pdf.: exp(< θ, t(x) > −F (θ) + C(x)) θ: natural parameter (dimension: order of the exp. fam.) t(x): sufficient statistics (← Fisher-Neyman factorization), F : log normalizer (cumulant) characterizes the family, C(x): carrier measure (usually Lebesgue or counting). Kullback-Leibler of exp. fam. is a Bregman divergence KL(p(θp |F )||q(θq |F )) = DF (θq ||θp ) Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 32. Discrete distributions are multinomials Multinomials extend Bernoulli distributions. Conversion from source q to natural θ parameters: θ (i) q (i) q (i) = log (d) = log q 1 − d−1 q (j) j=1 with q (d) = 1 − d−1 q (j) j=1 ... and to convert back from natural to source parameters: q (i) = with q (d) = exp θ(i) 1+ d−1 j=1 (1 + exp θ(j) ) 1 1+ d−1 (1+exp θ(j) ) j=1 Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 33. Dual log normalizers Log normalizer of multinomials is d−1 F (θ) = log 1 + exp θ(i) i=1 Logistic entropy on open space Θ = Rd−1 . Dual Legendre function F ∗ = LF is d-ary entropy: F ∗ (η) = d−1 η (i) log η (i) d−1 + i=1 Frank Nielsen and Richard Nock 1− i=1 η (i) d−1 log 1 − η (i) i=1 The Centroids of Symmetrized Bregman Divergences
  • 34. Geodesic of multinomials Both ∇F and ∇F ∗ are necessary for: computing the left-sided centroid (∇F -means), walking on the geodesic ((∇F )−1 = ∇F ∗ ): F F qλ = (∇F )−1 (1 − λ)∇F (cR ) + λ∇F (cL ) ∇F (θ) = exp θ(i) 1+ d−1 (j) j=1 exp θ def =η i It follows from Legendre transformation that ∇F ∗ = ∇F −1 ∇−1 F (η) = log Frank Nielsen and Richard Nock η (i) 1− d−1 (j) j=1 η def =θ i The Centroids of Symmetrized Bregman Divergences
  • 35. Example: Centroids of histograms Centroids are generalized means: → coordinates are always inside the extrema... Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 36. Example: Centroids of histograms ... but not necessarily true for the corresponding histograms! Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 37. Centroids of histograms: Java Applet Generalize ad-hoc convex programming method of (Veldhuis’02). Try with your own images online using Java applet at: http://www.sonycsl.co.jp/person/nielsen/SBDj/ Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 38. Side-by-side comparison (Vedlhuis’02) Frank Nielsen and Richard Nock Geodesic dichotomic walk The Centroids of Symmetrized Bregman Divergences
  • 39. Entropic means of multivariate normal distributions Multivariate normal distributions of Rd has following pdf.: Pr(X = x) = p(x; μ, Σ) = 1 d√ (2π) 2 (x − μ)T Σ−1 (x − μ) exp − 2 detΣ Source parameter is a mixed-type of vector and matrix: ˜ Λ = (μ, Σ) Order of the parametric distribution family is D = Frank Nielsen and Richard Nock d(d+3) 2 > d. The Centroids of Symmetrized Bregman Divergences
  • 40. Exponential family decomposition Multivariate normal distribution belongs to the exponential families: exp(< θ, t(x) > −F (θ) + C(x)) ˜ Sufficient statistics: x = (x, − 1 xx T ) 2 ˜ Natural parameters: Θ = (θ, Θ) = (Σ−1 μ, 1 Σ−1 ) 2 Log normalizer ˜ = 1 Tr(Θ−1 θθT ) − 1 log detΘ + d log π F (Θ) 4 2 2 Mixed-type separable inner product: ˜ ˜ < Θp , Θq >=< Θp , Θq > + < θp , θq > with matrix inner product defined as the following trace: < Θp , Θq >= Tr(Θp ΘT ) q Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 41. Dual Legendre functions d 1 1 −1 T Tr(Θ θθ ) − log detΘ + log π 4 2 2 d 1 1 ∗ ˜ T −1 F (H) = − log(1 + η H η) − log det(−H) − log(2πe) 2 2 2 ˜ F (Θ) = ˜ ˜ Parameters transformations H ↔ Θ ↔ Λ ˜ H = η=μ H = −(Σ + μμT ) ˜ ˜ H = ∇Θ F (Θ) = ˜ ∇Θ F (θ) ˜ ∇Θ F (Θ) ˜ ∗ ˜ ˜ Θ = ∇H F (H) = ˜ ⇐ Λ= ⇒ ˜ = λ=μ Λ=Σ θ = Σ−1 μ Θ = 1 Σ−1 2 ⇐ Θ= ⇒ ˜ 1 Θ−1 θ 2 1 Θ−1 − 1 (Θ−1 θ)(Θ−1 θ)T −2 4 ∇H F ∗ (η) ˜ ∇H F ∗ (H) ˜ = Frank Nielsen and Richard Nock −(H + ηη T )−1 η − 1 (H + ηη T )−1 2 = = μ −(Σ + μμT ) Σ−1 μ 1 Σ−1 2 The Centroids of Symmetrized Bregman Divergences
  • 42. Example of entropic multivariate normal means Simplify and extend (Myrvoll & Soong’03) Right in red Left in blue Symmetrized in green Geodesic half point λ = in purple 1 2 Left DF centroid is a right Kullback-Leibler centroid. → Generalize the approach of (Davis & Dhillon, NIPS’06). Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 43. Clustering multivariate normal means Center-based clustering (Banerjee et al., JMLR’05), (Teboulle’07) ˜ Primal (natural Θ) Frank Nielsen and Richard Nock ˜ Dual (expectation H) The Centroids of Symmetrized Bregman Divergences
  • 44. COSH distance: Symmetrized Itakura-Saito Burg entropy F (x) = − d IS(p||q) = i=1 d log x (i) i=1 yields Itakura-Saito div. pi pi + log − 1 qi qi = DF (p||q). Symmetrized Itakura-Saito divergence is called COSH distance IS(p||q) + IS(q||p) COSH(p; q) = 2 (Wei & Gibson’00) COSH performs “best” for sound processing. Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 45. ´ Bregman-Csiszar centroids ´ (Csiszar Ann. Stat.’91) and (Lafferty COLT’99) used the following continuous family of generators: ⎧ α=0 ⎨ x − log x − 1 1 (−x α + αx − α + 1) α ∈ (0, 1) Fα = ⎩ α(1−α) x log x − x + 1 α=1 Continuum of Bregman divergences: q p BC0 (p||q) = log + − 1 Itakura-Saito p q 1 BCα (p||q) = q α (x) − pα (x) + αq(x)α−1 (p(x) − q(x)) α(1 − α) p BC1 (p||q) = p log − p + q Ext. Kullback-Leibler. q → Smooth spectrum of symmetric entropic centroids. Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 46. Summary of presentation and contributions Bregman centroids (Kullback-Leibler exp. fam. centroids) Left-sided and right-sided Bregman barycenters are generalized means, Symmetrized Bregman centroid c F is exactly geometrically characterized, Simple dichotomic geodesic walk to approximate c F (2-mean on sided centroids) Generalize ad-hoc solutions with geometric interpretation: SKL centroid for non-parameter histograms (Veldhuis’02) SKL centroid for parametric multivariate normals (Myrvoll & Soong’03) Right KL for multivariate normal (Davis & Dhillon’06) Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences
  • 47. Thank you! References: arXiv.org – cs – cs.CG (Computational Geometry) arXiv:0711.3242 http://www.sonycsl.co.jp/person/nielsen/BregmanCentroids/ Acknowledgments: ´ Sony CSL, Ecole Polytechnique (LIX), CEREGMIA, ANR GAIA. Frank Nielsen and Richard Nock The Centroids of Symmetrized Bregman Divergences