SlideShare a Scribd company logo
1 of 19
Download to read offline
Total Jensen divergences: Definition, Properties
and k-Means++ Clustering
Frank Nielsen1 Richard Nock2
www.informationgeometry.org
1 Sony

Computer Science Laboratories, Inc.
2 UAG-CEREGMIA

September 2013

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

1/19
Divergences: Distortion measures
F a smooth convex function, the generator.
◮ Skew Jensen divergences:
′
Jα (p : q) = αF (p) + (1 − α)F (q) − F (αp + (1 − α)q),

= (F (p)F (q))α − F ((pq)α ),

◮

where (pq)γ = γp + (1 − γ)q = q + γ(p − q) and
(F (p)F (q))γ = γF (p)+(1−γ)F (q) = F (q)+γ(F (p)−F (q)).
Bregman divergences:
B(p : q) = F (p) − F (q) − p − q, ∇F (q) ,
lim Jα (p : q) = B(p : q),
α→0

lim Jα (p : q) = B(q : p).
α→1
◮

Statistical Bhattacharrya divergence:
Bhat(p1 : p2 ) = − log

′
p1 (x)α p2 (x)1−α dν(x) = Jα (θ1 : θ2 )

for exponential families [5].

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

2/19
Geometrically designed divergences
Plot of the convex generator F .
F : (x, F (x))

(q, F (q))

(p, F (p))

J(p, q)

tB(p : q)
B(p : q)

q

p+q
2

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

p

3/19
Total Bregman divergences
Conformal divergence, conformal factor ρ:
D ′ (p : q) = ρ(p, q)D(p : q)
plays the rˆle of “regularizer” [8]
o
Invariance by rotation of the axes of the design space
tB(p : q) =
ρB (q) =

B(p : q)
= ρB (q)B(p : q),
1 + ∇F (q), ∇F (q)
1
.
1 + ∇F (q), ∇F (q)

Total squared Euclidean divergence:
tE (p, q) =

1 p − q, p − q
.
2
1 + q, q

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

4/19
Total Jensen divergences

tB(p : q) = ρB (q)B(p : q),

ρB (q) =

tJα (p : q) = ρJ (p, q)Jα (p : q),

1
1 + ∇F (q), ∇F (q)

ρJ (p, q) =

1
1+

(F (p)−F (q))2
p−q,p−q

Jensen-Shannon divergence, square root is a metric [2]:
JS(p, q) =

1
2

d

pi log
i =1

2pi
1
+
pi + qi
2

d

qi log
i =1

2qi
pi + qi

Lemma
The square root of the total Jensen-Shannon divergence is not a
metric.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

5/19
Total Jensen divergence: Illustration

(F (p)F (q))α

F (p)

(F (p)F (q))β

F (p′ )

(F (p′ )F (q ′ ))α
(F (p′ )F (q ′ ))β

′
Jα (p : q)

tJ′ (p : q)
α

F (q)

p

′

:q)
tJ′ (p′ : q ′ )
α

F ((pq)α )
O

′
Jα (p′

F (q ′ )

F ((p′ q ′ )α )
(pq)α

q′

q
O

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

p′

(p′ q ′ )α

6/19
Total Jensen divergence: Illustration
α on graph plot, β on interpolated segment
Two kinds of total Jensen divergences (but one always yields
closed-form)
β ∈ [0, 1]

β>1

β<0

β ∈ [0, 1]

β>1

β<0

(F (p)F (q))β
F ((pq)α )

(F (p)F (q))β
F ((pq)α )
p

q

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

p

q

7/19
Total Jensen divergences/Total Bregman divergences
Total Jensen is not a generalization of total Bregman.
limit cases α ∈ {0, 1}, we have:
lim tJα (p : q) = ρJ (p, q)B(p : q) = ρB (q)B(p : q),

α→0

lim tJα (p : q) = ρJ (p, q)B(q : p) = ρB (p)B(q : p),

α→1

since ρJ (p, q) = ρB (q).

Squared chord slope index in ρJ :
s2 =

∆2
∆⊤ ∇F (ǫ)∆⊤ ∇F (ǫ)
F
= ∇F (ǫ), ∇F (ǫ) = ∇F (ǫ) 2 .
=
∆ 2
∆⊤ ∆

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

8/19
Conformal factor from mean value theorem
When p ≃ q, ρJ (p, q) ≃ ρB (q), and the total Jensen divergence
tends to the total Bregman divergence for any value of α.
ρJ (p, q) =

1
1 + ∇F (ǫ), ∇F (ǫ)

= ρB (ǫ),

for ǫ ∈ [p, q].

For univariate generators, explicitly the value of ǫ:
ǫ = ∇F −1

∆F
∆

= ∇F ∗

∆F
∆

,

where F ∗ is the Legendre convex conjugate [5].
Stolarsky mean [7]:
tJα (p : q) = ρB (ǫ)J(p : q)
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

9/19
Centroids and statistical robustness
Centroids (barycenters) are minimizers of average (weighted)
divergences:
n

L(x; w ) =

wi × tJα (pi : x),
i =1

cα = arg min L(x; w ),
x∈X

◮

Is it unique?

◮

Is it robust to outliers [3]?

Iterative convex-concave procedure (CCCP) [5]

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

10/19
Robustness of Jensen centroids (univariate generator)
Theorem
The Jensen centroid is robust for a strictly convex and smooth
generator f if |f ′ ( p+y )| is bounded on the domain X for any
2
prescribed p.
◮

◮

Jensen-Shannon: X = R+ , f (x) = x log x − x ,f ′ (x) = log(x),
f ′′ (x) = 1/x.
|f ′ ( p+y )| = | log p+y | is unbounded when y → +∞.
2
2
JS centroid is not robust
Jensen-Burg: X = R+ , f (x) = − log x, f ′ (x) = −1/x,
f ′′ (x) = x12
2
|f ′ ( p+y )| = | p+y | is always bounded for y ∈ (0, +∞).
2
z(y ) = 2p 2

1
2
−
p p+y

When y → ∞, we have |z(y )| → 2p < ∞.
JB centroid is robust.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

11/19
Clustering: No closed-form centroid, no cry!
k-means++ [1] picks up randomly seeds, no centroid calculation.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

12/19
Divergence-based k-means++

Theorem
Suppose there exist some U and V such that, ∀x, y , z:
tJα (x : z) ≤ U(tJα (x : y ) + tJα (y : z)) , (triangular inequality)
tJα (x : z) ≤ V tJα (z : x) , (symmetric inequality)
Then the average potential of total Jensen seeding with k clusters
satisfies
E [tJα ] ≤ 2U 2 (1 + V )(2 + log k)tJopt,α ,
where tJopt,α is the minimal total Jensen potential achieved by a
clustering in k clusters.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

13/19
Divergence-based k-means++: Two assumptions H
H:
◮

First, the maximal condition number of the Hessian of F , that
is, the ratio between the maximal and minimal eigenvalue
(> 0) of the Hessian of F , is upperbounded by K1 .

◮

Second, we assume the Lipschitz condition on F that
∆2 / ∆, ∆ ≤ K2 , for some K2 > 0.
F

Lemma
Assume 0 < α < 1. Then, under assumption H, for any
p, q, r ∈ S, there exists ǫ > 0 such that:
tJα (p : r ) ≤

2
2(1 + K2 )K1
ǫ

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

1
1
tJα (p : q) + tJα (q : r )
1−α
α

.

14/19
Divergence-based k-means++
Corollary
The total skew Jensen divergence satisfies the following triangular
inequality:
tJα (p : r ) ≤

2
2(1 + K2 )K1
(tJα (p : q) + tJα (q : r )) .
ǫα(1 − α)

U=

2
2(1 + K2 )K1
ǫ

Lemma
2
Symmetric inequality condition holds for V = K1 (1 + K2 )/ǫ, for
some 0 < ǫ < 1.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

15/19
Total Jensen divergences: Recap
Total Jensen divergence = conformal divergence with
non-separable double-sided conformal factor.
◮

Invariant to axis rotation of “design space“

◮

Equivalent to total Bregman divergences [8, 4] only when
p≃q

◮

Square root of total Jensen-Shannon divergence is not a
metric (square root of total JS is a metric).

◮

Jensen centroids are not always robust (e.g., Jensen-Shannon
centroid)

◮

Total Jensen k-means++ do not require centroid
computations and guaranteed approximation

Interest of conformal divergences in SVM [9] (double-sided
separable), in information geometry [6] (flattening).
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

16/19
Thank you.

@article{totalJensen-arXiv1309.7109 ,
author="Frank Nielsen and Richard Nock",
title="Total {J}ensen divergences: {D}efinition, Properties and $k$-Means++ Clustering",
year="2013",
eprint="arXiv/1309.7109"
}

www.informationgeometry.org

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

17/19
Bibliographic references I
David Arthur and Sergei Vassilvitskii.
k-means++: the advantages of careful seeding.
In Proceedings of the eighteenth annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages
1027–1035. Society for Industrial and Applied Mathematics, 2007.
Bent Fuglede and Flemming Topsoe.
Jensen-Shannon divergence and Hilbert space embedding.
In IEEE International Symposium on Information Theory, pages 31–31, 2004.
F. R. Hampel, P. J. Rousseeuw, E. Ronchetti, and W. A. Stahel.
Robust Statistics: The Approach Based on Influence Functions.
Wiley Series in Probability and Mathematical Statistics, 1986.
Meizhu Liu, Baba C. Vemuri, Shun-ichi Amari, and Frank Nielsen.
Shape retrieval using hierarchical total Bregman soft clustering.
Transactions on Pattern Analysis and Machine Intelligence, 34(12):2407–2419, 2012.
Frank Nielsen and Sylvain Boltz.
The Burbea-Rao and Bhattacharyya centroids.
IEEE Transactions on Information Theory, 57(8):5455–5466, August 2011.
Atsumi Ohara, Hiroshi Matsuzoe, and Shun-ichi Amari.
A dually flat structure on the space of escort distributions.
Journal of Physics: Conference Series, 201(1):012012, 2010.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

18/19
Bibliographic references II

Kenneth B Stolarsky.
Generalizations of the logarithmic mean.
Mathematics Magazine, 48(2):87–92, 1975.
Baba Vemuri, Meizhu Liu, Shun-ichi Amari, and Frank Nielsen.
Total Bregman divergence and its applications to DTI analysis.
IEEE Transactions on Medical Imaging, pages 475–483, 2011.
Si Wu and Shun-ichi Amari.
Conformal transformation of kernel functions a data dependent way to improve support vector machine
classifiers.
Neural Processing Letters, 15(1):59–67, 2002.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

19/19

More Related Content

What's hot

On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restrictionVjekoslavKovac1
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexityFrank Nielsen
 
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsA T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsVjekoslavKovac1
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsVjekoslavKovac1
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsVjekoslavKovac1
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Frank Nielsen
 
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Frank Nielsen
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Francesco Tudisco
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsVjekoslavKovac1
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsFrank Nielsen
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeVjekoslavKovac1
 
Small updates of matrix functions used for network centrality
Small updates of matrix functions used for network centralitySmall updates of matrix functions used for network centrality
Small updates of matrix functions used for network centralityFrancesco Tudisco
 
Linear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsLinear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsAlexander Litvinenko
 
Clustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryClustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryFrank Nielsen
 

What's hot (20)

On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexity
 
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsA T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurations
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operators
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
 
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurations
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cube
 
Small updates of matrix functions used for network centrality
Small updates of matrix functions used for network centralitySmall updates of matrix functions used for network centrality
Small updates of matrix functions used for network centrality
 
Linear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsLinear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficients
 
Clustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryClustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometry
 
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
 

Viewers also liked

властивості паралельних площин
властивості паралельних площинвластивості паралельних площин
властивості паралельних площинnatali7441
 
Zeynep Bülbül "Bitter'in İlk Yılbaşı" (Sayfa 1)
Zeynep Bülbül "Bitter'in İlk Yılbaşı" (Sayfa 1)Zeynep Bülbül "Bitter'in İlk Yılbaşı" (Sayfa 1)
Zeynep Bülbül "Bitter'in İlk Yılbaşı" (Sayfa 1)BrandCritique
 
Hiding in Plain Sight: The Danger of Known Vulnerabilities
Hiding in Plain Sight: The Danger of Known VulnerabilitiesHiding in Plain Sight: The Danger of Known Vulnerabilities
Hiding in Plain Sight: The Danger of Known VulnerabilitiesImperva
 
Marier — упаковка кухонной утвари. Clёver Branding | cleverbranding.ru
Marier — упаковка кухонной утвари. Clёver Branding | cleverbranding.ruMarier — упаковка кухонной утвари. Clёver Branding | cleverbranding.ru
Marier — упаковка кухонной утвари. Clёver Branding | cleverbranding.ruClever_Branding
 
Standard kualiti sekolah
Standard kualiti sekolahStandard kualiti sekolah
Standard kualiti sekolahusedry
 
Prova Comentada TCE-SP: Matemática Financeira e Estatística
Prova Comentada TCE-SP: Matemática Financeira e EstatísticaProva Comentada TCE-SP: Matemática Financeira e Estatística
Prova Comentada TCE-SP: Matemática Financeira e EstatísticaEstratégia Concursos
 

Viewers also liked (10)

властивості паралельних площин
властивості паралельних площинвластивості паралельних площин
властивості паралельних площин
 
Zeynep Bülbül "Bitter'in İlk Yılbaşı" (Sayfa 1)
Zeynep Bülbül "Bitter'in İlk Yılbaşı" (Sayfa 1)Zeynep Bülbül "Bitter'in İlk Yılbaşı" (Sayfa 1)
Zeynep Bülbül "Bitter'in İlk Yılbaşı" (Sayfa 1)
 
Hiding in Plain Sight: The Danger of Known Vulnerabilities
Hiding in Plain Sight: The Danger of Known VulnerabilitiesHiding in Plain Sight: The Danger of Known Vulnerabilities
Hiding in Plain Sight: The Danger of Known Vulnerabilities
 
Children’s defense fund
Children’s defense fundChildren’s defense fund
Children’s defense fund
 
Chile mi chile querido
Chile mi chile queridoChile mi chile querido
Chile mi chile querido
 
Marier — упаковка кухонной утвари. Clёver Branding | cleverbranding.ru
Marier — упаковка кухонной утвари. Clёver Branding | cleverbranding.ruMarier — упаковка кухонной утвари. Clёver Branding | cleverbranding.ru
Marier — упаковка кухонной утвари. Clёver Branding | cleverbranding.ru
 
Cv sample
Cv sampleCv sample
Cv sample
 
Standard kualiti sekolah
Standard kualiti sekolahStandard kualiti sekolah
Standard kualiti sekolah
 
Plasma proteins
Plasma proteinsPlasma proteins
Plasma proteins
 
Prova Comentada TCE-SP: Matemática Financeira e Estatística
Prova Comentada TCE-SP: Matemática Financeira e EstatísticaProva Comentada TCE-SP: Matemática Financeira e Estatística
Prova Comentada TCE-SP: Matemática Financeira e Estatística
 

Similar to Slides: Total Jensen divergences: Definition, Properties and k-Means++ Clustering

Slides: The Burbea-Rao and Bhattacharyya centroids
Slides: The Burbea-Rao and Bhattacharyya centroidsSlides: The Burbea-Rao and Bhattacharyya centroids
Slides: The Burbea-Rao and Bhattacharyya centroidsFrank Nielsen
 
Tales on two commuting transformations or flows
Tales on two commuting transformations or flowsTales on two commuting transformations or flows
Tales on two commuting transformations or flowsVjekoslavKovac1
 
Bregman Voronoi Diagrams (SODA 2007)
Bregman Voronoi Diagrams (SODA 2007)  Bregman Voronoi Diagrams (SODA 2007)
Bregman Voronoi Diagrams (SODA 2007) Frank Nielsen
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansFrank Nielsen
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-DivergencesFrank Nielsen
 
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCETHE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCEFrank Nielsen
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesVjekoslavKovac1
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisVjekoslavKovac1
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Frank Nielsen
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometryFrank Nielsen
 
On the smallest enclosing information disk
 On the smallest enclosing information disk On the smallest enclosing information disk
On the smallest enclosing information diskFrank Nielsen
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.pptFaizAbaas
 
Darmon Points: an Overview
Darmon Points: an OverviewDarmon Points: an Overview
Darmon Points: an Overviewmmasdeu
 
Universal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousUniversal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousJoe Suzuki
 

Similar to Slides: Total Jensen divergences: Definition, Properties and k-Means++ Clustering (20)

Slides: The Burbea-Rao and Bhattacharyya centroids
Slides: The Burbea-Rao and Bhattacharyya centroidsSlides: The Burbea-Rao and Bhattacharyya centroids
Slides: The Burbea-Rao and Bhattacharyya centroids
 
Tales on two commuting transformations or flows
Tales on two commuting transformations or flowsTales on two commuting transformations or flows
Tales on two commuting transformations or flows
 
Bregman Voronoi Diagrams (SODA 2007)
Bregman Voronoi Diagrams (SODA 2007)  Bregman Voronoi Diagrams (SODA 2007)
Bregman Voronoi Diagrams (SODA 2007)
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract means
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCETHE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysis
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Imc2017 day2-solutions
Imc2017 day2-solutionsImc2017 day2-solutions
Imc2017 day2-solutions
 
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometry
 
On the smallest enclosing information disk
 On the smallest enclosing information disk On the smallest enclosing information disk
On the smallest enclosing information disk
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.ppt
 
Darmon Points: an Overview
Darmon Points: an OverviewDarmon Points: an Overview
Darmon Points: an Overview
 
Universal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousUniversal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or Continuous
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Slides: Total Jensen divergences: Definition, Properties and k-Means++ Clustering

  • 1. Total Jensen divergences: Definition, Properties and k-Means++ Clustering Frank Nielsen1 Richard Nock2 www.informationgeometry.org 1 Sony Computer Science Laboratories, Inc. 2 UAG-CEREGMIA September 2013 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/19
  • 2. Divergences: Distortion measures F a smooth convex function, the generator. ◮ Skew Jensen divergences: ′ Jα (p : q) = αF (p) + (1 − α)F (q) − F (αp + (1 − α)q), = (F (p)F (q))α − F ((pq)α ), ◮ where (pq)γ = γp + (1 − γ)q = q + γ(p − q) and (F (p)F (q))γ = γF (p)+(1−γ)F (q) = F (q)+γ(F (p)−F (q)). Bregman divergences: B(p : q) = F (p) − F (q) − p − q, ∇F (q) , lim Jα (p : q) = B(p : q), α→0 lim Jα (p : q) = B(q : p). α→1 ◮ Statistical Bhattacharrya divergence: Bhat(p1 : p2 ) = − log ′ p1 (x)α p2 (x)1−α dν(x) = Jα (θ1 : θ2 ) for exponential families [5]. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/19
  • 3. Geometrically designed divergences Plot of the convex generator F . F : (x, F (x)) (q, F (q)) (p, F (p)) J(p, q) tB(p : q) B(p : q) q p+q 2 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. p 3/19
  • 4. Total Bregman divergences Conformal divergence, conformal factor ρ: D ′ (p : q) = ρ(p, q)D(p : q) plays the rˆle of “regularizer” [8] o Invariance by rotation of the axes of the design space tB(p : q) = ρB (q) = B(p : q) = ρB (q)B(p : q), 1 + ∇F (q), ∇F (q) 1 . 1 + ∇F (q), ∇F (q) Total squared Euclidean divergence: tE (p, q) = 1 p − q, p − q . 2 1 + q, q c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/19
  • 5. Total Jensen divergences tB(p : q) = ρB (q)B(p : q), ρB (q) = tJα (p : q) = ρJ (p, q)Jα (p : q), 1 1 + ∇F (q), ∇F (q) ρJ (p, q) = 1 1+ (F (p)−F (q))2 p−q,p−q Jensen-Shannon divergence, square root is a metric [2]: JS(p, q) = 1 2 d pi log i =1 2pi 1 + pi + qi 2 d qi log i =1 2qi pi + qi Lemma The square root of the total Jensen-Shannon divergence is not a metric. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/19
  • 6. Total Jensen divergence: Illustration (F (p)F (q))α F (p) (F (p)F (q))β F (p′ ) (F (p′ )F (q ′ ))α (F (p′ )F (q ′ ))β ′ Jα (p : q) tJ′ (p : q) α F (q) p ′ :q) tJ′ (p′ : q ′ ) α F ((pq)α ) O ′ Jα (p′ F (q ′ ) F ((p′ q ′ )α ) (pq)α q′ q O c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. p′ (p′ q ′ )α 6/19
  • 7. Total Jensen divergence: Illustration α on graph plot, β on interpolated segment Two kinds of total Jensen divergences (but one always yields closed-form) β ∈ [0, 1] β>1 β<0 β ∈ [0, 1] β>1 β<0 (F (p)F (q))β F ((pq)α ) (F (p)F (q))β F ((pq)α ) p q c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. p q 7/19
  • 8. Total Jensen divergences/Total Bregman divergences Total Jensen is not a generalization of total Bregman. limit cases α ∈ {0, 1}, we have: lim tJα (p : q) = ρJ (p, q)B(p : q) = ρB (q)B(p : q), α→0 lim tJα (p : q) = ρJ (p, q)B(q : p) = ρB (p)B(q : p), α→1 since ρJ (p, q) = ρB (q). Squared chord slope index in ρJ : s2 = ∆2 ∆⊤ ∇F (ǫ)∆⊤ ∇F (ǫ) F = ∇F (ǫ), ∇F (ǫ) = ∇F (ǫ) 2 . = ∆ 2 ∆⊤ ∆ c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/19
  • 9. Conformal factor from mean value theorem When p ≃ q, ρJ (p, q) ≃ ρB (q), and the total Jensen divergence tends to the total Bregman divergence for any value of α. ρJ (p, q) = 1 1 + ∇F (ǫ), ∇F (ǫ) = ρB (ǫ), for ǫ ∈ [p, q]. For univariate generators, explicitly the value of ǫ: ǫ = ∇F −1 ∆F ∆ = ∇F ∗ ∆F ∆ , where F ∗ is the Legendre convex conjugate [5]. Stolarsky mean [7]: tJα (p : q) = ρB (ǫ)J(p : q) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/19
  • 10. Centroids and statistical robustness Centroids (barycenters) are minimizers of average (weighted) divergences: n L(x; w ) = wi × tJα (pi : x), i =1 cα = arg min L(x; w ), x∈X ◮ Is it unique? ◮ Is it robust to outliers [3]? Iterative convex-concave procedure (CCCP) [5] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/19
  • 11. Robustness of Jensen centroids (univariate generator) Theorem The Jensen centroid is robust for a strictly convex and smooth generator f if |f ′ ( p+y )| is bounded on the domain X for any 2 prescribed p. ◮ ◮ Jensen-Shannon: X = R+ , f (x) = x log x − x ,f ′ (x) = log(x), f ′′ (x) = 1/x. |f ′ ( p+y )| = | log p+y | is unbounded when y → +∞. 2 2 JS centroid is not robust Jensen-Burg: X = R+ , f (x) = − log x, f ′ (x) = −1/x, f ′′ (x) = x12 2 |f ′ ( p+y )| = | p+y | is always bounded for y ∈ (0, +∞). 2 z(y ) = 2p 2 1 2 − p p+y When y → ∞, we have |z(y )| → 2p < ∞. JB centroid is robust. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/19
  • 12. Clustering: No closed-form centroid, no cry! k-means++ [1] picks up randomly seeds, no centroid calculation. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/19
  • 13. Divergence-based k-means++ Theorem Suppose there exist some U and V such that, ∀x, y , z: tJα (x : z) ≤ U(tJα (x : y ) + tJα (y : z)) , (triangular inequality) tJα (x : z) ≤ V tJα (z : x) , (symmetric inequality) Then the average potential of total Jensen seeding with k clusters satisfies E [tJα ] ≤ 2U 2 (1 + V )(2 + log k)tJopt,α , where tJopt,α is the minimal total Jensen potential achieved by a clustering in k clusters. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/19
  • 14. Divergence-based k-means++: Two assumptions H H: ◮ First, the maximal condition number of the Hessian of F , that is, the ratio between the maximal and minimal eigenvalue (> 0) of the Hessian of F , is upperbounded by K1 . ◮ Second, we assume the Lipschitz condition on F that ∆2 / ∆, ∆ ≤ K2 , for some K2 > 0. F Lemma Assume 0 < α < 1. Then, under assumption H, for any p, q, r ∈ S, there exists ǫ > 0 such that: tJα (p : r ) ≤ 2 2(1 + K2 )K1 ǫ c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1 1 tJα (p : q) + tJα (q : r ) 1−α α . 14/19
  • 15. Divergence-based k-means++ Corollary The total skew Jensen divergence satisfies the following triangular inequality: tJα (p : r ) ≤ 2 2(1 + K2 )K1 (tJα (p : q) + tJα (q : r )) . ǫα(1 − α) U= 2 2(1 + K2 )K1 ǫ Lemma 2 Symmetric inequality condition holds for V = K1 (1 + K2 )/ǫ, for some 0 < ǫ < 1. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/19
  • 16. Total Jensen divergences: Recap Total Jensen divergence = conformal divergence with non-separable double-sided conformal factor. ◮ Invariant to axis rotation of “design space“ ◮ Equivalent to total Bregman divergences [8, 4] only when p≃q ◮ Square root of total Jensen-Shannon divergence is not a metric (square root of total JS is a metric). ◮ Jensen centroids are not always robust (e.g., Jensen-Shannon centroid) ◮ Total Jensen k-means++ do not require centroid computations and guaranteed approximation Interest of conformal divergences in SVM [9] (double-sided separable), in information geometry [6] (flattening). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/19
  • 17. Thank you. @article{totalJensen-arXiv1309.7109 , author="Frank Nielsen and Richard Nock", title="Total {J}ensen divergences: {D}efinition, Properties and $k$-Means++ Clustering", year="2013", eprint="arXiv/1309.7109" } www.informationgeometry.org c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/19
  • 18. Bibliographic references I David Arthur and Sergei Vassilvitskii. k-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1027–1035. Society for Industrial and Applied Mathematics, 2007. Bent Fuglede and Flemming Topsoe. Jensen-Shannon divergence and Hilbert space embedding. In IEEE International Symposium on Information Theory, pages 31–31, 2004. F. R. Hampel, P. J. Rousseeuw, E. Ronchetti, and W. A. Stahel. Robust Statistics: The Approach Based on Influence Functions. Wiley Series in Probability and Mathematical Statistics, 1986. Meizhu Liu, Baba C. Vemuri, Shun-ichi Amari, and Frank Nielsen. Shape retrieval using hierarchical total Bregman soft clustering. Transactions on Pattern Analysis and Machine Intelligence, 34(12):2407–2419, 2012. Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, August 2011. Atsumi Ohara, Hiroshi Matsuzoe, and Shun-ichi Amari. A dually flat structure on the space of escort distributions. Journal of Physics: Conference Series, 201(1):012012, 2010. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 18/19
  • 19. Bibliographic references II Kenneth B Stolarsky. Generalizations of the logarithmic mean. Mathematics Magazine, 48(2):87–92, 1975. Baba Vemuri, Meizhu Liu, Shun-ichi Amari, and Frank Nielsen. Total Bregman divergence and its applications to DTI analysis. IEEE Transactions on Medical Imaging, pages 475–483, 2011. Si Wu and Shun-ichi Amari. Conformal transformation of kernel functions a data dependent way to improve support vector machine classifiers. Neural Processing Letters, 15(1):59–67, 2002. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 19/19