Bregman divergences from comparative convexity

Bregman divergences from comparative convexity
Frank Nielsen Richard Nock
2017
1

Convexity, and
Jensen/Bregman
divergences
2

Midpoint convexity, continuity and convexity
In 1906, Jensen [9] introduced the midpoint convex
property of a function F:
F(p) + F(q)
2
≥ F
p + q
2
, ∀p, q ∈ X.
A continuous function F satisfying the midpoint convexity
implies it also obeys the convexity property [14]:
∀p, q, ∀λ ∈ [0, 1], F(λp + (1 − λ)q) ≤ λF(p) + (1 − λ)F(q).
When inequality is strict for distinct points and λ ∈ (0, 1), this
inequality deﬁnes the strict convex property of F.
A function satisfying only the midpoint convexity inequality
may not be continuous [11] (and hence not convex).
3

Jensen divergences and skew Jensen divergences
A divergence D(p, q) is proper iff D(p, q) ≥ 0 with equality iff
p = q.
Jensen divergence (Burbea and Rao divergence [6]) defined for
a strictly convex function F ∈ F<:
JF (p, q):=
F(p) + F(q)
2
− F
p + q
2
.
Skew Jensen divergences [19, 15] for α ∈ (0, 1):
JF,α(p : q):=(1 − α)F(p) + αF(q) − F((1 − α)p + αq),
with JF,α(q : p) = JF,1−α(p : q).
4

Bregman divergences
Bregman divergences [4] (1967) for a strictly convex and
diﬀerentiable generator F:
BF (p : q):=F(p) − F(q) − (p − q) F(q).
Scaled skew Jensen divergences [19, 15]:
JF,α(p : q):=
1
α(1 − α)
JF,α(p : q),
with limit cases recovering Bregman divergences:
lim
α→1−
JF,α(p : q) = BF (p : q),
lim
α→0+
JF,α(p : q) = BF (q : p).
5

Comparative convexity
and generalized
divergences
6

Jensen midpoint convexity wrt comparative convexity [14]
Let M(·, ·) and N(·, ·) be two abstract mean functions [5].
min{p, q} ≤ M(p, q) ≤ max{p, q}. (1)
Functions F ∈ C≤
M,N is said midpoint (M, N)-convex iﬀ
∀p, q ∈ X, F(M(p, q)) ≤ N(F(p), F(q)).
When M(p, q) = N(p, q) = A(p, q) = p+q
2 are the arithmetic
means, recover the ordinary Jensen midpoint convexity:
∀p, q ∈ X, F
p + q
2
≤
F(p) + F(q)
2
Note that there exist discontinuous functions that satisfy the
Jensen midpoint convexity.
7

Means, regular means,
and weighted means
8

Regular means
A mean is said regular if it is
homogeneous: M(λp, λq) = λM(p, q) for any λ > 0
symmetric: M(p, q) = M(q, p)
continuous,
increasing in each variable.
Examples of regular means: power means or Hölder means [8]:
Pδ(x, y) =
xδ + yδ
2
1
δ
, P0(x, y) = G(x, y) =
√
xy
belong to a broader family of quasi-arithmetic means. For a
continuous and strictly increasing function f : I ⊂ R → J ⊂ R:
Mf (p, q):=f −1 f (p) + f (q)
2
.
also called Kolmogorov-Nagumo-de Finetti means [10, 13, 7].
9

Examples of quasi-arithmetic means: Pythagorean means
f (x) = x, arithmetic mean (A):
A(p, q) =
p + q
2
f (x) = log x, geometric mean (G):
G(p, q) =
√
pq
f (x) = 1
x , harmonic mean (H):
H(p, q) =
2
1
p + 1
q
=
2pq
p + q
10

Lagrange means [3]
Lagrange means [3] (Lagrangean means) are derived from the mean
value theorem.
Assume wlog that p < q so that the mean m ∈ [p, q].
From the mean value theorem, we have for a differentiable function
f :
∃λ ∈ [p, q] : f (λ) =
f (q) − f (p)
q − p
.
Thus when f is a monotonic function, its inverse function f −1 is
well-defined, and the unique mean value mean λ ∈ [p, q] can be
defined as:
Lf (p, q) = λ = (f )−1 f (q) − f (p)
q − p
.
11

An example of a Lagrange mean [3]
For example, f (x) = log(x) and f (x) = (f )−1(x) = 1
x , we get the
logarithmic mean (L), that is not a quasi-arithmetic mean:
m(p, q) =



0 if p = 0 or q = 0,
x if p = q,
q−p
log q−log p otherwise,
12

Cauchy means [12]
Two positive diﬀerentiable and strictly monotonic functions f
and g such that f
g has an inverse function.
The Cauchy mean-value mean:
Cf ,g (p, q) =
f
g
−1
f (q) − f (p)
g(q) − g(p)
, q = p,
with Cf ,g (p, p) = p.
Cauchy means as Lagrange means [12] by the following
identity:
Cf ,g (p, q) = Lf ◦g−1 (g(p), g(q))
since ((f ◦ g−1)(x)) = f (g−1(x))
g (g−1(x))
:
Proof:
Lf ◦g−1 (g(p), g(q)) = ((f ◦ g
−1
) (g(x)))
−1 (f ◦ g−1
)(g(q)) − (f ◦ g−1
)(g(p))
g(q) − g(p)
,
=
f (g−1
(g(x)))
g (g−1(g(x)))
−1
f (q) − f (p)
g(q) − g(p)
,
= Cf ,g (p, q).
13

Stolarsky regular means
The Stolarsky regular means are not quasi-arithmetic means nor
mean-value means [5], and are deﬁned by:
Sp(x, y) =
xp − yp
p(x − y)
1
p−1
, p ∈ {0, 1}.
In limit cases, the Stolarsky family of means yields the logarithmic
mean (L) when p → 0:
L(x, y) =
y − x
log y − log x
,
and the identric mean (I) when p → 1:
I(x, y) =
yy
xx
1
y−x
.
14

Lehmer and Gini (non-regular) means
Weighted Lehmer mean [2] of order δ is deﬁned for δ ∈ R
as:
Lδ(x1, . . . , xn; w1, . . . , wn) =
n
i=1 wi xδ+1
i
n
i=1 wi xδ
i
.
The Lehmer means intersect with the Hölder means only for
the arithmetic (A), geometric (G) and harmonic (H) means.
Lehmer barycentric means belong to Gini means:
Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) =
n
i=1 wi xδ1
i
n
i=1 wi xδ2
i
1
δ1−δ2
,
when δ1 = δ2, and
Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) =
n
i=1
x
wi xδ
i
i
1
n
i=1
wi xδ
i
,
when δ1 = δ2 = δ.
15

Lehmer and Gini (non-regular) means
Those families of Gini and Lehmer means are (positively)
homogeneous means:
Gδ1,δ2 (λx1, . . . , λxn; w1, . . . , wn) = λGδ1,δ2 (x1, . . . , xn; w1, . . . , wn),
for any λ > 0.
The family of Gini means include the power means: G0,δ = Pδ
for δ ≤ 0 and Gδ,0 = Pδ for δ ≥ 0.
The Lehmer and Gini means are not always regular since L2 is
not regular.
16

Weighted abstract means and (M, N)-convexity
Consider weighted means M and N,
Mα(p, q):=M(p, q; 1 − α, α).
Function F is (M, N)-convex if and only if:
F(M(p, q; 1 − α, α)) ≤ N(F(p), F(q); 1 − α, α).
For regular means and continuous function F, deﬁne the skew
(M, N)-Jensen divergence:
JM,N
F,α (p : q) = Nα(F(p), F(q)) − F(Mα(p, q)) ≥ 0
We have JM,N
F,α (q, p) = JM,N
F,1−α(p : q)
Continuous function + midpoint convexity = (M, N)-convexity
property (Theorem A of [14]):
Nα(F(p), F(q)) ≥ F(Mα(p, q)), ∀p, q ∈ X, ∀α ∈ [0, 1].
17

Generalized Bregman
divergences
18

Generalized Bregman divergences wrt comparative convexity
Deﬁne the (M, N)-Bregman divergence for regular means as
limit of scaled skew (M, N)-Jensen divergences:
BM,N
F (p : q) = lim
α→1−
1
α(1 − α)
JM,N
F,α (p : q),
= lim
α→1−
1
α(1 − α)
(Nα(F(p), F(q))) − F(Mα(p, q)))
BM,N
F (q : p) = lim
α→0+
1
α(1 − α)
JM,N
F,α (p : q)
Need to prove that the limit exists and that the generalized
Bregman divergences are proper: BM,N
F (q : p) ≥ 0 with
equality iﬀ p = q.
19

Quasi-arithmetic Bregman divergences
Theorem (Quasi-arithmetic Bregman divergences)
Let F : I ⊂ R → R be a real-valued strictly (ρ, τ)-convex function
deﬁned on an interval I for two strictly monotone and diﬀerentiable
functions ρ and τ (with ρ and τ the respective derivatives). The
Quasi-Arithmetic Bregman divergence (QABD) induced by the
comparative convexity is:
Bρ,τ
F (p : q) =
τ(F(p)) − τ(F(q))
τ (F(q))
−
ρ(p) − ρ(q)
ρ (q)
F (q),
= κτ (F(q) : F(p)) − κρ(q : p)F (q),
where primes ’ denote derivatives and
κγ(x : y) =
γ(y) − γ(x)
γ (x)
.
20

Quasi-arithmetic Bregman divergences (proof 1/2)
First-order Taylor expansion of τ−1(x) at x0:
τ−1
(x) x0 τ−1
(x0) + (x − x0)(τ−1
) (x0).
Using the derivative of an inverse function (τ−1) (x) = 1
(τ (τ−1)(x))
,
it follows:
τ−1
(x) τ−1
(x0) + (x − x0)
1
(τ (τ−1)(x0))
.
Plugging x0 = τ(p) and x = τ(p) + α(τ(q) − τ(p)), get a
ﬁrst-order approximation of the weighted quasi-arithmetic mean
Mτ,α when α → 0:
Mτ,α(p, q) p +
α(τ(q) − τ(p))
τ (p)
.
21

Quasi-arithmetic Bregman divergences (proof 2/2)
Comparative convexity skew Jensen Divergence deﬁned by:
Jτ,ρ
F,α(p : q) = Mτ,α(F(p), F(q)) − F(Mρ,α(p, q))
Apply a ﬁrst-order Taylor expansion to get
F(Mρ,α(p, q))) F p +
α(ρ(q) − ρ(p))
ρ (p)
F(p) +
α(τ(q) − τ(p))
τ (p)
F (p).
It follows the quasi-arithmetic Bregman divergence:
Bρ,τ
F (q : p) = lim
α→0
J
τ,ρ
α (p : q) =
τ(F(q)) − τ(F(p))
τ (F(p))
−
ρ(q) − ρ(p)
ρ (p)
F (p)
similar for reverse quasi-arithmetic Bregman divergence.
22

Quasi-arithmetic Bregman divergences are proper
Ordinary Bregman divergence on the convex generator
G(x) = τ(F(ρ−1(x))) for a (ρ, τ)-convex function F:
G (x) = τ(F(ρ−1
(x))) =
1
(ρ (ρ−1)(x))
F (ρ−1
(x))τ (F(ρ−1
(x))).
Get an ordinary Bregman divergence that is, in general, diﬀerent
from the generalized quasi-arithmetic Bregman divergence
(BG (p : q) = Bρ,τ
F (p : q)) :
BG (p : q) = τ(F(ρ−1
(p))) − τ(F(ρ−1
(q)))
−(p − q)
F (ρ−1(q))τ (F(ρ−1(q)))
(ρ (ρ−1)(q))
Sanity check: BG (p : q) = Bρ,τ
F (p : q) when ρ(x) = τ(x) = x.
Remarkable identity:
Bρ,τ
F (p : q) =
1
τ (F(q))
BG (ρ(p) : ρ(q)).
23

Power mean Bregman divergences
A subfamily of quasi-arithmetic Bregman divergences obtained for
the generators ρ(x) = xδ1 and τ(x) = xδ2 :
Corollary (Power Mean Bregman Divergences)
For δ1, δ2 ∈ R{0} with F ∈ CPδ1
,Pδ2
, get the family of Power
Mean Bregman Divergences (PMBDs):
Bδ1,δ2
F (p : q) =
Fδ2 (p) − Fδ2 (q)
δ2Fδ2−1(q)
−
pδ1 − qδ1
δ1qδ1−1
F (q)
Sanity check for δ1 = δ2 = 1: recover the ordinary Bregman
divergence.

Quasi-arithmetic to ordinary convexity criterion
To check whether a function F is (M, N)-convex or not for
quasi-arithmetic means Mρ and Mτ , we use an equivalent test to
ordinary convexity:
Lemma ((ρ, τ)-convexity ↔ ordinary convexity [1])
Let ρ : I → R and τ : J → R be two continuous and strictly
monotone real-valued functions with τ increasing, then function
F : I → J is (ρ, τ)-convex iﬀ function G = Fρ,τ = τ ◦ F ◦ ρ−1 is
(ordinary) convex on ρ(I).
F ∈ C≤
ρ,τ ⇔ G = τ ◦ F ◦ ρ−1
∈ C≤

Quasi-arithmetic to ordinary convexity criterion (proof)
Let us rewrite the (ρ, τ)-convexity midpoint inequality as follows:
F(Mρ(x, y)) ≤ Mτ (F(x), F(y)),
F ρ−1 ρ(x) + ρ(y)
2
≤ τ−1 τ(F(x)) + τ(F(y))
2
,
Since τ is strictly increasing, we have:
(τ ◦ F ◦ ρ−1
)
ρ(x) + ρ(y)
2
≤
(τ ◦ F)(x) + (τ ◦ F)(y)
2
.
Let u = ρ(x) and v = ρ(y) so that x = ρ−1(u) and y = ρ−1(v)
(with u, v ∈ ρ(I)). Then it comes that:
(τ ◦ F ◦ ρ−1
)
u + v
2
≤
(τ ◦ F ◦ ρ−1)(u) + (τ ◦ F ◦ ρ−1)(v)
2
.
This last inequality is precisely the ordinary midpoint convexity
inequality for function G = Fρ,τ = τ ◦ F ◦ ρ−1. Thus a function F is
(ρ, τ)-convex iﬀ G = τ ◦ F ◦ ρ−1 is ordinary convex, and vice-versa.
26

Summary of contributions [17, 16]
deﬁned generalized Jensen divergences and generalized
Bregman divergences using comparative convexity
reported an explicit formula for the quasi-arithmetic Bregman
divergences (QABD)
The QABD can be interpreted as a conformal Bregman
divergence [18] on the ρ-representation
emphasizes that the theory of means [5] is at the very heart of
distances.
see arXiv report for furthre results including a generalization of
Bhattacharyya statistical distance using comparable means
27

References I
[1] John Aczél.
A generalization of the notion of convex functions.
Det Kongelige Norske Videnskabers Selskabs Forhandlinger, Trondheim, 19(24):87–90, 1947.
[2] Gleb Beliakov, Humberto Bustince Sola, and Tomasa Calvo Sánchez.
A practical guide to averaging functions, volume 329.
Springer, 2015.
[3] Lucio R Berrone and Julio Moro.
Lagrangian means.
Aequationes Mathematicae, 55(3):217–226, 1998.
[4] Lev M Bregman.
The relaxation method of ﬁnding the common point of convex sets and its application to the
solution of problems in convex programming.
USSR computational mathematics and mathematical physics, 7(3):200–217, 1967.
[5] Peter S Bullen, Dragoslav S Mitrinovic, and M Vasic.
Means and their Inequalities, volume 31.
Springer Science & Business Media, 2013.
[6] Jacob Burbea and C Rao.
On the convexity of some divergence measures based on entropy functions.
IEEE Transactions on Information Theory, 28(3):489–495, 1982.
[7] Bruno De Finetti.
Sul concetto di media.
3:369Ű396, 1931.
Istituto italiano degli attuari.
[8] Otto Ludwig Holder.
Über einen Mittelwertssatz.
Nachr. Akad. Wiss. Gottingen Math.-Phys. Kl., pages 38–47, 1889.
28

References II
[9] Johan Ludwig William Valdemar Jensen.
Sur les fonctions convexes et les inégalités entre les valeurs moyennes.
Acta mathematica, 30(1):175–193, 1906.
[10] Andrey Nikolaevich Kolmogorov.
Sur la notion de moyenne.
12:388Ű391, 1930.
Acad. Naz. Lincei Mem. Cl. Sci. His. Mat. Natur. Sez.
[11] Gyula Maksa and Zsolt Páles.
Convexity with respect to families of means.
Aequationes mathematicae, 89(1):161–167, 2015.
[12] Janusz Matkowski.
On weighted extensions of Cauchy’s means.
Journal of mathematical analysis and applications, 319(1):215–227, 2006.
[13] Mitio Nagumo.
Über eine klasse der mittelwerte.
In Japanese journal of mathematics: transactions and abstracts, volume 7, pages 71–79. The
Mathematical Society of Japan, 1930.
[14] Constantin P. Niculescu and Lars-Erik Persson.
Convex functions and their applications: A contemporary approach.
Springer Science & Business Media, 2006.
[15] Frank Nielsen and Sylvain Boltz.
The Burbea-Rao and Bhattacharyya centroids.
IEEE Transactions on Information Theory, 57(8):5455–5466, 2011.
29

References III
[16] Frank Nielsen and Richard Nock.
Generalizing Jensen and Bregman divergences with comparative convexity and the statistical
Bhattacharyya distances with comparable means.
CoRR, abs/1702.04877, 2017.
[17] Frank Nielsen and Richard Nock.
Generalizing skew Jensen divergences and Bregman divergences with comparative convexity.
IEEE Signal Processing Letters, 24(8):1123–1127, Aug 2017.
[18] Richard Nock, Frank Nielsen, and Shun-ichi Amari.
On conformal divergences and their population minimizers.
IEEE Trans. Information Theory, 62(1):527–538, 2016.
[19] Jun Zhang.
Divergence function, duality, and convex analysis.
Neural Computation, 16(1):159–195, 2004.
30

Bregman divergences from comparative convexity

More Related Content

What's hot

Similar to Bregman divergences from comparative convexity

Recently uploaded

Bregman divergences from comparative convexity