Bregman divergences from comparative convexity
Frank Nielsen Richard Nock
2017
1
Convexity, and
Jensen/Bregman
divergences
2
Midpoint convexity, continuity and convexity
In 1906, Jensen [9] introduced the midpoint convex
property of a function F:
F(p) + F(q)
2
≥ F
p + q
2
, ∀p, q ∈ X.
A continuous function F satisfying the midpoint convexity
implies it also obeys the convexity property [14]:
∀p, q, ∀λ ∈ [0, 1], F(λp + (1 − λ)q) ≤ λF(p) + (1 − λ)F(q).
When inequality is strict for distinct points and λ ∈ (0, 1), this
inequality defines the strict convex property of F.
A function satisfying only the midpoint convexity inequality
may not be continuous [11] (and hence not convex).
3
Jensen divergences and skew Jensen divergences
A divergence D(p, q) is proper iff D(p, q) ≥ 0 with equality iff
p = q.
Jensen divergence (Burbea and Rao divergence [6]) defined for
a strictly convex function F ∈ F<:
JF (p, q):=
F(p) + F(q)
2
− F
p + q
2
.
Skew Jensen divergences [19, 15] for α ∈ (0, 1):
JF,α(p : q):=(1 − α)F(p) + αF(q) − F((1 − α)p + αq),
with JF,α(q : p) = JF,1−α(p : q).
4
Bregman divergences
Bregman divergences [4] (1967) for a strictly convex and
differentiable generator F:
BF (p : q):=F(p) − F(q) − (p − q) F(q).
Scaled skew Jensen divergences [19, 15]:
JF,α(p : q):=
1
α(1 − α)
JF,α(p : q),
with limit cases recovering Bregman divergences:
lim
α→1−
JF,α(p : q) = BF (p : q),
lim
α→0+
JF,α(p : q) = BF (q : p).
5
Comparative convexity
and generalized
divergences
6
Jensen midpoint convexity wrt comparative convexity [14]
Let M(·, ·) and N(·, ·) be two abstract mean functions [5].
min{p, q} ≤ M(p, q) ≤ max{p, q}. (1)
Functions F ∈ C≤
M,N is said midpoint (M, N)-convex iff
∀p, q ∈ X, F(M(p, q)) ≤ N(F(p), F(q)).
When M(p, q) = N(p, q) = A(p, q) = p+q
2 are the arithmetic
means, recover the ordinary Jensen midpoint convexity:
∀p, q ∈ X, F
p + q
2
≤
F(p) + F(q)
2
Note that there exist discontinuous functions that satisfy the
Jensen midpoint convexity.
7
Means, regular means,
and weighted means
8
Regular means
A mean is said regular if it is
homogeneous: M(λp, λq) = λM(p, q) for any λ > 0
symmetric: M(p, q) = M(q, p)
continuous,
increasing in each variable.
Examples of regular means: power means or Hölder means [8]:
Pδ(x, y) =
xδ + yδ
2
1
δ
, P0(x, y) = G(x, y) =
√
xy
belong to a broader family of quasi-arithmetic means. For a
continuous and strictly increasing function f : I ⊂ R → J ⊂ R:
Mf (p, q):=f −1 f (p) + f (q)
2
.
also called Kolmogorov-Nagumo-de Finetti means [10, 13, 7].
9
Examples of quasi-arithmetic means: Pythagorean means
f (x) = x, arithmetic mean (A):
A(p, q) =
p + q
2
f (x) = log x, geometric mean (G):
G(p, q) =
√
pq
f (x) = 1
x , harmonic mean (H):
H(p, q) =
2
1
p + 1
q
=
2pq
p + q
10
Lagrange means [3]
Lagrange means [3] (Lagrangean means) are derived from the mean
value theorem.
Assume wlog that p < q so that the mean m ∈ [p, q].
From the mean value theorem, we have for a differentiable function
f :
∃λ ∈ [p, q] : f (λ) =
f (q) − f (p)
q − p
.
Thus when f is a monotonic function, its inverse function f −1 is
well-defined, and the unique mean value mean λ ∈ [p, q] can be
defined as:
Lf (p, q) = λ = (f )−1 f (q) − f (p)
q − p
.
11
An example of a Lagrange mean [3]
For example, f (x) = log(x) and f (x) = (f )−1(x) = 1
x , we get the
logarithmic mean (L), that is not a quasi-arithmetic mean:
m(p, q) =



0 if p = 0 or q = 0,
x if p = q,
q−p
log q−log p otherwise,
12
Cauchy means [12]
Two positive differentiable and strictly monotonic functions f
and g such that f
g has an inverse function.
The Cauchy mean-value mean:
Cf ,g (p, q) =
f
g
−1
f (q) − f (p)
g(q) − g(p)
, q = p,
with Cf ,g (p, p) = p.
Cauchy means as Lagrange means [12] by the following
identity:
Cf ,g (p, q) = Lf ◦g−1 (g(p), g(q))
since ((f ◦ g−1)(x)) = f (g−1(x))
g (g−1(x))
:
Proof:
Lf ◦g−1 (g(p), g(q)) = ((f ◦ g
−1
) (g(x)))
−1 (f ◦ g−1
)(g(q)) − (f ◦ g−1
)(g(p))
g(q) − g(p)
,
=
f (g−1
(g(x)))
g (g−1(g(x)))
−1
f (q) − f (p)
g(q) − g(p)
,
= Cf ,g (p, q).
13
Stolarsky regular means
The Stolarsky regular means are not quasi-arithmetic means nor
mean-value means [5], and are defined by:
Sp(x, y) =
xp − yp
p(x − y)
1
p−1
, p ∈ {0, 1}.
In limit cases, the Stolarsky family of means yields the logarithmic
mean (L) when p → 0:
L(x, y) =
y − x
log y − log x
,
and the identric mean (I) when p → 1:
I(x, y) =
yy
xx
1
y−x
.
14
Lehmer and Gini (non-regular) means
Weighted Lehmer mean [2] of order δ is defined for δ ∈ R
as:
Lδ(x1, . . . , xn; w1, . . . , wn) =
n
i=1 wi xδ+1
i
n
i=1 wi xδ
i
.
The Lehmer means intersect with the Hölder means only for
the arithmetic (A), geometric (G) and harmonic (H) means.
Lehmer barycentric means belong to Gini means:
Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) =
n
i=1 wi xδ1
i
n
i=1 wi xδ2
i
1
δ1−δ2
,
when δ1 = δ2, and
Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) =
n
i=1
x
wi xδ
i
i
1
n
i=1
wi xδ
i
,
when δ1 = δ2 = δ.
15
Lehmer and Gini (non-regular) means
Those families of Gini and Lehmer means are (positively)
homogeneous means:
Gδ1,δ2 (λx1, . . . , λxn; w1, . . . , wn) = λGδ1,δ2 (x1, . . . , xn; w1, . . . , wn),
for any λ > 0.
The family of Gini means include the power means: G0,δ = Pδ
for δ ≤ 0 and Gδ,0 = Pδ for δ ≥ 0.
The Lehmer and Gini means are not always regular since L2 is
not regular.
16
Weighted abstract means and (M, N)-convexity
Consider weighted means M and N,
Mα(p, q):=M(p, q; 1 − α, α).
Function F is (M, N)-convex if and only if:
F(M(p, q; 1 − α, α)) ≤ N(F(p), F(q); 1 − α, α).
For regular means and continuous function F, define the skew
(M, N)-Jensen divergence:
JM,N
F,α (p : q) = Nα(F(p), F(q)) − F(Mα(p, q)) ≥ 0
We have JM,N
F,α (q, p) = JM,N
F,1−α(p : q)
Continuous function + midpoint convexity = (M, N)-convexity
property (Theorem A of [14]):
Nα(F(p), F(q)) ≥ F(Mα(p, q)), ∀p, q ∈ X, ∀α ∈ [0, 1].
17
Generalized Bregman
divergences
18
Generalized Bregman divergences wrt comparative convexity
Define the (M, N)-Bregman divergence for regular means as
limit of scaled skew (M, N)-Jensen divergences:
BM,N
F (p : q) = lim
α→1−
1
α(1 − α)
JM,N
F,α (p : q),
= lim
α→1−
1
α(1 − α)
(Nα(F(p), F(q))) − F(Mα(p, q)))
BM,N
F (q : p) = lim
α→0+
1
α(1 − α)
JM,N
F,α (p : q)
Need to prove that the limit exists and that the generalized
Bregman divergences are proper: BM,N
F (q : p) ≥ 0 with
equality iff p = q.
19
Quasi-arithmetic Bregman divergences
Theorem (Quasi-arithmetic Bregman divergences)
Let F : I ⊂ R → R be a real-valued strictly (ρ, τ)-convex function
defined on an interval I for two strictly monotone and differentiable
functions ρ and τ (with ρ and τ the respective derivatives). The
Quasi-Arithmetic Bregman divergence (QABD) induced by the
comparative convexity is:
Bρ,τ
F (p : q) =
τ(F(p)) − τ(F(q))
τ (F(q))
−
ρ(p) − ρ(q)
ρ (q)
F (q),
= κτ (F(q) : F(p)) − κρ(q : p)F (q),
where primes ’ denote derivatives and
κγ(x : y) =
γ(y) − γ(x)
γ (x)
.
20
Quasi-arithmetic Bregman divergences (proof 1/2)
First-order Taylor expansion of τ−1(x) at x0:
τ−1
(x) x0 τ−1
(x0) + (x − x0)(τ−1
) (x0).
Using the derivative of an inverse function (τ−1) (x) = 1
(τ (τ−1)(x))
,
it follows:
τ−1
(x) τ−1
(x0) + (x − x0)
1
(τ (τ−1)(x0))
.
Plugging x0 = τ(p) and x = τ(p) + α(τ(q) − τ(p)), get a
first-order approximation of the weighted quasi-arithmetic mean
Mτ,α when α → 0:
Mτ,α(p, q) p +
α(τ(q) − τ(p))
τ (p)
.
21
Quasi-arithmetic Bregman divergences (proof 2/2)
Comparative convexity skew Jensen Divergence defined by:
Jτ,ρ
F,α(p : q) = Mτ,α(F(p), F(q)) − F(Mρ,α(p, q))
Apply a first-order Taylor expansion to get
F(Mρ,α(p, q))) F p +
α(ρ(q) − ρ(p))
ρ (p)
F(p) +
α(τ(q) − τ(p))
τ (p)
F (p).
It follows the quasi-arithmetic Bregman divergence:
Bρ,τ
F (q : p) = lim
α→0
J
τ,ρ
α (p : q) =
τ(F(q)) − τ(F(p))
τ (F(p))
−
ρ(q) − ρ(p)
ρ (p)
F (p)
similar for reverse quasi-arithmetic Bregman divergence.
22
Quasi-arithmetic Bregman divergences are proper
Ordinary Bregman divergence on the convex generator
G(x) = τ(F(ρ−1(x))) for a (ρ, τ)-convex function F:
G (x) = τ(F(ρ−1
(x))) =
1
(ρ (ρ−1)(x))
F (ρ−1
(x))τ (F(ρ−1
(x))).
Get an ordinary Bregman divergence that is, in general, different
from the generalized quasi-arithmetic Bregman divergence
(BG (p : q) = Bρ,τ
F (p : q)) :
BG (p : q) = τ(F(ρ−1
(p))) − τ(F(ρ−1
(q)))
−(p − q)
F (ρ−1(q))τ (F(ρ−1(q)))
(ρ (ρ−1)(q))
Sanity check: BG (p : q) = Bρ,τ
F (p : q) when ρ(x) = τ(x) = x.
Remarkable identity:
Bρ,τ
F (p : q) =
1
τ (F(q))
BG (ρ(p) : ρ(q)).
23
Power mean Bregman divergences
A subfamily of quasi-arithmetic Bregman divergences obtained for
the generators ρ(x) = xδ1 and τ(x) = xδ2 :
Corollary (Power Mean Bregman Divergences)
For δ1, δ2 ∈ R{0} with F ∈ CPδ1
,Pδ2
, get the family of Power
Mean Bregman Divergences (PMBDs):
Bδ1,δ2
F (p : q) =
Fδ2 (p) − Fδ2 (q)
δ2Fδ2−1(q)
−
pδ1 − qδ1
δ1qδ1−1
F (q)
Sanity check for δ1 = δ2 = 1: recover the ordinary Bregman
divergence.
Quasi-arithmetic to ordinary convexity criterion
To check whether a function F is (M, N)-convex or not for
quasi-arithmetic means Mρ and Mτ , we use an equivalent test to
ordinary convexity:
Lemma ((ρ, τ)-convexity ↔ ordinary convexity [1])
Let ρ : I → R and τ : J → R be two continuous and strictly
monotone real-valued functions with τ increasing, then function
F : I → J is (ρ, τ)-convex iff function G = Fρ,τ = τ ◦ F ◦ ρ−1 is
(ordinary) convex on ρ(I).
F ∈ C≤
ρ,τ ⇔ G = τ ◦ F ◦ ρ−1
∈ C≤
Quasi-arithmetic to ordinary convexity criterion (proof)
Let us rewrite the (ρ, τ)-convexity midpoint inequality as follows:
F(Mρ(x, y)) ≤ Mτ (F(x), F(y)),
F ρ−1 ρ(x) + ρ(y)
2
≤ τ−1 τ(F(x)) + τ(F(y))
2
,
Since τ is strictly increasing, we have:
(τ ◦ F ◦ ρ−1
)
ρ(x) + ρ(y)
2
≤
(τ ◦ F)(x) + (τ ◦ F)(y)
2
.
Let u = ρ(x) and v = ρ(y) so that x = ρ−1(u) and y = ρ−1(v)
(with u, v ∈ ρ(I)). Then it comes that:
(τ ◦ F ◦ ρ−1
)
u + v
2
≤
(τ ◦ F ◦ ρ−1)(u) + (τ ◦ F ◦ ρ−1)(v)
2
.
This last inequality is precisely the ordinary midpoint convexity
inequality for function G = Fρ,τ = τ ◦ F ◦ ρ−1. Thus a function F is
(ρ, τ)-convex iff G = τ ◦ F ◦ ρ−1 is ordinary convex, and vice-versa.
26
Summary of contributions [17, 16]
defined generalized Jensen divergences and generalized
Bregman divergences using comparative convexity
reported an explicit formula for the quasi-arithmetic Bregman
divergences (QABD)
The QABD can be interpreted as a conformal Bregman
divergence [18] on the ρ-representation
emphasizes that the theory of means [5] is at the very heart of
distances.
see arXiv report for furthre results including a generalization of
Bhattacharyya statistical distance using comparable means
27
References I
[1] John Aczél.
A generalization of the notion of convex functions.
Det Kongelige Norske Videnskabers Selskabs Forhandlinger, Trondheim, 19(24):87–90, 1947.
[2] Gleb Beliakov, Humberto Bustince Sola, and Tomasa Calvo Sánchez.
A practical guide to averaging functions, volume 329.
Springer, 2015.
[3] Lucio R Berrone and Julio Moro.
Lagrangian means.
Aequationes Mathematicae, 55(3):217–226, 1998.
[4] Lev M Bregman.
The relaxation method of finding the common point of convex sets and its application to the
solution of problems in convex programming.
USSR computational mathematics and mathematical physics, 7(3):200–217, 1967.
[5] Peter S Bullen, Dragoslav S Mitrinovic, and M Vasic.
Means and their Inequalities, volume 31.
Springer Science & Business Media, 2013.
[6] Jacob Burbea and C Rao.
On the convexity of some divergence measures based on entropy functions.
IEEE Transactions on Information Theory, 28(3):489–495, 1982.
[7] Bruno De Finetti.
Sul concetto di media.
3:369Ű396, 1931.
Istituto italiano degli attuari.
[8] Otto Ludwig Holder.
Über einen Mittelwertssatz.
Nachr. Akad. Wiss. Gottingen Math.-Phys. Kl., pages 38–47, 1889.
28
References II
[9] Johan Ludwig William Valdemar Jensen.
Sur les fonctions convexes et les inégalités entre les valeurs moyennes.
Acta mathematica, 30(1):175–193, 1906.
[10] Andrey Nikolaevich Kolmogorov.
Sur la notion de moyenne.
12:388Ű391, 1930.
Acad. Naz. Lincei Mem. Cl. Sci. His. Mat. Natur. Sez.
[11] Gyula Maksa and Zsolt Páles.
Convexity with respect to families of means.
Aequationes mathematicae, 89(1):161–167, 2015.
[12] Janusz Matkowski.
On weighted extensions of Cauchy’s means.
Journal of mathematical analysis and applications, 319(1):215–227, 2006.
[13] Mitio Nagumo.
Über eine klasse der mittelwerte.
In Japanese journal of mathematics: transactions and abstracts, volume 7, pages 71–79. The
Mathematical Society of Japan, 1930.
[14] Constantin P. Niculescu and Lars-Erik Persson.
Convex functions and their applications: A contemporary approach.
Springer Science & Business Media, 2006.
[15] Frank Nielsen and Sylvain Boltz.
The Burbea-Rao and Bhattacharyya centroids.
IEEE Transactions on Information Theory, 57(8):5455–5466, 2011.
29
References III
[16] Frank Nielsen and Richard Nock.
Generalizing Jensen and Bregman divergences with comparative convexity and the statistical
Bhattacharyya distances with comparable means.
CoRR, abs/1702.04877, 2017.
[17] Frank Nielsen and Richard Nock.
Generalizing skew Jensen divergences and Bregman divergences with comparative convexity.
IEEE Signal Processing Letters, 24(8):1123–1127, Aug 2017.
[18] Richard Nock, Frank Nielsen, and Shun-ichi Amari.
On conformal divergences and their population minimizers.
IEEE Trans. Information Theory, 62(1):527–538, 2016.
[19] Jun Zhang.
Divergence function, duality, and convex analysis.
Neural Computation, 16(1):159–195, 2004.
30

Bregman divergences from comparative convexity

  • 1.
    Bregman divergences fromcomparative convexity Frank Nielsen Richard Nock 2017 1
  • 2.
  • 3.
    Midpoint convexity, continuityand convexity In 1906, Jensen [9] introduced the midpoint convex property of a function F: F(p) + F(q) 2 ≥ F p + q 2 , ∀p, q ∈ X. A continuous function F satisfying the midpoint convexity implies it also obeys the convexity property [14]: ∀p, q, ∀λ ∈ [0, 1], F(λp + (1 − λ)q) ≤ λF(p) + (1 − λ)F(q). When inequality is strict for distinct points and λ ∈ (0, 1), this inequality defines the strict convex property of F. A function satisfying only the midpoint convexity inequality may not be continuous [11] (and hence not convex). 3
  • 4.
    Jensen divergences andskew Jensen divergences A divergence D(p, q) is proper iff D(p, q) ≥ 0 with equality iff p = q. Jensen divergence (Burbea and Rao divergence [6]) defined for a strictly convex function F ∈ F<: JF (p, q):= F(p) + F(q) 2 − F p + q 2 . Skew Jensen divergences [19, 15] for α ∈ (0, 1): JF,α(p : q):=(1 − α)F(p) + αF(q) − F((1 − α)p + αq), with JF,α(q : p) = JF,1−α(p : q). 4
  • 5.
    Bregman divergences Bregman divergences[4] (1967) for a strictly convex and differentiable generator F: BF (p : q):=F(p) − F(q) − (p − q) F(q). Scaled skew Jensen divergences [19, 15]: JF,α(p : q):= 1 α(1 − α) JF,α(p : q), with limit cases recovering Bregman divergences: lim α→1− JF,α(p : q) = BF (p : q), lim α→0+ JF,α(p : q) = BF (q : p). 5
  • 6.
  • 7.
    Jensen midpoint convexitywrt comparative convexity [14] Let M(·, ·) and N(·, ·) be two abstract mean functions [5]. min{p, q} ≤ M(p, q) ≤ max{p, q}. (1) Functions F ∈ C≤ M,N is said midpoint (M, N)-convex iff ∀p, q ∈ X, F(M(p, q)) ≤ N(F(p), F(q)). When M(p, q) = N(p, q) = A(p, q) = p+q 2 are the arithmetic means, recover the ordinary Jensen midpoint convexity: ∀p, q ∈ X, F p + q 2 ≤ F(p) + F(q) 2 Note that there exist discontinuous functions that satisfy the Jensen midpoint convexity. 7
  • 8.
    Means, regular means, andweighted means 8
  • 9.
    Regular means A meanis said regular if it is homogeneous: M(λp, λq) = λM(p, q) for any λ > 0 symmetric: M(p, q) = M(q, p) continuous, increasing in each variable. Examples of regular means: power means or Hölder means [8]: Pδ(x, y) = xδ + yδ 2 1 δ , P0(x, y) = G(x, y) = √ xy belong to a broader family of quasi-arithmetic means. For a continuous and strictly increasing function f : I ⊂ R → J ⊂ R: Mf (p, q):=f −1 f (p) + f (q) 2 . also called Kolmogorov-Nagumo-de Finetti means [10, 13, 7]. 9
  • 10.
    Examples of quasi-arithmeticmeans: Pythagorean means f (x) = x, arithmetic mean (A): A(p, q) = p + q 2 f (x) = log x, geometric mean (G): G(p, q) = √ pq f (x) = 1 x , harmonic mean (H): H(p, q) = 2 1 p + 1 q = 2pq p + q 10
  • 11.
    Lagrange means [3] Lagrangemeans [3] (Lagrangean means) are derived from the mean value theorem. Assume wlog that p < q so that the mean m ∈ [p, q]. From the mean value theorem, we have for a differentiable function f : ∃λ ∈ [p, q] : f (λ) = f (q) − f (p) q − p . Thus when f is a monotonic function, its inverse function f −1 is well-defined, and the unique mean value mean λ ∈ [p, q] can be defined as: Lf (p, q) = λ = (f )−1 f (q) − f (p) q − p . 11
  • 12.
    An example ofa Lagrange mean [3] For example, f (x) = log(x) and f (x) = (f )−1(x) = 1 x , we get the logarithmic mean (L), that is not a quasi-arithmetic mean: m(p, q) =    0 if p = 0 or q = 0, x if p = q, q−p log q−log p otherwise, 12
  • 13.
    Cauchy means [12] Twopositive differentiable and strictly monotonic functions f and g such that f g has an inverse function. The Cauchy mean-value mean: Cf ,g (p, q) = f g −1 f (q) − f (p) g(q) − g(p) , q = p, with Cf ,g (p, p) = p. Cauchy means as Lagrange means [12] by the following identity: Cf ,g (p, q) = Lf ◦g−1 (g(p), g(q)) since ((f ◦ g−1)(x)) = f (g−1(x)) g (g−1(x)) : Proof: Lf ◦g−1 (g(p), g(q)) = ((f ◦ g −1 ) (g(x))) −1 (f ◦ g−1 )(g(q)) − (f ◦ g−1 )(g(p)) g(q) − g(p) , = f (g−1 (g(x))) g (g−1(g(x))) −1 f (q) − f (p) g(q) − g(p) , = Cf ,g (p, q). 13
  • 14.
    Stolarsky regular means TheStolarsky regular means are not quasi-arithmetic means nor mean-value means [5], and are defined by: Sp(x, y) = xp − yp p(x − y) 1 p−1 , p ∈ {0, 1}. In limit cases, the Stolarsky family of means yields the logarithmic mean (L) when p → 0: L(x, y) = y − x log y − log x , and the identric mean (I) when p → 1: I(x, y) = yy xx 1 y−x . 14
  • 15.
    Lehmer and Gini(non-regular) means Weighted Lehmer mean [2] of order δ is defined for δ ∈ R as: Lδ(x1, . . . , xn; w1, . . . , wn) = n i=1 wi xδ+1 i n i=1 wi xδ i . The Lehmer means intersect with the Hölder means only for the arithmetic (A), geometric (G) and harmonic (H) means. Lehmer barycentric means belong to Gini means: Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) = n i=1 wi xδ1 i n i=1 wi xδ2 i 1 δ1−δ2 , when δ1 = δ2, and Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) = n i=1 x wi xδ i i 1 n i=1 wi xδ i , when δ1 = δ2 = δ. 15
  • 16.
    Lehmer and Gini(non-regular) means Those families of Gini and Lehmer means are (positively) homogeneous means: Gδ1,δ2 (λx1, . . . , λxn; w1, . . . , wn) = λGδ1,δ2 (x1, . . . , xn; w1, . . . , wn), for any λ > 0. The family of Gini means include the power means: G0,δ = Pδ for δ ≤ 0 and Gδ,0 = Pδ for δ ≥ 0. The Lehmer and Gini means are not always regular since L2 is not regular. 16
  • 17.
    Weighted abstract meansand (M, N)-convexity Consider weighted means M and N, Mα(p, q):=M(p, q; 1 − α, α). Function F is (M, N)-convex if and only if: F(M(p, q; 1 − α, α)) ≤ N(F(p), F(q); 1 − α, α). For regular means and continuous function F, define the skew (M, N)-Jensen divergence: JM,N F,α (p : q) = Nα(F(p), F(q)) − F(Mα(p, q)) ≥ 0 We have JM,N F,α (q, p) = JM,N F,1−α(p : q) Continuous function + midpoint convexity = (M, N)-convexity property (Theorem A of [14]): Nα(F(p), F(q)) ≥ F(Mα(p, q)), ∀p, q ∈ X, ∀α ∈ [0, 1]. 17
  • 18.
  • 19.
    Generalized Bregman divergenceswrt comparative convexity Define the (M, N)-Bregman divergence for regular means as limit of scaled skew (M, N)-Jensen divergences: BM,N F (p : q) = lim α→1− 1 α(1 − α) JM,N F,α (p : q), = lim α→1− 1 α(1 − α) (Nα(F(p), F(q))) − F(Mα(p, q))) BM,N F (q : p) = lim α→0+ 1 α(1 − α) JM,N F,α (p : q) Need to prove that the limit exists and that the generalized Bregman divergences are proper: BM,N F (q : p) ≥ 0 with equality iff p = q. 19
  • 20.
    Quasi-arithmetic Bregman divergences Theorem(Quasi-arithmetic Bregman divergences) Let F : I ⊂ R → R be a real-valued strictly (ρ, τ)-convex function defined on an interval I for two strictly monotone and differentiable functions ρ and τ (with ρ and τ the respective derivatives). The Quasi-Arithmetic Bregman divergence (QABD) induced by the comparative convexity is: Bρ,τ F (p : q) = τ(F(p)) − τ(F(q)) τ (F(q)) − ρ(p) − ρ(q) ρ (q) F (q), = κτ (F(q) : F(p)) − κρ(q : p)F (q), where primes ’ denote derivatives and κγ(x : y) = γ(y) − γ(x) γ (x) . 20
  • 21.
    Quasi-arithmetic Bregman divergences(proof 1/2) First-order Taylor expansion of τ−1(x) at x0: τ−1 (x) x0 τ−1 (x0) + (x − x0)(τ−1 ) (x0). Using the derivative of an inverse function (τ−1) (x) = 1 (τ (τ−1)(x)) , it follows: τ−1 (x) τ−1 (x0) + (x − x0) 1 (τ (τ−1)(x0)) . Plugging x0 = τ(p) and x = τ(p) + α(τ(q) − τ(p)), get a first-order approximation of the weighted quasi-arithmetic mean Mτ,α when α → 0: Mτ,α(p, q) p + α(τ(q) − τ(p)) τ (p) . 21
  • 22.
    Quasi-arithmetic Bregman divergences(proof 2/2) Comparative convexity skew Jensen Divergence defined by: Jτ,ρ F,α(p : q) = Mτ,α(F(p), F(q)) − F(Mρ,α(p, q)) Apply a first-order Taylor expansion to get F(Mρ,α(p, q))) F p + α(ρ(q) − ρ(p)) ρ (p) F(p) + α(τ(q) − τ(p)) τ (p) F (p). It follows the quasi-arithmetic Bregman divergence: Bρ,τ F (q : p) = lim α→0 J τ,ρ α (p : q) = τ(F(q)) − τ(F(p)) τ (F(p)) − ρ(q) − ρ(p) ρ (p) F (p) similar for reverse quasi-arithmetic Bregman divergence. 22
  • 23.
    Quasi-arithmetic Bregman divergencesare proper Ordinary Bregman divergence on the convex generator G(x) = τ(F(ρ−1(x))) for a (ρ, τ)-convex function F: G (x) = τ(F(ρ−1 (x))) = 1 (ρ (ρ−1)(x)) F (ρ−1 (x))τ (F(ρ−1 (x))). Get an ordinary Bregman divergence that is, in general, different from the generalized quasi-arithmetic Bregman divergence (BG (p : q) = Bρ,τ F (p : q)) : BG (p : q) = τ(F(ρ−1 (p))) − τ(F(ρ−1 (q))) −(p − q) F (ρ−1(q))τ (F(ρ−1(q))) (ρ (ρ−1)(q)) Sanity check: BG (p : q) = Bρ,τ F (p : q) when ρ(x) = τ(x) = x. Remarkable identity: Bρ,τ F (p : q) = 1 τ (F(q)) BG (ρ(p) : ρ(q)). 23
  • 24.
    Power mean Bregmandivergences A subfamily of quasi-arithmetic Bregman divergences obtained for the generators ρ(x) = xδ1 and τ(x) = xδ2 : Corollary (Power Mean Bregman Divergences) For δ1, δ2 ∈ R{0} with F ∈ CPδ1 ,Pδ2 , get the family of Power Mean Bregman Divergences (PMBDs): Bδ1,δ2 F (p : q) = Fδ2 (p) − Fδ2 (q) δ2Fδ2−1(q) − pδ1 − qδ1 δ1qδ1−1 F (q) Sanity check for δ1 = δ2 = 1: recover the ordinary Bregman divergence.
  • 25.
    Quasi-arithmetic to ordinaryconvexity criterion To check whether a function F is (M, N)-convex or not for quasi-arithmetic means Mρ and Mτ , we use an equivalent test to ordinary convexity: Lemma ((ρ, τ)-convexity ↔ ordinary convexity [1]) Let ρ : I → R and τ : J → R be two continuous and strictly monotone real-valued functions with τ increasing, then function F : I → J is (ρ, τ)-convex iff function G = Fρ,τ = τ ◦ F ◦ ρ−1 is (ordinary) convex on ρ(I). F ∈ C≤ ρ,τ ⇔ G = τ ◦ F ◦ ρ−1 ∈ C≤
  • 26.
    Quasi-arithmetic to ordinaryconvexity criterion (proof) Let us rewrite the (ρ, τ)-convexity midpoint inequality as follows: F(Mρ(x, y)) ≤ Mτ (F(x), F(y)), F ρ−1 ρ(x) + ρ(y) 2 ≤ τ−1 τ(F(x)) + τ(F(y)) 2 , Since τ is strictly increasing, we have: (τ ◦ F ◦ ρ−1 ) ρ(x) + ρ(y) 2 ≤ (τ ◦ F)(x) + (τ ◦ F)(y) 2 . Let u = ρ(x) and v = ρ(y) so that x = ρ−1(u) and y = ρ−1(v) (with u, v ∈ ρ(I)). Then it comes that: (τ ◦ F ◦ ρ−1 ) u + v 2 ≤ (τ ◦ F ◦ ρ−1)(u) + (τ ◦ F ◦ ρ−1)(v) 2 . This last inequality is precisely the ordinary midpoint convexity inequality for function G = Fρ,τ = τ ◦ F ◦ ρ−1. Thus a function F is (ρ, τ)-convex iff G = τ ◦ F ◦ ρ−1 is ordinary convex, and vice-versa. 26
  • 27.
    Summary of contributions[17, 16] defined generalized Jensen divergences and generalized Bregman divergences using comparative convexity reported an explicit formula for the quasi-arithmetic Bregman divergences (QABD) The QABD can be interpreted as a conformal Bregman divergence [18] on the ρ-representation emphasizes that the theory of means [5] is at the very heart of distances. see arXiv report for furthre results including a generalization of Bhattacharyya statistical distance using comparable means 27
  • 28.
    References I [1] JohnAczél. A generalization of the notion of convex functions. Det Kongelige Norske Videnskabers Selskabs Forhandlinger, Trondheim, 19(24):87–90, 1947. [2] Gleb Beliakov, Humberto Bustince Sola, and Tomasa Calvo Sánchez. A practical guide to averaging functions, volume 329. Springer, 2015. [3] Lucio R Berrone and Julio Moro. Lagrangian means. Aequationes Mathematicae, 55(3):217–226, 1998. [4] Lev M Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3):200–217, 1967. [5] Peter S Bullen, Dragoslav S Mitrinovic, and M Vasic. Means and their Inequalities, volume 31. Springer Science & Business Media, 2013. [6] Jacob Burbea and C Rao. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory, 28(3):489–495, 1982. [7] Bruno De Finetti. Sul concetto di media. 3:369Ű396, 1931. Istituto italiano degli attuari. [8] Otto Ludwig Holder. Über einen Mittelwertssatz. Nachr. Akad. Wiss. Gottingen Math.-Phys. Kl., pages 38–47, 1889. 28
  • 29.
    References II [9] JohanLudwig William Valdemar Jensen. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta mathematica, 30(1):175–193, 1906. [10] Andrey Nikolaevich Kolmogorov. Sur la notion de moyenne. 12:388Ű391, 1930. Acad. Naz. Lincei Mem. Cl. Sci. His. Mat. Natur. Sez. [11] Gyula Maksa and Zsolt Páles. Convexity with respect to families of means. Aequationes mathematicae, 89(1):161–167, 2015. [12] Janusz Matkowski. On weighted extensions of Cauchy’s means. Journal of mathematical analysis and applications, 319(1):215–227, 2006. [13] Mitio Nagumo. Über eine klasse der mittelwerte. In Japanese journal of mathematics: transactions and abstracts, volume 7, pages 71–79. The Mathematical Society of Japan, 1930. [14] Constantin P. Niculescu and Lars-Erik Persson. Convex functions and their applications: A contemporary approach. Springer Science & Business Media, 2006. [15] Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, 2011. 29
  • 30.
    References III [16] FrankNielsen and Richard Nock. Generalizing Jensen and Bregman divergences with comparative convexity and the statistical Bhattacharyya distances with comparable means. CoRR, abs/1702.04877, 2017. [17] Frank Nielsen and Richard Nock. Generalizing skew Jensen divergences and Bregman divergences with comparative convexity. IEEE Signal Processing Letters, 24(8):1123–1127, Aug 2017. [18] Richard Nock, Frank Nielsen, and Shun-ichi Amari. On conformal divergences and their population minimizers. IEEE Trans. Information Theory, 62(1):527–538, 2016. [19] Jun Zhang. Divergence function, duality, and convex analysis. Neural Computation, 16(1):159–195, 2004. 30