SlideShare a Scribd company logo
1 of 29
Download to read offline
A series of maximum entropy upper
bounds of the differential entropy
https://arxiv.org/abs/1612.02954
https://www.lix.polytechnique.fr/~nielsen/MEUB/
Frank Nielsen1,2 Richard Nock3,4,5
Frank.Nielsen@acm.org
1École Polytechnique, France 2Sony Computer Science Laboratories, Japan
3Data61, Australia 4ANU, Australia 5The University of Sydney, Australia
January 2017
1
Shannon’s differential entropy
X ∼ p(x): continuous random variable, support
X = {x ∈ R : p(x) > 0}
Shannon’s entropy quantifies amount of uncertainty [2]:
H(X) =
X
p(x) log
1
p(x)
dx = −
X
p(x) log p(x)dx (1)
logarithm: basis 2 (unit in bits), basis e (nats).
Differential entropy is strictly concave and:
May be negative: X ∼ N(µ, σ), H(X) = 1
2 log(2πeσ2) < 0
when σ < 1√
2πe
May be infinite (unbounded): X ∼ p(x) with p(x) = log(2)
x log2
x
for x > 2 (with support X = (2, ∞))
Closed forms [7, 9] for many distribution families, but the
differential entropy of mixtures usually does not admit
closed-form expressions [10, 6] 2
Maximum Entropy Principle (MaxEnt)
Jaynes’ MaxEnt distribution principle [4, 5] (1957):
Infer a distribution given several moment constraints.
Constrained optimization problem:
max
p
H(p) : E[ti (X)] = ηi , i ∈ [D] = {1, . . . , D}. (2)
When an iid sample set {x1, . . . , xs} is given, we may choose,
for example, the raw geometric sample moments
ηi = 1
s
s
j=1 xi
j for setting up the constraint E[Xi ] = ηi (ie.,
taking ti (X) = Xi in Eq. 2).
The distribution p(x) maximizing the entropy under those
moment constraints is unique and termed the MaxEnt
distribution. The constrained optimization of Eq. 2 is solved
by means of Lagrangian multipliers [8, 2].
3
MaxEnt and exponential families
MaxEnt distribution p(x) belongs to a parametric family of
distributions called an exponential family [1, 8, 3].
Canonical probability density function of an exponential family
(EF):
p(x; θ) = exp ( θ, t(x) − F(θ)) (3)
a, b = a b: scalar product
θ ∈ Θ: natural parameter
Θ: natural parameter space
t(x): sufficient statistics
F(θ) = log p(x; θ)dx: log-normalizer [1]
4
Dual parameterizations of exponential families
A distribution p(x; θ) of an exponential family can be
parameterized equivalently either using the
natural coordinate system θ,
expectation coordinate system η = Ep(x;θ)[t(x)]
(also called moment coordinate system)
The two coordinate systems are linked by the Legendre
transformation:
F∗
(η) = sup
θ
{ η, θ − F(θ)}
η = F(θ), θ = F∗
(η)
In practice, when F(θ) is not available in closed-forms, conversion
θ ↔ η is approximated numerically [8].
5
Differential entropy of exponential families
Closed-form when the dual Legendre convex conjugate function is
in closed-form:
H(p(x; θ)) = −F∗
(η(θ))
More general form when allowing an auxiliary carrier measure
term [9]
6
Strategy to get MaxEnt Upper Bounds (MEUBs)
Rationale: Any other distribution with density p (x) different from
the MaxEnt distribution p(x) and satisfying all the D moment
constraints E[ti (X)] = ηi have smaller entropy:
H(p (x)) ≤ H(p(x)) with p(x) = p(x; θ).
Receipe for building MaxEnt Upper Bounds on arbitrary density
q(x):
Choose sufficient statistics ti (x) so that the differential
entropy H(p(x; η)) of the induced maxent distribution p(x; θ)
is in closed-form (or can be unbounded easily)
Compute the moment coordinates ηi = Eq[ti (x)], and deduce
that H(q(x)) ≤ H(p(x; η))
7
Absolute Monomial Exponential Family
pl (x; θ) = exp θ|x|l
− Fl (θ) , x ∈ R (4)
for θ < 0.
Exponential family (t(x) = |x|l ) with log-normalizer:
Fl (θ) = log 2 + log Γ
1
l
− log l −
1
l
log(−θ), (5)
Γ(u) =
∞
0 xu−1 exp(−x)dx generalizes the factorial:
Γ(n) = (n − 1)! for n ∈ N
8
Differential entropy of AMEFs
The entropy expressed using the θ-parameter is:
Hl (θ) = log 2 + log Γ
1
l
− log l +
1
l
(1 − log(−θ)),
= al −
1
l
log(−θ), (6)
where al = log 2 + log Γ 1
l − log l + 1
l .
The entropy expressed using the η-parameter is:
Hl (η) = log 2 + log Γ
1
l
− log l +
1
l
(1 + log l + log η),
= bl +
1
l
log η, (7)
with bl = log
2Γ( 1
l
)(el)
1
l
l .
9
A series of MaxEnt Upper Bounds (MEUBs)
For any continuous RV X, MaxEnt entropy Upper Bound (MEUB)
Ul :
H(X) ≤ Hη
l EX |X|l
Are all UBs useful?
That is, can we build a RV X so that Ul+1 < Ul ?
(Anwser is yes!)
10
AMEF MEUBs for Gaussian Mixture Models
Density of a mixture model with k components:
m(x) =
k
c=1
wcpc(x)
Gaussian distribution:
pi (x) = p(x; µi , σi ) =
1
√
2πσi
exp −
(x − µi )2
2σ2
i
,
µi = E[Xi ] ∈ R: mean parameter
σi = E[(Xi − µi )2] > 0: standard deviation
To upper bound H(X) ≤ Hη
l EX |X|l , we need to compute the
raw absolute geometric moments EX |X|l for a GMM.
11
Raw absolute geometric moments of a GMM
Technical part (integration by parts and solving recurrence)
Al (X) =



k
c=1 wc
l
2
i=0
l
2i
µl−2i
c σ2i
c 2i Γ( 1+2i
2
)
√
π
= k
c=1 wc
l
2
i=0
l
2i
µl−2i
c σ2i
c (2i − 1)!! for even l,
k
c=1 wc
l
i=0
n
i
µl−i
c σi
c Ii − µc
σc
− (−1)i
Ii
µc
σc
for odd l.
,
where n!! denotes the double factorial: n!! =
n
2
−1
k=0
(n − 2k) = 2n+1
π
Γ( n
2
+ 1), and:
Ii (a) =
1
√
2π
+∞
a
x
i
exp −
1
2
x
2
dx,
=
1
√
2π
a
i−1
exp −
1
2
a
2
+ (i − 1)Ii−2(a),
with the terminal recursion cases:
I0(a) = 1 − Φ(a) =
1
2
1 − erf
a
√
2
=
1
2
erfc
a
√
2
,
I1(a) =
1
√
2π
exp −
1
2
a
2
.
12
Laplacian MEUB for a GMM (l = 1)
AMEF for l = 1 is the Laplacian distribution
The differential entropy of a Gaussian mixture model
X ∼ k
c=1 wcp(x; µc, σc) is upper bounded by:
H(X) ≤ U1(X)
U1(X) = log

2e


k
c=1
wc

µc 1 − 2Φ −
µc
σc
+ σc
2
π
exp −
1
2
µc
σc
2





 .
13
Gaussian MEUB for a GMM (l = 2)
AMEF for l = 2 is the Gaussian distribution
The differential entropy of a GMM X ∼ k
c=1 wcp(x; µc, σc) is
upper bounded by:
H(X) ≤ U2(X) =
1
2
log 2πe
k
c=1
wc((µc − ¯µ)2
+ σ2
c ) ,
with ¯µ = k
c=1 wcµc.
14
Vanilla approximation method: Monte-Carlo
Estimate H(X) using Monte-Carlo (MC) stochastic integration:
ˆHs(X) = −
1
s
s
i=1
log p(xi ), (8)
where {x1, . . . , xs} is an iid set of variates sampled from X ∼ p(x).
MC estimator ˆHs(X) is consistent:
lim
s→∞
ˆHs(X) = H(X)
(convergence in probability)
However, no deterministic bound, can be above or below true value.
15
Experiments: Laplacian vs Gaussian MEUBs
k = 2 to 10 for µi , σi ∼iid U(0, 1), averaged on 1000 trials.
k Average error Percentage of times U1(X) < U2(X)
2 0.5401015778688498 32.7
3 2.7397146972652484 39.2
4 3.4333962273074774 47.9
5 0.9310683623797987 49.9
6 0.5902956910979954 52.1
7 0.7688142345093779 53.2
8 0.2982994538560814 53.8
9 0.1955843679792208 56.8
10 0.1797637053023196 59.9
Important to recenter the GMMs so that they have zero
expectation (as AMEFs): This does not change the entropy. If not,
the 30%+ rates fall significantly to less than 10%.
16
Are all AMEF MEUBs useful for GMMs?
For zero-centered GMMs, only Laplacian or Gaussian MEUB is
useful,
For arbitrary GMMs, each bound can be the tightest one
(k = 2, with GMM mean 0 and two symmetric components
with small standard deviation).
17
Zero-centered GMMs
U1(X) < U2(X) iff
log 2e
2
π
¯σ1 ≤ log
√
2πeσ2.
σ1
σ2
≤
π
2
√
e
≈ 0.9527
σ1: arithmetic weighted mean, σ2 = k
i=1 wi σ2
i : quadratic mean
weighted quadratic mean dominates weighted arithmetic mean:
σ1
σ2
≤ 1.
k = 1: σ > 2
√
e
π (ie., σ > 1.0496)
18
Zero-centered GMMs: Ul+2 < Ul?
Geometric raw (even) moments coincide with the central (even)
geometric moments
H(X) ≤ Hη
l (Al (X)) = bl +
1
l
log zl + log ¯σl ,
EX [Xl
] = 2
l
2
Γ(1+l
2 )
√
π
zl
k
i=1
wi σl
i = Al (X).
¯σl : l-th power mean: ¯σl = k
i=1 wi σl
i
1
l
¯σl+2
¯σl
≥ 1 ⇒ log ¯σl+2
¯σl
≥ 0
→ not possible (see arXiv).
19
Arbitrary GMMs: Consider 2-component GMM
m(x) = 1
2 p(x; −1
2, 10−5) + 1
2 p(x; 1
2 , 10−5)
H (MC):-9.400517405407735
1 MEUB:0.999999999958284
2 MEUB:0.7257913528258293
3 MEUB:0.5863457882025702
4 MEUB:0.4983017544470345
5 MEUB:0.43651349327316713
6 MEUB:0.390267211711506
7 MEUB:0.35410343073850886
8 MEUB:0.32490700997403515
9 MEUB:0.3007543998901125
10 MEUB:0.2803860698295638
11 MEUB:0.2629389102447494
12 MEUB:0.24779955106708096
13 MEUB:0.23451890956649502
14 MEUB:0.22275989562550735
15 MEUB:0.21226407836562905
16 MEUB:0.2028296978359989
17 MEUB:0.19429672922288133
18 MEUB:0.18653647716356042
19 MEUB:0.17944416377804479
20 MEUB:0.1729335449648154
21 MEUB:0.16693293142890442
22 MEUB:0.16138220185001972
23 MEUB:0.1562305292037145
24 MEUB:0.15143462788690765
25 MEUB:0.14695738668300817
26 MEUB:0.14276679134420478
27 MEUB:0.13883506718452443
28 MEUB:0.13513799065560295
29 MEUB:0.13165433203558718
30 MEUB:0.12836540080724268
31 MEUB:0.12525467216646413 20
Contributions and conclusion
Introduced the class of Absolute Monomial Exponential
Families (AMEFs) with closed-form log-normalizer,
Reported closed-form formulæ for the differential entropy of
AMEFs,
Calculated the exact non-centered absolute geometric
moments for a Gaussian Mixture Model (GMMs),
Apply MaxEnt Upper Bounds induced by AMEFs to GMMs:
All upper bounds are potentially useful for non-centered GMMs
(But for zero centered-GMMs, only the first two bounds are enough.)
Recommend min(U1, U2) in applications! (not only U2)
Reproducible research with code
https://www.lix.polytechnique.fr/~nielsen/MEUB/
21
L.D. Brown.
Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision
Theory.
IMS Lecture Notes. Institute of Mathematical Statistics, 1986.
Thomas M Cover and Joy A Thomas.
Elements of information theory.
John Wiley & Sons, 2012.
Aapo Hyvarinen, Juha Karhunen, and Erkki Oja.
Independent component analysis.
Adaptive and learning systems for signal processing, communications, and control. John Wiley,
New York, Chichester, Weinheim, 2001.
Edwin T Jaynes.
Information theory and statistical mechanics.
Physical review, 106(4):620, 1957.
Edwin T Jaynes.
Information theory and statistical mechanics II.
Physical review, 108(2):171, 1957.
Joseph V Michalowicz, Jonathan M Nichols, and Frank Bucholtz.
Calculation of differential entropy for a mixed Gaussian distribution.
Entropy, 10(3):200–206, 2008.
Joseph Victor Michalowicz, Jonathan M. Nichols, and Frank Bucholtz.
Handbook of Differential Entropy.
Chapman & Hall/CRC, 2013.
Ali Mohammad-Djafari.
A Matlab program to calculate the maximum entropy distributions.
In Maximum Entropy and Bayesian Methods, pages 221–233. Springer, 1992.
22
Frank Nielsen and Richard Nock.
Entropies and cross-entropies of exponential families.
In 2010 IEEE International Conference on Image Processing, pages 3621–3624. IEEE, 2010.
Sumio Watanabe, Keisuke Yamazaki, and Miki Aoyagi.
Kullback information of normal mixture is not an analytic function.
Technical report of IEICE (in Japanese), pages 41–46, 2004.
23
Differential entropy of a location-scale family
Density of a location-scale distribution: p(x; µ, σ) = 1
σ p0(x−µ
σ )
µ ∈ R: location parameter and σ > 0: dispersion parameter.
Change of variable y = x−µ
σ (with dy = dy
σ ) in the integral to get:
H(X) =
+∞
x=−∞
−
1
σ
p0
x − µ
σ
log
1
σ
p0
x − µ
σ
dx,
=
+∞
y=−∞
−p0(y)(log p0(y) − log σ),
= H(X0) + log σ.
→ always independent of the location parameter µ
24
Non-central even geometric moments of a normal
distribution
Even l Al = E |X|l = E Xl =
l
2
i=0
l
2i (2i − 1)!!µl−2i σ2i
2 µ2 + σ2
4 µ4 + 6µ2σ2 + 3σ4
6 µ6 + 15µ4σ2 + 45µ2σ4 + 15σ6
8 µ8 + 28µ6σ2 + 210µ4σ4 + 420µ2σ6 + 105σ8
10 µ10 + 45µ8σ2 + 630µ6σ4 + 3150µ4σ6 + 4725µ2σ8 + 945σ10
25
Non-central odd geometric moments of a normal
distribution
Odd l Al = E |X|l
= Cl (µ, σ) 2
π
exp(− µ2
2σ2 ) + Dl (µ, σ)erf( µ√
2σ
)
1 σ 2
π
exp(− µ2
2σ2 ) + µerf( µ√
2σ
)
3 (2σ3
+ µ2
σ) 2
π
exp(− µ2
2σ2 ) + (µ3
+ 3µσ2
)erf( µ√
2σ
)
5 (8σ5
+ 9µ2
σ3
+ µ4
σ) 2
π
exp(− µ2
2σ2 ) + (µ5
+ 10µ3
σ2
+ 15µσ4
)erf( µ√
2σ
)
7 (48σ7
+ 87µ2
σ5
+ 20µ4
σ3
+ µ6
σ) 2
π
exp(− µ2
2σ2 )+
(µ7
+ 21µ5
σ2
+ 105µ3
σ4
+ 105µσ6
)erf( µ√
2σ
)
9 (384σ9
+ 975µ2
σ7
+ 345µ4
σ5
+ 35µ6
σ3
+ µ8
σ) 2
π
exp(− µ2
2σ2 )+
(µ9
+ 36µ7
σ2
+ 378µ5
σ4
+ 1260µ3
σ6
+ 945σ8
)erf( µ√
2σ
)
26
Maxima program
assume (theta<0);
F(theta) := log(integrate(exp(theta*abs(x)^5),x,-inf,inf));
integrate(exp(theta*abs(x)^5-F(theta)),x,-inf,inf);
27
Maxima program
/* Binomial expansion */
binomialExpansion(i,p,q) := if i = 1 then p+q
else expand((p+q)*binomialExpansion(i-1,p,q)) ;
/* The standard distribution (here, normal) */
p0(y) := exp(-y^2/2)/sqrt(2*pi);
/* Even moment */
absEvenMoment(mu,sigma,l) :=
ratexpand(ratsimp(integrate(factor(expand(binomialExpansion(l,mu,y*sigma)))*p0(y),y,-inf,inf)));
/* Odd moment */
absOddMoment(mu,sigma,l) :=
ratexpand(ratsimp(integrate(factor(expand(binomialExpansion(l,mu,y*sigma)))*p0(y),y,-mu/sigma,inf)
-integrate(factor(expand(binomialExpansion(l,mu,y*sigma)))*p0(y),y,-inf,-mu/sigma)));
/* General : Maxima does not give a closed-form formula
because of the absolute value */
absMoment(mu,sigma,l) :=
ratexpand(ratsimp(integrate(abs(factor(expand(binomialExpansion(l,mu,y*sigma))))*p0(y),y,-inf,inf)));
assume(sigma>0);
assume(mu>0); /* maxima needs to branch condition */
absEvenMoment(mu,sigma,8);
absOddMoment(mu,sigma,7);
28
29

More Related Content

What's hot

Clustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryClustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryFrank Nielsen
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsFrank Nielsen
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningFrank Nielsen
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportGabriel Peyré
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureVjekoslavKovac1
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Francesco Tudisco
 
A new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsA new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsFrancesco Tudisco
 

What's hot (20)

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli... Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 
Clustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryClustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometry
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learning
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal Transport
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structure
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
A new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsA new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensors
 

Viewers also liked

INF442: Traitement des données massives
INF442: Traitement des données massivesINF442: Traitement des données massives
INF442: Traitement des données massivesFrank Nielsen
 
Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Frank Nielsen
 
On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)Frank Nielsen
 
Traitement des données massives (INF442, A1)
Traitement des données massives (INF442, A1)Traitement des données massives (INF442, A1)
Traitement des données massives (INF442, A1)Frank Nielsen
 
Traitement des données massives (INF442, A3)
Traitement des données massives (INF442, A3)Traitement des données massives (INF442, A3)
Traitement des données massives (INF442, A3)Frank Nielsen
 
Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)Frank Nielsen
 
Traitement massif des données 2016
Traitement massif des données 2016Traitement massif des données 2016
Traitement massif des données 2016Frank Nielsen
 
Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Frank Nielsen
 
Traitement des données massives (INF442, A2)
Traitement des données massives (INF442, A2)Traitement des données massives (INF442, A2)
Traitement des données massives (INF442, A2)Frank Nielsen
 
(ISIA 5) Cours d'algorithmique (1995)
(ISIA 5) Cours d'algorithmique (1995)(ISIA 5) Cours d'algorithmique (1995)
(ISIA 5) Cours d'algorithmique (1995)Frank Nielsen
 
Computational Information Geometry for Machine Learning
Computational Information Geometry for Machine LearningComputational Information Geometry for Machine Learning
Computational Information Geometry for Machine LearningFrank Nielsen
 

Viewers also liked (11)

INF442: Traitement des données massives
INF442: Traitement des données massivesINF442: Traitement des données massives
INF442: Traitement des données massives
 
Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)
 
On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)
 
Traitement des données massives (INF442, A1)
Traitement des données massives (INF442, A1)Traitement des données massives (INF442, A1)
Traitement des données massives (INF442, A1)
 
Traitement des données massives (INF442, A3)
Traitement des données massives (INF442, A3)Traitement des données massives (INF442, A3)
Traitement des données massives (INF442, A3)
 
Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)
 
Traitement massif des données 2016
Traitement massif des données 2016Traitement massif des données 2016
Traitement massif des données 2016
 
Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)
 
Traitement des données massives (INF442, A2)
Traitement des données massives (INF442, A2)Traitement des données massives (INF442, A2)
Traitement des données massives (INF442, A2)
 
(ISIA 5) Cours d'algorithmique (1995)
(ISIA 5) Cours d'algorithmique (1995)(ISIA 5) Cours d'algorithmique (1995)
(ISIA 5) Cours d'algorithmique (1995)
 
Computational Information Geometry for Machine Learning
Computational Information Geometry for Machine LearningComputational Information Geometry for Machine Learning
Computational Information Geometry for Machine Learning
 

Similar to A series of maximum entropy upper bounds of the differential entropy

A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsFrank Nielsen
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationAlexander Litvinenko
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
 
Physical Chemistry Assignment Help
Physical Chemistry Assignment HelpPhysical Chemistry Assignment Help
Physical Chemistry Assignment HelpEdu Assignment Help
 
Optimal multi-configuration approximation of an N-fermion wave function
 Optimal multi-configuration approximation of an N-fermion wave function Optimal multi-configuration approximation of an N-fermion wave function
Optimal multi-configuration approximation of an N-fermion wave functionjiang-min zhang
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte CarloUnbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte CarloJeremyHeng10
 
Numerical solution of the Schr¨odinger equation
Numerical solution of the Schr¨odinger equationNumerical solution of the Schr¨odinger equation
Numerical solution of the Schr¨odinger equationMohamed Ramadan
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko
 
Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)asghar123456
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Docslide.us 2002 formulae-and-tables
Docslide.us 2002 formulae-and-tablesDocslide.us 2002 formulae-and-tables
Docslide.us 2002 formulae-and-tablesbarasActuarial
 
Applied numerical methods lec10
Applied numerical methods lec10Applied numerical methods lec10
Applied numerical methods lec10Yasser Ahmed
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionPedro222284
 
Natural and Clamped Cubic Splines
Natural and Clamped Cubic SplinesNatural and Clamped Cubic Splines
Natural and Clamped Cubic SplinesMark Brandao
 

Similar to A series of maximum entropy upper bounds of the differential entropy (20)

A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty Quantification
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
 
Physical Chemistry Assignment Help
Physical Chemistry Assignment HelpPhysical Chemistry Assignment Help
Physical Chemistry Assignment Help
 
Optimal multi-configuration approximation of an N-fermion wave function
 Optimal multi-configuration approximation of an N-fermion wave function Optimal multi-configuration approximation of an N-fermion wave function
Optimal multi-configuration approximation of an N-fermion wave function
 
Input analysis
Input analysisInput analysis
Input analysis
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte CarloUnbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo
 
Numerical solution of the Schr¨odinger equation
Numerical solution of the Schr¨odinger equationNumerical solution of the Schr¨odinger equation
Numerical solution of the Schr¨odinger equation
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
 
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systemsAdaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
 
Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Docslide.us 2002 formulae-and-tables
Docslide.us 2002 formulae-and-tablesDocslide.us 2002 formulae-and-tables
Docslide.us 2002 formulae-and-tables
 
Applied numerical methods lec10
Applied numerical methods lec10Applied numerical methods lec10
Applied numerical methods lec10
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
 
Natural and Clamped Cubic Splines
Natural and Clamped Cubic SplinesNatural and Clamped Cubic Splines
Natural and Clamped Cubic Splines
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 

Recently uploaded

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 

Recently uploaded (20)

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

A series of maximum entropy upper bounds of the differential entropy

  • 1. A series of maximum entropy upper bounds of the differential entropy https://arxiv.org/abs/1612.02954 https://www.lix.polytechnique.fr/~nielsen/MEUB/ Frank Nielsen1,2 Richard Nock3,4,5 Frank.Nielsen@acm.org 1École Polytechnique, France 2Sony Computer Science Laboratories, Japan 3Data61, Australia 4ANU, Australia 5The University of Sydney, Australia January 2017 1
  • 2. Shannon’s differential entropy X ∼ p(x): continuous random variable, support X = {x ∈ R : p(x) > 0} Shannon’s entropy quantifies amount of uncertainty [2]: H(X) = X p(x) log 1 p(x) dx = − X p(x) log p(x)dx (1) logarithm: basis 2 (unit in bits), basis e (nats). Differential entropy is strictly concave and: May be negative: X ∼ N(µ, σ), H(X) = 1 2 log(2πeσ2) < 0 when σ < 1√ 2πe May be infinite (unbounded): X ∼ p(x) with p(x) = log(2) x log2 x for x > 2 (with support X = (2, ∞)) Closed forms [7, 9] for many distribution families, but the differential entropy of mixtures usually does not admit closed-form expressions [10, 6] 2
  • 3. Maximum Entropy Principle (MaxEnt) Jaynes’ MaxEnt distribution principle [4, 5] (1957): Infer a distribution given several moment constraints. Constrained optimization problem: max p H(p) : E[ti (X)] = ηi , i ∈ [D] = {1, . . . , D}. (2) When an iid sample set {x1, . . . , xs} is given, we may choose, for example, the raw geometric sample moments ηi = 1 s s j=1 xi j for setting up the constraint E[Xi ] = ηi (ie., taking ti (X) = Xi in Eq. 2). The distribution p(x) maximizing the entropy under those moment constraints is unique and termed the MaxEnt distribution. The constrained optimization of Eq. 2 is solved by means of Lagrangian multipliers [8, 2]. 3
  • 4. MaxEnt and exponential families MaxEnt distribution p(x) belongs to a parametric family of distributions called an exponential family [1, 8, 3]. Canonical probability density function of an exponential family (EF): p(x; θ) = exp ( θ, t(x) − F(θ)) (3) a, b = a b: scalar product θ ∈ Θ: natural parameter Θ: natural parameter space t(x): sufficient statistics F(θ) = log p(x; θ)dx: log-normalizer [1] 4
  • 5. Dual parameterizations of exponential families A distribution p(x; θ) of an exponential family can be parameterized equivalently either using the natural coordinate system θ, expectation coordinate system η = Ep(x;θ)[t(x)] (also called moment coordinate system) The two coordinate systems are linked by the Legendre transformation: F∗ (η) = sup θ { η, θ − F(θ)} η = F(θ), θ = F∗ (η) In practice, when F(θ) is not available in closed-forms, conversion θ ↔ η is approximated numerically [8]. 5
  • 6. Differential entropy of exponential families Closed-form when the dual Legendre convex conjugate function is in closed-form: H(p(x; θ)) = −F∗ (η(θ)) More general form when allowing an auxiliary carrier measure term [9] 6
  • 7. Strategy to get MaxEnt Upper Bounds (MEUBs) Rationale: Any other distribution with density p (x) different from the MaxEnt distribution p(x) and satisfying all the D moment constraints E[ti (X)] = ηi have smaller entropy: H(p (x)) ≤ H(p(x)) with p(x) = p(x; θ). Receipe for building MaxEnt Upper Bounds on arbitrary density q(x): Choose sufficient statistics ti (x) so that the differential entropy H(p(x; η)) of the induced maxent distribution p(x; θ) is in closed-form (or can be unbounded easily) Compute the moment coordinates ηi = Eq[ti (x)], and deduce that H(q(x)) ≤ H(p(x; η)) 7
  • 8. Absolute Monomial Exponential Family pl (x; θ) = exp θ|x|l − Fl (θ) , x ∈ R (4) for θ < 0. Exponential family (t(x) = |x|l ) with log-normalizer: Fl (θ) = log 2 + log Γ 1 l − log l − 1 l log(−θ), (5) Γ(u) = ∞ 0 xu−1 exp(−x)dx generalizes the factorial: Γ(n) = (n − 1)! for n ∈ N 8
  • 9. Differential entropy of AMEFs The entropy expressed using the θ-parameter is: Hl (θ) = log 2 + log Γ 1 l − log l + 1 l (1 − log(−θ)), = al − 1 l log(−θ), (6) where al = log 2 + log Γ 1 l − log l + 1 l . The entropy expressed using the η-parameter is: Hl (η) = log 2 + log Γ 1 l − log l + 1 l (1 + log l + log η), = bl + 1 l log η, (7) with bl = log 2Γ( 1 l )(el) 1 l l . 9
  • 10. A series of MaxEnt Upper Bounds (MEUBs) For any continuous RV X, MaxEnt entropy Upper Bound (MEUB) Ul : H(X) ≤ Hη l EX |X|l Are all UBs useful? That is, can we build a RV X so that Ul+1 < Ul ? (Anwser is yes!) 10
  • 11. AMEF MEUBs for Gaussian Mixture Models Density of a mixture model with k components: m(x) = k c=1 wcpc(x) Gaussian distribution: pi (x) = p(x; µi , σi ) = 1 √ 2πσi exp − (x − µi )2 2σ2 i , µi = E[Xi ] ∈ R: mean parameter σi = E[(Xi − µi )2] > 0: standard deviation To upper bound H(X) ≤ Hη l EX |X|l , we need to compute the raw absolute geometric moments EX |X|l for a GMM. 11
  • 12. Raw absolute geometric moments of a GMM Technical part (integration by parts and solving recurrence) Al (X) =    k c=1 wc l 2 i=0 l 2i µl−2i c σ2i c 2i Γ( 1+2i 2 ) √ π = k c=1 wc l 2 i=0 l 2i µl−2i c σ2i c (2i − 1)!! for even l, k c=1 wc l i=0 n i µl−i c σi c Ii − µc σc − (−1)i Ii µc σc for odd l. , where n!! denotes the double factorial: n!! = n 2 −1 k=0 (n − 2k) = 2n+1 π Γ( n 2 + 1), and: Ii (a) = 1 √ 2π +∞ a x i exp − 1 2 x 2 dx, = 1 √ 2π a i−1 exp − 1 2 a 2 + (i − 1)Ii−2(a), with the terminal recursion cases: I0(a) = 1 − Φ(a) = 1 2 1 − erf a √ 2 = 1 2 erfc a √ 2 , I1(a) = 1 √ 2π exp − 1 2 a 2 . 12
  • 13. Laplacian MEUB for a GMM (l = 1) AMEF for l = 1 is the Laplacian distribution The differential entropy of a Gaussian mixture model X ∼ k c=1 wcp(x; µc, σc) is upper bounded by: H(X) ≤ U1(X) U1(X) = log  2e   k c=1 wc  µc 1 − 2Φ − µc σc + σc 2 π exp − 1 2 µc σc 2       . 13
  • 14. Gaussian MEUB for a GMM (l = 2) AMEF for l = 2 is the Gaussian distribution The differential entropy of a GMM X ∼ k c=1 wcp(x; µc, σc) is upper bounded by: H(X) ≤ U2(X) = 1 2 log 2πe k c=1 wc((µc − ¯µ)2 + σ2 c ) , with ¯µ = k c=1 wcµc. 14
  • 15. Vanilla approximation method: Monte-Carlo Estimate H(X) using Monte-Carlo (MC) stochastic integration: ˆHs(X) = − 1 s s i=1 log p(xi ), (8) where {x1, . . . , xs} is an iid set of variates sampled from X ∼ p(x). MC estimator ˆHs(X) is consistent: lim s→∞ ˆHs(X) = H(X) (convergence in probability) However, no deterministic bound, can be above or below true value. 15
  • 16. Experiments: Laplacian vs Gaussian MEUBs k = 2 to 10 for µi , σi ∼iid U(0, 1), averaged on 1000 trials. k Average error Percentage of times U1(X) < U2(X) 2 0.5401015778688498 32.7 3 2.7397146972652484 39.2 4 3.4333962273074774 47.9 5 0.9310683623797987 49.9 6 0.5902956910979954 52.1 7 0.7688142345093779 53.2 8 0.2982994538560814 53.8 9 0.1955843679792208 56.8 10 0.1797637053023196 59.9 Important to recenter the GMMs so that they have zero expectation (as AMEFs): This does not change the entropy. If not, the 30%+ rates fall significantly to less than 10%. 16
  • 17. Are all AMEF MEUBs useful for GMMs? For zero-centered GMMs, only Laplacian or Gaussian MEUB is useful, For arbitrary GMMs, each bound can be the tightest one (k = 2, with GMM mean 0 and two symmetric components with small standard deviation). 17
  • 18. Zero-centered GMMs U1(X) < U2(X) iff log 2e 2 π ¯σ1 ≤ log √ 2πeσ2. σ1 σ2 ≤ π 2 √ e ≈ 0.9527 σ1: arithmetic weighted mean, σ2 = k i=1 wi σ2 i : quadratic mean weighted quadratic mean dominates weighted arithmetic mean: σ1 σ2 ≤ 1. k = 1: σ > 2 √ e π (ie., σ > 1.0496) 18
  • 19. Zero-centered GMMs: Ul+2 < Ul? Geometric raw (even) moments coincide with the central (even) geometric moments H(X) ≤ Hη l (Al (X)) = bl + 1 l log zl + log ¯σl , EX [Xl ] = 2 l 2 Γ(1+l 2 ) √ π zl k i=1 wi σl i = Al (X). ¯σl : l-th power mean: ¯σl = k i=1 wi σl i 1 l ¯σl+2 ¯σl ≥ 1 ⇒ log ¯σl+2 ¯σl ≥ 0 → not possible (see arXiv). 19
  • 20. Arbitrary GMMs: Consider 2-component GMM m(x) = 1 2 p(x; −1 2, 10−5) + 1 2 p(x; 1 2 , 10−5) H (MC):-9.400517405407735 1 MEUB:0.999999999958284 2 MEUB:0.7257913528258293 3 MEUB:0.5863457882025702 4 MEUB:0.4983017544470345 5 MEUB:0.43651349327316713 6 MEUB:0.390267211711506 7 MEUB:0.35410343073850886 8 MEUB:0.32490700997403515 9 MEUB:0.3007543998901125 10 MEUB:0.2803860698295638 11 MEUB:0.2629389102447494 12 MEUB:0.24779955106708096 13 MEUB:0.23451890956649502 14 MEUB:0.22275989562550735 15 MEUB:0.21226407836562905 16 MEUB:0.2028296978359989 17 MEUB:0.19429672922288133 18 MEUB:0.18653647716356042 19 MEUB:0.17944416377804479 20 MEUB:0.1729335449648154 21 MEUB:0.16693293142890442 22 MEUB:0.16138220185001972 23 MEUB:0.1562305292037145 24 MEUB:0.15143462788690765 25 MEUB:0.14695738668300817 26 MEUB:0.14276679134420478 27 MEUB:0.13883506718452443 28 MEUB:0.13513799065560295 29 MEUB:0.13165433203558718 30 MEUB:0.12836540080724268 31 MEUB:0.12525467216646413 20
  • 21. Contributions and conclusion Introduced the class of Absolute Monomial Exponential Families (AMEFs) with closed-form log-normalizer, Reported closed-form formulæ for the differential entropy of AMEFs, Calculated the exact non-centered absolute geometric moments for a Gaussian Mixture Model (GMMs), Apply MaxEnt Upper Bounds induced by AMEFs to GMMs: All upper bounds are potentially useful for non-centered GMMs (But for zero centered-GMMs, only the first two bounds are enough.) Recommend min(U1, U2) in applications! (not only U2) Reproducible research with code https://www.lix.polytechnique.fr/~nielsen/MEUB/ 21
  • 22. L.D. Brown. Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory. IMS Lecture Notes. Institute of Mathematical Statistics, 1986. Thomas M Cover and Joy A Thomas. Elements of information theory. John Wiley & Sons, 2012. Aapo Hyvarinen, Juha Karhunen, and Erkki Oja. Independent component analysis. Adaptive and learning systems for signal processing, communications, and control. John Wiley, New York, Chichester, Weinheim, 2001. Edwin T Jaynes. Information theory and statistical mechanics. Physical review, 106(4):620, 1957. Edwin T Jaynes. Information theory and statistical mechanics II. Physical review, 108(2):171, 1957. Joseph V Michalowicz, Jonathan M Nichols, and Frank Bucholtz. Calculation of differential entropy for a mixed Gaussian distribution. Entropy, 10(3):200–206, 2008. Joseph Victor Michalowicz, Jonathan M. Nichols, and Frank Bucholtz. Handbook of Differential Entropy. Chapman & Hall/CRC, 2013. Ali Mohammad-Djafari. A Matlab program to calculate the maximum entropy distributions. In Maximum Entropy and Bayesian Methods, pages 221–233. Springer, 1992. 22
  • 23. Frank Nielsen and Richard Nock. Entropies and cross-entropies of exponential families. In 2010 IEEE International Conference on Image Processing, pages 3621–3624. IEEE, 2010. Sumio Watanabe, Keisuke Yamazaki, and Miki Aoyagi. Kullback information of normal mixture is not an analytic function. Technical report of IEICE (in Japanese), pages 41–46, 2004. 23
  • 24. Differential entropy of a location-scale family Density of a location-scale distribution: p(x; µ, σ) = 1 σ p0(x−µ σ ) µ ∈ R: location parameter and σ > 0: dispersion parameter. Change of variable y = x−µ σ (with dy = dy σ ) in the integral to get: H(X) = +∞ x=−∞ − 1 σ p0 x − µ σ log 1 σ p0 x − µ σ dx, = +∞ y=−∞ −p0(y)(log p0(y) − log σ), = H(X0) + log σ. → always independent of the location parameter µ 24
  • 25. Non-central even geometric moments of a normal distribution Even l Al = E |X|l = E Xl = l 2 i=0 l 2i (2i − 1)!!µl−2i σ2i 2 µ2 + σ2 4 µ4 + 6µ2σ2 + 3σ4 6 µ6 + 15µ4σ2 + 45µ2σ4 + 15σ6 8 µ8 + 28µ6σ2 + 210µ4σ4 + 420µ2σ6 + 105σ8 10 µ10 + 45µ8σ2 + 630µ6σ4 + 3150µ4σ6 + 4725µ2σ8 + 945σ10 25
  • 26. Non-central odd geometric moments of a normal distribution Odd l Al = E |X|l = Cl (µ, σ) 2 π exp(− µ2 2σ2 ) + Dl (µ, σ)erf( µ√ 2σ ) 1 σ 2 π exp(− µ2 2σ2 ) + µerf( µ√ 2σ ) 3 (2σ3 + µ2 σ) 2 π exp(− µ2 2σ2 ) + (µ3 + 3µσ2 )erf( µ√ 2σ ) 5 (8σ5 + 9µ2 σ3 + µ4 σ) 2 π exp(− µ2 2σ2 ) + (µ5 + 10µ3 σ2 + 15µσ4 )erf( µ√ 2σ ) 7 (48σ7 + 87µ2 σ5 + 20µ4 σ3 + µ6 σ) 2 π exp(− µ2 2σ2 )+ (µ7 + 21µ5 σ2 + 105µ3 σ4 + 105µσ6 )erf( µ√ 2σ ) 9 (384σ9 + 975µ2 σ7 + 345µ4 σ5 + 35µ6 σ3 + µ8 σ) 2 π exp(− µ2 2σ2 )+ (µ9 + 36µ7 σ2 + 378µ5 σ4 + 1260µ3 σ6 + 945σ8 )erf( µ√ 2σ ) 26
  • 27. Maxima program assume (theta<0); F(theta) := log(integrate(exp(theta*abs(x)^5),x,-inf,inf)); integrate(exp(theta*abs(x)^5-F(theta)),x,-inf,inf); 27
  • 28. Maxima program /* Binomial expansion */ binomialExpansion(i,p,q) := if i = 1 then p+q else expand((p+q)*binomialExpansion(i-1,p,q)) ; /* The standard distribution (here, normal) */ p0(y) := exp(-y^2/2)/sqrt(2*pi); /* Even moment */ absEvenMoment(mu,sigma,l) := ratexpand(ratsimp(integrate(factor(expand(binomialExpansion(l,mu,y*sigma)))*p0(y),y,-inf,inf))); /* Odd moment */ absOddMoment(mu,sigma,l) := ratexpand(ratsimp(integrate(factor(expand(binomialExpansion(l,mu,y*sigma)))*p0(y),y,-mu/sigma,inf) -integrate(factor(expand(binomialExpansion(l,mu,y*sigma)))*p0(y),y,-inf,-mu/sigma))); /* General : Maxima does not give a closed-form formula because of the absolute value */ absMoment(mu,sigma,l) := ratexpand(ratsimp(integrate(abs(factor(expand(binomialExpansion(l,mu,y*sigma))))*p0(y),y,-inf,inf))); assume(sigma>0); assume(mu>0); /* maxima needs to branch condition */ absEvenMoment(mu,sigma,8); absOddMoment(mu,sigma,7); 28
  • 29. 29