prior selection for mixture estimation

Prior selection for mixture estimation
Christian P. Robert, U Paris-Dauphine & U Warwick
joint works with C. Grazian, K. Kamary, & J.E. Lee
JSM 2018, Vancouver

contents
introduction
weakly informative priors
less informative prior

contents
Grazian, C. & Robert, C.
(2018) Jeﬀreys priors for
mixture estimation:
Properties and alternatives,
Computational Statistics
& Data Analysis, 121,
149-163
Kamary, K., Lee, J.E.
& Robert, C. (2018)
Weakly informative
reparameterizations for
location-scale mixtures,
Journal of Computational
& Graphical Statistics.

coming soon
Handbook of
Mixture Analysis
HandbookofMixtureAnalysis
Edited by
Sylvia Frühwirth-Schnatter
Gilles Celeux
Christian P. Robert
Fruhwirth-Schnatter
Celeux
Robert
copy to come
Chapman & Hall/CRC
Handbooks of Modern
Statistical Methods
Chapman & Hall/CRC
Handbooks of Modern
Statistical Methods
STATISTICS
w w w . c r c p r e s s . c o m
CRC Press titles are available as eBook editions in a range of digital formats
K28989_cover.indd All Pages 7/30/18 10:15 AM

introduction
Structure of mixtures of distributions:
x ∼ fj with probability pj ,
for j = 1, 2, . . . , k, with overall density
p1f1(x) + · · · + pkfk(x) .
Usual case: parameterised components
k
i=1
pi f (x|θi )
n
i=1
pi = 1
where weights pi ’s are distinguished from other parameters
[Titterington et al., 1984; Fr¨uwirth-Schnatter, 2006]

License
Dataset derived from [my] license plate image
Grey levels concentrated on 256 values [later jittered]
[Marin & X, 2007]

Bayesian modelling
For any prior π (θ, p), posterior distribution of (θ, p) available up
to a multiplicative constant
π(θ, p|x) ∝


n
i=1
k
j=1
pj f (xi |θj )

 π (θ, p)
at a [low] cost of order O(kn)

Outline
introduction
less informative prior

possible symmetric empirical Bayes priors
p ∼ D(γ, . . . , γ), θi ∼ N(ˆµ, ˆωσ2
i ), σ−2
i ∼ Ga(ν, ˆν)
which can be replaced with hierarchical priors
[Diebolt & X, 1990; Richardson & Green, 1997]
independent improper priors on θj ’s prohibited, thus standard
“ﬂat” and Jeﬀreys priors impossible to use (except with the
exclude-empty-component trick)
[Diebolt & X, 1990; Wasserman, 1999]

reparameterization to compact set for use of uniform priors
µi −→
eµi
1 + eµi
, σi −→
σi
1 + σi
[Chopin, 2000]
dependent weakly informative priors
p ∼ D(k, . . . , 1), θi ∼ N(θi−1, ζσ2
i−1), σi ∼ U([0, σi−1])
[Mengersen & X, 1996; X & Titterington, 1998]
reference priors
p ∼ D(1, . . . , 1), θi ∼ N(µ0, (σ2
i +τ2
0 )/2), σ2
i ∼ C+
(0, τ2
0 )
[Moreno & Liseo, 1999]

re-ban on improper priors
Delicate use of improper priors in the setting of mixtures because
independent improper priors,
π (θ) =
k
i=1
πi (θi ) , with πi (θi )dθi = ∞
end up, for any n, with the [bad] property
π(θ, p|x)dθdp = ∞
Reason
There are (k − 1)n terms among the kn terms in the expansion
that allocate no observation at all to the i-th component

saturated priors
When number of components k not determined a priori, inferred
by hierarchical prior with distribution on k, or by saturating model
with upper bound k∗ and informative prior on
(p1, . . . , pk, θ1, . . . , θG ) in
f (y|η, ψ) =
G
g=1
ηg f (y|θg , ψ)
with k∗ − k empty components, i.e.
pg = p∗
g , θg = θ∗
g , g ≤ k, pk+1 = . . . = pk∗ = 0
or by merging the extra components to existing ones, for instance:
pg = p∗
g , θg = θ∗
g , g ≤ k, θk+1 = . . . = θk∗ = θ∗
k
[X, 1994; Fr¨uwirth-Schnatter, 2006; Rousseau & Mengersen, 2011]

saturated priors
Validation of prior that favours one of two conﬁgurations: either
emptying or merging the extra components
Dirichlet prior
(p1, . . . , pk∗ ) ∼ D(e1, . . . , ek∗ )
with
max
g
eg < dim(θg )/2
Alternative: repulsive prior on (θ1, . . . , θG )
[Petralia et al., 2016]

Jeffreys priors for mixtures
True Jeffreys prior for mixtures of distributions defined as
π(θ) ∝ Eθ
T
log f (X|θ)
1/2
O(k) matrix
unavailable in closed form except special cases
[Rubio & Steel, 2014]
unidimensional integrals need be approximated by Monte
Carlo or numerical tools
[Grazian & X, 2018]

Difficulties
complexity grows in O(k2)
significant computing requirement (reduced by delayed
acceptance)
[Banterle et al., 2014]
differ from component-wise (completed likelihood) Jeffreys
[Diebolt & X, 1990; Stoneking, 2014]
when is the posterior proper?
how to check properness via MCMC outputs?
[Grazian & X, 2018]

proper exceptions
Jeﬀreys priors most often lead to improper posteriors for
scale-location mixture models
Exceptions
weights sole unknowns (Bernardo & Gir`on, 1988)
location parameters sole unknowns for two mixture
components (Grazian & X, 2018)
Two-piece location-scale model under some conditions (Rubio
& Steel, 2014)

improprieties
When parameters of location-scale mixture are unknown,
Jeﬀreys prior improper, constant in µ and powered as τ−d/2,
where d is the total number of unknown parameters of the
components (i.e. excluding the weights)
When k > 2, posterior distribution derived from Jeﬀreys prior
improper when only the location parameters are unknown.
[Grazian & X, 2018]

hierarchical diﬀusion of propriety
Gaussian mixture model
g(x|θ) =
k
=1
pi N(x|µ , σ )
second level of the hierarchical model
µ
iid
∼ N(µ0, ζ0)
σ
iid
∼
1
2
U(0, ζ0) +
1
2
1
U(0, ζ0)
p|µ, σ ∼ πJ
(p|µ, σ)
third level of the hierarchical model
π(µ0, ζ0) ∝
1
ζ0
.
remains proper
[Grazian & X, 2018]

reparametrization
Express each component as local perturbation of previous one
[Mengersen and X, 1996]
Example Mixture of two Normals
pN(µ1, τ2
1 ) + (1 − p)N(µ1 + τ1µ2, τ2
1 τ2
2 )
Weakly informative but proper prior for this reparametrization
[X & Titterington, 1998]

reparametrization via moments
Instead of location-scale parameterisation, use of moments like
µ = E[X], σ2 = Var[X] for Gaussian mixtures
µ = E[X] for Poisson mixtures

reparametrization of Gaussian mixtures
Gaussian mixture reparametrized via global mean µ and global
standard deviation σ
fk =
k
i=1
pi N(µi , σi ) =
k
i=1
pi N µ +
σγi
√
pi
,
σηi
√
pi
component-wise location and scale parameters
(η1, . . . , ηk, γ1, . . . , γk) constrained by
k
i=1
√
pi γi = 0 and
k
i=1
(η2
i + γ2
i ) = 1
Compact domain on η’s and γ’s
0 ≤ ηi ≤ 1 , 0 ≤ γ2
i ≤ 1 , ∀i

valid reparametrization
Theorem
Posterior distribution associated with Jeﬀreys prior π(µ, σ) = 1/σ
proper when components are Gaussian densities, provided (a)
proper distributions on the other parameters and (b) at least two
observations in the sample
Jeﬀrey like prior for the global parameters
Uniform or non-informative proper priors on γ’s and η’s.
[Kamary, Lee, & X, 2018]

spherical reparameterisation
Introducing k
i=1 γ2
i = ϕ2, constraints rewritten as
k
i=1
√
pi γi = 0 and
k
i=1
(η2
i + γ2
i ) =
k
i=1
η2
i + ϕ2
= 1
(η1, . . . , ηk): points on hypersphere of radius 1 − ϕ2.
ηi =



1 − ϕ2 cos(ξi ) , i = 1
1 − ϕ2 i−1
j=1 sin(ξj ) cos(ξi ) , 1 < i < k
1 − ϕ2 i−1
j=1 sin(ξj ) , i = k
where (ξ1, · · · , ξk−1) ∈ [0, π/2]k−1

spherical reparameterisation
Introducing k
i=1 γ2
i = ϕ2, constraints rewritten as
k
i=1
√
pi γi = 0 and
k
i=1
(η2
i + γ2
i ) =
k
i=1
η2
i + ϕ2
= 1
(γ1, . . . , γk) : points on intersection of hypersphere of radius
ϕ and hyperplane orthogonal to (
√
p1, . . . ,
√
pk).
γ = ϕ cos( 1) 1+ϕ sin( 1) cos( 2) 2+. . .+ϕ sin( 1) · · · sin( k−2) k−1
where ( 1, . . . , k−3) ∈ [0, π]k−3 and k−2 ∈ [0, 2π].
1, . . . , k−1 orthonormal basis of hyperplane

CT scan image
sheep: 34% of fat, 59% of muscle, 7% of bone
ˆf = 0.16N(68.7, 17.42) + 0.17N(89.9, 9.42) + 0.25N(121.1, 13.72) +
0.34N(134.6, 4.62) + 0.03N(196.3, 21.892) + 0.04N(243.9, 22)

Reparametrization of Poisson mixture
With the global parameter λ = E[X],
f (x|λ1, . . . , λk) =
1
x!
k
i=1
pi λx
i e−λi
=
1
x!
k
i=1
pi (λγi /pi )x
e−λγi /pi
Component-wise parameters (γi ’s and pi ’s) are constrained by
k
i=1
γi = 1 and
k
i=1
pi = 1 .
Theorem
Posterior distribution associated with Jeﬀreys prior π(λ) = 1/λ
proper when the components are Poisson, provided (a) proper
distributions used on other parameters and (b) at least one strictly
positive observation in the sample.

extensions
mixtures of compound Gaussian distributions, i.e., scale
mixtures
X = µ + σξZ Z ∼ N(0, 1) ξ ∼ h(ξ)
includes T, double exponential, logistic, and α-stable
distributions
mixtures of compound Poisson distributions, when Poisson
parameter is random with mean λ:
P(X = x|λ) =
1
x!
(λξ)x
exp{−λξ} dν(ξ)
covers all inﬁnitely exchangeable distributions

extensions
mixtures of compound Gaussian distributions, i.e., scale
mixtures
X = µ + σξZ Z ∼ N(0, 1) ξ ∼ h(ξ)
includes T, double exponential, logistic, and α-stable
distributions mixtures of compound exponential distributions,
when parameter is random with mean λ:
f (x|λ) =
1
λξ
exp{−x/λξ} dν(ξ) , x > 0 .
all Gamma distributions with shape less than one and Pareto
distributions

prior selection for mixture estimation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to prior selection for mixture estimation

Similar to prior selection for mixture estimation (20)

More from Christian Robert

More from Christian Robert (14)

Recently uploaded

Recently uploaded (20)

prior selection for mixture estimation