SlideShare a Scribd company logo
Prior selection for mixture estimation
Christian P. Robert, U Paris-Dauphine & U Warwick
joint works with C. Grazian, K. Kamary, & J.E. Lee
JSM 2018, Vancouver
contents
introduction
weakly informative priors
less informative prior
contents
Grazian, C. & Robert, C.
(2018) Jeffreys priors for
mixture estimation:
Properties and alternatives,
Computational Statistics
& Data Analysis, 121,
149-163
Kamary, K., Lee, J.E.
& Robert, C. (2018)
Weakly informative
reparameterizations for
location-scale mixtures,
Journal of Computational
& Graphical Statistics.
coming soon
Handbook of
Mixture Analysis
HandbookofMixtureAnalysis
Edited by
Sylvia Frühwirth-Schnatter
Gilles Celeux
Christian P. Robert
Fruhwirth-Schnatter
Celeux
Robert
copy to come
Chapman & Hall/CRC
Handbooks of Modern
Statistical Methods
Chapman & Hall/CRC
Handbooks of Modern
Statistical Methods
STATISTICS
w w w . c r c p r e s s . c o m
CRC Press titles are available as eBook editions in a range of digital formats
K28989_cover.indd All Pages 7/30/18 10:15 AM
introduction
Structure of mixtures of distributions:
x ∼ fj with probability pj ,
for j = 1, 2, . . . , k, with overall density
p1f1(x) + · · · + pkfk(x) .
Usual case: parameterised components
k
i=1
pi f (x|θi )
n
i=1
pi = 1
where weights pi ’s are distinguished from other parameters
[Titterington et al., 1984; Fr¨uwirth-Schnatter, 2006]
introduction
Structure of mixtures of distributions:
x ∼ fj with probability pj ,
for j = 1, 2, . . . , k, with overall density
p1f1(x) + · · · + pkfk(x) .
Usual case: parameterised components
k
i=1
pi f (x|θi )
n
i=1
pi = 1
where weights pi ’s are distinguished from other parameters
[Titterington et al., 1984; Fr¨uwirth-Schnatter, 2006]
License
Dataset derived from [my] license plate image
Grey levels concentrated on 256 values [later jittered]
[Marin & X, 2007]
Bayesian modelling
For any prior π (θ, p), posterior distribution of (θ, p) available up
to a multiplicative constant
π(θ, p|x) ∝


n
i=1
k
j=1
pj f (xi |θj )

 π (θ, p)
at a [low] cost of order O(kn)
almost conjugate priors
For mixtures of exponential families, conjugate priors for completed
likelihood
pi(θ, z|x) ∝
n
i=1
f (xi |θzi )pzi ×
k
j=1
π(θj |α) × π(p|β)
∝
k
j=1 i;zi =j
f (xi |θj )π(θj |α) ×
k
j=1
p
nj
j × π(p|β)
[Diebolt & X, 1990]
Outline
introduction
weakly informative priors
less informative prior
weakly informative priors
possible symmetric empirical Bayes priors
p ∼ D(γ, . . . , γ), θi ∼ N(ˆµ, ˆωσ2
i ), σ−2
i ∼ Ga(ν, ˆν)
which can be replaced with hierarchical priors
[Diebolt & X, 1990; Richardson & Green, 1997]
independent improper priors on θj ’s prohibited, thus standard
“flat” and Jeffreys priors impossible to use (except with the
exclude-empty-component trick)
[Diebolt & X, 1990; Wasserman, 1999]
weakly informative priors
reparameterization to compact set for use of uniform priors
µi −→
eµi
1 + eµi
, σi −→
σi
1 + σi
[Chopin, 2000]
dependent weakly informative priors
p ∼ D(k, . . . , 1), θi ∼ N(θi−1, ζσ2
i−1), σi ∼ U([0, σi−1])
[Mengersen & X, 1996; X & Titterington, 1998]
reference priors
p ∼ D(1, . . . , 1), θi ∼ N(µ0, (σ2
i +τ2
0 )/2), σ2
i ∼ C+
(0, τ2
0 )
[Moreno & Liseo, 1999]
re-ban on improper priors
Delicate use of improper priors in the setting of mixtures because
independent improper priors,
π (θ) =
k
i=1
πi (θi ) , with πi (θi )dθi = ∞
end up, for any n, with the [bad] property
π(θ, p|x)dθdp = ∞
Reason
There are (k − 1)n terms among the kn terms in the expansion
that allocate no observation at all to the i-th component
re-ban on improper priors
Delicate use of improper priors in the setting of mixtures because
independent improper priors,
π (θ) =
k
i=1
πi (θi ) , with πi (θi )dθi = ∞
end up, for any n, with the [bad] property
π(θ, p|x)dθdp = ∞
Reason
There are (k − 1)n terms among the kn terms in the expansion
that allocate no observation at all to the i-th component
saturated priors
When number of components k not determined a priori, inferred
by hierarchical prior with distribution on k, or by saturating model
with upper bound k∗ and informative prior on
(p1, . . . , pk, θ1, . . . , θG ) in
f (y|η, ψ) =
G
g=1
ηg f (y|θg , ψ)
with k∗ − k empty components, i.e.
pg = p∗
g , θg = θ∗
g , g ≤ k, pk+1 = . . . = pk∗ = 0
or by merging the extra components to existing ones, for instance:
pg = p∗
g , θg = θ∗
g , g ≤ k, θk+1 = . . . = θk∗ = θ∗
k
[X, 1994; Fr¨uwirth-Schnatter, 2006; Rousseau & Mengersen, 2011]
saturated priors
Validation of prior that favours one of two configurations: either
emptying or merging the extra components
Dirichlet prior
(p1, . . . , pk∗ ) ∼ D(e1, . . . , ek∗ )
with
max
g
eg < dim(θg )/2
Alternative: repulsive prior on (θ1, . . . , θG )
[Petralia et al., 2016]
saturated priors
Validation of prior that favours one of two configurations: either
emptying or merging the extra components
Dirichlet prior
(p1, . . . , pk∗ ) ∼ D(e1, . . . , ek∗ )
with
max
g
eg < dim(θg )/2
Alternative: repulsive prior on (θ1, . . . , θG )
[Petralia et al., 2016]
Outline
introduction
weakly informative priors
less informative prior
Jeffreys priors for mixtures
True Jeffreys prior for mixtures of distributions defined as
π(θ) ∝ Eθ
T
log f (X|θ)
1/2
O(k) matrix
unavailable in closed form except special cases
[Rubio & Steel, 2014]
unidimensional integrals need be approximated by Monte
Carlo or numerical tools
[Grazian & X, 2018]
Difficulties
complexity grows in O(k2)
significant computing requirement (reduced by delayed
acceptance)
[Banterle et al., 2014]
differ from component-wise (completed likelihood) Jeffreys
[Diebolt & X, 1990; Stoneking, 2014]
when is the posterior proper?
how to check properness via MCMC outputs?
[Grazian & X, 2018]
proper exceptions
Jeffreys priors most often lead to improper posteriors for
scale-location mixture models
Exceptions
weights sole unknowns (Bernardo & Gir`on, 1988)
location parameters sole unknowns for two mixture
components (Grazian & X, 2018)
Two-piece location-scale model under some conditions (Rubio
& Steel, 2014)
improprieties
When parameters of location-scale mixture are unknown,
Jeffreys prior improper, constant in µ and powered as τ−d/2,
where d is the total number of unknown parameters of the
components (i.e. excluding the weights)
When k > 2, posterior distribution derived from Jeffreys prior
improper when only the location parameters are unknown.
[Grazian & X, 2018]
hierarchical diffusion of propriety
Gaussian mixture model
g(x|θ) =
k
=1
pi N(x|µ , σ )
second level of the hierarchical model
µ
iid
∼ N(µ0, ζ0)
σ
iid
∼
1
2
U(0, ζ0) +
1
2
1
U(0, ζ0)
p|µ, σ ∼ πJ
(p|µ, σ)
third level of the hierarchical model
π(µ0, ζ0) ∝
1
ζ0
.
remains proper
[Grazian & X, 2018]
hierarchical diffusion of propriety
Gaussian mixture model
g(x|θ) =
k
=1
pi N(x|µ , σ )
second level of the hierarchical model
µ
iid
∼ N(µ0, ζ0)
σ
iid
∼
1
2
U(0, ζ0) +
1
2
1
U(0, ζ0)
p|µ, σ ∼ πJ
(p|µ, σ)
third level of the hierarchical model
π(µ0, ζ0) ∝
1
ζ0
.
remains proper
[Grazian & X, 2018]
hierarchical diffusion of propriety
Gaussian mixture model
g(x|θ) =
k
=1
pi N(x|µ , σ )
second level of the hierarchical model
µ
iid
∼ N(µ0, ζ0)
σ
iid
∼
1
2
U(0, ζ0) +
1
2
1
U(0, ζ0)
p|µ, σ ∼ πJ
(p|µ, σ)
third level of the hierarchical model
π(µ0, ζ0) ∝
1
ζ0
.
remains proper
[Grazian & X, 2018]
reparametrization
Express each component as local perturbation of previous one
[Mengersen and X, 1996]
Example Mixture of two Normals
pN(µ1, τ2
1 ) + (1 − p)N(µ1 + τ1µ2, τ2
1 τ2
2 )
Weakly informative but proper prior for this reparametrization
[X & Titterington, 1998]
reparametrization via moments
Instead of location-scale parameterisation, use of moments like
µ = E[X], σ2 = Var[X] for Gaussian mixtures
µ = E[X] for Poisson mixtures
reparametrization of Gaussian mixtures
Gaussian mixture reparametrized via global mean µ and global
standard deviation σ
fk =
k
i=1
pi N(µi , σi ) =
k
i=1
pi N µ +
σγi
√
pi
,
σηi
√
pi
component-wise location and scale parameters
(η1, . . . , ηk, γ1, . . . , γk) constrained by
k
i=1
√
pi γi = 0 and
k
i=1
(η2
i + γ2
i ) = 1
Compact domain on η’s and γ’s
0 ≤ ηi ≤ 1 , 0 ≤ γ2
i ≤ 1 , ∀i
reparametrization of Gaussian mixtures
Gaussian mixture reparametrized via global mean µ and global
standard deviation σ
fk =
k
i=1
pi N(µi , σi ) =
k
i=1
pi N µ +
σγi
√
pi
,
σηi
√
pi
component-wise location and scale parameters
(η1, . . . , ηk, γ1, . . . , γk) constrained by
k
i=1
√
pi γi = 0 and
k
i=1
(η2
i + γ2
i ) = 1
Compact domain on η’s and γ’s
0 ≤ ηi ≤ 1 , 0 ≤ γ2
i ≤ 1 , ∀i
valid reparametrization
Theorem
Posterior distribution associated with Jeffreys prior π(µ, σ) = 1/σ
proper when components are Gaussian densities, provided (a)
proper distributions on the other parameters and (b) at least two
observations in the sample
Jeffrey like prior for the global parameters
Uniform or non-informative proper priors on γ’s and η’s.
[Kamary, Lee, & X, 2018]
spherical reparameterisation
Introducing k
i=1 γ2
i = ϕ2, constraints rewritten as
k
i=1
√
pi γi = 0 and
k
i=1
(η2
i + γ2
i ) =
k
i=1
η2
i + ϕ2
= 1
(η1, . . . , ηk): points on hypersphere of radius 1 − ϕ2.
ηi =



1 − ϕ2 cos(ξi ) , i = 1
1 − ϕ2 i−1
j=1 sin(ξj ) cos(ξi ) , 1 < i < k
1 − ϕ2 i−1
j=1 sin(ξj ) , i = k
where (ξ1, · · · , ξk−1) ∈ [0, π/2]k−1
spherical reparameterisation
Introducing k
i=1 γ2
i = ϕ2, constraints rewritten as
k
i=1
√
pi γi = 0 and
k
i=1
(η2
i + γ2
i ) =
k
i=1
η2
i + ϕ2
= 1
(γ1, . . . , γk) : points on intersection of hypersphere of radius
ϕ and hyperplane orthogonal to (
√
p1, . . . ,
√
pk).
γ = ϕ cos( 1) 1+ϕ sin( 1) cos( 2) 2+. . .+ϕ sin( 1) · · · sin( k−2) k−1
where ( 1, . . . , k−3) ∈ [0, π]k−3 and k−2 ∈ [0, 2π].
1, . . . , k−1 orthonormal basis of hyperplane
CT scan image
sheep: 34% of fat, 59% of muscle, 7% of bone
ˆf = 0.16N(68.7, 17.42) + 0.17N(89.9, 9.42) + 0.25N(121.1, 13.72) +
0.34N(134.6, 4.62) + 0.03N(196.3, 21.892) + 0.04N(243.9, 22)
Reparametrization of Poisson mixture
With the global parameter λ = E[X],
f (x|λ1, . . . , λk) =
1
x!
k
i=1
pi λx
i e−λi
=
1
x!
k
i=1
pi (λγi /pi )x
e−λγi /pi
Component-wise parameters (γi ’s and pi ’s) are constrained by
k
i=1
γi = 1 and
k
i=1
pi = 1 .
Theorem
Posterior distribution associated with Jeffreys prior π(λ) = 1/λ
proper when the components are Poisson, provided (a) proper
distributions used on other parameters and (b) at least one strictly
positive observation in the sample.
extensions
mixtures of compound Gaussian distributions, i.e., scale
mixtures
X = µ + σξZ Z ∼ N(0, 1) ξ ∼ h(ξ)
includes T, double exponential, logistic, and α-stable
distributions
mixtures of compound Poisson distributions, when Poisson
parameter is random with mean λ:
P(X = x|λ) =
1
x!
(λξ)x
exp{−λξ} dν(ξ)
covers all infinitely exchangeable distributions
extensions
mixtures of compound Gaussian distributions, i.e., scale
mixtures
X = µ + σξZ Z ∼ N(0, 1) ξ ∼ h(ξ)
includes T, double exponential, logistic, and α-stable
distributions mixtures of compound exponential distributions,
when parameter is random with mean λ:
f (x|λ) =
1
λξ
exp{−x/λξ} dν(ξ) , x > 0 .
all Gamma distributions with shape less than one and Pareto
distributions

More Related Content

What's hot

Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
Christian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
Caleb (Shiqiang) Jin
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
Stefano Cabras
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
Christian Robert
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
Christian Robert
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
Christian Robert
 
Nested sampling
Nested samplingNested sampling
Nested sampling
Christian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
Christian Robert
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 

What's hot (20)

Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

Similar to prior selection for mixture estimation

Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
Christian Robert
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
Christian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
Christian Robert
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
Frank Nielsen
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtureszukun
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract means
Frank Nielsen
 
Non-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesNon-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixtures
Christian Robert
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
Frank Nielsen
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometryFrank Nielsen
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
Christian Robert
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
Frank Nielsen
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
Equivariance
EquivarianceEquivariance
Equivariance
mustafa sarac
 
Numerical Evidence for Darmon Points
Numerical Evidence for Darmon PointsNumerical Evidence for Darmon Points
Numerical Evidence for Darmon Points
mmasdeu
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
Arthur Charpentier
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
Frank Nielsen
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
Frank Nielsen
 

Similar to prior selection for mixture estimation (20)

Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract means
 
Non-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesNon-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixtures
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometry
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
Equivariance
EquivarianceEquivariance
Equivariance
 
Numerical Evidence for Darmon Points
Numerical Evidence for Darmon PointsNumerical Evidence for Darmon Points
Numerical Evidence for Darmon Points
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Igv2008
Igv2008Igv2008
Igv2008
 

More from Christian Robert

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
Christian Robert
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
Christian Robert
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
Christian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
Christian Robert
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
Christian Robert
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
Christian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
Christian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
Christian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
Christian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
Christian Robert
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
Christian Robert
 

More from Christian Robert (14)

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 

Recently uploaded

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
azzyixes
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 

Recently uploaded (20)

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 

prior selection for mixture estimation

  • 1. Prior selection for mixture estimation Christian P. Robert, U Paris-Dauphine & U Warwick joint works with C. Grazian, K. Kamary, & J.E. Lee JSM 2018, Vancouver
  • 3. contents Grazian, C. & Robert, C. (2018) Jeffreys priors for mixture estimation: Properties and alternatives, Computational Statistics & Data Analysis, 121, 149-163 Kamary, K., Lee, J.E. & Robert, C. (2018) Weakly informative reparameterizations for location-scale mixtures, Journal of Computational & Graphical Statistics.
  • 4. coming soon Handbook of Mixture Analysis HandbookofMixtureAnalysis Edited by Sylvia Frühwirth-Schnatter Gilles Celeux Christian P. Robert Fruhwirth-Schnatter Celeux Robert copy to come Chapman & Hall/CRC Handbooks of Modern Statistical Methods Chapman & Hall/CRC Handbooks of Modern Statistical Methods STATISTICS w w w . c r c p r e s s . c o m CRC Press titles are available as eBook editions in a range of digital formats K28989_cover.indd All Pages 7/30/18 10:15 AM
  • 5. introduction Structure of mixtures of distributions: x ∼ fj with probability pj , for j = 1, 2, . . . , k, with overall density p1f1(x) + · · · + pkfk(x) . Usual case: parameterised components k i=1 pi f (x|θi ) n i=1 pi = 1 where weights pi ’s are distinguished from other parameters [Titterington et al., 1984; Fr¨uwirth-Schnatter, 2006]
  • 6. introduction Structure of mixtures of distributions: x ∼ fj with probability pj , for j = 1, 2, . . . , k, with overall density p1f1(x) + · · · + pkfk(x) . Usual case: parameterised components k i=1 pi f (x|θi ) n i=1 pi = 1 where weights pi ’s are distinguished from other parameters [Titterington et al., 1984; Fr¨uwirth-Schnatter, 2006]
  • 7. License Dataset derived from [my] license plate image Grey levels concentrated on 256 values [later jittered] [Marin & X, 2007]
  • 8. Bayesian modelling For any prior π (θ, p), posterior distribution of (θ, p) available up to a multiplicative constant π(θ, p|x) ∝   n i=1 k j=1 pj f (xi |θj )   π (θ, p) at a [low] cost of order O(kn)
  • 9. almost conjugate priors For mixtures of exponential families, conjugate priors for completed likelihood pi(θ, z|x) ∝ n i=1 f (xi |θzi )pzi × k j=1 π(θj |α) × π(p|β) ∝ k j=1 i;zi =j f (xi |θj )π(θj |α) × k j=1 p nj j × π(p|β) [Diebolt & X, 1990]
  • 11. weakly informative priors possible symmetric empirical Bayes priors p ∼ D(γ, . . . , γ), θi ∼ N(ˆµ, ˆωσ2 i ), σ−2 i ∼ Ga(ν, ˆν) which can be replaced with hierarchical priors [Diebolt & X, 1990; Richardson & Green, 1997] independent improper priors on θj ’s prohibited, thus standard “flat” and Jeffreys priors impossible to use (except with the exclude-empty-component trick) [Diebolt & X, 1990; Wasserman, 1999]
  • 12. weakly informative priors reparameterization to compact set for use of uniform priors µi −→ eµi 1 + eµi , σi −→ σi 1 + σi [Chopin, 2000] dependent weakly informative priors p ∼ D(k, . . . , 1), θi ∼ N(θi−1, ζσ2 i−1), σi ∼ U([0, σi−1]) [Mengersen & X, 1996; X & Titterington, 1998] reference priors p ∼ D(1, . . . , 1), θi ∼ N(µ0, (σ2 i +τ2 0 )/2), σ2 i ∼ C+ (0, τ2 0 ) [Moreno & Liseo, 1999]
  • 13. re-ban on improper priors Delicate use of improper priors in the setting of mixtures because independent improper priors, π (θ) = k i=1 πi (θi ) , with πi (θi )dθi = ∞ end up, for any n, with the [bad] property π(θ, p|x)dθdp = ∞ Reason There are (k − 1)n terms among the kn terms in the expansion that allocate no observation at all to the i-th component
  • 14. re-ban on improper priors Delicate use of improper priors in the setting of mixtures because independent improper priors, π (θ) = k i=1 πi (θi ) , with πi (θi )dθi = ∞ end up, for any n, with the [bad] property π(θ, p|x)dθdp = ∞ Reason There are (k − 1)n terms among the kn terms in the expansion that allocate no observation at all to the i-th component
  • 15. saturated priors When number of components k not determined a priori, inferred by hierarchical prior with distribution on k, or by saturating model with upper bound k∗ and informative prior on (p1, . . . , pk, θ1, . . . , θG ) in f (y|η, ψ) = G g=1 ηg f (y|θg , ψ) with k∗ − k empty components, i.e. pg = p∗ g , θg = θ∗ g , g ≤ k, pk+1 = . . . = pk∗ = 0 or by merging the extra components to existing ones, for instance: pg = p∗ g , θg = θ∗ g , g ≤ k, θk+1 = . . . = θk∗ = θ∗ k [X, 1994; Fr¨uwirth-Schnatter, 2006; Rousseau & Mengersen, 2011]
  • 16. saturated priors Validation of prior that favours one of two configurations: either emptying or merging the extra components Dirichlet prior (p1, . . . , pk∗ ) ∼ D(e1, . . . , ek∗ ) with max g eg < dim(θg )/2 Alternative: repulsive prior on (θ1, . . . , θG ) [Petralia et al., 2016]
  • 17. saturated priors Validation of prior that favours one of two configurations: either emptying or merging the extra components Dirichlet prior (p1, . . . , pk∗ ) ∼ D(e1, . . . , ek∗ ) with max g eg < dim(θg )/2 Alternative: repulsive prior on (θ1, . . . , θG ) [Petralia et al., 2016]
  • 19. Jeffreys priors for mixtures True Jeffreys prior for mixtures of distributions defined as π(θ) ∝ Eθ T log f (X|θ) 1/2 O(k) matrix unavailable in closed form except special cases [Rubio & Steel, 2014] unidimensional integrals need be approximated by Monte Carlo or numerical tools [Grazian & X, 2018]
  • 20. Difficulties complexity grows in O(k2) significant computing requirement (reduced by delayed acceptance) [Banterle et al., 2014] differ from component-wise (completed likelihood) Jeffreys [Diebolt & X, 1990; Stoneking, 2014] when is the posterior proper? how to check properness via MCMC outputs? [Grazian & X, 2018]
  • 21. proper exceptions Jeffreys priors most often lead to improper posteriors for scale-location mixture models Exceptions weights sole unknowns (Bernardo & Gir`on, 1988) location parameters sole unknowns for two mixture components (Grazian & X, 2018) Two-piece location-scale model under some conditions (Rubio & Steel, 2014)
  • 22. improprieties When parameters of location-scale mixture are unknown, Jeffreys prior improper, constant in µ and powered as τ−d/2, where d is the total number of unknown parameters of the components (i.e. excluding the weights) When k > 2, posterior distribution derived from Jeffreys prior improper when only the location parameters are unknown. [Grazian & X, 2018]
  • 23. hierarchical diffusion of propriety Gaussian mixture model g(x|θ) = k =1 pi N(x|µ , σ ) second level of the hierarchical model µ iid ∼ N(µ0, ζ0) σ iid ∼ 1 2 U(0, ζ0) + 1 2 1 U(0, ζ0) p|µ, σ ∼ πJ (p|µ, σ) third level of the hierarchical model π(µ0, ζ0) ∝ 1 ζ0 . remains proper [Grazian & X, 2018]
  • 24. hierarchical diffusion of propriety Gaussian mixture model g(x|θ) = k =1 pi N(x|µ , σ ) second level of the hierarchical model µ iid ∼ N(µ0, ζ0) σ iid ∼ 1 2 U(0, ζ0) + 1 2 1 U(0, ζ0) p|µ, σ ∼ πJ (p|µ, σ) third level of the hierarchical model π(µ0, ζ0) ∝ 1 ζ0 . remains proper [Grazian & X, 2018]
  • 25. hierarchical diffusion of propriety Gaussian mixture model g(x|θ) = k =1 pi N(x|µ , σ ) second level of the hierarchical model µ iid ∼ N(µ0, ζ0) σ iid ∼ 1 2 U(0, ζ0) + 1 2 1 U(0, ζ0) p|µ, σ ∼ πJ (p|µ, σ) third level of the hierarchical model π(µ0, ζ0) ∝ 1 ζ0 . remains proper [Grazian & X, 2018]
  • 26. reparametrization Express each component as local perturbation of previous one [Mengersen and X, 1996] Example Mixture of two Normals pN(µ1, τ2 1 ) + (1 − p)N(µ1 + τ1µ2, τ2 1 τ2 2 ) Weakly informative but proper prior for this reparametrization [X & Titterington, 1998]
  • 27. reparametrization via moments Instead of location-scale parameterisation, use of moments like µ = E[X], σ2 = Var[X] for Gaussian mixtures µ = E[X] for Poisson mixtures
  • 28. reparametrization of Gaussian mixtures Gaussian mixture reparametrized via global mean µ and global standard deviation σ fk = k i=1 pi N(µi , σi ) = k i=1 pi N µ + σγi √ pi , σηi √ pi component-wise location and scale parameters (η1, . . . , ηk, γ1, . . . , γk) constrained by k i=1 √ pi γi = 0 and k i=1 (η2 i + γ2 i ) = 1 Compact domain on η’s and γ’s 0 ≤ ηi ≤ 1 , 0 ≤ γ2 i ≤ 1 , ∀i
  • 29. reparametrization of Gaussian mixtures Gaussian mixture reparametrized via global mean µ and global standard deviation σ fk = k i=1 pi N(µi , σi ) = k i=1 pi N µ + σγi √ pi , σηi √ pi component-wise location and scale parameters (η1, . . . , ηk, γ1, . . . , γk) constrained by k i=1 √ pi γi = 0 and k i=1 (η2 i + γ2 i ) = 1 Compact domain on η’s and γ’s 0 ≤ ηi ≤ 1 , 0 ≤ γ2 i ≤ 1 , ∀i
  • 30. valid reparametrization Theorem Posterior distribution associated with Jeffreys prior π(µ, σ) = 1/σ proper when components are Gaussian densities, provided (a) proper distributions on the other parameters and (b) at least two observations in the sample Jeffrey like prior for the global parameters Uniform or non-informative proper priors on γ’s and η’s. [Kamary, Lee, & X, 2018]
  • 31. spherical reparameterisation Introducing k i=1 γ2 i = ϕ2, constraints rewritten as k i=1 √ pi γi = 0 and k i=1 (η2 i + γ2 i ) = k i=1 η2 i + ϕ2 = 1 (η1, . . . , ηk): points on hypersphere of radius 1 − ϕ2. ηi =    1 − ϕ2 cos(ξi ) , i = 1 1 − ϕ2 i−1 j=1 sin(ξj ) cos(ξi ) , 1 < i < k 1 − ϕ2 i−1 j=1 sin(ξj ) , i = k where (ξ1, · · · , ξk−1) ∈ [0, π/2]k−1
  • 32. spherical reparameterisation Introducing k i=1 γ2 i = ϕ2, constraints rewritten as k i=1 √ pi γi = 0 and k i=1 (η2 i + γ2 i ) = k i=1 η2 i + ϕ2 = 1 (γ1, . . . , γk) : points on intersection of hypersphere of radius ϕ and hyperplane orthogonal to ( √ p1, . . . , √ pk). γ = ϕ cos( 1) 1+ϕ sin( 1) cos( 2) 2+. . .+ϕ sin( 1) · · · sin( k−2) k−1 where ( 1, . . . , k−3) ∈ [0, π]k−3 and k−2 ∈ [0, 2π]. 1, . . . , k−1 orthonormal basis of hyperplane
  • 33. CT scan image sheep: 34% of fat, 59% of muscle, 7% of bone ˆf = 0.16N(68.7, 17.42) + 0.17N(89.9, 9.42) + 0.25N(121.1, 13.72) + 0.34N(134.6, 4.62) + 0.03N(196.3, 21.892) + 0.04N(243.9, 22)
  • 34. Reparametrization of Poisson mixture With the global parameter λ = E[X], f (x|λ1, . . . , λk) = 1 x! k i=1 pi λx i e−λi = 1 x! k i=1 pi (λγi /pi )x e−λγi /pi Component-wise parameters (γi ’s and pi ’s) are constrained by k i=1 γi = 1 and k i=1 pi = 1 . Theorem Posterior distribution associated with Jeffreys prior π(λ) = 1/λ proper when the components are Poisson, provided (a) proper distributions used on other parameters and (b) at least one strictly positive observation in the sample.
  • 35. extensions mixtures of compound Gaussian distributions, i.e., scale mixtures X = µ + σξZ Z ∼ N(0, 1) ξ ∼ h(ξ) includes T, double exponential, logistic, and α-stable distributions mixtures of compound Poisson distributions, when Poisson parameter is random with mean λ: P(X = x|λ) = 1 x! (λξ)x exp{−λξ} dν(ξ) covers all infinitely exchangeable distributions
  • 36. extensions mixtures of compound Gaussian distributions, i.e., scale mixtures X = µ + σξZ Z ∼ N(0, 1) ξ ∼ h(ξ) includes T, double exponential, logistic, and α-stable distributions mixtures of compound exponential distributions, when parameter is random with mean λ: f (x|λ) = 1 λξ exp{−x/λξ} dν(ξ) , x > 0 . all Gamma distributions with shape less than one and Pareto distributions