SlideShare a Scribd company logo
1 of 67
Download to read offline
How many mixture components?
Christian P. Robert
U. Paris Dauphine & Warwick U.
Joint work with A. Hairault and J. Rousseau
Festschrift for Sylvia, Cambridge, May 12, 2023
An early meet[ing]
ESF HSSS, CIRM, 1995
Early meet[ing]s
Outline
1 Mixtures of distributions
2 Approximations to evidence
3 Distributed evidence evaluation
4 Dirichlet process mixtures
The reference
Mixtures of distributions
Convex combination of densities
x ∼ fj with probability pj,
for j = 1, 2, . . . , k, with overall density
p1f1(x) + · · · + pkfk(x) .
Usual case: parameterised components
k
X
i=1
pif(x|ϑi) with
n
X
i=1
pi = 1
where weights pi’s are distinguished from
other parameters
Jeffreys priors for mixtures
True Jeffreys prior for mixtures of distributions defined as
Eϑ

∇
∇ log f(X|ϑ)
 1/2
I O(k) matrix
I unavailable in closed form except special cases
I unidimensional integrals approximated by Monte Carlo
tools
[Grazian  X, 2015]
Difficulties
I complexity grows in O(k2)
I significant computing requirement (reduced by delayed
acceptance)
[Banterle et al., 2014]
I differ from component-wise Jeffreys
[Diebolt  X, 1990; Stoneking, 2014]
I when is the posterior proper?
I how to check properness via MCMC outputs?
Further reference priors
Reparameterisation of a location-scale mixture in terms of its
global mean µ and global variance σ2 as
µi = µ + σαi and σi = στi 1 ≤ i ≤ k
where τi  0 and αi ∈ R
Induces compact space on other parameters:
k
X
i=1
piαi = 0 and
k
X
i=1
piτ2
i +
k
X
i=1
piα2
i = 1
© Posterior associated with prior π(µ, σ) = 1/σ proper with
Gaussian components if there are at least two observations in
the sample
[Kamary, Lee  X, 2018]
Label switching paradox
I Under exchangeability, should observe exchangeability of
the components [label switching] to conclude about
convergence of the Gibbs sampler
I If observed, how should we estimate parameters?
I If unobserved, uncertainty about convergence
[Celeux, Hurn  X, 2000; Frühwirth-Schnatter, 2001, 2004; Jasra  al., 2005]
[Unless adopting a point process perspective]
[Green, 2019]
The [s]Witcher
The [s]Witcher
Loss functions for mixture estimation
Global loss function that considers
distance between predictives
L(ξ, ^
ξ) =
Z
X
fξ(x) log

fξ(x)/f^
ξ(x) dx
eliminates the labelling effect
Similar solution for estimating
clusters through allocation variables
L(z, ^
z) =
X
ij
h
I[zi=zj](1 − I[^
zi=^
zj]) + I[^
zi=^
zj](1 − I[zi=zj])
i
.
[Celeux, Hurn  X, 2000]
Bayesian model comparison
Bayes Factor consistent for selecting number of components
[Ishwaran et al., 2001; Casella  Moreno, 2009; Chib and Kuffner, 2016]
Bayes Factor consistent for testing parametric versus
nonparametric alternatives
[Verdinelli  Wasserman, 1997; Dass  Lee, 2004; McVinish et al., 2009]
Consistent evidence for location DPM
Consistency of Bayes factor comparing finite mixtures against
(location) Dirichlet Process Mixture
H0 : f0 ∈ MK vs. H1 : f0 /
∈ MK
0 250 500 750 1000 1250 1500 1750 2000
n
4
3
2
1
0
1
2
log(BF)
0 200 400 600 800 1000
n
0
1
2
3
4
5
6
7
8
log(BF)
(left) finite mixture; (right) not a finite mixture
Consistent evidence for location DPM
Under generic assumptions, when x1, · · · , xn iid fP0
with
P0 =
k0
X
j=1
p0
j δϑ0
j
and Dirichlet DP(M, G0) prior on P, there exists t  0 such
that for all ε  0
Pf0

mDP(x)  n−(k0−1+dk0+t)/2

= o(1)
Moreover there exists q ≥ 0 such that
ΠDP

kf0 − fpk1 ≤
(log n)q
√
n
x

= 1 + oPf0
(1)
[Hairault, X  Rousseau, 2022]
Consistent evidence for location DPM
Assumption A1 [Regularity]
Assumption A2 [Strong identifiability]
Assumption A3 [Compactness ]
Assumption A4 [Existence of DP random mean]
Assumption A5 [Truncated support of M, e.g. trunc’d Ga]
If fP0
∈ Mk0
satisfies Assumptions A1–A3, then
mk0
(y)/mDP(y) → ∞ under fP0
Moreover for all k ≥ k0, if Dirichlet parameter α = η/k and
η  kd/2, then
mk(y)/mDP(y) → ∞ under fP0
Outline
1 Mixtures of distributions
2 Approximations to evidence
3 Distributed evidence evaluation
4 Dirichlet process mixtures
Chib’s or candidate’s representation
Direct application of Bayes’ theorem: given x ∼ fk(x|ϑk) and
ϑk ∼ πk(ϑk),
Zk = mk(x) =
fk(x|ϑk) πk(ϑk)
πk(ϑk|x)
Replace with an approximation to the posterior
b
Zk = d
mk(x) =
fk(x|ϑ∗
k) πk(ϑ∗
k)
^
πk(ϑ∗
k|x)
.
[Besag, 1989; Chib, 1995]
Natural Rao-Blackwellisation
For missing variable z as in mixture models, natural
Rao-Blackwell (unbiased) estimate
c
πk(ϑ∗
k|x) =
1
T
T
X
t=1
πk(ϑ∗
k|x, z
(t)
k ) ,
where the z
(t)
k ’s are Gibbs sampled latent variables
[Diebolt  X, 1990; Chib, 1995]
Compensation for label switching
For mixture models, z
(t)
k usually fails to visit all configurations,
despite symmetry predicted by theory
Consequences on numerical approximation, biased by an order
k!
Compensation for label switching
For mixture models, z
(t)
k usually fails to visit all configurations,
despite symmetry predicted by theory
Force predicted theoretical symmetry by using
f
πk(ϑ∗
k|x) =
1
T k!
X
σ∈Sk
T
X
t=1
πk(σ(ϑ∗
k)|x, z
(t)
k ) .
for all σ’s in Sk, set of all permutations of {1, . . . , k}
[Neal, 1999; Berkhof, Mechelen,  Gelman, 2003; Lee  X, 2018]
Astronomical illustration
Benchmark galaxies for radial velocities of 82 galaxies
[Postman et al., 1986; Roader, 1992; Raftery, 1996]
Conjugate priors
σ2
k ∼ Γ−1
(a0, b0)
µk|σ2
k ∼ N(µ0, σ2
k/λ0)
Galaxy dataset (k)
Using Chib’s estimate, with ϑ∗
k as MAP estimator,
log(^
Zk(x)) = −105.1396
for k = 3, while introducing permutations leads to
log(^
Zk(x)) = −103.3479
Note that
−105.1396 + log(3!) = −103.3479
k 2 3 4 5 6 7 8
Zk(x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44
Estimations of the marginal likelihoods by the symmetrised Chib’s
approximation (based on 105
Gibbs iterations and, for k  5, 100
permutations selected at random in Sk).
[Lee et al., 2008]
RJMCMC outcome
Rethinking Chib’s solution
Alternate Rao–Blackwellisation by marginalising into partitions
Apply candidate’s/Chib’s formula to a chosen partition:
mk(x) =
fk(x|C0)πk(C0)
πk(C0|x)
with
πk(C(z)) =
k!
(k − k+)!
Γ
Pk
j=1 αj

Γ
Pk
j=1 αj + n

k
Y
j=1
Γ(nj + αj)
Γ(αj)
C(z) partition of {1, . . . , n} induced by cluster membership z
nj =
Pn
i=1 I{zi=j} # observations assigned to cluster j
k+ =
Pk
j=1 I{nj0} # non-empty clusters
Rethinking Chib’s solution
Under conjugate prior G0 on ϑ,
fk(x|C(z)) =
k
Y
j=1
Z
Θ
Y
i:zi=k
f(xi|ϑ)G0(dϑ)
| {z }
m(Ck(z))
and
^
πk(C0
|x) =
1
T
T
X
t=1
IC0≡C(z(t))
I considerably lower computational demand
I no label switching issue
I further Rao-Blackwellisation?
More efficient sampling
Iterative bridge sampling:
b
e(t)
(k) = b
e(t−1)
(k) M−1
1
M1
X
l=1
^
π(ϑ̃l|x)
M1q(ϑ̃l) + M2 ^
π(ϑ̃l|x)
.
M−1
2
M2
X
m=1
q(^
ϑm)
M1q(^
ϑm) + M2 ^
π(^
ϑm|x)
[Frühwirth-Schnatter, 2004]
where
ϑ̃l
∼ q(ϑ) and ^
ϑm
∼ π(ϑ)
More efficient sampling
Iterative bridge sampling:
b
e(t)
(k) = b
e(t−1)
(k) M−1
1
M1
X
l=1
^
π(ϑ̃l|x)
M1q(ϑ̃l) + M2 ^
π(ϑ̃l|x)
.
M−1
2
M2
X
m=1
q(^
ϑm)
M1q(^
ϑm) + M2 ^
π(^
ϑm|x)
[Frühwirth-Schnatter, 2004]
where
q(ϑ) =
1
J1
J1
X
j=1
p(λ|z(j)
)
k
Y
i=1
p(ξi|ξ
(j)
ij, ξ
(j−1)
ij , z(j)
, x)
More efficient sampling
Iterative bridge sampling:
b
e(t)
(k) = b
e(t−1)
(k) M−1
1
M1
X
l=1
^
π(ϑ̃l|x)
M1q(ϑ̃l) + M2 ^
π(ϑ̃l|x)
.
M−1
2
M2
X
m=1
q(^
ϑm)
M1q(^
ϑm) + M2 ^
π(^
ϑm|x)
[Frühwirth-Schnatter, 2004]
where
q(ϑ) =
1
k!
X
σ∈S(k)
p(λ|σ(zo
))
k
Y
i=1
p(ξi|σ(ξo
ij), σ(ξo
ij), σ(zo
), x)
Sparsity for permutations
Contribution of each term relative to q(ϑ)
ησ(ϑ) =
hσ(ϑ)
k!q(ϑ)
=
hσi
(ϑ)
P
σ∈Sk
hσ(ϑ)
and importance of permutation σ evaluated by
b
Ehσc
[ησi
(ϑ)] =
1
M
M
X
l=1
ησi
(ϑ(l)
) , ϑ(l)
∼ hσc (ϑ)
Approximate set A(k) ⊆ S(k) consist of [σ1, · · · , σn] for the
smallest n that satisfies the condition
^
ϕn =
1
M
M
X
l=1
q̃n(ϑ(l)
) − q(ϑ(l)
)  τ
dual importance sampling with approximation
DIS2A
1 Randomly select {z(j)
, ϑ(j)
}J
j=1 from Gibbs sample and
un-switch
Construct q(ϑ)
2 Choose hσc (ϑ) and generate particles {ϑ(t)
}T
t=1 ∼ hσc (ϑ)
3 Construction of approximation q̃(ϑ) using first M-sample
3.1 Compute b
Ehσc
[ησ1
(ϑ)], · · · ,b
Ehσc
[ησk!
(ϑ)]
3.2 Reorder the σ’s such that
b
Ehσc
[ησ1
(ϑ)] ≥ · · · ≥ b
Ehσc
[ησk!
(ϑ)].
3.3 Initially set n = 1 and compute q̃n(ϑ(t)
)’s and b
ϕn. If
b
ϕn=1  τ, go to Step 4. Otherwise increase n = n + 1
4 Replace q(ϑ(1)
), . . . , q(ϑ(T)
) with q̃(ϑ(1)
), . . . , q̃(ϑ(T)
) to
estimate b
E
[Lee  X, 2014]
illustrations
k k! |A(k)| ∆(A)
3 6 1.0000 0.1675
4 24 2.7333 0.1148
Fishery data
k k! |A(k)| ∆(A)
3 6 1.000 0.1675
4 24 15.7000 0.6545
6 720 298.1200 0.4146
Galaxy data
Table: Mean estimates of approximate set sizes, |A(k)|, and the
reduction rate of a number of evaluated h-terms ∆(A) for (a) fishery
and (b) galaxy datasets
Sequential Monte Carlo
Tempered sequence of targets (t = 1, . . . , T)
πkt(ϑk) ∝ pkt(ϑk) = πk(ϑk)fk(x|ϑk)λt
λ1 = 0  · · ·  λT = 1
particles (simulations) (i = 1, . . . , Nt)
ϑi
t
i.i.d.
∼ πkt(ϑk)
usually obtained by MCMC step
ϑi
t ∼ Kt(ϑi
t−1, ϑ)
with importance weights (i = 1, . . . , Nt)
ωt
i = fk(x|ϑk)λt−λt−1
[Del Moral et al., 2006; Buchholz et al., 2021]
Sequential Monte Carlo
Tempered sequence of targets (t = 1, . . . , T)
πkt(ϑk) ∝ pkt(ϑk) = πk(ϑk)fk(x|ϑk)λt
λ1 = 0  · · ·  λT = 1
Produces approximation of evidence
^
Zk =
Y
t
1
Nt
Nt
X
i=1
ωt
i
[Del Moral et al., 2006; Buchholz et al., 2021]
Sequential2
imputation
For conjugate priors, (marginal) particle filter representation of
a proposal:
π∗
(z|x) = π(z1|x1)
n
Y
i=2
π(zi|x1:i, z1:i−1)
with importance weight
π(z|x)
π∗(z|x)
=
π(x, z)
m(x)
m(x1)
π(z1, x1)
m(z1, x1, x2)
π(z1, x1, z2, x2)
· · ·
π(z1:n−1, x)
π(z, x)
=
w(z, x)
m(x)
leading to unbiased estimator of evidence
^
Zk(x) =
1
T
T
X
i=1
w(z(t)
, x)
[Long, Liu  Wong, 1994; Carvalho et al., 2010]
Galactic illustration
C
h
i
b
C
h
i
b
P
e
r
m
B
r
i
d
g
e
S
a
m
p
l
i
n
g
a
d
a
p
t
i
v
e
S
M
C
S
I
S
C
h
i
b
P
a
r
t
i
t
i
o
n
s
a
r
i
t
h
m
e
t
i
c
M
e
a
n
229
228
227
226
m(y)
K=3
C
h
i
b
C
h
i
b
P
e
r
m
B
r
i
d
g
e
S
a
m
p
l
i
n
g
a
d
a
p
t
i
v
e
S
M
C
S
I
S
C
h
i
b
P
a
r
t
i
t
i
o
n
s
a
r
i
t
h
m
e
t
i
c
M
e
a
n
232
230
228
226
K=5
C
h
i
b
C
h
i
b
R
a
n
d
P
e
r
m
a
d
a
p
t
i
v
e
S
M
C
S
I
S
C
h
i
b
P
a
r
t
i
t
i
o
n
s
a
r
i
t
h
m
e
t
i
c
M
e
a
n
235.0
232.5
230.0
227.5
m(y)
K=6
C
h
i
b
C
h
i
b
R
a
n
d
P
e
r
m
a
d
a
p
t
i
v
e
S
M
C
S
I
S
C
h
i
b
P
a
r
t
i
t
i
o
n
s
a
r
i
t
h
m
e
t
i
c
M
e
a
n
240
235
230
225
K=8
Common illustration
5.0 5.5 6.0 6.5 7.0 7.5 8.0
log(time)
10
8
6
4
2
0
2
log(MSE)
ChibPartitions
Bridge Sampling
ChibPerm
SIS
SMC
Empirical conclusions
I Bridge sampling, arithmetic mean and original Chib’s
method fail to scale with n, sample size
I Partition Chib’s increasingly variable with k, number of
components
I Adaptive SMC ultimately fails
I SIS remains most reliable method
1 Mixtures of distributions
2 Approximations to evidence
3 Distributed evidence evaluation
4 Dirichlet process mixtures
Distributed computation
[Buchholz et al., 2022]
Divide  Conquer
1. data y divided into S batches y1, . . . , yS with
π(ϑ|y) ∝ p(y|ϑ)π(ϑ) =
S
Y
s=1
p(ys|ϑ)π(ϑ)1/S
=
S
Y
s=1
p(ys|ϑ)π̃(ϑ) ∝
S
Y
s=1
π̃(ϑ|ys)
2. infer with π̃(ϑ|ys), sub-posterior distributions, in parallel
by MCMC
3. recombine all sub-posterior samples
[Buchholz et al., 2022]
Connecting bits
While
m(y) =
Z S
Y
s=1
p(ys|ϑ)π̃(ϑ)dϑ 6=
S
Y
s=1
Z
p(ys|ϑ)π̃(ϑ)dϑ =
S
Y
s=1
m̃(ys)
they can be connected as
m(y) = ZS
S
Y
s=1
m̃(ys)
Z S
Y
s=1
π̃(ϑ|ys)dϑ
[Buchholz et al., 2022]
Connecting bits
m(y) = ZS
S
Y
s=1
m̃(ys)
Z S
Y
s=1
π̃(ϑ|ys)dϑ
where
π̃(ϑ|ys) ∝ p(ys|ϑ)π̃(ϑ),
m̃(ys) =
Z
p(ys|ϑ)π̃(ϑ)dϑ,
Z =
Z
π(ϑ)1/S
dϑ
[Buchholz et al., 2022]
Label unswitching worries
While Z usually closed-form,
I =
Z S
Y
s=1
π̃(ϑ|ys)dϑ
is not and need be evaluated as
^
I =
1
T
T
X
t=1
Z S
Y
s=1
π̃(ϑ|z
(t)
s , ys)dϑ
when
π̃(ϑ|ys) =
Z
π̃(ϑ|zs, ys)π̃(zs|ys)dzs
Label unswitching worries
π̃(ϑ|ys) =
Z
π̃(ϑ|zs, ys)π̃(zs|ys)dzs
Issue: with distributed computing, shards zs are unrelated and
corresponding clusters disconnected.
Label unswitching worries
π̃(ϑ|ys) =
Z
π̃(ϑ|zs, ys)π̃(zs|ys)dzs
Issue: with distributed computing, shards zs are unrelated and
corresponding clusters disconnected.
4500
4400
4300
4200
4100
4000
3900
3800
logm(y)
K=1 K=2 K=3 K=4 K=5
Label switching imposition
Returning to averaging across permutations, identity
^
Iperm =
1
TK!S−1
T
X
t=1
X
σ2,...,σS∈SK
Z
π̃(ϑ|z
(t)
1 , y1)
S
Y
s=2
π̃(ϑ|σs(z
(t)
s ), ys)dϑ
Label switching imposition
Returning to averaging across permutations, identity
^
Iperm =
1
TK!S−1
T
X
t=1
X
σ2,...,σS∈SK
Z
π̃(ϑ|z
(t)
1 , y1)
S
Y
s=2
π̃(ϑ|σs(z
(t)
s ), ys)dϑ
I Iperm
4500
4400
4300
4200
4100
4000
3900
3800
logm(y)
K=1
I Iperm
K=2
I Iperm
K=3
Label switching imposition
Returning to averaging across permutations, identity
^
Iperm =
1
TK!S−1
T
X
t=1
X
σ2,...,σS∈SK
Z
π̃(ϑ|z
(t)
1 , y1)
S
Y
s=2
π̃(ϑ|σs(z
(t)
s ), ys)dϑ
Obtained at heavy computational cost: O(T) for ^
I versus
O(TK!S−1) for ^
Iperm
Importance sampling version
Obtained at heavy computational cost: O(T) for ^
I versus
O(TK!S−1) for ^
Iperm
Avoid enumeration of permutations by using simulated values of
parameter for the reference sub-posterior as anchors towards
coherent labeling of clusters
[Celeux, 1998; Stephens, 2000]
Importance sampling version
For each batch s = 2, . . . , S, define matching matrix
Ps =



ps11 · · · ps1K
.
.
.
.
.
.
.
.
.
psK1 · · · psKK



where
pslk =
Y
i:zsi=l
p(ysi|ϑk)
used in creating proposals
qs(σ) ∝
K
Y
k=1
pskσ(k)
that reflect probabilities that each cluster k of batch s is
well-matched with cluster σ(k) of batch 1
Importance sampling version
Considerably reduced computational cost compared to
^
m^
Iperm
(y)
I At each iteration t, total cost of O(Kn/S) for evaluating Ps
I computing K! weights of discrete importance distribution
qσs requires K! operations
I sampling from the global discrete importance distribution
requires M(t) basic operations
Global cost of
O(T(Kn/S + K! + M̄))
for M̄ maximum number of importance simulations
Importance sampling version
Resulting estimator
^
IIS =
1
TK!S−1
T
X
t=1
1
M(t)
M(t)
X
m=1
χ(z(t); σ
(t,m)
2 , . . . , σ
(t,m)
S )
πσ(σ
(t,m)
2 , . . . , σ
(t,m)
S )
where
χ(z(t)
; σ2, . . . , σS) :=
Z
π̃(ϑ|z
(t)
1 , y1)
S
Y
s=2
π̃(ϑ|σs(z
(t)
s ), ys)dϑ
Importance sampling version
Resulting estimator
^
IIS =
1
TK!S−1
T
X
t=1
1
M(t)
M(t)
X
m=1
χ(z(t); σ
(t,m)
2 , . . . , σ
(t,m)
S )
πσ(σ
(t,m)
2 , . . . , σ
(t,m)
S )
I Iperm IIS
1520
1500
1480
1460
1440
1420
logm(y)
K=1
I Iperm IIS
K=2
I Iperm IIS
K=3
I Iperm IIS
K=4
Sequential importance sampling
Define
π̃s(ϑ) =
Qs
l=1 π̃(ϑ|yl)
Zs
where
Zs =
Z s
Y
l=1
π̃(ϑ|yl)
then
m(y) = ZS
× m(y1) ×
S
Y
s=2
Z
πs−1(ϑ)p(ys|ϑ)π̃(ϑ)dϑ
Sequential importance sampling
Calls for standard sequential importance sampling strategy
making use of the successive distributions πs(ϑ) as importance
distributions
Sequential importance sampling
Calls for standard sequential importance sampling strategy
making use of the successive distributions πs(ϑ) as importance
distributions
I Iperm IIS SMC
1575
1550
1525
1500
1475
1450
1425
logm(y)
K=1
I Iperm IIS SMC
K=2
I Iperm IIS SMC
K=3
I Iperm IIS SMC
K=4
I Iperm IIS SMC
K=5
1 Mixtures of distributions
2 Approximations to evidence
3 Distributed evidence evaluation
4 Dirichlet process mixtures
Dirichlet process mixture (DPM)
Extension to the k = ∞ (non-parametric) case
xi|zi, ϑ
i.i.d
∼ f(xi|ϑxi
), i = 1, . . . , n (1)
P(Zi = k) = πk, k = 1, 2, . . .
π1, π2, . . . ∼ GEM(M) M ∼ π(M)
ϑ1, ϑ2, . . .
i.i.d
∼ G0
with GEM (Griffith-Engen-McCloskey) defined by the
stick-breaking representation
πk = vk
k−1
Y
i=1
(1 − vi) vi ∼ Beta(1, M)
[Sethuraman, 1994]
Dirichlet process mixture (DPM)
Resulting in an infinite mixture
x ∼
n
Y
i=1
∞
X
i=1
πif(xi|ϑi)
with (prior) cluster allocation
π(z|M) =
Γ(M)
Γ(M + n)
MK+
K+
Y
j=1
Γ(nj)
and conditional likelihood
p(x|z, M) =
K+
Y
j=1
Z Y
i:zi=j
f(xi|ϑj)dG0(ϑj)
available in closed form when G0 conjugate
Approximating the evidence
Extension of Chib’s formula by marginalising over z and ϑ
mDP(x) =
p(x|M∗, G0)π(M∗)
π(M∗|x)
and using estimate
^
π(M∗
|x) =
1
T
T
X
t=1
π(M∗
|x, η(t)
, K
(t)
+ )
provided prior on M a Γ(a, b) distribution since
M|x, η, K+ ∼ ωΓ(a+K+, b−log(η))+(1−ω)Γ(a+K+−1, b−log(η))
with ω = (a + K+ − 1)/{n(b − log(η)) + a + K+ − 1} and
η|x, M ∼ Beta(M + 1, n)
[Basu  Chib, 2003]
Approximating the likelihood
Intractable likelihood p(x|M∗, G0) approximated by sequential
inputation importance sampling
Generating z from the proposal
π∗
(z|x, M) =
n
Y
i=1
π(zi|x1:i, z1:i−1, M)
and using the approximation
^
L(x|M∗
, G0) =
1
T
T
X
t=1
^
p(x1|z
(t)
1 , G0)
n
Y
i=2
p(yi|x1:i−1z
(t)
1:i−1, G0)
[Kong, Lu  Wong, 1994; Basu  Chib, 2003]
Approximating the evidence (bis)
Reverse logistic regression applies to DPM:
Importance function
π1(z, M) := π∗
(z|x, M)π(M) and π2(z, M) =
π(z, M|x)
m(y)
{z(1,j), M(1,j)}T
j=1 and {z(2,j), M(2,j)}T
j=1 samples from π1 and π2
Marginal likelihood m(y) estimated as intercept of logistic
regression with covariate
log{π1(z, M)/π̃2(z, M)}
on merged sample
[Geyer, 1994; Chen  Shao, 1997]
Galactic illustration
a
r
i
t
h
m
e
t
i
c
h
a
r
m
o
n
i
c
C
h
i
b
R
L
R
-
S
I
S
R
L
R
-
P
r
i
o
r
23.296
23.288
23.280
23.272
23.264
23.256
23.248
23.240
23.232
23.224
m(x)
n=6
a
r
i
t
h
m
e
t
i
c
h
a
r
m
o
n
i
c
C
h
i
b
R
L
R
-
S
I
S
R
L
R
-
P
r
i
o
r
111
110
109
108
107
106
105
104
n=36
a
r
i
t
h
m
e
t
i
c
h
a
r
m
o
n
i
c
C
h
i
b
R
L
R
-
S
I
S
R
L
R
-
P
r
i
o
r
240
236
232
228
224
220
216
212
n=82
Galactic illustration
1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6
log(time)
3
2
1
0
1
log(MSE)
Chib
RLR-SIS
RLR-Prior
Towards new adventures
Exumas, Bahamas, 10 August 2011

More Related Content

What's hot

[DL輪読会]Deep Learning 第10章 系列モデリング 回帰結合型ニューラルネットワークと再帰型ネットワーク
[DL輪読会]Deep Learning 第10章 系列モデリング 回帰結合型ニューラルネットワークと再帰型ネットワーク[DL輪読会]Deep Learning 第10章 系列モデリング 回帰結合型ニューラルネットワークと再帰型ネットワーク
[DL輪読会]Deep Learning 第10章 系列モデリング 回帰結合型ニューラルネットワークと再帰型ネットワークDeep Learning JP
 
区間分割の仕方を最適化する動的計画法 (JOI 2021 夏季セミナー)
区間分割の仕方を最適化する動的計画法 (JOI 2021 夏季セミナー)区間分割の仕方を最適化する動的計画法 (JOI 2021 夏季セミナー)
区間分割の仕方を最適化する動的計画法 (JOI 2021 夏季セミナー)Kensuke Otsuki
 
Conditional Probability
Conditional ProbabilityConditional Probability
Conditional Probabilitychristjt
 
[DL輪読会]Deep Learning 第7章 深層学習のための正則化
[DL輪読会]Deep Learning 第7章 深層学習のための正則化[DL輪読会]Deep Learning 第7章 深層学習のための正則化
[DL輪読会]Deep Learning 第7章 深層学習のための正則化Deep Learning JP
 
[01] Quantum Error Correction for Beginners
[01] Quantum Error Correction for Beginners [01] Quantum Error Correction for Beginners
[01] Quantum Error Correction for Beginners Shin Nishio
 
Artificial Neural Networks (ANNs) - XOR - Step-By-Step
Artificial Neural Networks (ANNs) - XOR - Step-By-StepArtificial Neural Networks (ANNs) - XOR - Step-By-Step
Artificial Neural Networks (ANNs) - XOR - Step-By-StepAhmed Gad
 
Fixed Point Theorems
Fixed Point TheoremsFixed Point Theorems
Fixed Point TheoremsNikhil Simha
 
「明日話したくなる「素数」のお話」第1回プログラマのための数学勉強会 #maths4pg
「明日話したくなる「素数」のお話」第1回プログラマのための数学勉強会 #maths4pg 「明日話したくなる「素数」のお話」第1回プログラマのための数学勉強会 #maths4pg
「明日話したくなる「素数」のお話」第1回プログラマのための数学勉強会 #maths4pg Junpei Tsuji
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdfHyungjoo Cho
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methodsChristian Robert
 
Recurrence relations
Recurrence relationsRecurrence relations
Recurrence relationsIIUM
 

What's hot (20)

[DL輪読会]Deep Learning 第10章 系列モデリング 回帰結合型ニューラルネットワークと再帰型ネットワーク
[DL輪読会]Deep Learning 第10章 系列モデリング 回帰結合型ニューラルネットワークと再帰型ネットワーク[DL輪読会]Deep Learning 第10章 系列モデリング 回帰結合型ニューラルネットワークと再帰型ネットワーク
[DL輪読会]Deep Learning 第10章 系列モデリング 回帰結合型ニューラルネットワークと再帰型ネットワーク
 
区間分割の仕方を最適化する動的計画法 (JOI 2021 夏季セミナー)
区間分割の仕方を最適化する動的計画法 (JOI 2021 夏季セミナー)区間分割の仕方を最適化する動的計画法 (JOI 2021 夏季セミナー)
区間分割の仕方を最適化する動的計画法 (JOI 2021 夏季セミナー)
 
Conditional Probability
Conditional ProbabilityConditional Probability
Conditional Probability
 
From mcmc to sgnht
From mcmc to sgnhtFrom mcmc to sgnht
From mcmc to sgnht
 
Pumping lemma (1)
Pumping lemma (1)Pumping lemma (1)
Pumping lemma (1)
 
[DL輪読会]Deep Learning 第7章 深層学習のための正則化
[DL輪読会]Deep Learning 第7章 深層学習のための正則化[DL輪読会]Deep Learning 第7章 深層学習のための正則化
[DL輪読会]Deep Learning 第7章 深層学習のための正則化
 
文字列アルゴリズム
文字列アルゴリズム文字列アルゴリズム
文字列アルゴリズム
 
双対性
双対性双対性
双対性
 
Greedy method
Greedy method Greedy method
Greedy method
 
[01] Quantum Error Correction for Beginners
[01] Quantum Error Correction for Beginners [01] Quantum Error Correction for Beginners
[01] Quantum Error Correction for Beginners
 
動的計画法を極める!
動的計画法を極める!動的計画法を極める!
動的計画法を極める!
 
Artificial Neural Networks (ANNs) - XOR - Step-By-Step
Artificial Neural Networks (ANNs) - XOR - Step-By-StepArtificial Neural Networks (ANNs) - XOR - Step-By-Step
Artificial Neural Networks (ANNs) - XOR - Step-By-Step
 
Convex Hull Trick
Convex Hull TrickConvex Hull Trick
Convex Hull Trick
 
Prml
PrmlPrml
Prml
 
Fixed Point Theorems
Fixed Point TheoremsFixed Point Theorems
Fixed Point Theorems
 
「明日話したくなる「素数」のお話」第1回プログラマのための数学勉強会 #maths4pg
「明日話したくなる「素数」のお話」第1回プログラマのための数学勉強会 #maths4pg 「明日話したくなる「素数」のお話」第1回プログラマのための数学勉強会 #maths4pg
「明日話したくなる「素数」のお話」第1回プログラマのための数学勉強会 #maths4pg
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
Recurrence relations
Recurrence relationsRecurrence relations
Recurrence relations
 
Galois theory
Galois theoryGalois theory
Galois theory
 

Similar to How many components in a mixture?

Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methodsChristian Robert
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applicationsFrank Nielsen
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtureszukun
 
Computing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCDComputing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCDChristos Kallidonis
 

Similar to How many components in a mixture? (20)

Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Vancouver18
Vancouver18Vancouver18
Vancouver18
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
 
Computing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCDComputing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCD
 

More from Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsChristian Robert
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 

More from Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 

Recently uploaded

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 

Recently uploaded (20)

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 

How many components in a mixture?

  • 1. How many mixture components? Christian P. Robert U. Paris Dauphine & Warwick U. Joint work with A. Hairault and J. Rousseau Festschrift for Sylvia, Cambridge, May 12, 2023
  • 2. An early meet[ing] ESF HSSS, CIRM, 1995
  • 4. Outline 1 Mixtures of distributions 2 Approximations to evidence 3 Distributed evidence evaluation 4 Dirichlet process mixtures
  • 6. Mixtures of distributions Convex combination of densities x ∼ fj with probability pj, for j = 1, 2, . . . , k, with overall density p1f1(x) + · · · + pkfk(x) . Usual case: parameterised components k X i=1 pif(x|ϑi) with n X i=1 pi = 1 where weights pi’s are distinguished from other parameters
  • 7. Jeffreys priors for mixtures True Jeffreys prior for mixtures of distributions defined as Eϑ ∇ ∇ log f(X|ϑ) 1/2 I O(k) matrix I unavailable in closed form except special cases I unidimensional integrals approximated by Monte Carlo tools [Grazian X, 2015]
  • 8. Difficulties I complexity grows in O(k2) I significant computing requirement (reduced by delayed acceptance) [Banterle et al., 2014] I differ from component-wise Jeffreys [Diebolt X, 1990; Stoneking, 2014] I when is the posterior proper? I how to check properness via MCMC outputs?
  • 9. Further reference priors Reparameterisation of a location-scale mixture in terms of its global mean µ and global variance σ2 as µi = µ + σαi and σi = στi 1 ≤ i ≤ k where τi 0 and αi ∈ R Induces compact space on other parameters: k X i=1 piαi = 0 and k X i=1 piτ2 i + k X i=1 piα2 i = 1 © Posterior associated with prior π(µ, σ) = 1/σ proper with Gaussian components if there are at least two observations in the sample [Kamary, Lee X, 2018]
  • 10. Label switching paradox I Under exchangeability, should observe exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler I If observed, how should we estimate parameters? I If unobserved, uncertainty about convergence [Celeux, Hurn X, 2000; Frühwirth-Schnatter, 2001, 2004; Jasra al., 2005] [Unless adopting a point process perspective] [Green, 2019]
  • 13. Loss functions for mixture estimation Global loss function that considers distance between predictives L(ξ, ^ ξ) = Z X fξ(x) log fξ(x)/f^ ξ(x) dx eliminates the labelling effect Similar solution for estimating clusters through allocation variables L(z, ^ z) = X ij h I[zi=zj](1 − I[^ zi=^ zj]) + I[^ zi=^ zj](1 − I[zi=zj]) i . [Celeux, Hurn X, 2000]
  • 14. Bayesian model comparison Bayes Factor consistent for selecting number of components [Ishwaran et al., 2001; Casella Moreno, 2009; Chib and Kuffner, 2016] Bayes Factor consistent for testing parametric versus nonparametric alternatives [Verdinelli Wasserman, 1997; Dass Lee, 2004; McVinish et al., 2009]
  • 15. Consistent evidence for location DPM Consistency of Bayes factor comparing finite mixtures against (location) Dirichlet Process Mixture H0 : f0 ∈ MK vs. H1 : f0 / ∈ MK 0 250 500 750 1000 1250 1500 1750 2000 n 4 3 2 1 0 1 2 log(BF) 0 200 400 600 800 1000 n 0 1 2 3 4 5 6 7 8 log(BF) (left) finite mixture; (right) not a finite mixture
  • 16. Consistent evidence for location DPM Under generic assumptions, when x1, · · · , xn iid fP0 with P0 = k0 X j=1 p0 j δϑ0 j and Dirichlet DP(M, G0) prior on P, there exists t 0 such that for all ε 0 Pf0 mDP(x) n−(k0−1+dk0+t)/2 = o(1) Moreover there exists q ≥ 0 such that ΠDP kf0 − fpk1 ≤ (log n)q √ n x = 1 + oPf0 (1) [Hairault, X Rousseau, 2022]
  • 17. Consistent evidence for location DPM Assumption A1 [Regularity] Assumption A2 [Strong identifiability] Assumption A3 [Compactness ] Assumption A4 [Existence of DP random mean] Assumption A5 [Truncated support of M, e.g. trunc’d Ga] If fP0 ∈ Mk0 satisfies Assumptions A1–A3, then mk0 (y)/mDP(y) → ∞ under fP0 Moreover for all k ≥ k0, if Dirichlet parameter α = η/k and η kd/2, then mk(y)/mDP(y) → ∞ under fP0
  • 18. Outline 1 Mixtures of distributions 2 Approximations to evidence 3 Distributed evidence evaluation 4 Dirichlet process mixtures
  • 19. Chib’s or candidate’s representation Direct application of Bayes’ theorem: given x ∼ fk(x|ϑk) and ϑk ∼ πk(ϑk), Zk = mk(x) = fk(x|ϑk) πk(ϑk) πk(ϑk|x) Replace with an approximation to the posterior b Zk = d mk(x) = fk(x|ϑ∗ k) πk(ϑ∗ k) ^ πk(ϑ∗ k|x) . [Besag, 1989; Chib, 1995]
  • 20. Natural Rao-Blackwellisation For missing variable z as in mixture models, natural Rao-Blackwell (unbiased) estimate c πk(ϑ∗ k|x) = 1 T T X t=1 πk(ϑ∗ k|x, z (t) k ) , where the z (t) k ’s are Gibbs sampled latent variables [Diebolt X, 1990; Chib, 1995]
  • 21. Compensation for label switching For mixture models, z (t) k usually fails to visit all configurations, despite symmetry predicted by theory Consequences on numerical approximation, biased by an order k!
  • 22. Compensation for label switching For mixture models, z (t) k usually fails to visit all configurations, despite symmetry predicted by theory Force predicted theoretical symmetry by using f πk(ϑ∗ k|x) = 1 T k! X σ∈Sk T X t=1 πk(σ(ϑ∗ k)|x, z (t) k ) . for all σ’s in Sk, set of all permutations of {1, . . . , k} [Neal, 1999; Berkhof, Mechelen, Gelman, 2003; Lee X, 2018]
  • 23. Astronomical illustration Benchmark galaxies for radial velocities of 82 galaxies [Postman et al., 1986; Roader, 1992; Raftery, 1996] Conjugate priors σ2 k ∼ Γ−1 (a0, b0) µk|σ2 k ∼ N(µ0, σ2 k/λ0)
  • 24. Galaxy dataset (k) Using Chib’s estimate, with ϑ∗ k as MAP estimator, log(^ Zk(x)) = −105.1396 for k = 3, while introducing permutations leads to log(^ Zk(x)) = −103.3479 Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 Zk(x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k 5, 100 permutations selected at random in Sk). [Lee et al., 2008]
  • 26. Rethinking Chib’s solution Alternate Rao–Blackwellisation by marginalising into partitions Apply candidate’s/Chib’s formula to a chosen partition: mk(x) = fk(x|C0)πk(C0) πk(C0|x) with πk(C(z)) = k! (k − k+)! Γ Pk j=1 αj Γ Pk j=1 αj + n k Y j=1 Γ(nj + αj) Γ(αj) C(z) partition of {1, . . . , n} induced by cluster membership z nj = Pn i=1 I{zi=j} # observations assigned to cluster j k+ = Pk j=1 I{nj0} # non-empty clusters
  • 27. Rethinking Chib’s solution Under conjugate prior G0 on ϑ, fk(x|C(z)) = k Y j=1 Z Θ Y i:zi=k f(xi|ϑ)G0(dϑ) | {z } m(Ck(z)) and ^ πk(C0 |x) = 1 T T X t=1 IC0≡C(z(t)) I considerably lower computational demand I no label switching issue I further Rao-Blackwellisation?
  • 28. More efficient sampling Iterative bridge sampling: b e(t) (k) = b e(t−1) (k) M−1 1 M1 X l=1 ^ π(ϑ̃l|x) M1q(ϑ̃l) + M2 ^ π(ϑ̃l|x) . M−1 2 M2 X m=1 q(^ ϑm) M1q(^ ϑm) + M2 ^ π(^ ϑm|x) [Frühwirth-Schnatter, 2004] where ϑ̃l ∼ q(ϑ) and ^ ϑm ∼ π(ϑ)
  • 29. More efficient sampling Iterative bridge sampling: b e(t) (k) = b e(t−1) (k) M−1 1 M1 X l=1 ^ π(ϑ̃l|x) M1q(ϑ̃l) + M2 ^ π(ϑ̃l|x) . M−1 2 M2 X m=1 q(^ ϑm) M1q(^ ϑm) + M2 ^ π(^ ϑm|x) [Frühwirth-Schnatter, 2004] where q(ϑ) = 1 J1 J1 X j=1 p(λ|z(j) ) k Y i=1 p(ξi|ξ (j) ij, ξ (j−1) ij , z(j) , x)
  • 30. More efficient sampling Iterative bridge sampling: b e(t) (k) = b e(t−1) (k) M−1 1 M1 X l=1 ^ π(ϑ̃l|x) M1q(ϑ̃l) + M2 ^ π(ϑ̃l|x) . M−1 2 M2 X m=1 q(^ ϑm) M1q(^ ϑm) + M2 ^ π(^ ϑm|x) [Frühwirth-Schnatter, 2004] where q(ϑ) = 1 k! X σ∈S(k) p(λ|σ(zo )) k Y i=1 p(ξi|σ(ξo ij), σ(ξo ij), σ(zo ), x)
  • 31. Sparsity for permutations Contribution of each term relative to q(ϑ) ησ(ϑ) = hσ(ϑ) k!q(ϑ) = hσi (ϑ) P σ∈Sk hσ(ϑ) and importance of permutation σ evaluated by b Ehσc [ησi (ϑ)] = 1 M M X l=1 ησi (ϑ(l) ) , ϑ(l) ∼ hσc (ϑ) Approximate set A(k) ⊆ S(k) consist of [σ1, · · · , σn] for the smallest n that satisfies the condition ^ ϕn = 1 M M X l=1 q̃n(ϑ(l) ) − q(ϑ(l) ) τ
  • 32. dual importance sampling with approximation DIS2A 1 Randomly select {z(j) , ϑ(j) }J j=1 from Gibbs sample and un-switch Construct q(ϑ) 2 Choose hσc (ϑ) and generate particles {ϑ(t) }T t=1 ∼ hσc (ϑ) 3 Construction of approximation q̃(ϑ) using first M-sample 3.1 Compute b Ehσc [ησ1 (ϑ)], · · · ,b Ehσc [ησk! (ϑ)] 3.2 Reorder the σ’s such that b Ehσc [ησ1 (ϑ)] ≥ · · · ≥ b Ehσc [ησk! (ϑ)]. 3.3 Initially set n = 1 and compute q̃n(ϑ(t) )’s and b ϕn. If b ϕn=1 τ, go to Step 4. Otherwise increase n = n + 1 4 Replace q(ϑ(1) ), . . . , q(ϑ(T) ) with q̃(ϑ(1) ), . . . , q̃(ϑ(T) ) to estimate b E [Lee X, 2014]
  • 33. illustrations k k! |A(k)| ∆(A) 3 6 1.0000 0.1675 4 24 2.7333 0.1148 Fishery data k k! |A(k)| ∆(A) 3 6 1.000 0.1675 4 24 15.7000 0.6545 6 720 298.1200 0.4146 Galaxy data Table: Mean estimates of approximate set sizes, |A(k)|, and the reduction rate of a number of evaluated h-terms ∆(A) for (a) fishery and (b) galaxy datasets
  • 34. Sequential Monte Carlo Tempered sequence of targets (t = 1, . . . , T) πkt(ϑk) ∝ pkt(ϑk) = πk(ϑk)fk(x|ϑk)λt λ1 = 0 · · · λT = 1 particles (simulations) (i = 1, . . . , Nt) ϑi t i.i.d. ∼ πkt(ϑk) usually obtained by MCMC step ϑi t ∼ Kt(ϑi t−1, ϑ) with importance weights (i = 1, . . . , Nt) ωt i = fk(x|ϑk)λt−λt−1 [Del Moral et al., 2006; Buchholz et al., 2021]
  • 35. Sequential Monte Carlo Tempered sequence of targets (t = 1, . . . , T) πkt(ϑk) ∝ pkt(ϑk) = πk(ϑk)fk(x|ϑk)λt λ1 = 0 · · · λT = 1 Produces approximation of evidence ^ Zk = Y t 1 Nt Nt X i=1 ωt i [Del Moral et al., 2006; Buchholz et al., 2021]
  • 36. Sequential2 imputation For conjugate priors, (marginal) particle filter representation of a proposal: π∗ (z|x) = π(z1|x1) n Y i=2 π(zi|x1:i, z1:i−1) with importance weight π(z|x) π∗(z|x) = π(x, z) m(x) m(x1) π(z1, x1) m(z1, x1, x2) π(z1, x1, z2, x2) · · · π(z1:n−1, x) π(z, x) = w(z, x) m(x) leading to unbiased estimator of evidence ^ Zk(x) = 1 T T X i=1 w(z(t) , x) [Long, Liu Wong, 1994; Carvalho et al., 2010]
  • 38. Common illustration 5.0 5.5 6.0 6.5 7.0 7.5 8.0 log(time) 10 8 6 4 2 0 2 log(MSE) ChibPartitions Bridge Sampling ChibPerm SIS SMC
  • 39. Empirical conclusions I Bridge sampling, arithmetic mean and original Chib’s method fail to scale with n, sample size I Partition Chib’s increasingly variable with k, number of components I Adaptive SMC ultimately fails I SIS remains most reliable method
  • 40. 1 Mixtures of distributions 2 Approximations to evidence 3 Distributed evidence evaluation 4 Dirichlet process mixtures
  • 42. Divide Conquer 1. data y divided into S batches y1, . . . , yS with π(ϑ|y) ∝ p(y|ϑ)π(ϑ) = S Y s=1 p(ys|ϑ)π(ϑ)1/S = S Y s=1 p(ys|ϑ)π̃(ϑ) ∝ S Y s=1 π̃(ϑ|ys) 2. infer with π̃(ϑ|ys), sub-posterior distributions, in parallel by MCMC 3. recombine all sub-posterior samples [Buchholz et al., 2022]
  • 43. Connecting bits While m(y) = Z S Y s=1 p(ys|ϑ)π̃(ϑ)dϑ 6= S Y s=1 Z p(ys|ϑ)π̃(ϑ)dϑ = S Y s=1 m̃(ys) they can be connected as m(y) = ZS S Y s=1 m̃(ys) Z S Y s=1 π̃(ϑ|ys)dϑ [Buchholz et al., 2022]
  • 44. Connecting bits m(y) = ZS S Y s=1 m̃(ys) Z S Y s=1 π̃(ϑ|ys)dϑ where π̃(ϑ|ys) ∝ p(ys|ϑ)π̃(ϑ), m̃(ys) = Z p(ys|ϑ)π̃(ϑ)dϑ, Z = Z π(ϑ)1/S dϑ [Buchholz et al., 2022]
  • 45. Label unswitching worries While Z usually closed-form, I = Z S Y s=1 π̃(ϑ|ys)dϑ is not and need be evaluated as ^ I = 1 T T X t=1 Z S Y s=1 π̃(ϑ|z (t) s , ys)dϑ when π̃(ϑ|ys) = Z π̃(ϑ|zs, ys)π̃(zs|ys)dzs
  • 46. Label unswitching worries π̃(ϑ|ys) = Z π̃(ϑ|zs, ys)π̃(zs|ys)dzs Issue: with distributed computing, shards zs are unrelated and corresponding clusters disconnected.
  • 47. Label unswitching worries π̃(ϑ|ys) = Z π̃(ϑ|zs, ys)π̃(zs|ys)dzs Issue: with distributed computing, shards zs are unrelated and corresponding clusters disconnected. 4500 4400 4300 4200 4100 4000 3900 3800 logm(y) K=1 K=2 K=3 K=4 K=5
  • 48. Label switching imposition Returning to averaging across permutations, identity ^ Iperm = 1 TK!S−1 T X t=1 X σ2,...,σS∈SK Z π̃(ϑ|z (t) 1 , y1) S Y s=2 π̃(ϑ|σs(z (t) s ), ys)dϑ
  • 49. Label switching imposition Returning to averaging across permutations, identity ^ Iperm = 1 TK!S−1 T X t=1 X σ2,...,σS∈SK Z π̃(ϑ|z (t) 1 , y1) S Y s=2 π̃(ϑ|σs(z (t) s ), ys)dϑ I Iperm 4500 4400 4300 4200 4100 4000 3900 3800 logm(y) K=1 I Iperm K=2 I Iperm K=3
  • 50. Label switching imposition Returning to averaging across permutations, identity ^ Iperm = 1 TK!S−1 T X t=1 X σ2,...,σS∈SK Z π̃(ϑ|z (t) 1 , y1) S Y s=2 π̃(ϑ|σs(z (t) s ), ys)dϑ Obtained at heavy computational cost: O(T) for ^ I versus O(TK!S−1) for ^ Iperm
  • 51. Importance sampling version Obtained at heavy computational cost: O(T) for ^ I versus O(TK!S−1) for ^ Iperm Avoid enumeration of permutations by using simulated values of parameter for the reference sub-posterior as anchors towards coherent labeling of clusters [Celeux, 1998; Stephens, 2000]
  • 52. Importance sampling version For each batch s = 2, . . . , S, define matching matrix Ps =    ps11 · · · ps1K . . . . . . . . . psK1 · · · psKK    where pslk = Y i:zsi=l p(ysi|ϑk) used in creating proposals qs(σ) ∝ K Y k=1 pskσ(k) that reflect probabilities that each cluster k of batch s is well-matched with cluster σ(k) of batch 1
  • 53. Importance sampling version Considerably reduced computational cost compared to ^ m^ Iperm (y) I At each iteration t, total cost of O(Kn/S) for evaluating Ps I computing K! weights of discrete importance distribution qσs requires K! operations I sampling from the global discrete importance distribution requires M(t) basic operations Global cost of O(T(Kn/S + K! + M̄)) for M̄ maximum number of importance simulations
  • 54. Importance sampling version Resulting estimator ^ IIS = 1 TK!S−1 T X t=1 1 M(t) M(t) X m=1 χ(z(t); σ (t,m) 2 , . . . , σ (t,m) S ) πσ(σ (t,m) 2 , . . . , σ (t,m) S ) where χ(z(t) ; σ2, . . . , σS) := Z π̃(ϑ|z (t) 1 , y1) S Y s=2 π̃(ϑ|σs(z (t) s ), ys)dϑ
  • 55. Importance sampling version Resulting estimator ^ IIS = 1 TK!S−1 T X t=1 1 M(t) M(t) X m=1 χ(z(t); σ (t,m) 2 , . . . , σ (t,m) S ) πσ(σ (t,m) 2 , . . . , σ (t,m) S ) I Iperm IIS 1520 1500 1480 1460 1440 1420 logm(y) K=1 I Iperm IIS K=2 I Iperm IIS K=3 I Iperm IIS K=4
  • 56. Sequential importance sampling Define π̃s(ϑ) = Qs l=1 π̃(ϑ|yl) Zs where Zs = Z s Y l=1 π̃(ϑ|yl) then m(y) = ZS × m(y1) × S Y s=2 Z πs−1(ϑ)p(ys|ϑ)π̃(ϑ)dϑ
  • 57. Sequential importance sampling Calls for standard sequential importance sampling strategy making use of the successive distributions πs(ϑ) as importance distributions
  • 58. Sequential importance sampling Calls for standard sequential importance sampling strategy making use of the successive distributions πs(ϑ) as importance distributions I Iperm IIS SMC 1575 1550 1525 1500 1475 1450 1425 logm(y) K=1 I Iperm IIS SMC K=2 I Iperm IIS SMC K=3 I Iperm IIS SMC K=4 I Iperm IIS SMC K=5
  • 59. 1 Mixtures of distributions 2 Approximations to evidence 3 Distributed evidence evaluation 4 Dirichlet process mixtures
  • 60. Dirichlet process mixture (DPM) Extension to the k = ∞ (non-parametric) case xi|zi, ϑ i.i.d ∼ f(xi|ϑxi ), i = 1, . . . , n (1) P(Zi = k) = πk, k = 1, 2, . . . π1, π2, . . . ∼ GEM(M) M ∼ π(M) ϑ1, ϑ2, . . . i.i.d ∼ G0 with GEM (Griffith-Engen-McCloskey) defined by the stick-breaking representation πk = vk k−1 Y i=1 (1 − vi) vi ∼ Beta(1, M) [Sethuraman, 1994]
  • 61. Dirichlet process mixture (DPM) Resulting in an infinite mixture x ∼ n Y i=1 ∞ X i=1 πif(xi|ϑi) with (prior) cluster allocation π(z|M) = Γ(M) Γ(M + n) MK+ K+ Y j=1 Γ(nj) and conditional likelihood p(x|z, M) = K+ Y j=1 Z Y i:zi=j f(xi|ϑj)dG0(ϑj) available in closed form when G0 conjugate
  • 62. Approximating the evidence Extension of Chib’s formula by marginalising over z and ϑ mDP(x) = p(x|M∗, G0)π(M∗) π(M∗|x) and using estimate ^ π(M∗ |x) = 1 T T X t=1 π(M∗ |x, η(t) , K (t) + ) provided prior on M a Γ(a, b) distribution since M|x, η, K+ ∼ ωΓ(a+K+, b−log(η))+(1−ω)Γ(a+K+−1, b−log(η)) with ω = (a + K+ − 1)/{n(b − log(η)) + a + K+ − 1} and η|x, M ∼ Beta(M + 1, n) [Basu Chib, 2003]
  • 63. Approximating the likelihood Intractable likelihood p(x|M∗, G0) approximated by sequential inputation importance sampling Generating z from the proposal π∗ (z|x, M) = n Y i=1 π(zi|x1:i, z1:i−1, M) and using the approximation ^ L(x|M∗ , G0) = 1 T T X t=1 ^ p(x1|z (t) 1 , G0) n Y i=2 p(yi|x1:i−1z (t) 1:i−1, G0) [Kong, Lu Wong, 1994; Basu Chib, 2003]
  • 64. Approximating the evidence (bis) Reverse logistic regression applies to DPM: Importance function π1(z, M) := π∗ (z|x, M)π(M) and π2(z, M) = π(z, M|x) m(y) {z(1,j), M(1,j)}T j=1 and {z(2,j), M(2,j)}T j=1 samples from π1 and π2 Marginal likelihood m(y) estimated as intercept of logistic regression with covariate log{π1(z, M)/π̃2(z, M)} on merged sample [Geyer, 1994; Chen Shao, 1997]
  • 66. Galactic illustration 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 log(time) 3 2 1 0 1 log(MSE) Chib RLR-SIS RLR-Prior
  • 67. Towards new adventures Exumas, Bahamas, 10 August 2011