Testing as a mixture estimation problem
Christian P. Robert
Universit´e Paris-Dauphine and University of Warwick
Joint work with K. Kamary, K. Mengersen, and J. Rousseau
Outline
Introduction
Testing problems as estimating mixture
models
Computational mixture estimation
Illustrations
Asymptotic consistency
Conclusion
Introduction
Hypothesis testing
central problem of statistical inference
dramatically differentiating feature between classical and
Bayesian paradigms
wide open to controversy and divergent opinions, even within
the Bayesian community
non-informative Bayesian testing case mostly unresolved,
witness the Jeffreys–Lindley paradox
[Berger (2003), Mayo & Cox (2006), Gelman (2008)]
Testing hypotheses
Bayesian model selection as comparison of k potential
statistical models towards the selection of model that fits the
data “best”
mostly accepted perspective: it does not primarily seek to
identify which model is “true”, but rather to indicate which
fits data better
tools like Bayes factor naturally include a penalisation
addressing model complexity, mimicked by Bayes Information
(BIC) and Deviance Information (DIC) criteria
posterior predictive tools successfully advocated in Gelman et
al. (2013) even though they imply multiple uses of data
Bayesian modelling
Standard Bayesian approach to testing: consider two families of
models, one for each of the hypotheses under comparison,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
and associate with each model a prior distribution,
θ1 ∼ π1(θ1) and θ2 ∼ π2(θ2) ,
Bayesian modelling
Standard Bayesian approach to testing: consider two families of
models, one for each of the hypotheses under comparison,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
in order to compare the marginal likelihoods
m1(x) =
Θ1
f1(x|θ1) π1(θ1) dθ1 and m2(x) =
Θ2
f2(x|θ2) π1(θ2) dθ2
Bayesian modelling
Standard Bayesian approach to testing: consider two families of
models, one for each of the hypotheses under comparison,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
either through Bayes factor or posterior probability, respectively:
B12 =
m1(x)
m2(x)
, P(M1|x) =
ω1m1(x)
ω1m1(x) + ω2m2(x)
;
the latter depends on the prior weights ωi
Bayesian decision
Bayesian decision step
comparing Bayes factor B12 with threshold value of one or
comparing posterior probability P(M1|x) with bound
When comparing more than two models, model with highest
posterior probability is the one selected, but highly dependent on
the prior modelling.
Bayesian decision
Bayesian decision step
comparing Bayes factor B12 with threshold value of one or
comparing posterior probability P(M1|x) with bound
When comparing more than two models, model with highest
posterior probability is the one selected, but highly dependent on
the prior modelling.
Some difficulties
tension between using posterior probabilities justified by a
binary loss function but depending on unnatural prior weights,
and using Bayes factors that eliminate this dependence but
escape direct connection with posterior distribution, unless
prior weights are integrated within the loss
subsequent and delicate interpretation (or calibration) of the
strength of the Bayes factor towards supporting a given
hypothesis or model, because it is not a Bayesian decision rule
similar difficulty with posterior probabilities, with tendency to
interpret them as p-values (rather than the opposite) when
they only report through a marginal likelihood ratio the
respective strengths of fitting the data to both models
Some more difficulties
long-lasting impact of the prior modeling, meaning the choice
of the prior distributions on the parameter spaces of both
models under comparison, despite overall consistency proof for
Bayes factor
discontinuity in use of improper priors since they are not
justified in most testing situations, leading to many alternative
if ad hoc solutions, where data is either used twice or split in
artificial ways
binary (accept vs. reject) outcome more suited for immediate
decision (if any) than for model evaluation, in connection with
rudimentary loss function
Some additional difficulties
related impossibility to ascertain simultaneous misfit or to
detect presence of outliers;
lack of assessment of uncertainty associated with decision
itself
difficult computation of marginal likelihoods in most settings
with further controversies about which computational solution
to adopt
strong dependence of posterior probabilities on conditioning
statistics, which in turn undermines their validity for model
assessment, as exhibited in ABC
temptation to create pseudo-frequentist equivalents such as
q-values with even less Bayesian justifications
Paradigm shift
New proposal of a paradigm shift in Bayesian processing of
hypothesis testing and of model selection
convergent and naturally interpretable solution
more extended use of improper priors
Simple representation of the testing problem as a two-component
mixture estimation problem where the weights are formally equal
to 0 or 1
Paradigm shift
New proposal of a paradigm shift in Bayesian processing of
hypothesis testing and of model selection
convergent and naturally interpretable solution
more extended use of improper priors
Simple representation of the testing problem as a two-component
mixture estimation problem where the weights are formally equal
to 0 or 1
Paradigm shift
Simple representation of the testing problem as a two-component
mixture estimation problem where the weights are formally equal
to 0 or 1
Approach inspired from consistency result of Rousseau and
Mengersen (2011) on estimated overfitting mixtures
Mixture representation not directly equivalent to the use of a
posterior probability
Potential of a better approach to testing, while not expanding
number of parameters
Calibration of the posterior distribution of the weight of a
model, while moving from the artificial notion of the posterior
probability of a model
Encompassing mixture model
Idea: Given two statistical models,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
embed both within an encompassing mixture
Mα : x ∼ αf1(x|θ1) + (1 − α)f2(x|θ2) , 0 α 1 (1)
Note: Both models correspond to special cases of (1), one for
α = 1 and one for α = 0
Draw inference on mixture representation (1), as if each
observation was individually and independently produced by the
mixture model
Encompassing mixture model
Idea: Given two statistical models,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
embed both within an encompassing mixture
Mα : x ∼ αf1(x|θ1) + (1 − α)f2(x|θ2) , 0 α 1 (1)
Note: Both models correspond to special cases of (1), one for
α = 1 and one for α = 0
Draw inference on mixture representation (1), as if each
observation was individually and independently produced by the
mixture model
Encompassing mixture model
Idea: Given two statistical models,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
embed both within an encompassing mixture
Mα : x ∼ αf1(x|θ1) + (1 − α)f2(x|θ2) , 0 α 1 (1)
Note: Both models correspond to special cases of (1), one for
α = 1 and one for α = 0
Draw inference on mixture representation (1), as if each
observation was individually and independently produced by the
mixture model
Inferential motivations
Sounds like an approximation to the real model, but several
definitive advantages to this paradigm shift:
Bayes estimate of the weight α replaces posterior probability
of model M1, equally convergent indicator of which model is
“true”, while avoiding artificial prior probabilities on model
indices, ω1 and ω2
interpretation of estimator of α at least as natural as handling
the posterior probability, while avoiding zero-one loss setting
α and its posterior distribution provide measure of proximity
to the models, while being interpretable as data propensity to
stand within one model
further allows for alternative perspectives on testing and
model choice, like predictive tools, cross-validation, and
information indices like WAIC
Computational motivations
avoids highly problematic computations of the marginal
likelihoods, since standard algorithms are available for
Bayesian mixture estimation
straightforward extension to a finite collection of models, with
a larger number of components, which considers all models at
once and eliminates least likely models by simulation
eliminates difficulty of “label switching” that plagues both
Bayesian estimation and Bayesian computation, since
components are no longer exchangeable
posterior distribution of α evaluates more thoroughly strength
of support for a given model than the single figure outcome of
a posterior probability
variability of posterior distribution on α allows for a more
thorough assessment of the strength of this support
Noninformative motivations
additional feature missing from traditional Bayesian answers:
a mixture model acknowledges possibility that, for a finite
dataset, both models or none could be acceptable
standard (proper and informative) prior modeling can be
reproduced in this setting, but non-informative (improper)
priors also are manageable therein, provided both models first
reparameterised towards shared parameters, e.g. location and
scale parameters
in special case when all parameters are common
Mα : x ∼ αf1(x|θ) + (1 − α)f2(x|θ) , 0 α 1
if θ is a location parameter, a flat prior π(θ) ∝ 1 is available
Weakly informative motivations
using the same parameters or some identical parameters on
both components highlights that opposition between the two
components is not an issue of enjoying different parameters
those common parameters are nuisance parameters, to be
integrated out
prior model weights ωi rarely discussed in classical Bayesian
approach, even though linear impact on posterior probabilities.
Here, prior modeling only involves selecting a prior on α, e.g.,
α ∼ B(a0, a0)
while a0 impacts posterior on α, it always leads to mass
accumulation near 1 or 0, i.e. favours most likely model over
the other
sensitivity analysis straightforward to carry
approach easily calibrated by parametric boostrap providing
reference posterior of α under each model
natural Metropolis–Hastings alternative
Computational issues
Specificities of mixture estimation in such settings.
While likelihood is a regular mixture likelihood, weight α close
to one of the boundaries
Gibbs completion bound to be inefficient when sample size
grows, more than usual!
Gibbs sampler
Consider sample x = (x1, x2, . . . , xn) from (1).
Completion by latent component indicators ζi leads to completed
likelihood
(θ, α1, α2 | x, ζ) =
n
i=1
αζi
f (xi | θζi
)
= αn1
(1 − α)n2
n
i=1
f (xi | θζi
) ,
where
(n1, n2) =
n
i=1
Iζi =1,
n
i=1
Iζi =2
under constraint
n =
2
j=1
n
i=1
Iζi =j
.
[Diebolt & Robert, 1990]
Local concentration
Gibbs sampling scheme valid from a theoretical point of view but
faces convergence difficulties, due to prior concentration on
boundaries of (0, 1) for the mixture weight α
As sample size n grows, sample of the α’s shows less and less
switches between the vicinity of zero and the vicinity of one
Local concentration
Gibbs sequences (αt ) on the first component weight for the mixture model
αN(µ, 1) + (1 − α)N(0, 1) for a N(0, 1) sample of size N = 5, 10, 50, 100, 500,
103
(from top to bottom) based on 105
simulations. The y-range range for all
series is (0, 1).
MCMC alternative
Simple Metropolis-Hastings algorithm where
the model parameters θi generated from respective full
posteriors of both models (i.e., based on entire sample) and
mixture weight α generated from a random walk proposal on
(0, 1)
Rare occurrence for mixtures to allow for independent proposals
MCMC alternative
Metropolis–Hastings sequences (αt ) on the first component weight for the
mixture model αN(µ, 1) + (1 − α)N(0, 1) for a N(0, 1) sample of size N = 5,
10, 50, 100, 500, 103
(from top to bottom) based on 105
simulations.The
y-range for all series is (0, 1).
Illustrations
Introduction
Testing problems as estimating mixture
models
Computational mixture estimation
Illustrations
Poisson/Geometric
Normal-normal comparison
Normal-normal comparison #2
Logistic vs. Probit
Asymptotic consistency
Conclusion
Poisson/Geometric
choice betwen Poisson P(λ) and Geometric Geo(p)
distribution
mixture with common parameter λ
Mα : αP(λ) + (1 − α)Geo(1/1+λ)
Allows for Jeffreys prior since resulting posterior is proper
independent Metropolis–within–Gibbs with proposal
distribution on λ equal to Poisson posterior (with acceptance
rate larger than 75%)
Poisson/Geometric
choice betwen Poisson P(λ) and Geometric Geo(p)
distribution
mixture with common parameter λ
Mα : αP(λ) + (1 − α)Geo(1/1+λ)
Allows for Jeffreys prior since resulting posterior is proper
independent Metropolis–within–Gibbs with proposal
distribution on λ equal to Poisson posterior (with acceptance
rate larger than 75%)
Poisson/Geometric
choice betwen Poisson P(λ) and Geometric Geo(p)
distribution
mixture with common parameter λ
Mα : αP(λ) + (1 − α)Geo(1/1+λ)
Allows for Jeffreys prior since resulting posterior is proper
independent Metropolis–within–Gibbs with proposal
distribution on λ equal to Poisson posterior (with acceptance
rate larger than 75%)
Beta prior
When α ∼ Be(a0, a0) prior, full conditional posterior
α ∼ Be(n1(ζ) + a0, n2(ζ) + a0)
Exact Bayes factor opposing Poisson and Geometric
B12 = nn¯xn
n
i=1
xi ! Γ n + 2 +
n
i=1
xi Γ(n + 2)
although undefined from a purely mathematical viewpoint
Parameter estimation
1e-04 0.001 0.01 0.1 0.2 0.3 0.4 0.5
3.903.954.004.054.10
Posterior means of λ and medians of α for 100 Poisson P(4) datasets of size
n = 1000, for a0 = .0001, .001, .01, .1, .2, .3, .4, .5. Each posterior approximation
is based on 104
Metropolis-Hastings iterations.
Parameter estimation
1e-04 0.001 0.01 0.1 0.2 0.3 0.4 0.5
0.9900.9920.9940.9960.9981.000
Posterior means of λ and medians of α for 100 Poisson P(4) datasets of size
n = 1000, for a0 = .0001, .001, .01, .1, .2, .3, .4, .5. Each posterior approximation
is based on 104
Metropolis-Hastings iterations.
MCMC convergence
Dataset from a Poisson distribution P(4): Estimations of (top) λ and (bottom)
α via MH for 5 samples of size n = 5, 50, 100, 500, 10, 000.
Consistency
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
a0=.1
log(sample size)
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
a0=.3
log(sample size)
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
a0=.5
log(sample size)
Posterior means (sky-blue) and medians (grey-dotted) of α, over 100 Poisson
P(4) datasets for sample sizes from 1 to 1000.
Behaviour of Bayes factor
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
log(sample size)
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
log(sample size)
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
log(sample size)
Comparison between P(M1|x) (red dotted area) and posterior medians of α
(grey zone) for 100 Poisson P(4) datasets with sample sizes n between 1 and
1000, for a0 = .001, .1, .5
Normal-normal comparison
comparison of a normal N(θ1, 1) with a normal N(θ2, 2)
distribution
mixture with identical location parameter θ
αN(θ, 1) + (1 − α)N(θ, 2)
Jeffreys prior π(θ) = 1 can be used, since posterior is proper
Reference (improper) Bayes factor
B12 = 2
n−1/2
exp 1/4
n
i=1
(xi − ¯x)2
,
Normal-normal comparison
comparison of a normal N(θ1, 1) with a normal N(θ2, 2)
distribution
mixture with identical location parameter θ
αN(θ, 1) + (1 − α)N(θ, 2)
Jeffreys prior π(θ) = 1 can be used, since posterior is proper
Reference (improper) Bayes factor
B12 = 2
n−1/2
exp 1/4
n
i=1
(xi − ¯x)2
,
Normal-normal comparison
comparison of a normal N(θ1, 1) with a normal N(θ2, 2)
distribution
mixture with identical location parameter θ
αN(θ, 1) + (1 − α)N(θ, 2)
Jeffreys prior π(θ) = 1 can be used, since posterior is proper
Reference (improper) Bayes factor
B12 = 2
n−1/2
exp 1/4
n
i=1
(xi − ¯x)2
,
Consistency
0.1 0.2 0.3 0.4 0.5 p(M1|x)
0.00.20.40.60.81.0
n=15
0.00.20.40.60.81.0
0.1 0.2 0.3 0.4 0.5 p(M1|x)
0.650.700.750.800.850.900.951.00
n=50
0.650.700.750.800.850.900.951.00
0.1 0.2 0.3 0.4 0.5 p(M1|x)
0.750.800.850.900.951.00
n=100
0.750.800.850.900.951.00
0.1 0.2 0.3 0.4 0.5 p(M1|x)
0.930.940.950.960.970.980.991.00
n=500
0.930.940.950.960.970.980.991.00
Posterior means (wheat) and medians of α (dark wheat), compared with
posterior probabilities of M0 (gray) for a N(0, 1) sample, derived from 100
datasets for sample sizes equal to 15, 50, 100, 500. Each posterior
approximation is based on 104
MCMC iterations.
Comparison with posterior probability
0 100 200 300 400 500
-50-40-30-20-100
a0=.1
sample size
0 100 200 300 400 500
-50-40-30-20-100
a0=.3
sample size
0 100 200 300 400 500
-50-40-30-20-100
a0=.4
sample size
0 100 200 300 400 500
-50-40-30-20-100
a0=.5
sample size
Plots of ranges of log(n) log(1 − E[α|x]) (gray color) and log(1 − p(M1|x)) (red
dotted) over 100 N(0, 1) samples as sample size n grows from 1 to 500. and α
is the weight of N(0, 1) in the mixture model. The shaded areas indicate the
range of the estimations and each plot is based on a Beta prior with
a0 = .1, .2, .3, .4, .5, 1 and each posterior approximation is based on 104
iterations.
Comments
convergence to one boundary value as sample size n grows
impact of hyperarameter a0 slowly vanishes as n increases, but
present for moderate sample sizes
when simulated sample is neither from N(θ1, 1) nor from
N(θ2, 2), behaviour of posterior varies, depending on which
distribution is closest
More consistency
q
q
q
q
qq
q
qqq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qqq
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
qq
qq
q
q
qq
q
q
q
q
q
q
q
qq
qq
q
qqq
q
q
q
q
q
qq
qq
q
q
qq
q
qq
q
qqqq
q
qq
q
q
q
q
qqq
q
q
q
q
q
qq
qq
qq
qq
q
qqqqqqq
qq
q
qq
qqqq
qqqqqq
q
qq
qqqqq
qqqq
qqq
q
qq
q
q
q
q
qqq
qq
q
q
qq
q
q
qqqq
qq
q
q
qqq
qqq
q
qq
q
q
q
qq
q
q
q
qqq
q
q
qqq
q
qq
q
qq
q
q
qqq
q
q
qq
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qqqq
q
q
q
q
q
q
qqq
q
q
q
q
qq
qqq
q
qq
q
qqq
q
q
qq
q
qq
q
qqqq
qqqqq
q
q
qqqq
qq
qq
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qqqq
q
qqq
qqqqqqqqqqqq
qqqqq
qq
qq
qq
qqq
q
qqqqq
q
qqqq
q
qq
qq
q
q
q
qqq
q
q
qq
q
q
q
q
qq
q
q
q
qq
q
q
qq
q
q
qq
qq
qq
q
qq
qqq
qq
q
q
qq
qqqq
q
q
q
q
q
qq
q
qqq
qqq
q
q
q
qq
q
q
qq
qqqq
q
q
qq
qqqq
q
qqqqq
q
qqqqqqq
qqqq
q
q
q
q
qq
q
q
q
q
qqq
qq
q
q
qq
q
qq
q
q
q
q
q
qq
q
qq
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
qqqqqq
q
q
q
q
qq
q
qq
qqqqq
qqq
q
q
qq
qq
qqqqq
qqqqq
qq
qqqqqqq
q
qqqqq
qqq
qq
q
q
q
q
q
qq
q
qq
q
q
q
q
qq
qq
q
q
qq
qq
q
q
qq
q
q
q
q
qqq
q
q
q
q
q
q
q
q
qqqq
q
qq
q
q
q
qq
q
qq
q
qq
qqqq
q
qqqq
q
qq
q
qq
qq
q
qq
q
q
q
qq
qqqqqqq
qqq
q
q
q
q
q
qq
q
q
q
q
qq
q
qqqq
qqqqq
qqqqqq
q
q
qq
q
q
q
q
qq
qq
q
q
q
qq
qqq
q
q
q
qqqqqqqq
q
qqqqqq
q
qqqqqqqqqqqqqqqqq
q
qq
qqqqqq
q
qqq
qqqqq
q
qqqq
qqq
qq
qqqqqq
q
qq
q
qqqqq
qqqqq
q
q
q
q
q
q
qq
qq
q
qqqq
qq
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
qq
q
qq
q
qqq
qq
qqqqq
qq
q
qqq
qqqq
q
q
q
qq
qq
q
qq
qq
qqqq
qq
q
q
q
qq
qq
qq
q
q
qqqqqqqqqq
qq
qqqq
q
qqq
qqqq
qq
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
qq
q
q
qqqqqqq
q
qq
q
q
qq
q
q
q
q
q
qq
q
q
q
q
qq
q
q
qqq
q
qq
qq
q
qq
q
q
q
qq
qq
qqq
qqq
q
qq
qq
q
q
q
q
q
q
qq
qq
qq
qq
q
q
q
qq
qq
q
q
qq
q
q
qqqqq
q
qqqqqqqqqq
q
q
q
q
qqq
q
qqq
q
q
q
qqqqq
q
q
q
qq
qqq
q
q
qq
q
q
qqqq
q
q
qq
qqqq
q
q
q
q
qqqq
qq
q
qqqqqqqqqq
qq
qqqq
q
qq
q
qqq
q
qqqqqqq
q
qqq
q
q
qq
qq
qqqq
qq
q
qqq
q
q
q
qqqq
qqqqq
q
qqqqq
qqq
qqqqq
qq
qq
qqqq
q
qqqqqqqqqqqq
q
q
q
q
qqq
qqqq
q
q
qq
q
q
qq
qqq
q
q
q
qq
q
q
qq
q
qqqq
q
q
q
q
q
qqq
qq
qqq
q
q
q
qqqq
qqq
qqq
q
qq
q
qqqqqqqqq
qqqq
q
qq
q
qq
q
qqqq
q
q
qq
qq
q
qqqqq
q
q
qq
qqq
qq
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qqq
qqq
qqq
qq
q
qq
q
q
q
q
qqqqqq
q
qq
q
q
q
q
q
qqq
qqq
q
q
q
qq
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
qq
qq
qq
q
q
q
q
q
q
q
q
q
q
qqq
q
q
qqq
q
qqq
qqq
q
q
qq
q
q
q
qq
qq
q
q
qq
q
q
qqqq
qqqq
qq
qqq
qq
q
q
q
qq
qq
qqqq
qqqq
qq
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qqqq
qq
q
qq
q
qqq
q
qq
q
q
q
q
qq
q
q
qq
q
q
q
q
qqq
qq
q
qqq
q
q
qq
qq
q
q
q
qq
qqqq
qqqq
qqqqq
q
q
qq
q
qqq
q
q
q
qq
qqqq
q
qqq
qqq
q
qq
qqqq
q
q
q
q
q
q
qqq
qq
q
q
q
qq
q
qq
q
q
qq
q
q
q
q
q
qqq
q
q
qq
q
q
qq
q
q
q
q
q
q
qqq
q
q
q
q
qq
q
qqqqqq
q
q
q
qq
qq
q
qq
qqqqq
q
q
q
qq
qq
qq
qqqqqq
qq
q
q
q
q
qq
q
qq
q
qq
q
q
q
q
qqqq
qqq
q
qq
qqq
q
qqq
qqq
q
q
q
q
q
q
qqq
qq
q
q
q
qqq
q
q
q
q
qq
qqq
qqqq
q
qq
q
q
qqqqqqqqq
qqqqqqq
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
qq
q
qqq
qqqqq
q
q
q
q
q
q
q
qq
q
q
q
qq
q
qq
q
qqq
q
q
q
q
qqq
qq
qq
q
q
q
q
q
q
qq
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
qq
qq
q
qqq
q
q
q
q
q
q
q
qqqq
q
qq
qq
q
q
q
q
qqq
qq
q
q
q
qq
qqq
q
q
qq
qq
q
q
q
q
q
q
qq
q
qq
qq
q
qqqq
q
q
qq
qq
q
q
qq
q
q
q
q
qq
q
qqq
q
q
q
qqqq
qqqq
q
qq
q
qqqqqq
q
q
q
q
qq
q
q
q
q
q
q
qqq
q
q
q
q
q
qq
q
q
qq
qq
q
qq
q
q
q
q
qqq
q
q
q
q
q
q
q
q
qqqqqqq
qq
q
q
qq
q
q
q
q
q
q
q
qq
qqq
qq
q
q
q
q
qqq
q
q
q
qqq
qq
q
q
q
q
q
q
q
q
qq
qqq
qqq
q
q
q
q
qq
q
q
q
q
qq
qq
q
q
q
qq
q
q
qqq
q
q
q
q
q
qqqq
qq
q
q
q
qq
q
qqq
qqqqq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqq
qqqq
q
qq
qq
qq
q
qq
qqq
qq
q
q
qqq
qq
qq
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
qq
qq
qqq
qqq
q
qq
q
q
q
qqq
qqq
q
qq
q
qq
q
qqqq
qqqq
qq
qq
q
q
qq
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
q
qqq
q
q
q
q
q
q
q
qq
q
q
qqq
qqqq
q
qq
qq
qq
q
qq
qq
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
qq
qqq
q
q
q
q
qqqq
qqq
q
qq
q
q
q
qqq
qq
q
q
q
q
q
q
q
q
qq
q
qq
qqqq
qq
q
q
qqqqqq
q
q
q
qq
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
qqq
qqq
qqq
q
q
qq
q
qq
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
qq
q
q
q
q
q
q
qq
qq
q
q
qq
qq
qq
q
q
q
q
qq
qqq
q
q
qq
q
q
qqqqq
qq
q
q
qqqq
q
q
q
q
q
q
qqq
q
qq
q
q
q
q
qq
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
qq
q
q
q
qq
q
qq
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
qq
q
qq
q
qq
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qqq
qq
qq
q
q
q
q
qq
q
q
q
q
qq
q
q
qqq
q
q
q
q
q
qq
q
q
q
qq
qqqq
qqqq
q
q
q
qq
q
q
qq
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
qqqq
qq
q
q
q
q
q
qq
qq
qqq
q
q
qq
qq
q
q
q
q
qq
q
qq
qqq
q
q
q
qq
qq
q
q
qq
qq
q
qqq
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
qqqqq
q
q
qq
qq
qqq
q
q
qq
qq
q
qqqq
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
qq
q
qq
q
q
q
q
q
q
q
qq
qqq
qq
qqqq
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
qq
qq
qqqq
q
qq
q
qqqq
qq
q
q
qq
q
q
qq
q
q
q
q
q
qqq
qq
q
q
qq
qq
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qqqq
q
q
q
q
qq
q
qq
q
q
q
q
q
q
qq
qq
qqq
q
q
q
q
q
qq
q
q
q
qq
qq
q
q
q
q
qq
qqq
q
q
q
qq
qq
qq
q
q
q
q
qq
qq
q
q
q
qq
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
qq
q
qqqq
q
q
q
q
q
qq
q
q
q
q
qqq
q
q
q
qq
q
q
q
qqqq
qqq
q
q
q
q
q
q
q
q
q
qq
qq
q
q
qq
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
qq
qqq
q
qqqqqq
q
q
q
q
q
q
q
q
q
q
qq
q
qqq
q
q
qq
q
q
q
q
q
qq
qqq
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
qq
qqq
q
q
qq
q
q
qqqq
q
q
qqq
q
q
q
qqq
q
q
q
q
q
q
qqq
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
qq
qq
q
qq
q
q
q
q
q
qqq
q
q
qq
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
qq
qqq
qq
qq
q
qqq
q
q
q
q
q
q
q
q
qqqqq
q
q
q
q
q
qq
q
qqq
q
qq
q
q
q
q
q
q
q
qqqq
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qqq
q
qqq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
qq
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
qq
q
q
q
q
q
q
qq
q
qq
q
qq
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqqq
qq
q
q
q
q
q
q
qqq
q
q
qq
q
qq
qq
qq
q
q
q
qq
q
q
qqq
q
qqqq
q
q
q
qqq
qqqqq
q
q
q
qq
q
q
q
q
q
q
q
q
qqq
qqq
q
q
qqq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qqq
q
q
qq
qq
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
qq
qq
q
qqq
q
qq
q
q
q
q
q
q
qqqq
q
q
qq
q
q
qq
qq
q
q
qq
q
q
q
q
qq
q
q
qq
qq
q
q
q
q
qq
qqq
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
qq
q
q
qq
qq
q
qq
q
q
q
qq
q
q
qq
q
q
q
qqq
q
qqq
q
q
qqq
qq
q
q
q
q
qqq
qqqqqqqq
q
qqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
qq
q
q
q
q
q
q
q
q
qqq
q
q
q
q
qq
qq
q
q
q
qq
q
q
q
q
qqq
qqq
q
qq
q
q
q
q
qqq
qq
q
q
qq
qq
q
q
q
qqq
q
q
q
q
qqq
q
q
q
q
q
q
qq
qqq
q
q
qq
qq
q
qq
qq
q
q
q
q
q
qq
q
q
qq
qqq
q
q
qq
q
q
qq
qqq
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
qq
q
q
qqq
q
q
q
q
q
q
q
q
qq
q
q
qqq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
qqqq
qqqq
qq
q
qq
q
q
qq
q
qqqq
qqqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
qq
qq
q
qq
qqq
q
q
q
qq
q
q
q
qqqq
qq
q
q
q
q
qq
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
qq
q
qqq
qqq
q
qq
qq
q
q
q
qqqq
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqq
qq
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
qqq
q
q
q
q
q
q
qq
q
qq
q
q
qq
q
qq
q
q
q
qqqqq
qqq
q
q
q
q
q
q
q
q
qqqq
q
qqq
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
qq
qqq
qq
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
qqq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qqq
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqq
q
q
qqqq
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qqq
q
qq
q
q
q
q
q
qq
qq
q
qq
qq
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
qqq
q
q
qq
q
qqq
q
q
q
qq
q
q
q
q
q
q
qqq
q
q
q
q
q
q
qq
qq
q
qq
q
q
q
q
qq
q
q
qqq
qq
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qqqq
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
q
q
qqq
q
q
q
q
q
q
q
q
qq
q
qqq
qq
q
qq
q
qqq
q
q
qq
q
q
q
q
qq
q
q
qq
q
q
qq
qq
q
qq
qq
qq
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
qqq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
qqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
qq
qq
qq
qqqq
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
qqq
q
q
qq
q
q
q
qq
q
qq
qqq
q
q
q
q
qq
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
qq
q
q
q
q
q
q
q
qqqqqqqqqq
q
q
qq
.1 .2 .3 .4 .5 1
0.000.050.100.150.200.250.300.35
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
qq
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qqq
q
q
q
q
q
q
qq
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qq
qq
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
qq
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqqq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
qqqq
q
qq
q
qq
q
qq
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
qq
q
q
q
q
qqq
qqqq
qq
q
q
q
q
qqq
q
q
q
qqq
q
q
q
qqq
q
q
qq
q
q
q
q
q
q
q
q
q
qq
qqq
q
q
qqqq
q
q
q
q
q
q
qqq
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
qqq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
q
q
qq
qqq
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
qqqq
q
qqq
q
q
qqq
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
qqq
q
qq
qq
q
q
q
q
q
q
q
q
qq
qqq
q
q
qq
q
q
qqq
q
q
qqq
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
qq
q
qq
q
qqqq
q
q
qqq
q
q
qq
qq
qq
q
q
q
q
q
q
q
q
qqq
q
q
q
q
qq
qq
q
qq
q
qq
qqq
qqqq
qq
qq
q
q
qq
q
q
q
qq
qqq
q
qq
q
q
q
q
qq
q
qq
q
qq
q
q
qq
q
q
qqqq
qq
q
q
qq
q
qqq
qq
qq
q
q
qqqq
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
qq
qq
q
q
qqq
q
qq
q
q
qq
q
q
q
q
qq
qqq
q
qq
q
qqq
qq
q
q
q
qq
q
qqqqqqqq
q
qq
qqqqq
q
qq
q
qq
q
q
q
qq
qq
qqq
q
q
q
q
qqqq
q
q
q
q
qq
q
q
qq
q
q
qq
qq
qqqqq
q
q
q
q
qq
q
q
q
q
qq
q
q
q
qq
q
q
q
qqq
q
q
q
qq
q
q
qqqq
q
qq
qq
qq
q
q
qqq
q
qqqq
qq
q
q
q
q
q
q
q
q
qq
qqqq
q
qq
q
qqq
q
qq
q
qq
q
q
q
q
q
q
q
q
qqqqqq
q
q
q
qq
qqqq
qqq
qqq
q
qqqqq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
qqq
qq
q
qq
qqq
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
qq
q
q
qq
q
qqqqq
qqqq
qqqq
q
q
q
q
qq
qqqqq
q
q
q
q
qqq
q
qq
qq
q
qq
q
qqqqqq
q
qqq
qq
qqqqqqqqq
q
qq
qqqq
qq
q
q
q
q
q
q
qq
q
q
qqqq
qq
qqqq
qq
qq
q
q
q
q
q
qq
q
q
q
q
qqq
q
qq
q
qq
q
q
qq
q
q
q
q
qqq
qqq
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
qq
q
q
q
q
q
qq
q
q
q
q
qqqq
q
qq
qqqq
q
qq
q
q
qq
q
q
q
qq
q
q
qq
q
q
q
q
q
qq
q
qqq
qq
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qqq
qq
q
q
q
qq
q
qq
q
qq
q
q
q
q
q
q
q
qqq
qqq
qq
qq
q
qqqqq
q
q
q
qq
qqq
q
qq
q
q
q
q
qq
q
qqqqqqq
qqq
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
qqq
q
qq
q
q
q
q
q
q
qq
q
qq
q
q
q
q
qqq
qq
qqq
q
q
q
qqqq
q
qqqqq
q
q
qq
q
q
qqqq
q
qq
qq
q
q
q
q
q
q
qqqq
q
q
qqq
qqqq
q
qqq
q
q
q
qq
q
q
q
q
qq
qqqqqq
q
q
q
q
q
q
qq
qq
q
qq
q
q
qqq
q
q
q
qqqq
q
q
qqqqq
q
q
qq
qq
qqqqq
qq
q
q
q
q
q
qq
q
qqq
q
qqq
q
q
qq
q
q
q
q
qqq
q
qq
q
qqqqqqq
qq
qqqqq
q
qqqqq
q
qqqq
qqqq
qq
q
qqqq
q
qq
qqq
qqq
q
q
q
qqq
q
q
q
qqq
q
q
qq
q
q
q
qqq
qqq
qq
q
qqq
qqqqq
q
q
qq
q
q
q
qq
q
q
qq
q
qqq
qq
q
q
qq
qqqqqqqqq
q
qq
q
qqqqqq
q
q
qqqqqq
qq
q
q
q
qq
qq
q
q
qqqq
qq
qqq
q
q
q
qqqqqqqq
q
q
qq
q
q
qq
qq
qqqqqq
qqqq
q
q
qqqq
q
qqqqqq
q
qqqqqq
q
qqq
q
q
q
qqq
qq
q
qqqqqqq
qqq
qqqqq
qqq
q
q
qqq
q
q
q
q
q
q
q
q
qqq
q
q
q
q
qq
q
q
qq
q
q
qq
q
q
q
q
qq
q
qqqq
q
q
q
qqq
q
qq
q
qq
qqqqqqqqqqq
q
qqq
q
qq
q
qqqq
q
qqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqq
qq
q
qqqq
qqq
qqqqq
q
qqqqqqqq
qqqqqq
qqqqqqqqqqq
qqqqq
q
qqqqq
qqq
q
q
qqqq
qqqqq
q
qqq
q
qqq
qqqqqq
qqqqqq
q
qqqqqq
q
qqqqqqqqq
qqq
q
qqqq
q
q
q
qqqqqqqqqqqqqq
q
q
q
qqqqqqqqqqqqqq
q
q
qq
qqqqqqqq
qqqqqqq
q
qqqq
q
qqqqqqqqqqqqqqqq
qqqqqqq
q
q
qqqqq
qqqqqqq
qqqq
q
qq
q
q
qqqq
q
q
q
qqq
qqqqqq
qqqqq
qqqqq
q
qqq
qqqqqq
qqqqqqq
qq
qqqqqqq
qqq
qqqqqqqqq
qqq
qqqqqqqqqqqqqqqqq
q
qq
qqqqqqq
q
q
q
q
qq
.1 .2 .3 .4 .5 1
−100−80−60−40−200
(left) Posterior distributions of the mixture weight α and (right) of their
logarithmic transform log{α} under a Beta B(a0, a0) prior when a0 = .1, .2, .3,
.4, .5, 1 and for a normal N(0, 2) sample of 103
observations. The MCMC
outcome is based on 104
iterations.
Another normal-normal comparison
compare N(0, 1) against N(µ, 1) model, hence testing
whether or not µ = 0
embedded case: cannot use improper priors on µ and
substitute a µ ∼ N(0, 1) prior
inference on α contrasted between the N(0, 1) case and
outside the null distribution
Two dynamics
1/n .1 .2 .3 .4 .5 1
0.50.60.70.80.91.0
5 10 50 100 500 1000
0.00.20.40.60.81.0
Posterior distributions of α under a Beta B(a0, a0) prior (left) for
a0 = 1/n, .1, .2, .3, .4, .5, 1 with 102
N(1, 1) observations and (right) for a0 = .1
with n = 5, 10, 50, 100, 500, 103
N(1, 1) observations
Logistic or Probit?
binary dataset, R dataset about diabetes in 200 Pima Indian
women with body mass index as explanatory variable
comparison of logit and probit fits could be suitable. We are
thus comparing both fits via our method
M1 : yi | xi
, θ1 ∼ B(1, pi ) where pi =
exp(xi θ1)
1 + exp(xi θ1)
M2 : yi | xi
, θ2 ∼ B(1, qi ) where qi = Φ(xi
θ2)
Common parameterisation
Local reparameterisation strategy that rescales parameters of the
probit model M2 so that the MLE’s of both models coincide.
[Choudhuty et al., 2007]
Φ(xi
θ2) ≈
exp(kxi θ2)
1 + exp(kxi θ2)
and use best estimate of k to bring both parameters into coherency
(k0, k1) = (θ01/θ02, θ11/θ12) ,
reparameterise M1 and M2 as
M1 :yi | xi
, θ ∼ B(1, pi ) where pi =
exp(xi θ)
1 + exp(xi θ)
M2 :yi | xi
, θ ∼ B(1, qi ) where qi = Φ(xi
(κ−1
θ)) ,
with κ−1θ = (θ0/k0, θ1/k1).
Prior modelling
Under default g-prior
θ ∼ N2(0, n(XT
X)−1
)
full conditional posterior distributions given allocations
π(θ | y, X, ζ) ∝
exp i Iζi =1yi xi θ
i;ζi =1[1 + exp(xi θ)]
exp −θT
(XT
X)θ 2n
×
i;ζi =2
Φ(xi
(κ−1
θ))yi
(1 − Φ(xi
(κ−1
θ)))(1−yi )
hence posterior distribution clearly defined
Results
Logistic Probit
a0 α θ0 θ1
θ0
k0
θ1
k1
.1 .352 -4.06 .103 -2.51 .064
.2 .427 -4.03 .103 -2.49 .064
.3 .440 -4.02 .102 -2.49 .063
.4 .456 -4.01 .102 -2.48 .063
.5 .449 -4.05 .103 -2.51 .064
Histograms of posteriors of α in favour of logistic model where a0 = .1, .2, .3,
.4, .5 for (a) Pima dataset, (b) Data from logistic model, (c) Data from probit
model
Asymptotic consistency
Posterior consistency for mixture testing procedure under minor
conditions
Two different cases
the two models, M1 and M2, are well separated
model M1 is a submodel of M2.
Asymptotic consistency
Posterior consistency for mixture testing procedure under minor
conditions
Two different cases
the two models, M1 and M2, are well separated
model M1 is a submodel of M2.
Posterior concentration rate
Let π be the prior and xn = (x1, · · · , xn) a sample with true
density f ∗
proposition
Assume that, for all c > 0, there exist Θn ⊂ Θ1 × Θ2 and B > 0 such that
π [Θc
n] n−c
, Θn ⊂ { θ1 + θ2 nB
}
and that there exist H 0 and L, δ > 0 such that, for j = 1, 2,
sup
θ,θ ∈Θn
fj,θj
− fj,θj
1 LnH
θj − θj , θ = (θ1, θ2), θ = (θ1, θ2) ,
∀ θj − θ∗
j δ; KL(fj,θj
, fj,θ∗
j
) θj − θ∗
j .
Then, when f ∗
= fθ∗,α∗ , with α∗
∈ [0, 1], there exists M > 0 such that
π (α, θ); fθ,α − f ∗
1 > M log n/n|xn
= op(1) .
Separated models
Assumption: Models are separated, i.e. there is identifiability:
∀α, α ∈ [0, 1], ∀θj , θj , j = 1, 2 Pθ,α = Pθ ,α ⇒ α = α , θ = θ
Further
inf
θ1∈Θ1
inf
θ2∈Θ2
f1,θ1
− f2,θ2 1 > 0
and, for θ∗
j ∈ Θj , if Pθj
weakly converges to Pθ∗
j
, then θj converges
in the Euclidean topology to θ∗
j
Separated models
Assumption: Models are separated, i.e. there is identifiability:
∀α, α ∈ [0, 1], ∀θj , θj , j = 1, 2 Pθ,α = Pθ ,α ⇒ α = α , θ = θ
theorem
Under above assumptions, then for all > 0,
π [|α − α∗
| > |xn
] = op(1)
If θj → fj,θj
is C2 in a neighbourhood of θ∗
j , j = 1, 2,
f1,θ∗
1
− f2,θ∗
2
, f1,θ∗
1
, f2,θ∗
2
are linearly independent in y and there
exists δ > 0 such that
f1,θ∗
1
, f2,θ∗
2
, sup
|θ1−θ∗
1 |<δ
|D2
f1,θ1
|, sup
|θ2−θ∗
2 |<δ
|D2
f2,θ2
| ∈ L1
then
π |α − α∗
| > M log n/n xn
= op(1).
Separated models
Assumption: Models are separated, i.e. there is identifiability:
∀α, α ∈ [0, 1], ∀θj , θj , j = 1, 2 Pθ,α = Pθ ,α ⇒ α = α , θ = θ
Theorem allows for interpretation of α under the posterior: If data
xn is generated from model M1 then posterior on α concentrates
around α = 1
Embedded case
Here M1 is a submodel of M2, i.e.
θ2 = (θ1, ψ) and θ2 = (θ1, ψ0 = 0)
corresponds to f2,θ2 ∈ M1
Same posterior concentration rate
log n/n
for estimating α when α∗ ∈ (0, 1) and ψ∗ = 0.
Null case
Case where ψ∗ = 0, i.e., f ∗ is in model M1
Two possible paths to approximate f ∗: either α goes to 1 (path 1)
or ψ goes to 0 (path 2)
New identifiability condition: Pθ,α = P∗ only if
α = 1, θ1 = θ∗
1, θ2 = (θ∗
1, ψ) or α 1, θ1 = θ∗
1, θ2 = (θ∗
1, 0)
Prior
π(α, θ) = πα(α)π1(θ1)πψ(ψ), θ2 = (θ1, ψ)
with common (prior on) θ1
Null case
Case where ψ∗ = 0, i.e., f ∗ is in model M1
Two possible paths to approximate f ∗: either α goes to 1 (path 1)
or ψ goes to 0 (path 2)
New identifiability condition: Pθ,α = P∗ only if
α = 1, θ1 = θ∗
1, θ2 = (θ∗
1, ψ) or α 1, θ1 = θ∗
1, θ2 = (θ∗
1, 0)
Prior
π(α, θ) = πα(α)π1(θ1)πψ(ψ), θ2 = (θ1, ψ)
with common (prior on) θ1
Assumptions
[B1] Regularity: Assume that θ1 → f1,θ1 and θ2 → f2,θ2 are 3
times continuously differentiable and that
F∗
¯f 3
1,θ∗
1
f 3
1,θ∗
1
< +∞, ¯f1,θ∗
1
= sup
|θ1−θ∗
1 |<δ
f1,θ1
, f 1,θ∗
1
= inf
|θ1−θ∗
1 |<δ
f1,θ1
F∗
sup|θ1−θ∗
1 |<δ | f1,θ∗
1
|3
f 3
1,θ∗
1
< +∞, F∗
| f1,θ∗
1
|4
f 4
1,θ∗
1
< +∞,
F∗
sup|θ1−θ∗
1 |<δ |D2
f1,θ∗
1
|2
f 2
1,θ∗
1
< +∞, F∗
sup|θ1−θ∗
1 |<δ |D3
f1,θ∗
1
|
f 1,θ∗
1
< +∞
Assumptions
[B2] Integrability: There exists S0 ⊂ S ∩ {|ψ| > δ0}, for some
positive δ0 and satisfying Leb(S0) > 0, and such that for all
ψ ∈ S0,
F∗
sup|θ1−θ∗
1 |<δ f2,θ1,ψ
f 4
1,θ∗
1
< +∞, F∗
sup|θ1−θ∗
1 |<δ f 3
2,θ1,ψ
f 3
1,θ1∗
< +∞,
Assumptions
[B3] Stronger identifiability : Set
f2,θ∗
1,ψ∗ (x) = θ1 f2,θ∗
1,ψ∗ (x)T
, ψf2,θ∗
1,ψ∗ (x)T T
.
Then for all ψ ∈ S with ψ = 0, if η0 ∈ R, η1 ∈ Rd1
η0(f1,θ∗
1
− f2,θ∗
1,ψ) + ηT
1 θ1 f1,θ∗
1
= 0 ⇔ η1 = 0, η2 = 0
Consistency
theorem
Given the mixture fθ1,ψ,α = αf1,θ1 + (1 − α)f2,θ1,ψ and a sample
xn = (x1, · · · , xn) issued from f1,θ∗
1
, under assumptions B1 − B3
are satisfied, and an M > 0 such that
π (α, θ); fθ,α − f ∗
1 > M log n/n|xn
= op(1).
If α ∼ B(a1, a2), with a2 < d2, and if the prior πθ1,ψ is absolutely
continuous with positive and continuous density at (θ∗
1, 0), then for
all Mn going to ∞
π |α − α∗
| > Mn(log n)γ
/
√
n|xn
= op(1), γ = max((d1 + a2)/(d2 − a2), 1)/2,
Conclusion
many applications of the Bayesian paradigm concentrate on
the comparison of scientific theories and on testing of null
hypotheses
natural tendency to default to Bayes factors
poorly understood sensitivity to prior modeling and posterior
calibration
Time is ripe for a paradigm shift
Conclusion
Time is ripe for a paradigm shift
original testing problem replaced with a better controlled
estimation target
allow for posterior variability over the component frequency as
opposed to deterministic Bayes factors
range of acceptance, rejection and indecision conclusions
easily calibrated by simulation
posterior medians quickly settling near the boundary values of
0 and 1
potential derivation of a Bayesian b-value by looking at the
posterior area under the tail of the distribution of the weight
Prior modelling
Partly common parameterisation always feasible and hence
allows for reference priors
removal of the absolute prohibition of improper priors in
hypothesis testing
prior on the weight α shows sensitivity that naturally vanishes
as the sample size increases
default value of a0 = 0.5 in the Beta prior
Computing aspects
proposal that does not induce additional computational strain
when algorithmic solutions exist for both models, they can be
recycled towards estimating the encompassing mixture
easier than in standard mixture problems due to common
parameters that allow for original MCMC samplers to be
turned into proposals
Gibbs sampling completions useful for assessing potential
outliers but not essential to achieve a conclusion about the
overall problem

testing as a mixture estimation problem