Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this document? Why not share!

699 views

Published on

poster given at MCMski V, Lenzerheide, Switzerland, Jan. 05, 2016

Published in:
Science

No Downloads

Total views

699

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

11

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Non-informative reparametrisations for location-scale mixtures Kaniav Kamary1, Kate Lee2, Christian P. Robert1,3 1CEREMADE, Université Paris–Dauphine, Paris 2Auckland University of Technology, New Zealand 3Dept. of Statistics, University of Warwick, and CREST, Paris Introduction Traditional deﬁnition of mixture density: f(x θ,p) = k ∑ i=1 pif(x θi) k ∑ i=1 pi = 1. (1) which gives a separate meaning to each component. For the location-scale Gaussian mixture: f(x θ,p) = k ∑ i=1 piN(x µi,σi) Mengersen and Robert (1996) [2] established that an improper prior on (µ1,σ1) leads to a proper prior when µi = µi−1 + σi−1δi and σi = τiσi−1,τi < 1. Diebolt and Robert (1994) [3] discussed the alternative approach of imposing proper posteriors on improper priors by banning almost empty components from the likelihood function. Setting global mean and variance Eθ,p(X) = µ and varθ,p(X) = σ2, imposes natural constraints on the component parameters; µ = k ∑ i=1 piµi; σ2 = k ∑ i=1 piµ2 i + k ∑ i=1 piσ2 i − µ2 ; Eθ,p(X2 ) = k ∑ i=1 piµ2 i + k ∑ i=1 piσ2 i which implies that (µ1,...,µk,σ1,...,σk) belongs to a speciﬁc ellipse. New reparametrisation: Modifying the parameterization of the location-scale mixture in terms of the global mean and variance of the mixture distribution. Writing f(x θ,p) = k ∑ i=1 pif(x µ + σγi/ √ pi,σηi/ √ pi), (2) leads a parameter space such that (p1,...,pk,γ1,...,γk,η1,...,ηk) is constrained by pi,ηi ≥ 0 (1 ≤ i ≤ k) k ∑ i=1 pi = 1 k ∑ i=1 √ piγi = 0 k ∑ i=1 {η2 i + γ2 i } = 1. which implies ∀i 0 ≤ pi ≤ 1, 0 ≤ γi ≤ 1, 0 ≤ ηi ≤ 1. The constraints lead that (γ1,...,η) belongs to an hypersphere of R2k centered at the origin with the radius of r = 1 intersected with an hyperplane of this space passing the origin that results in a circle centered at the origin with radius 1. Spherical coordinate representation of γ’s: Suppose that ∑ k i=1 γ2 i = ϕ2. The vector γ belongs both to the hypersphere of radius ϕ and to the hyperplane orthogonal to √ pi;i = 1,...,k. s-th orthogonal base Λs: ̃Λ1,j = ⎧⎪⎪ ⎨ ⎪⎪⎩ − √ p2, j = 1 √ p1, j = 2 0, j > 2 s-th vector is given by ̃Λs,j = ⎧⎪⎪⎪⎪⎪⎪ ⎨ ⎪⎪⎪⎪⎪⎪⎩ −(pjps+1)1/2/(∑ s l=1 pl) 1/2 , s > 1, j ≤ s (∑ s l=1 pl) 1/2 , s > 1, j = s + 1 0, s > 1, j > s + 1 and s-th orthonormal base is Fs = Λs/Λs . Figure: Image from Robert Osserman. (γ1,...,γk) can be written as (γ1,...,γk) = ϕcos( 1)F1 + ϕsin( 1)cos( 2)F2 + ... + ϕsin( 1)⋯sin( k−2)Fk−1 with the angles 1,..., k−3 in [0,π] and k−2 in [0,2π]. Foundational consequences: The restriction is compact and helpful in selecting improper and non-informative priors over mixtures. Prior modeling: Global mean and variance: The posterior distribution associated with the prior π(µ,σ) = 1/σ is proper when (a) proper distributions are used on the other parameters and (b) there are at least two observations in the sample. Component weights: (p1,...,pk) ∼ Dir(α0,...,α0), Angles ’s: 1,..., k−3 ∼ U[0,π] and k ∼ U[0,2π], Raduis ϕ and η1,...,ηk: If k is small, (ϕ2,η2 1,...,η2 k) ∼ Dir(α,...,α) while for k more than 3, (η1,...,ηk) is written through spherical coordinates ηi = ⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪ ⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ √ 1 − ϕ2 cos(ξi), i = 1 √ 1 − ϕ2 i−1 ∏ j=1 sin(ξj)cos(ξi), 1 < i < k √ 1 − ϕ2 i−1 ∏ j=1 sin(ξj), i = k Unlike , the support for all angles ξ1,⋯,ξk−1 is limited to [0,π/2], due to the positivity requirement on the ηi’s. (ξ1,⋯,ξk−1) ∼ U([0,π/2]k−1 ). MCMC algorithm Metropolis-within-Gibbs algorithm for reparameterised mixture model: 1 Generate initial values (µ(0),σ(0),p(0),ϕ(0),ξ (0) 1 ,...,ξ (0) k−1, (0) 1 ,..., (0) k−2). 2 For t = 1,...,T, the update of (µ(t),σ(t),p(t),ϕ(t),ξ (t) 1 ,...,ξ (t) k−1, (t) 1 ,..., (t) k−2) follows; 2.1 Generate a proposal µ′ ∼ N(µ(t−1),εµ) and update µ(t) against π(⋅ x,σ(t−1) ,p(t−1) ,ϕ(t−1) ,ξ(t−1) , (t−1) ). 2.2 Generate a proposal log(σ)′ ∼ N(log(σ(t−1)),εσ) and update σ(t) against π(⋅ x,µ(t) ,p(t−1) ,ϕ(t−1) ,ξ(t−1) , (t−1) ). 2.3 Generate a proposal (ϕ2)′ ∼ Beta((ϕ2)(t)εϕ + 1,(1 − (ϕ2)(t))εϕ + 1) and update ϕ(t) against π(⋅ x,µ(t) ,σ(t) ,p(t−1) ,ξ(t) , (t) ). 2.4 Generate a proposal p′ ∼ Dir(p (t−1) 1 εp + 1,...,p (t−1) k εp + 1), and update p(t) against π(⋅ x,µ(t) ,σ(t) ,ϕ(t) ,ξ(t) , (t) ). 2.5 Generate proposals ξ′ i ∼ U[ξ (t) i − εξ,ξ (t) i + εξ], i = 1,⋯,k − 1, and update (ξ (t) 1 ,...,ξ (t) k−1) against π(⋅ x,µ(t) ,σ(t) ,p(t) ,ϕ(t) , (t) ). 2.6 Generate proposals ′ i ∼ U[ (t) i − ε , (t) i + ε ], i = 1,⋯,k − 2, and update ( (t) 1 ,..., (t) k−2) against π(⋅ x,µ(t) ,σ(t) ,p(t) ,ϕ(t) ,ξ(t) ). where p(t) = (p (t) 1 ,...,p (t) k ), x = (x1,...,xn), ξ(t) = (ξ (t) 1 ,...,ξ (t) k−1) and (t) = ( (t) 1 ,..., (t) k−2). Ultimixt package ▸ Implementation of the Metropolis-within-Gibbs algorithm for reparametrized mixture distribution; ▸ Calibrate the scales of the various proposals by aiming an average acceptance rate of either 0.44 or 0.234 depending on the dimension of the simulated parameter; ▸ Accurately estimate the component parameters; Point estimator of the component parameters in the case of label switching: ▸ K-means clustering algorithm; ▸ Reordering labels towards producing the shortest distance between the current posterior sample and the (or a) maximum posterior probability (MAP) estimate; [1]. Mixture of two normal distributions A sample of size 50 simulated from .65N(−8,2) + .35N(−.5,1), Figure: Empirical densities of 10 sequences of running Metropolis-within-Gibbs algorithm in parallel with 2e + 05 iterations. ▸ Outcomes of 10 parallel chains started randomly from different starting values, are indistinguishable; ▸ Chains are well-mixed; ▸ Sampler output covers the entire sample space; ▸ Estimated densities converge to a neighborhood of the true values; ▸ Estimated mixture density is remarkably smooth; Mixture of three normal distributions A sample of size 50 is simulated from model .27N(−4.5,1) + .4N(10,1) + .33N(3,1) Figure: Sequences of µi,σi and pi and estimated mixture density; mixture density estimate based on 104 MCMC iterations Overﬁtting case Extreme valued posterior samples for an overﬁtted model. Galaxy dataset: Point estimator of the parameters of a mixture of (Left) 6 components; (Right) 4 components. References [1] S. Früwirth. Schnatter. (2001). Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J. American Statist. Assoc., 96 194–209. [2] K. Mengersen and C. Robert. (1996) Testing for mixtures: A Bayesian entropic approach (with discussion). In Bayesian Statistics 5 (J. Berger, J. Bernardo, A. Dawid, D. Lindley and A. Smith, eds). Oxford University Press, Oxford, 255–276. [3] J. Diebolt and C. Robert. (1994) Estimation of ﬁnite mixture distributions by Bayesian sampling. J. Royal Statist. Society Series B, 56 363–375. kamary@ceremade.dauphine.fr

No public clipboards found for this slide

Be the first to comment