Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Non-informative reparametrisations for location-scale mixtures
Kaniav Kamary1, Kate Lee2, Christian P. Robert1,3
1CEREMADE...
Upcoming SlideShare
Loading in …5
×

Non-informative reparametrisation for location-scale mixtures

699 views

Published on

poster given at MCMski V, Lenzerheide, Switzerland, Jan. 05, 2016

Published in: Science
  • Be the first to comment

Non-informative reparametrisation for location-scale mixtures

  1. 1. Non-informative reparametrisations for location-scale mixtures Kaniav Kamary1, Kate Lee2, Christian P. Robert1,3 1CEREMADE, Université Paris–Dauphine, Paris 2Auckland University of Technology, New Zealand 3Dept. of Statistics, University of Warwick, and CREST, Paris Introduction Traditional definition of mixture density: f(x θ,p) = k ∑ i=1 pif(x θi) k ∑ i=1 pi = 1. (1) which gives a separate meaning to each component. For the location-scale Gaussian mixture: f(x θ,p) = k ∑ i=1 piN(x µi,σi) Mengersen and Robert (1996) [2] established that an improper prior on (µ1,σ1) leads to a proper prior when µi = µi−1 + σi−1δi and σi = τiσi−1,τi < 1. Diebolt and Robert (1994) [3] discussed the alternative approach of imposing proper posteriors on improper priors by banning almost empty components from the likelihood function. Setting global mean and variance Eθ,p(X) = µ and varθ,p(X) = σ2, imposes natural constraints on the component parameters; µ = k ∑ i=1 piµi; σ2 = k ∑ i=1 piµ2 i + k ∑ i=1 piσ2 i − µ2 ; Eθ,p(X2 ) = k ∑ i=1 piµ2 i + k ∑ i=1 piσ2 i which implies that (µ1,...,µk,σ1,...,σk) belongs to a specific ellipse. New reparametrisation: Modifying the parameterization of the location-scale mixture in terms of the global mean and variance of the mixture distribution. Writing f(x θ,p) = k ∑ i=1 pif(x µ + σγi/ √ pi,σηi/ √ pi), (2) leads a parameter space such that (p1,...,pk,γ1,...,γk,η1,...,ηk) is constrained by pi,ηi ≥ 0 (1 ≤ i ≤ k) k ∑ i=1 pi = 1 k ∑ i=1 √ piγi = 0 k ∑ i=1 {η2 i + γ2 i } = 1. which implies ∀i 0 ≤ pi ≤ 1, 0 ≤ γi ≤ 1, 0 ≤ ηi ≤ 1. The constraints lead that (γ1,...,η) belongs to an hypersphere of R2k centered at the origin with the radius of r = 1 intersected with an hyperplane of this space passing the origin that results in a circle centered at the origin with radius 1. Spherical coordinate representation of γ’s: Suppose that ∑ k i=1 γ2 i = ϕ2. The vector γ belongs both to the hypersphere of radius ϕ and to the hyperplane orthogonal to √ pi;i = 1,...,k. s-th orthogonal base Λs: ̃Λ1,j = ⎧⎪⎪ ⎨ ⎪⎪⎩ − √ p2, j = 1 √ p1, j = 2 0, j > 2 s-th vector is given by ̃Λs,j = ⎧⎪⎪⎪⎪⎪⎪ ⎨ ⎪⎪⎪⎪⎪⎪⎩ −(pjps+1)1/2/(∑ s l=1 pl) 1/2 , s > 1, j ≤ s (∑ s l=1 pl) 1/2 , s > 1, j = s + 1 0, s > 1, j > s + 1 and s-th orthonormal base is Fs = Λs/Λs . Figure: Image from Robert Osserman. (γ1,...,γk) can be written as (γ1,...,γk) = ϕcos( 1)F1 + ϕsin( 1)cos( 2)F2 + ... + ϕsin( 1)⋯sin( k−2)Fk−1 with the angles 1,..., k−3 in [0,π] and k−2 in [0,2π]. Foundational consequences: The restriction is compact and helpful in selecting improper and non-informative priors over mixtures. Prior modeling: Global mean and variance: The posterior distribution associated with the prior π(µ,σ) = 1/σ is proper when (a) proper distributions are used on the other parameters and (b) there are at least two observations in the sample. Component weights: (p1,...,pk) ∼ Dir(α0,...,α0), Angles ’s: 1,..., k−3 ∼ U[0,π] and k ∼ U[0,2π], Raduis ϕ and η1,...,ηk: If k is small, (ϕ2,η2 1,...,η2 k) ∼ Dir(α,...,α) while for k more than 3, (η1,...,ηk) is written through spherical coordinates ηi = ⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪ ⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ √ 1 − ϕ2 cos(ξi), i = 1 √ 1 − ϕ2 i−1 ∏ j=1 sin(ξj)cos(ξi), 1 < i < k √ 1 − ϕ2 i−1 ∏ j=1 sin(ξj), i = k Unlike , the support for all angles ξ1,⋯,ξk−1 is limited to [0,π/2], due to the positivity requirement on the ηi’s. (ξ1,⋯,ξk−1) ∼ U([0,π/2]k−1 ). MCMC algorithm Metropolis-within-Gibbs algorithm for reparameterised mixture model: 1 Generate initial values (µ(0),σ(0),p(0),ϕ(0),ξ (0) 1 ,...,ξ (0) k−1, (0) 1 ,..., (0) k−2). 2 For t = 1,...,T, the update of (µ(t),σ(t),p(t),ϕ(t),ξ (t) 1 ,...,ξ (t) k−1, (t) 1 ,..., (t) k−2) follows; 2.1 Generate a proposal µ′ ∼ N(µ(t−1),εµ) and update µ(t) against π(⋅ x,σ(t−1) ,p(t−1) ,ϕ(t−1) ,ξ(t−1) , (t−1) ). 2.2 Generate a proposal log(σ)′ ∼ N(log(σ(t−1)),εσ) and update σ(t) against π(⋅ x,µ(t) ,p(t−1) ,ϕ(t−1) ,ξ(t−1) , (t−1) ). 2.3 Generate a proposal (ϕ2)′ ∼ Beta((ϕ2)(t)εϕ + 1,(1 − (ϕ2)(t))εϕ + 1) and update ϕ(t) against π(⋅ x,µ(t) ,σ(t) ,p(t−1) ,ξ(t) , (t) ). 2.4 Generate a proposal p′ ∼ Dir(p (t−1) 1 εp + 1,...,p (t−1) k εp + 1), and update p(t) against π(⋅ x,µ(t) ,σ(t) ,ϕ(t) ,ξ(t) , (t) ). 2.5 Generate proposals ξ′ i ∼ U[ξ (t) i − εξ,ξ (t) i + εξ], i = 1,⋯,k − 1, and update (ξ (t) 1 ,...,ξ (t) k−1) against π(⋅ x,µ(t) ,σ(t) ,p(t) ,ϕ(t) , (t) ). 2.6 Generate proposals ′ i ∼ U[ (t) i − ε , (t) i + ε ], i = 1,⋯,k − 2, and update ( (t) 1 ,..., (t) k−2) against π(⋅ x,µ(t) ,σ(t) ,p(t) ,ϕ(t) ,ξ(t) ). where p(t) = (p (t) 1 ,...,p (t) k ), x = (x1,...,xn), ξ(t) = (ξ (t) 1 ,...,ξ (t) k−1) and (t) = ( (t) 1 ,..., (t) k−2). Ultimixt package ▸ Implementation of the Metropolis-within-Gibbs algorithm for reparametrized mixture distribution; ▸ Calibrate the scales of the various proposals by aiming an average acceptance rate of either 0.44 or 0.234 depending on the dimension of the simulated parameter; ▸ Accurately estimate the component parameters; Point estimator of the component parameters in the case of label switching: ▸ K-means clustering algorithm; ▸ Reordering labels towards producing the shortest distance between the current posterior sample and the (or a) maximum posterior probability (MAP) estimate; [1]. Mixture of two normal distributions A sample of size 50 simulated from .65N(−8,2) + .35N(−.5,1), Figure: Empirical densities of 10 sequences of running Metropolis-within-Gibbs algorithm in parallel with 2e + 05 iterations. ▸ Outcomes of 10 parallel chains started randomly from different starting values, are indistinguishable; ▸ Chains are well-mixed; ▸ Sampler output covers the entire sample space; ▸ Estimated densities converge to a neighborhood of the true values; ▸ Estimated mixture density is remarkably smooth; Mixture of three normal distributions A sample of size 50 is simulated from model .27N(−4.5,1) + .4N(10,1) + .33N(3,1) Figure: Sequences of µi,σi and pi and estimated mixture density; mixture density estimate based on 104 MCMC iterations Overfitting case Extreme valued posterior samples for an overfitted model. Galaxy dataset: Point estimator of the parameters of a mixture of (Left) 6 components; (Right) 4 components. References [1] S. Früwirth. Schnatter. (2001). Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J. American Statist. Assoc., 96 194–209. [2] K. Mengersen and C. Robert. (1996) Testing for mixtures: A Bayesian entropic approach (with discussion). In Bayesian Statistics 5 (J. Berger, J. Bernardo, A. Dawid, D. Lindley and A. Smith, eds). Oxford University Press, Oxford, 255–276. [3] J. Diebolt and C. Robert. (1994) Estimation of finite mixture distributions by Bayesian sampling. J. Royal Statist. Society Series B, 56 363–375. kamary@ceremade.dauphine.fr

×