Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

0

Share

Download to read offline

BAYSM'14, Wien, Austria

Download to read offline

talk at BAYSM'14 on mixtures and some resutls and considerations on the topic, hopefully not too soporific for the neXt generation!

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

BAYSM'14, Wien, Austria

  1. 1. My life as a mixture Christian P. Robert Universite Paris-Dauphine, Paris University of Warwick, Coventry September 17, 2014 bayesianstatistics@gmail.com
  2. 2. Your next Valencia meeting: I Objective Bayes section of ISBA major meeting: I O-Bayes 2015 in Valencia, Spain, June 1-4(+1), 2015 I in memory of our friend Susie Bayarri I objective Bayes, limited information, partly de
  3. 3. ned and approximate models, tc I all avours of Bayesian analysis welcomed! I Spain in June, what else...?!
  4. 4. Outline Gibbs sampling weakly informative priors imperfect sampling SAME algorithm perfect sampling Bayes factor less informative prior no Bayes factor
  5. 5. birthdate: May 1989, Ottawa Civic Hospital Repartition of grey levels in an unprocessed chest radiograph [X, 1994]
  6. 6. Mixture models Structure of mixtures of distributions: x fj with probability pj , for j = 1, 2, . . . , k, with overall density p1f1(x) + + pk fk (x) . Usual case: parameterised components Xk i=1 pi f (xji ) Xn i=1 pi = 1 where weights pi 's are distinguished from other parameters
  7. 7. Mixture models Structure of mixtures of distributions: x fj with probability pj , for j = 1, 2, . . . , k, with overall density p1f1(x) + + pk fk (x) . Usual case: parameterised components Xk i=1 pi f (xji ) Xn i=1 pi = 1 where weights pi 's are distinguished from other parameters
  8. 8. Motivations I Dataset made of several latent/missing/unobserved groups/strata/subpopulations. Mixture structure due to the missing origin/allocation of each observation to a speci
  9. 9. c subpopulation/stratum. Inference on either the allocations (clustering) or on the parameters (i , pi ) or on the number of groups I Semiparametric perspective where mixtures are functional basis approximations of unknown distributions
  10. 10. Motivations I Dataset made of several latent/missing/unobserved groups/strata/subpopulations. Mixture structure due to the missing origin/allocation of each observation to a speci
  11. 11. c subpopulation/stratum. Inference on either the allocations (clustering) or on the parameters (i , pi ) or on the number of groups I Semiparametric perspective where mixtures are functional basis approximations of unknown distributions
  12. 12. License Dataset derived from [my] license plate image Grey levels concentrated on 256 values [later jittered] [Marin X, 2007]
  13. 13. Likelihood For a sample of independent random variables (x1, , xn), likelihood Yn i=1 fp1f1(xi ) + + pk fk (xi )g . Expanding this product involves kn elementary terms: prohibitive to compute in large samples. But likelihood still computable [pointwise] in O(kn) time.
  14. 14. Likelihood For a sample of independent random variables (x1, , xn), likelihood Yn i=1 fp1f1(xi ) + + pk fk (xi )g . Expanding this product involves kn elementary terms: prohibitive to compute in large samples. But likelihood still computable [pointwise] in O(kn) time.
  15. 15. Normal mean benchmark Normal mixture p N(1, 1) + (1 - p)N(2, 1) with only means unknown (2-D representation possible) Identi
  16. 16. ability Parameters 1 and 2 identi
  17. 17. able: 1 cannot be confused with 2 when p is dierent from 0.5. Presence of a spurious mode, understood by letting p go to 0.5
  18. 18. Normal mean benchmark Normal mixture p N(1, 1) + (1 - p)N(2, 1) with only means unknown (2-D representation possible) Identi
  19. 19. ability Parameters 1 and 2 identi
  20. 20. able: 1 cannot be confused with 2 when p is dierent from 0.5. Presence of a spurious mode, understood by letting p go to 0.5
  21. 21. Bayesian inference on mixtures For any prior (, p), posterior distribution of (, p) available up to a multiplicative constant (, pjx) / 2 4 Yn i=1 Xk j=1 pj f (xi jj ) 3 5 (, p) at a cost of order O(kn) B Diculty Despite this, derivation of posterior characteristics like posterior expectations only possible in an exponential time of order O(kn)!
  22. 22. Bayesian inference on mixtures For any prior (, p), posterior distribution of (, p) available up to a multiplicative constant (, pjx) / 2 4 Yn i=1 Xk j=1 pj f (xi jj ) 3 5 (, p) at a cost of order O(kn) B Diculty Despite this, derivation of posterior characteristics like posterior expectations only possible in an exponential time of order O(kn)!
  23. 23. Missing variable representation Associate to each xi a missing/latent variable zi that indicates its component: zi jp Mk (p1, . . . , pk ) and xi jzi , f (jzi ) . Completed likelihood `(, pjx, z) = Yn i=1 pzi f (xi jzi ) , and (, pjx, z) / Yn i=1 pzi f (xi jzi ) # (, p) where z = (z1, . . . , zn)
  24. 24. Missing variable representation Associate to each xi a missing/latent variable zi that indicates its component: zi jp Mk (p1, . . . , pk ) and xi jzi , f (jzi ) . Completed likelihood `(, pjx, z) = Yn i=1 pzi f (xi jzi ) , and (, pjx, z) / Yn i=1 pzi f (xi jzi ) # (, p) where z = (z1, . . . , zn)
  25. 25. Gibbs sampling for mixture models Take advantage of the missing data structure: Algorithm I Initialization: choose p(0) and (0) arbitrarily I Step t. For t = 1, . . . 1. Generate z(t) i (i = 1, . . . , n) from (j = 1, . . . , k) P z(t) i = j jp(t-1) j , (t-1) j , xi / p(t-1) j f xi j(t-1) j 2. Generate p(t) from (pjz(t)), 3. Generate (t) from (jz(t), x). [Brooks Gelman, 1990; Diebolt X, 1990, 1994; Escobar West, 1991]
  26. 26. Normal mean example (cont'd) Algorithm I Initialization. Choose (0) 1 and (0) 2 , I Step t. For t = 1, . . . 1. Generate z(t) i (i = 1, . . . , n) from P z(t) i = 1 = 1-P z(t) i = 2 / p exp - 1 2 xi - (t-1) 1 2 2. Compute n(t) j = Xn i=1 i =j and (sx Iz(t) j )(t) = Xn i=1 Iz(t) i =jxi 3. Generate (t) j (j = 1, 2) from N + (sx j )(t) + n(t) j , 1 + n(t) j ! .
  27. 27. Normal mean example (cont'd) −1 0 1 2 3 4 −1 0 1 2 m1 3 4 m2 (a) initialised at random [X Casella, 2009]
  28. 28. Normal mean example (cont'd) −1 0 1 2 3 4 −1 0 1 2 m1 3 4 m2 (a) initialised at random −2 0 2 4 −2 0 2 4 m1 m2 (b) initialised close to the lower mode [X Casella, 2009]
  29. 29. License Consider k = 3 components, a D3(1=2, 1=2, 1=2) prior for the weights, a N(x, ^2=3) prior on the means i and a Ga(10, ^2) prior on the precisions -2 i , where x and ^2 are the empirical mean and variance of License [Empirical Bayes] [Marin X, 2007]
  30. 30. Outline Gibbs sampling weakly informative priors imperfect sampling SAME algorithm perfect sampling Bayes factor less informative prior no Bayes factor
  31. 31. weakly informative priors I possible symmetric empirical Bayes priors p D( , . . . , ), i N(^, ^ !2i ), -2 i Ga(, ^) which can be replaced with hierarchical priors [Diebolt X, 1990; Richardson Green, 1997] I independent improper priors on j 's prohibited, thus standard at and Jereys priors impossible to use (except with the exclude-empty-component trick) [Diebolt X, 1990; Wasserman, 1999]
  32. 32. weakly informative priors I Reparameterization to compact set for use of uniform priors i

talk at BAYSM'14 on mixtures and some resutls and considerations on the topic, hopefully not too soporific for the neXt generation!

Views

Total views

2,422

On Slideshare

0

From embeds

0

Number of embeds

1,548

Actions

Downloads

35

Shares

0

Comments

0

Likes

0

×