Computation of the marginal likelihood:
             brief summary and method of power posteriors




                    ...
Outline
             Objectives
             Brief summary of current methods
                Monte Carlo direct
         ...
Objectives
    Marginal likelihood ("Prior Predictive", "Evidence")
              m ( y ) = ∫ f ( y | θ )π ( θ ) dθ
      ...
Methods/Monte Carlo, Harmonic Mean
                                                               1 G
             1) Dire...
Methods/Gelfand&Dey & Chib
     3) Generalized harmonic mean
     (Gelfand & Dey, 1994; Chen & Shao, 1997)
               ...
Chib(Cont.)



      4)Chib (1995)
      ln mSC ( y ) = ln f ( y | θ* ) + ln π ( θ* ) − ln π ( θ* | y )
         ˆ        ...
Chib via Gibbs

             If θ = ( θ1 , θ2 )
         π ( θ1 , θ 2 | y ) = π ( θ1 | y, θ 2 ) π ( θ 2 | y )
            ...
Bridge sampling




      5)Bridge sampling (Meng & Wong, 1996)


                       f ( y | θ)π (θ)
      ∫ α ( θ ) g...
Bridge sampling/cont.
             5)Bridge sampling (Meng & Wong, 1996)

                        ∫ α ( θ ) f ( y | θ ) π ...
Bridge sampling (cont.)
             5)Bridge sampling (Meng & Wong, 1996)

                     ∫ α ( θ ) f ( y | θ ) π (...
Nested sampling

  6)Nested sampling
  (Skilling, 2006; Murray et al, 2006; Chopin & Robert, 2010)
  m ( y ) = ∫ f ( y | θ...
Nested sampling/Cont.
 1) Draw N points θ1,i from prior, θ1 = Argmin i =1,.., N L (θ1,i ) set l1 = L (θ1 )
  2) Obtain N p...
Power Posteriors/basic principle

      Method due to Friel & Petit (2008)
      Lartillot & Philippe (2006) "Annealing-Me...
PP/key result
                        1
        log m ( y ) = ∫ Eθ|y ,t log f ( y | θ ) dt
                             ...
PP formula/proof as a special case of path sampling


 If p (θ | t ) = q (θ | t ) / z ( t ) où z ( t ) = ∫ q (θ | t ) dθ

...
PP/Example
                       yi | θ ~ iid N (θ ,1) , i = 1,.., N
                      θ ~ N ( µ ,τ 2 )
             ...
PP/Example/cont.




06/01/2011                 JLF/BigMC   17
KL distance Prior-Posterior
                                   π (θ | y )
KL (π ( θ | y ) , π ( θ ) ) = ∫ ln           π (...
PP/partial BF

    1)if π (θ ) improper ⇒ marginal f ( x ) also improper
     resulting in problems for defining BF
     2...
Fractional BF


      A fraction b of the likelihood is used to tune the prior
                                  b
       ...
PP & fractional BF
                                      b
       π ( θ, b ) ∝ f ( y | θ ) π ( θ )
                       ...
PP/algorithm
  MCMC with discretization of t on [ 0,1[
  t0 = 0 < t1 < ... < ti < ... < tn −1 < tn = 1
  ti = (i / n)c wit...
PP/Little toy example
                                                                yi

        0) yi | λi ~ id P ( λi x...
PP/Little toy example/cont.


       Ex / Pump data: Ex#2 in Winbugs, Carlin-Louis (p126)
        y = # failures of pumps ...
PP/Toy example in Openbugs




06/01/2011                 JLF/BigMC      25
PP/Toy example in Openbugs/Cont.




06/01/2011                JLF/BigMC      26
PP/Toy example in Openbugs/Cont.




06/01/2011                JLF/BigMC      27
PP/Toy example in Openbugs/Cont.




06/01/2011                JLF/BigMC      28
Sampling both θ & t
                      1
     log m ( y ) = ∫ log f ( y | θ ) π ( θ | y , t ) dt
                    ...
Example 1/ Pothoff&Roy’s data
             Growth measurements in 11 girls and 16 boys: Pothoff and Roy,1964; Little and R...
Model comparison on Pothoff’s data
             i: subscript for individual i = 1,.., I = 25 (11girls+16boys)
            ...
Model presentation:Hierarchical Bayes

             1st level:yij ~ id N (ηij , σ e2 ) with ηij = φi1 + φi 2 ( t j − 8 )
 ...
Results




06/01/2011   JLF/BigMC   33
Results/fractional priors (b=0 vs 0.125)




06/01/2011             JLF/BigMC              34
Example 2:Models of genetic differentiation


      2 level hierarchical model
      i =locus; j =(sub)population
      ai...
Ex2: Nicholson’s model

    Nicholson et al (2002) same as previously but
   1) α ij | xi ,λij ~ id N (π i , c jπ i (1 − π...
Results




06/01/2011       JLF/BigMC   37
Conclusion
             Derived from thermodynamical integration
             Link with « path sampling »
             Eas...
Some references
         Chen M, Shao Q, Ibrahim J (2000) Monte Carlo methods in Bayesian
         computation. Springer
 ...
Acknowledgements
             Nial Friel (U College, Dublin) for his interest in these
             applications and his u...
Upcoming SlideShare
Loading in …5
×

Computation of the marginal likelihood

2,240 views
2,162 views

Published on

First talk at BigMC seminar on 06/01/2010 (Institut Henri Poincaré, Paris), by Jean-Louis Foulley, INRA, on "Computation of the marginal likelihood".

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,240
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Computation of the marginal likelihood

  1. 1. Computation of the marginal likelihood: brief summary and method of power posteriors Jean-Louis Foulley jean-louis.foulley@jouy.inra.fr 06/01/2011 JLF/BigMC 1
  2. 2. Outline Objectives Brief summary of current methods Monte Carlo direct Harmonic mean Generalized harmonic mean Chib Bridge sampling Nested sampling Power Posteriors Relationship with fractional BF Algorithm Examples Conclusion 06/01/2011 JLF/BigMC 2
  3. 3. Objectives Marginal likelihood ("Prior Predictive", "Evidence") m ( y ) = ∫ f ( y | θ )π ( θ ) dθ Θ -Normalization constant of π * ( θ | y ) π * (θ) π (θ | y ) = where π * ( θ | y ) = f ( y | θ ) π ( θ ) m(y) -Component of the Bayes factor π ( M1 | y ) / π ( M 2 | y ) m1 ( y ) BF12 = = π ( M1 ) / π ( M 2 ) m2 ( y ) ∆Dm,12 = −2 ln BF12 = Dm,1 − Dm,2 Dm , j = −2 ln m j ( y ) : Marginal deviance Calibration: Jeffreys & Turing (Deciban: 10log10 BF) 06/01/2011 JLF/BigMC 3
  4. 4. Methods/Monte Carlo, Harmonic Mean 1 G 1) Direct Monte Carlo mMC ( y ) = ˆ ∑ G g =1 f y | θ( ) g ( ) θ( ) ,..., θ( ) : draws from π ( θ ) 1 g Converges (a. s) to m ( y ) but very inefficient Many samples outside regions ofhigh likelihood 2)Harmonic mean (Newton & Raftery, 1994) −1   1 G 1 mNR ( y ) =  ∑ g =1  θ( ) ,..., θ( ) : draws from π ( θ | y ) 1 g ˆ G  ( f y | θ( ) g )   A special case of WIS: ∑ j =1 f y | θ( J ( j) )w (θ( ) ) / ∑ j J j =1 ( )) w θ( j where w θ(( ) ) ∝ π ( θ ) / g ( θ ) for g ( θ ) ∝ f ( y | θ ) π ( θ ) j Converges (a.s) but very instable (infinite variance): to be absolutely avoided "Worst Monte Carlo Method Ever" Radford Neal (2010) Harmonic mean not really affected by change in prior while true marginal highly sensitive to prior 06/01/2011 JLF/BigMC 4
  5. 5. Methods/Gelfand&Dey & Chib 3) Generalized harmonic mean (Gelfand & Dey, 1994; Chen & Shao, 1997) −1  1 G mGD ( y ) =  ∑ g =1 ˆ ( ) g θ( ) g   G  ( ) ( ) f y | θ( ) π θ( ) g g   θ( ) ,..., θ( ) : draws from π ( θ | y ) 1 g g (.) as an approx of the posterior: pbs in large dimension 4)Chib's methods (1995) ln m ( y ) = ln f ( y | θ ) + ln π ( θ ) − ln π ( θ | y ) , ∀θ ln mSC ( y ) = ln f ( y | θ* ) + ln π ( θ* ) − ln π ( θ* | y ) ˆ ˆ π ( θ | y ) to be estimated & θ* = ML, MAP, E ( θ | y ) selected ˆ Simple & often effective 06/01/2011 JLF/BigMC 5
  6. 6. Chib(Cont.) 4)Chib (1995) ln mSC ( y ) = ln f ( y | θ* ) + ln π ( θ* ) − ln π ( θ* | y ) ˆ ˆ a) Gibbs & RaoBlackwellization (Chib,1995) b) Metropolis-Hastings (Chib & Jeliazkov, 2001) c) Kernel estimator (Chen, 1994) 06/01/2011 JLF/BigMC 6
  7. 7. Chib via Gibbs If θ = ( θ1 , θ2 ) π ( θ1 , θ 2 | y ) = π ( θ1 | y, θ 2 ) π ( θ 2 | y ) known estimated π ( θ 2 | y ) = ∫ π ( θ 2 | y , θ1 ) π ( θ1 | y ) dθ1 known MCMC draws "Estimation by Rao-Blackwellization" 1 G ˆ * 2 G * ( π ( θ | y ) = ∑ g =1 π θ 2 | y , θ1 (g) ) (g) θ1 : draws from π ( θ1 | y ) 06/01/2011 JLF/BigMC 7
  8. 8. Bridge sampling 5)Bridge sampling (Meng & Wong, 1996) f ( y | θ)π (θ) ∫ α ( θ ) g ( θ ) m ( y ) dθ =1 ∫ α ( θ ) g ( θ ) π ( θ | y ) dθ 06/01/2011 JLF/BigMC 8
  9. 9. Bridge sampling/cont. 5)Bridge sampling (Meng & Wong, 1996) ∫ α ( θ ) f ( y | θ ) π ( θ ) g ( θ ) dθ g (θ) m(y) = = E (α ( θ ) f ( y | θ ) π ( θ ) ) ∫ α ( θ ) g ( θ ) π ( θ | y ) dθ E ( ) (α ( θ ) g ( θ ) ) π θ|y α ( θ ) "bridge function" g ( θ ) = density to be calibated For α ( θ ) = 1/ g ( θ ) −1 ˆ −1  ( ) ( ) ( ) mBS 1 ( y ) = L ∑ l =1  f y | θ( ) π θ( ) / g θ( )  ( IS ) L l l l  For α ( θ ) = 1/ f ( y | θ ) π ( θ ) mBS 2 ( y ) = Gelfand-Dey (1994) ˆ 1/ 2 For α ( θ ) = 1/  f ( y | θ ) π ( θ ) g ( θ )    mBS 3 ( y ) = Lopes-West (2004) ˆ 1/ 2 mBS 3 ( y ) = ˆ −1  L l ( ) ( ) ( ) L ∑ l =1  f y | θ( ) π θ( ) / g θ( )  l l  1/ 2 M ∑ m =1 −1 M  ( ) ( ) ( )  g θ( m ) / f y | θ ( m ) π θ( m )   θ( ) : draws from g ( θ ) ; θ( ) : draws from π ( θ | y ) l m 06/01/2011 JLF/BigMC 9
  10. 10. Bridge sampling (cont.) 5)Bridge sampling (Meng & Wong, 1996) ∫ α ( θ ) f ( y | θ ) π ( θ ) g ( θ ) dθ = E ( ) (α ( θ ) f ( y | θ ) π ( θ ) ) g θ m (y) = ∫ α ( θ ) g ( θ ) π ( θ | y ) dθ E ( ) (α ( θ ) g ( θ ) ) π θ|y For α ( θ ) = 1/ f ( y | θ ) π ( θ ) g ( θ ) mBS 4 ( y ) = ˆ L−1 ∑ l =1 1/ g θ( )  L  l  ( ) (Lopes & West, 2004; Ando, 2010) 1/ 2 M ∑ m =1 1/ f y | θ π θ −1 M  ( m) ( ( m)   ) ( ) θ( ) : draws from g ( θ ) ; θ( ) : draws from π ( θ | y ) Odd (cf numerator) l m draws -1 For α ( θ ) ∝  sM π ( θ | y ) +sL g ( θ )  , optimum estim. wrt E(RMSE)   (Meng & Wong, 1996; Lopes & West, 2004; Fruhwirth-Schnatter,2004) L−1 ∑ l =1 L ˆ ( π t θ( l ) | y ) mBS 5) ( y ) = mBS) 5 ˆ ( t +1 ˆ (t ˆ ( ) sM π t θ ( ) | y + s L g θ ( ) l ( ) l ( )) g θ( m ∑ −1 M M m =1 sM π ( θ( ) | y ) + s g ( θ( ) ) ˆt m L m where π t ( θ | y ) = f ( y | θ ) π ( θ ) / mBS) 5 and mBS)5 = mBS 1 ou mBS 2 ˆ ˆ (t ˆ (0 ˆ ˆ sM = 1 − sL = M /( M + L) 06/01/2011 JLF/BigMC 10
  11. 11. Nested sampling 6)Nested sampling (Skilling, 2006; Murray et al, 2006; Chopin & Robert, 2010) m ( y ) = ∫ f ( y | θ) π ( θ) dθ = Eπ  L ( θ)    Z L( θ) Let x = ϕ −1 ( l ) = Pr  L ( θ) > l  be the survival function of rv L ( θ)   where l = ϕ( x) (upper tail) quantile function of L ( θ) so that x ~ U (0,1) 1 ˆ = ∑m ∆ l Then Z = ∫ ϕ ( x)dx area under curve l =ϕ ( x )  and Z 0   i =1 xi i with ∆xi = xi−1 − xi or ∆xi = ½ ( xi−1 − xi+1 ) if trapezoidal integration 06/01/2011 JLF/BigMC 11
  12. 12. Nested sampling/Cont. 1) Draw N points θ1,i from prior, θ1 = Argmin i =1,.., N L (θ1,i ) set l1 = L (θ1 ) 2) Obtain N points θ 2,i by repeating θ1,i except θ1 replaced by a draw from prior constrained by L (θ ) > l1 , record θ 2 = Argmin i =1,.., N L (θ 2,i ) and set l2 = L (θ 2 ) 3) Repeat 1 & 2 until a stopping rule (change in max of L ≤ ε ) Since xi = ϕ −1 ( li ) is unknown Set a) deterministic xi = exp(−i / N ) so that lnxi = E ( ln ϕ −1 ( li ) ) or b) random xi +1 = ti xi with x0 = 1, ti ~ Be ( N ,1) Main difficulty in sampling θ from the prior constrained by L ( θ ) > l ? See Chopin & Robert (2010) Extended Importance Sampling scheme Z = ∑ i =1 ∆ xi ϕi wi with π (θ ) L (θ ) = π (θ ) L (θ ) w (θ ) m 06/01/2011 JLF/BigMC 12
  13. 13. Power Posteriors/basic principle Method due to Friel & Petit (2008) Lartillot & Philippe (2006) "Annealing-Melting" t f ( y | θ) π (θ) Power Posterior defined as π ( θ | y , t ) = zt ( y ) where zt ( y ) = ∫ f ( y | θ ) π ( θ )dθ t and t ∈ ]0,1[ with t −1 equivalent to "physical temperature" t = 0 to 1: cooling down or "annealing"; t = 1 to 0 "melting" Notice the path sampling scheme (Gelman & Meng, 1998) π ( θ | y, 0 ) = π ( θ ) with z0 ( y ) = 1 π ( θ | y,1) = π ( θ | y ) with z1 ( y ) = m ( y ) 06/01/2011 JLF/BigMC 13
  14. 14. PP/key result 1 log m ( y ) = ∫ Eθ|y ,t log f ( y | θ ) dt   0 where θ | y , t has density: t f ( y | θ) π (θ) π ( θ | y, t ) = zt ( y ) Thermodynamic integration (end of the 70's) Ripley (1988),Ogata (1989), Neal (1993) "Path sampling" (Gelman & Meng, 1998) 06/01/2011 JLF/BigMC 14
  15. 15. PP formula/proof as a special case of path sampling If p (θ | t ) = q (θ | t ) / z ( t ) où z ( t ) = ∫ q (θ | t ) dθ Let label U (θ , t ) = ln q (θ | t ) as the potential d dt z (1) 1 One has ln = ∫ Eθ |t U (θ , t ) dt   z ( 0) 0 Here p (θ | t ) = π ( θ | y, t ) ; q (θ | t ) =  f ( y | θ )  π ( θ ) t   Then U (θ , t ) = ln f ( y | θ ) 06/01/2011 JLF/BigMC 15
  16. 16. PP/Example yi | θ ~ iid N (θ ,1) , i = 1,.., N θ ~ N ( µ ,τ 2 ) Alors θ | y, t ~ N ( µt ,τ t2 ) Nty + µτ −2 1 µt = ; τ t2 = Nt + τ −2 Nt + τ −2 −2 Eθ |y ,t log f ( y | θ )  =   Dt (θ )  (µ − y ) + 1  2 N log 2π + log s 2 +   ( µτ 2t + 1) Nt + τ  2 −2   y = N −1 ∑ i =1 yi ; s 2 = N −1 ∑ i =1 ( yi − y ) N N 2 D0 (θ ) = N Cte + ( µ − y )  + Nτ 2 2   High sensitivity to τ 2 (τ 2 → ∞, D0 (θ ) → ∞) 06/01/2011 JLF/BigMC 16
  17. 17. PP/Example/cont. 06/01/2011 JLF/BigMC 17
  18. 18. KL distance Prior-Posterior π (θ | y ) KL (π ( θ | y ) , π ( θ ) ) = ∫ ln π ( θ | y ) dθ π (θ) f ( y | θ) π (θ) KL = ∫ ln π ( θ | y ) dθ m ( y ) π (θ) KL = Eθ|y ln f ( y | θ )  − ln m ( y )   −2 KL = D − Dm (by-product of PP) ⇒ Dm = D + 2 KL DIC = D + pD where pD = D − D ( θ ) model complexity 06/01/2011 JLF/BigMC 18
  19. 19. PP/partial BF 1)if π (θ ) improper ⇒ marginal f ( x ) also improper resulting in problems for defining BF 2) High sensitivity of BF to priors (does not vanish with increasing sample size) sample Idea behind partial BF (Lempers,1971) y = ( y P , y T ) -Learning or pilot sample y P to tune the prior -Testing sample y T for data analysis Intrinsinc BF (Berger & Perrichi, 1996) Fractional BF (O'Hagan, 1995) 06/01/2011 JLF/BigMC 19
  20. 20. Fractional BF A fraction b of the likelihood is used to tune the prior b f ( y P | θ ) ≈ f ( y | θ ) b = m / N < 1 (O'Hagan, 1995) resulting in: in: b π ( θ, b ) ∝ f ( y | θ ) π ( θ ) 06/01/2011 20
  21. 21. PP & fractional BF b π ( θ, b ) ∝ f ( y | θ ) π ( θ ) 1−b m F ( y, b ) = ∫ f ( y | θ ) π ( θ, b ) dθ m F ( y, b ) = ∫ f ( y | θ ) π ( θ ) dθ = m ( y,1) ∫ f ( y | θ ) π ( θ ) dθ m ( y , b ) b PP directly provides −π ( θ, b ) via π ( θ | y , t = b ) 1 − log m F ( y, b ) = ∫ b Eθ|y ,t log f ( y | θ )dt   06/01/2011 JLF/BigMC 21
  22. 22. PP/algorithm MCMC with discretization of t on [ 0,1[ t0 = 0 < t1 < ... < ti < ... < tn −1 < tn = 1 ti = (i / n)c with i = 1,.., n; n = 20 − 100; c = 2 − 5 1)Make draws of θ( gi ) MCMC from π ( θ | y, ti ) 1 G   G i ( 2)Compute Eθ|y ,t =ti log p ( y | θ )  = ∑ g =1 log p y | θ( i ) ˆ g ) Often conditional independence, log p ( y | θ ) = ∑ i =1 log p ( yi | θ ) N eg if θ if the closest stochastic parent of y = ( yi ) (as for DIC) 3)Approximate the integral (eg trapezoidal rule) ∑ i=0 i+1 i i i+1 ˆog m ( y ) = ½ n ( t − t )( E + E ) l Error due to this numerical approx. (Calderhead & Girolami,2009) Formula for MC sampling error: see Friel & Pettitt 06/01/2011 JLF/BigMC 22
  23. 23. PP/Little toy example yi 0) yi | λi ~ id P ( λi xi ) ⇔ (λ x ) f ( yi | λi ) = i i exp ( −λi xi ) yi ! β α λiα −1 exp ( − βλi ) 1)λi ~ id G (α , β ) ⇔ π ( λi ) = Γ (α ) 0 + 1) yi ~ id BN (α , pi ) where pi = β / ( β + xi ) Γ ( yi + α ) α y Direct approach: f ( yi ) = pi (1 − pi ) i Γ (α ) yi ! f ( y ) = − n ln Γ (α ) + ∑ i =1 ln Γ ( yi + α ) −∑ i =1 ln ( yi !) n n +α ∑ i =1 ln pi + ∑ i =1 yi ln (1 − p )i n n n Indirect approach: f ( y ) = ∏ i =1 ∫ f ( yi | λi ) π ( λi ) d λi 06/01/2011 JLF/BigMC 23
  24. 24. PP/Little toy example/cont. Ex / Pump data: Ex#2 in Winbugs, Carlin-Louis (p126) y = # failures of pumps in x (103 hrs ) y = ( 5,1,5,14,3,19,1,1, 4, 22 ) ; n = 10; α = β = 1 x = (94.3,15.7, 62.9,126,5.24,31.4,1.05,1.05, 2.1,10.5) ˆ D = −2 ln f ( y ) = 66.03 D = 66.28 ± 0.03 (20pts) FP 06/01/2011 JLF/BigMC 24
  25. 25. PP/Toy example in Openbugs 06/01/2011 JLF/BigMC 25
  26. 26. PP/Toy example in Openbugs/Cont. 06/01/2011 JLF/BigMC 26
  27. 27. PP/Toy example in Openbugs/Cont. 06/01/2011 JLF/BigMC 27
  28. 28. PP/Toy example in Openbugs/Cont. 06/01/2011 JLF/BigMC 28
  29. 29. Sampling both θ & t 1 log m ( y ) = ∫ log f ( y | θ ) π ( θ | y , t ) dt 0  log f ( y | θ ) 1 log m ( y ) = ∫ π ( θ | y, t ) p(t ) dt 0 p (t ) π ( θ ,t | y )  log f ( y | θ )  log log m ( y ) = Eθ ,t|y    p (t )  t π ( θ | y, t ) ∝ f ( y | θ ) π ( θ ) t if we assume p (t ) ∝ zt ( y ) ⇒ π ( t | θ, y ) ∝ f ( y | θ ) Sampling ( θ, t ) in such conditions gives poor estimation (too few draws of t close to 0) 06/01/2011 JLF/BigMC 29
  30. 30. Example 1/ Pothoff&Roy’s data Growth measurements in 11 girls and 16 boys: Pothoff and Roy,1964; Little and Rubin, 1987 Age (years) Age (years) Girl 8 10 12 14 Boy 8 10 12 14 1 210 200 215 230 1 260 250 290 310 2 210 215 240 255 2 215 230 265 3 205 245 260 3 230 225 240 275 4 235 245 250 265 4 255 275 265 270 5 215 230 225 235 5 200 225 260 6 200 210 225 6 245 255 270 285 7 215 225 230 250 7 220 220 245 265 8 230 230 235 240 8 240 215 245 255 9 200 220 215 9 230 205 310 260 10 165 190 195 10 275 280 310 315 11 245 250 280 280 11 230 230 235 250 12 215 240 280 13 170 260 295 14 225 255 255 260 15 230 245 260 300 16 220 235 250 distance from the centre of the pituary to the pteryomaxillary fissure (unit 10-4m) 06/01/2011 JLF/BigMC 30
  31. 31. Model comparison on Pothoff’s data i: subscript for individual i = 1,.., I = 25 (11girls+16boys) j: subscript for measurement at age t j (8,10,12,14 yrs ) 1)Purely Fixed Model yij = (α 0 + α xi ) + ( β 0 + β xi ) ( t j − 8 ) + eij int ercept pente 2)Random intercept model yij = (α 0 + α xi + ai ) + ( β 0 + β xi ) ( t j − 8 ) + eij 3)Random intercept & slope model assuming independent effects yij = (α 0 + α xi + ai ) + ( β 0 + β xi + bi ) ( t j − 8 ) + eij or yij = φi1 + φi 2 ( t j − 8 ) + eij , yij ~ id N (ηij , σ e2 )  φi1   α 0 + α xi   σ a 0   2 with φi =   ~ N  ,   φi 2    β 0 + β xi   0 σ b2     4)Random intercept & slope model assuming correlated effects  φi1   α 0 + α xi   σ a σ ab   2 φi =   ~ N  ,   φi 2    β 0 + β xi   σ ab σ b2     06/01/2011 JLF/BigMC 31
  32. 32. Model presentation:Hierarchical Bayes 1st level:yij ~ id N (ηij , σ e2 ) with ηij = φi1 + φi 2 ( t j − 8 ) 2nd level :   φ   α 0 + α xi   σ a σ ab   2 2a) φi =  i1  ~ N  ,   φi 2   β 0 + β xi   σ ab σ b2      Σ   2b) σ e ~ U ( 0, ∆ e ) or σ e2 ~ InvG (1, σ e2 ) 3rd level: Fixed effects: α 0 , α , β 0 , β ~ U(inf,sup) Var (Covar) components: − If σ ab = 0, then i) σ a ~ U ( 0, ∆ a ) , same for σ b ~ U ( 0, ∆ b ) or ii) σ a ~ InvG (1, σ a ) ,same for σ b2 ~ InvG (1, σ b2 ) 2 2 − If σ ab ≠ 0, then i)σ a ~ U ( 0, ∆ a ) , σ b ~ U ( 0, ∆ b ) , ρ ~ U ( -1,1) * ( or ii) Ω ~ W (νΣ ) ,ν −1 ) for Ω = Σ −1 with ν = dim(Ω) + 1 and Σ known location parameter *Take care as Winbugs uses another notation ie W ( (νΣ ) ,ν ) 06/01/2011 JLF/BigMC 32
  33. 33. Results 06/01/2011 JLF/BigMC 33
  34. 34. Results/fractional priors (b=0 vs 0.125) 06/01/2011 JLF/BigMC 34
  35. 35. Example 2:Models of genetic differentiation 2 level hierarchical model i =locus; j =(sub)population aij =Nbre of genes carrying a given allele at locus i in pop. j pij = Frequency of that allele at locus i in pop. j 0) yij | α ij ~ id B ( nij , α ij ) 1− cj 1) α ij | xi ,λij ~ id Beta (τ jπ i ,τ j (1 − π i ) ) τ j = where c j ( Dif. index ) cj π i = Frequency of that allele at locus i in the gene pool 2)π i ~ id Beta ( aπ , bπ ) , c j ~ id Beta ( ac , bc ) Migration-Drift at equilibrium (Balding) 06/01/2011 JLF/BigMC 35
  36. 36. Ex2: Nicholson’s model Nicholson et al (2002) same as previously but 1) α ij | xi ,λij ~ id N (π i , c jπ i (1 − π i ) ) Truncated normal with masses in 0 and 1 so that yij | α ij ~ id B ( nij , α ij ) * * where α ij = max(0, min(1, α ij )) 2)π i ~ id Beta ( aπ , bπ ) , c j ~ id Beta ( ac , bc ) Pure drift model 06/01/2011 JLF/BigMC 36
  37. 37. Results 06/01/2011 JLF/BigMC 37
  38. 38. Conclusion Derived from thermodynamical integration Link with « path sampling » Easy to understand and quite general Well suited to complex hierarchical models « Theta’s » can be defined as the closest stochastic parents of data making the latter conditionally independent Draws only from posterior distributions Gives as a by product fractional BF Easy to implement (including in Openbugs) but time consuming Caution needed in discretization of t (close to 0) 06/01/2011 JLF/BigMC 38
  39. 39. Some references Chen M, Shao Q, Ibrahim J (2000) Monte Carlo methods in Bayesian computation. Springer Chib S (1995) Marginal likelihood from the Gibbs output. JASA 90,1313-1321 Chopin N, Robert CP (2010) Properties of nested sampling. Biometrika, 97, 741- 755 Friel N, Pettitt AN (2008) Marginal likelihood estimation via power posteriors, JRSS, B, 70, 589-607 Frühwirth-Schnatter (2004) Estimating marginal likelihoods from mixtures & Markov switching models using bridge sampling techniques. Econometrics Journal, 7,143-167 Gelman A, Meng X-L (1998) Simulating normalizing constants: from importance sampling to bridge sampling and path sampling, Statistical Science, 13, 163-185 Lartillot N, Philippe H (2006) Computing Bayes factors using thermodynamic integration. Systematic Biology, 55, 195-207 Marin JM, Robert CP (2009) Importance sampling methods for Bayesian discrimination between embedded models. arXiv:0910.2325v1 Meng X-L, Wong WH (1996) Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica,6,831-860 O Hagan A (1995) Fractional Bayes factors for model comparison. JRSS, B, 57, 99-138 06/01/2011 JLF/BigMC 39
  40. 40. Acknowledgements Nial Friel (U College, Dublin) for his interest in these applications and his unvaluable explanations & suggestions Tony O’Hagan for further insight into FBF Gilles Celeux, Mathieu Gautier as coadvisors of the Master dissertation of Yoan Soussan (Paris VI) Christian Robert for his blog and his relevant comments, standpoints and bibliographical references The Applibugs & Babayes groups for stimulating discussions on DIC, BF,CPO & other information criteria (AIC,BIC) 06/01/2011 JLF/BigMC 40

×