Computational Bayesian Statistics




                     Computational Bayesian Statistics

                       Chris...
Computational Bayesian Statistics




Outline



      1    Computational basics

      2    Regression and variable selec...
Computational Bayesian Statistics
  Computational basics




Computational basics



       1   Computational basics
     ...
Computational Bayesian Statistics
  Computational basics
     The Bayesian toolbox



The Bayesian toolbox


             ...
Computational Bayesian Statistics
  Computational basics
     The Bayesian toolbox



New perspective


              Unce...
Computational Bayesian Statistics
  Computational basics
     The Bayesian toolbox



Bayesian model


       A Bayesian s...
Computational Bayesian Statistics
  Computational basics
     The Bayesian toolbox



Posterior distribution center of Bay...
Computational Bayesian Statistics
  Computational basics
     The Bayesian toolbox



Improper prior distribution


      ...
Computational Bayesian Statistics
  Computational basics
     The Bayesian toolbox



Validation


       Extension of the...
Computational Bayesian Statistics
  Computational basics
     Testing



Testing hypotheses

       Deciding about validit...
Computational Bayesian Statistics
  Computational basics
     Testing



Bayes factor


       Bayesian testing procedure ...
Computational Bayesian Statistics
  Computational basics
     Testing



Banning improper priors



       Impossibility o...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration



Fundamental integration issue



...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration



Monte Carlo Principle

       Use...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration



e.g., Bayes factor approximation
...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration



Example



       CMBdata
       ...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration



Precision evaluation
       Estim...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration




       Example (Cauchy-normal)
 ...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration




       Example (Cauchy-normal (2...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration



Importance sampling


       Simu...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration



Importance sampling (cont’d)

   ...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration



Justification

       Convergence ...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration



Choice of importance function
   ...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration




       Example (Cauchy target)
 ...
Computational Bayesian Statistics
  Computational basics
     Monte Carlo integration



Practical alternative


         ...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Computational ...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Importance sam...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Diabetes in Pi...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Probit modelli...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Importance sam...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Diabetes in Pi...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Bridge samplin...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Optimal bridge...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Extension to v...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Illustration f...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Diabetes in Pi...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



The original h...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Approximating ...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Comparison wit...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



HPD indicator ...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Diabetes in Pi...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Chib’s represe...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Case of latent...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Case of the pr...
Computational Bayesian Statistics
  Computational basics
     Bayes factor approximations in perspective



Diabetes in Pi...
Computational Bayesian Statistics
  Regression and variable selection




Regression and variable selection



       2   ...
Computational Bayesian Statistics
  Regression and variable selection
     Regression



Regressors and response

       V...
Computational Bayesian Statistics
  Regression and variable selection
     Regression



Regressors and response cont’d

 ...
Computational Bayesian Statistics
  Regression and variable selection
     Regression



Linear models


       Ordinary n...
Computational Bayesian Statistics
  Regression and variable selection
     Regression



Pine processionary caterpillars
 ...
Computational Bayesian Statistics
  Regression and variable selection
     Regression



Ibex horn growth
                ...
Computational Bayesian Statistics
  Regression and variable selection
     Regression



Likelihood function & estimator
 ...
Computational Bayesian Statistics
  Regression and variable selection
     Regression



lm() for pine processionary cater...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



Zellner’s info...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



Prior selectio...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



Posterior stru...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



Posterior stru...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



Pine processio...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



Pine processio...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



Credible regio...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



Pine processio...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



T marginal
   ...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



Point null hyp...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



Point null mar...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s informative G-prior



Bayes factor

...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s noninformative G-prior



Zellner’s n...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s noninformative G-prior



Posterior d...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s noninformative G-prior



Posterior m...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s noninformative G-prior



Marginal di...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s noninformative G-prior



Point null ...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s noninformative G-prior



Point null ...
Computational Bayesian Statistics
  Regression and variable selection
     Zellner’s noninformative G-prior



Processiona...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Markov Chai...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Markov chai...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Algorithms ...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Convergence...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



More conver...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Demarginali...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Two-stage G...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Two-stage G...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Implementat...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Simple Exam...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Gibbs outpu...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Gibbs outpu...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



Generalisat...
Computational Bayesian Statistics
  Regression and variable selection
     Markov Chain Monte Carlo Methods



The general...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Variable selection

     ...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Model notations

        ...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Model indicators



     ...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Models in competition


 ...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Informative G-prior



  ...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Prior definitions


      ...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Prior completion



     ...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Prior competition (2)

  ...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection




       f (y|γ, σ 2 , X) ...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Pine processionary caterp...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Pine processionary caterp...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Noninformative extension
...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Pine processionary caterp...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Stochastic search for the...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Gibbs sampling for variab...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



MCMC interpretation
     ...
Computational Bayesian Statistics
  Regression and variable selection
     Variable selection



Pine processionary caterp...
Computational Bayesian Statistics
  Generalized linear models




Generalized linear models



       3   Generalized line...
Computational Bayesian Statistics
  Generalized linear models
     Generalisation of linear models



Generalisation of th...
Computational Bayesian Statistics
  Generalized linear models
     Generalisation of linear models



Specifications of GLM...
Computational Bayesian Statistics
  Generalized linear models
     Generalisation of linear models



Likelihood



      ...
Computational Bayesian Statistics
  Generalized linear models
     Generalisation of linear models



Examples
       Case...
Computational Bayesian Statistics
  Generalized linear models
     Generalisation of linear models



Canonical link
     ...
Computational Bayesian Statistics
  Generalized linear models
     Generalisation of linear models



Examples (2)

      ...
Computational Bayesian Statistics
  Generalized linear models
     Generalisation of linear models



Log-linear models
  ...
Computational Bayesian Statistics
  Generalized linear models
     Generalisation of linear models



Poisson regression m...
Computational Bayesian Statistics
  Generalized linear models
     Metropolis–Hastings algorithms



Metropolis–Hastings a...
Computational Bayesian Statistics
  Generalized linear models
     Metropolis–Hastings algorithms



Generic MCMC sampler
...
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Course on Bayesian computational methods
Upcoming SlideShare
Loading in...5
×

Course on Bayesian computational methods

1,620

Published on

Course given during the conference held in San Antonio, Texas, March 17-29 in honour of Jim Berger.

Published in: Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,620
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
94
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Course on Bayesian computational methods

  1. 1. Computational Bayesian Statistics Computational Bayesian Statistics Christian P. Robert, Universit´ Paris Dauphine e Frontiers of Statistical Decision Making and Bayesian Analysis, In honour of Jim Berger, 17th of March, 2010 1 / 143
  2. 2. Computational Bayesian Statistics Outline 1 Computational basics 2 Regression and variable selection 3 Generalized linear models 2 / 143
  3. 3. Computational Bayesian Statistics Computational basics Computational basics 1 Computational basics The Bayesian toolbox Improper prior distribution Testing Monte Carlo integration Bayes factor approximations in perspective 3 / 143
  4. 4. Computational Bayesian Statistics Computational basics The Bayesian toolbox The Bayesian toolbox Bayes theorem = Inversion of probabilities If A and E are events such that P (E) = 0, P (A|E) and P (E|A) are related by P (E|A)P (A) P (A|E) = P (E|A)P (A) + P (E|Ac )P (Ac ) P (E|A)P (A) = P (E) 4 / 143
  5. 5. Computational Bayesian Statistics Computational basics The Bayesian toolbox New perspective Uncertainty on the parameters θ of a model modeled through a probability distribution π on Θ, called prior distribution Inference based on the distribution of θ conditional on x, π(θ|x), called posterior distribution f (x|θ)π(θ) π(θ|x) = . f (x|θ)π(θ) dθ 5 / 143
  6. 6. Computational Bayesian Statistics Computational basics The Bayesian toolbox Bayesian model A Bayesian statistical model is made of 1 a likelihood f (x|θ), and of 2 a prior distribution on the parameters, π(θ) . 6 / 143
  7. 7. Computational Bayesian Statistics Computational basics The Bayesian toolbox Posterior distribution center of Bayesian inference π(θ|x) ∝ f (x|θ) π(θ) Operates conditional upon the observations Integrate simultaneously prior information/knowledge and information brought by x Avoids averaging over the unobserved values of x Coherent updating of the information available on θ, independent of the order in which i.i.d. observations are collected Provides a complete inferential scope and an unique motor of inference 7 / 143
  8. 8. Computational Bayesian Statistics Computational basics The Bayesian toolbox Improper prior distribution Extension from a prior distribution to a prior σ-finite measure π such that π(θ) dθ = +∞ Θ Formal extension: π cannot be interpreted as a probability any longer 8 / 143
  9. 9. Computational Bayesian Statistics Computational basics The Bayesian toolbox Validation Extension of the posterior distribution π(θ|x) associated with an improper prior π given by Bayes’s formula f (x|θ)π(θ) π(θ|x) = , Θ f (x|θ)π(θ) dθ when f (x|θ)π(θ) dθ < ∞ Θ 9 / 143
  10. 10. Computational Bayesian Statistics Computational basics Testing Testing hypotheses Deciding about validity of assumptions or restrictions on the parameter θ from the data, represented as H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0 Binary outcome of the decision process: accept [coded by 1] or reject [coded by 0] D = {0, 1} Bayesian solution formally very close from a likelihood ratio test statistic, but numerical values often strongly differ from classical solutions 10 / 143
  11. 11. Computational Bayesian Statistics Computational basics Testing Bayes factor Bayesian testing procedure depends on P π (θ ∈ Θ0 |x) or alternatively on the Bayes factor π {P π (θ ∈ Θ1 |x)/P π (θ ∈ Θ0 |x)} B10 = {P π (θ ∈ Θ1 )/P π (θ ∈ Θ0 )} = f (x|θ1 )π1 (θ1 )dθ1 f (x|θ0 )π0 (θ0 )dθ0 = m1 (x)/m0 (x) [Akin to likelihood ratio] 11 / 143
  12. 12. Computational Bayesian Statistics Computational basics Testing Banning improper priors Impossibility of using improper priors for testing! Reason: When using the representation π(θ) = P π (θ ∈ Θ1 ) × π1 (θ) + P π (θ ∈ Θ0 ) × π0 (θ) π1 and π0 must be normalised 12 / 143
  13. 13. Computational Bayesian Statistics Computational basics Monte Carlo integration Fundamental integration issue Generic problem of evaluating an integral I = Ef [h(X)] = h(x) f (x) dx X where X is uni- or multidimensional, f is a closed form, partly closed form, or implicit density, and h is a function 13 / 143
  14. 14. Computational Bayesian Statistics Computational basics Monte Carlo integration Monte Carlo Principle Use a sample (x1 , . . . , xm ) from the density f to approximate the integral I by the empirical average m 1 hm = h(xj ) m j=1 Convergence of the average hm −→ Ef [h(X)] by the Strong Law of Large Numbers 14 / 143
  15. 15. Computational Bayesian Statistics Computational basics Monte Carlo integration e.g., Bayes factor approximation For the normal case x1 , . . . , xn ∼ N (µ + ξ, σ 2 ) y1 , . . . , yn ∼ N (µ − ξ, σ 2 ) and H0 : ξ = 0 under prior π(µ, σ 2 ) = 1/σ 2 and ξ ∼ N (0, 1) −n+1/2 π (¯ − y )2 + S 2 x ¯ B01 = √ [(2ξ − x − y )2 + S 2 ] e−ξ2 /2 dξ/ 2π −n+1/2 ¯ ¯ 15 / 143
  16. 16. Computational Bayesian Statistics Computational basics Monte Carlo integration Example CMBdata π Simulate ξ1 , . . . , ξ1000 ∼ N (0, 1) and approximate B01 with −n+1/2 π (¯ − y )2 + S 2 x ¯ B01 = 1000 = 89.9 1 [(2ξi − x − y )2 + S 2 ] −n+1/2 1000 i=1 ¯ ¯ when x = 0.0888 , ¯ y = 0.1078 , ¯ S 2 = 0.00875 16 / 143
  17. 17. Computational Bayesian Statistics Computational basics Monte Carlo integration Precision evaluation Estimate the variance with m 1 1 vm = [h(xj ) − hm ]2 , mm−1 j=1 and for m large, √ hm − Ef [h(X)] / vm ≈ N (0, 1). Note 0.05 Construction of a convergence 0.04 0.03 test and of confidence bounds on 0.02 the approximation of Ef [h(X)] 0.01 0.00 0 50 100 150 1B10 17 / 143
  18. 18. Computational Bayesian Statistics Computational basics Monte Carlo integration Example (Cauchy-normal) For estimating a normal mean, a robust prior is a Cauchy prior x ∼ N (θ, 1), θ ∼ C(0, 1). Under squared error loss, posterior mean ∞ θ 2 e−(x−θ) /2 dθ −∞ 1 + θ2 δ π (x) = ∞ 1 2 2 e−(x−θ) /2 dθ −∞ 1+θ 18 / 143
  19. 19. Computational Bayesian Statistics Computational basics Monte Carlo integration Example (Cauchy-normal (2)) Form of δ π suggests simulating iid variables θ1 , · · · , θm ∼ N (x, 1) and calculate 10.6 m θi 10.4 i=1 2 ˆπ 1 + θi δm (x) = . 1 10.2 m i=1 2 1 + θi 10.0 LLN implies 9.8 9.6 ˆπ δm (x) −→ δ π (x) as m −→ ∞. 0 200 400 600 800 1000 iterations 19 / 143
  20. 20. Computational Bayesian Statistics Computational basics Monte Carlo integration Importance sampling Simulation from f (the true density) is not necessarily optimal Alternative to direct sampling from f is importance sampling, based on the alternative representation f (x) f (x) Ef [h(x)] = h(x) g(x) dx = Eg h(x) X g(x) g(x) which allows us to use other distributions than f 20 / 143
  21. 21. Computational Bayesian Statistics Computational basics Monte Carlo integration Importance sampling (cont’d) Importance sampling algorithm Evaluation of Ef [h(x)] = h(x) f (x) dx X by 1 Generate a sample x1 , . . . , xm from a distribution g 2 Use the approximation m 1 f (xj ) h(xj ) m g(xj ) j=1 21 / 143
  22. 22. Computational Bayesian Statistics Computational basics Monte Carlo integration Justification Convergence of the estimator m 1 f (xj ) h(xj ) −→ Ef [h(x)] m g(xj ) j=1 1 converges for any choice of the distribution g as long as supp(g) ⊃ supp(f ) 2 Instrumental distribution g chosen from distributions easy to simulate 3 Same sample (generated from g) can be used repeatedly, not only for different functions h, but also for different densities f 22 / 143
  23. 23. Computational Bayesian Statistics Computational basics Monte Carlo integration Choice of importance function g can be any density but some choices better than others 1 Finite variance only when f (x) f 2 (x) Ef h2 (x) = h2 (x) dx < ∞ . g(x) X g(x) 2 Instrumental distributions with tails lighter than those of f (that is, with sup f /g = ∞) not appropriate, because weights f (xj )/g(xj ) vary widely, giving too much importance to a few values xj . 3 If sup f /g = M < ∞, the accept-reject algorithm can be used as well to simulate f directly. 4 IS suffers from curse of dimensionality 23 / 143
  24. 24. Computational Bayesian Statistics Computational basics Monte Carlo integration Example (Cauchy target) Case of Cauchy distribution C(0, 1) when importance function is Gaussian N (0, 1). Density ratio 80 p⋆ (x) √ exp x2 /2 = 2π π (1 + x2 ) 60 p0 (x) 40 very badly behaved: e.g., 20 ∞ ̺(x)2 p0 (x)dx = ∞ 0 0 2000 4000 6000 8000 10000 iterations −∞ Poor performances of the associated importance sampling estimator 24 / 143
  25. 25. Computational Bayesian Statistics Computational basics Monte Carlo integration Practical alternative m m h(xj ) f (xj )/g(xj ) f (xj )/g(xj ) j=1 j=1 where f and g are known up to constants. 1 Also converges to I by the Strong Law of Large Numbers. 2 Biased, but the bias is quite small: may beat the unbiased estimator in squared error loss. 25 / 143
  26. 26. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Computational alternatives for evidence Bayesian model choice and hypothesis testing relies on a similar quantity, the evidence Zk = πk (θk )Lk (θk ) dθk , k = 1, 2 Θk aka the marginal likelihood. [Jeffreys, 1939] 26 / 143
  27. 27. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Importance sampling When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 simultaneous use of importance functions ̟0 and ̟1 and n0 n−1 0 i i i i=1 f0 (x|θ0 )π0 (θ0 )/̟0 (θ0 ) B01 = n1 n−1 1 i i i i=1 f1 (x|θ1 )π1 (θ1 )/̟1 (θ1 ) 27 / 143
  28. 28. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Diabetes in Pima Indian women Example (R benchmark) “A population of women who were at least 21 years old, of Pima Indian heritage and living near Phoenix (AZ), was tested for diabetes according to WHO criteria. The data were collected by the US National Institute of Diabetes and Digestive and Kidney Diseases.” 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes 28 / 143
  29. 29. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Probit modelling on Pima Indian women Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 29 / 143
  30. 30. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Importance sampling for the Pima Indian dataset Use of the importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ) R Importance sampling code model1=summary(glm(y~-1+X1,family=binomial(link="probit"))) is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled) is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled) bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1], sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)- dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled))) 30 / 143
  31. 31. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Diabetes in Pima Indian women Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations from the prior and the above MLE importance sampler 5 4 3 2 Basic Monte Carlo Importance sampling 31 / 143
  32. 32. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Bridge sampling General identity: π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ B12 = ∀ α(·) π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ n2 θji ∼ πj (θ|x) 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1 32 / 143
  33. 33. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Optimal bridge sampling The optimal choice of auxiliary function α is n1 + n2 α⋆ = n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 1 π2 (θ1i |x) ˜ n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x) i=1 B12 ≈ n2 1 π1 (θ2i |x) ˜ n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x) i=1 Back later! 33 / 143
  34. 34. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Extension to varying dimensions When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on Θ2 so that Z π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = Z π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜ " # π1 (θ1 )ω(ψ|θ1 ) ˜ Eϕ [˜ 1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π = Eπ 2 = π2 (θ1 , ψ) ˜ Eϕ [˜ 2 (θ1 , ψ)/ϕ(θ1 , ψ)] π for any conditional density ω(ψ|θ1 ) and any joint density ϕ. 34 / 143
  35. 35. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate R bridge sampling code cova=model2$cov.unscaled expecta=model2$coeff[,1] covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3] probit1=hmprobit(Niter,y,X1) probit2=hmprobit(Niter,y,X2) pseudo=rnorm(Niter,meanw(probit1),sqrt(covw)) probit1p=cbind(probit1,pseudo) bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]), sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3], cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+ dnorm(pseudo,expecta[3],cova[3,3]))) 35 / 143
  36. 36. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 × 20, 000 simulations from the prior (MC), the above bridge sampler and the above importance sampler 36 / 143
  37. 37. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!! 37 / 143
  38. 38. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk which holds no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the Monte Carlo output 38 / 143
  39. 39. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ 39 / 143
  40. 40. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective HPD indicator as ϕ Use the convex hull of Monte Carlo simulations corresponding to the 10% HPD region (easily derived!) and ϕ as indicator: 10 ϕ(θ) = Id(θ,θ(t) )≤ǫ T t∈HPD 40 / 143
  41. 41. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above harmonic mean sampler and importance samplers 3.102 3.104 3.106 3.108 3.110 3.112 3.114 3.116 Harmonic mean Importance sampling 41 / 143
  42. 42. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x) 42 / 143
  43. 43. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are latent variables sampled from π(z|x) (often by Gibbs ) 43 / 143
  44. 44. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Case of the probit model For the completion by z, hidden normal variable, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities R Bridge sampling code gibbs1=gibbsprobit(Niter,y,X1) gibbs2=gibbsprobit(Niter,y,X2) bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3), sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/ mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2), sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1))) 44 / 143
  45. 45. Computational Bayesian Statistics Computational basics Bayes factor approximations in perspective Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above Chib’s and importance samplers 45 / 143
  46. 46. Computational Bayesian Statistics Regression and variable selection Regression and variable selection 2 Regression and variable selection Regression Zellner’s informative G-prior Zellner’s noninformative G-prior Markov Chain Monte Carlo Methods Variable selection 46 / 143
  47. 47. Computational Bayesian Statistics Regression and variable selection Regression Regressors and response Variable of primary interest, y, called the response or the outcome variable [assumed here to be continuous] E.g., number of Pine processionary caterpillar colonies Covariates x = (x1 , . . . , xk ) called explanatory variables [may be discrete, continuous or both] Distribution of y given x typically studied in the context of a set of units or experimental subjects, i = 1, . . . , n, for instance patients in an hospital ward, on which both yi and xi1 , . . . , xik are measured. 47 / 143
  48. 48. Computational Bayesian Statistics Regression and variable selection Regression Regressors and response cont’d Dataset made of the conjunction of the vector of outcomes y = (y1 , . . . , yn ) and of the n × (k + 1) matrix of explanatory variables   1 x11 x12 . . . x1k  1 x21 x22 . . . x2k     x32 . . . x3k  X =  1 x31   . . . . .   .. . . . . . . .  . 1 xn1 xn2 . . . xnk 48 / 143
  49. 49. Computational Bayesian Statistics Regression and variable selection Regression Linear models Ordinary normal linear regression model such that y|β, σ 2 , X ∼ Nn (Xβ, σ 2 In ) and thus E[yi |β, X] = β0 + β1 xi1 + . . . + βk xik V(yi |σ 2 , X) = σ 2 49 / 143
  50. 50. Computational Bayesian Statistics Regression and variable selection Regression Pine processionary caterpillars Pine processionary caterpillar colony size influenced by x1 altitude x2 slope (in degrees) x3 number of pines in the area x4 height of the central tree x5 diameter of the central tree x6 index of the settlement density x7 orientation of the area (from 1 [southbound] to 2) x8 height of the dominant tree x9 number of vegetation strata x10 mix settlement index (from 1 if not mixed to 2 if mixed) x1 x2 x3 50 / 143
  51. 51. Computational Bayesian Statistics Regression and variable selection Regression Ibex horn growth Growth of the Ibex horn size S against age A A log(S) = α + β log +ǫ 1+A 6.5 6.0 5.5 log(Size) 5.0 4.5 4.0 3.5 −0.25 −0.20 −0.15 −0.10 log(1 1 + Age) 51 / 143
  52. 52. Computational Bayesian Statistics Regression and variable selection Regression Likelihood function & estimator The likelihood of the ordinary normal linear model is −n/2 1 ℓ β, σ 2 |y, X = 2πσ 2 exp − (y − Xβ)T (y − Xβ) 2σ 2 The MLE of β is solution of the least squares minimisation problem n 2 min (y − Xβ)T (y − Xβ) = min (yi − β0 − β1 xi1 − . . . − βk xik ) , β β i=1 namely ˆ β = (X T X)−1 X T y 52 / 143
  53. 53. Computational Bayesian Statistics Regression and variable selection Regression lm() for pine processionary caterpillars Residuals: Min 1Q Median 3Q Max -1.6989 -0.2731 -0.0003 0.3246 1.7305 Coefficients: Estimate Std. Error t value Pr(>|t|) intercept 10.998412 3.060272 3.594 0.00161 ** XV1 -0.004431 0.001557 -2.846 0.00939 ** XV2 -0.053830 0.021900 -2.458 0.02232 * XV3 0.067939 0.099472 0.683 0.50174 XV4 -1.293636 0.563811 -2.294 0.03168 * XV5 0.231637 0.104378 2.219 0.03709 * XV6 -0.356800 1.566464 -0.228 0.82193 XV7 -0.237469 1.006006 -0.236 0.81558 XV8 0.181060 0.236724 0.765 0.45248 XV9 -1.285316 0.864847 -1.486 0.15142 XV10 -0.433106 0.734869 -0.589 0.56162 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 53 / 143
  54. 54. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior Zellner’s informative G-prior Constraint Allow the experimenter to introduce information about the location parameter of the regression while bypassing the most difficult aspects of the prior specification, namely the derivation of the prior correlation structure. Zellner’s prior corresponds to ˜ β|σ 2 , X ∼ Nk+1 (β, cσ 2 (X T X)−1 ) σ 2 ∼ π(σ 2 |X) ∝ σ −2 . [Special conjugate] 54 / 143
  55. 55. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior Prior selection ˜ Experimental prior determination restricted to the choices of β and of the constant c. Note c can be interpreted as a measure of the amount of information available in the prior relative to the sample. For instance, setting 1/c = 0.5 gives the prior the same weight as 50% of the sample. There still is a lasting influence of the factor c 55 / 143
  56. 56. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior Posterior structure With this prior model, the posterior simplifies into π(β, σ 2 |y, X) ∝ f (y|β, σ 2 , X)π(β, σ 2 |X) 1 ˆ ˆ ∝ σ2 exp − 2 (y − X β)T (y − X β) −(n/2+1) 2σ 1 ˆ ˆ − 2 (β − β)T X T X(β − β) σ 2 −k/2 2σ 1 ˜ ˜ × exp − (β − β)X T X(β − β) , 2cσ 2 because X T X used in both prior and likelihood [G-prior trick] 56 / 143
  57. 57. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior Posterior structure (cont’d) Therefore, c 2 β|σ 2 , y, X ∼ Nk+1 ˜ ˆ σ c (X T X)−1 (β/c + β), c+1 c+1 n s 2 1 σ 2 |y, X ∼ IG , + ˜ ˆ ˜ ˆ (β − β)T X T X(β − β) 2 2 2(c + 1) and c ˜ β β|y, X ∼ Tk+1 n, ˆ +β , c+1 c ˜ ˆ ˜ ˆ c(s2 + (β − β)T X T X(β − β)/(c + 1)) T −1 (X X) . n(c + 1) 57 / 143
  58. 58. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior Pine processionary caterpillars βi Eπ (βi |y, X) Vπ (βi |y, X) β0 10.8895 6.4094 β1 -0.0044 2e-06 β2 -0.0533 0.0003 β3 0.0673 0.0068 β4 -1.2808 0.2175 β5 0.2293 0.0075 β6 -0.3532 1.6793 β7 -0.2351 0.6926 β8 0.1793 0.0383 β9 -1.2726 0.5119 β10 -0.4288 0.3696 c = 100 58 / 143
  59. 59. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior Pine processionary caterpillars (2) βi Eπ (βi |y, X) Vπ (βi |y, X) β0 10.9874 6.2604 β1 -0.0044 2e-06 β2 -0.0538 0.0003 β3 0.0679 0.0066 β4 -1.2923 0.2125 β5 0.2314 0.0073 β6 -0.3564 1.6403 β7 -0.2372 0.6765 β8 0.1809 0.0375 β9 -1.2840 0.5100 β10 -0.4327 0.3670 c = 1, 000 59 / 143
  60. 60. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior Credible regions Highest posterior density (HPD) regions on subvectors of the parameter β derived from the marginal posterior distribution of β. For a single parameter, c ˜ βi βi |y, X ∼ T1 n, ˆ + βi , c+1 c ˜ ˆ ˜ ˆ c(s2 + (β − β)T X T X(β − β)/(c + 1)) ω(i,i) , n(c + 1) where ω(i,i) is the (i, i)-th element of the matrix (X T X)−1 . 60 / 143
  61. 61. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior Pine processionary caterpillars βi HPD interval β0 [5.7435, 16.2533] β1 [−0.0071, −0.0018] β2 [−0.0914, −0.0162] β3 [−0.1029, 0.2387] β4 [−2.2618, −0.3255] β5 [0.0524, 0.4109] β6 [−3.0466, 2.3330] β7 [−1.9649, 1.4900] β8 [−0.2254, 0.5875] β9 [−2.7704, 0.1997] β10 [−1.6950, 0.8288] c = 100 61 / 143
  62. 62. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior T marginal Marginal distribution of y is multivariate t distribution ˜ Proof. Since β|σ 2 , X ∼ Nk+1 (β, cσ 2 (X T X)−1 ), ˜ Xβ|σ 2 , X ∼ N (X β, cσ 2 X(X T X)−1 X T ) , which implies that ˜ y|σ 2 , X ∼ Nn (X β, σ 2 (In + cX(X T X)−1 X T )). Integrating in σ 2 yields f (y|X) = (c + 1)−(k+1)/2 π −n/2 Γ(n/2) −n/2 c 1 ˜T T ˜ × yT y − y T X(X T X)−1 X T y − β X Xβ . c+1 c+1 62 / 143
  63. 63. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior Point null hypothesis If a null hypothesis is H0 : Rβ = r, the model under H0 can be rewritten as H0 y|β 0 , σ 2 , X0 ∼ Nn X0 β 0 , σ 2 In where β 0 is (k + 1 − q) dimensional. 63 / 143
  64. 64. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior Point null marginal Under the prior ˜ β 0 |X0 , σ 2 ∼ Nk+1−q β 0 , c0 σ 2 (X0 X0 )−1 , T the marginal distribution of y under H0 is f (y|X0 , H0 ) = (c0 + 1)−(k+1−q)/2 π −n/2 Γ(n/2) c0 × yT y − y T X0 (X0 X0 )−1 X0 y T T c0 + 1 −n/2 1 ˜T T ˜ − β X X0 β0 . c0 + 1 0 0 64 / 143
  65. 65. Computational Bayesian Statistics Regression and variable selection Zellner’s informative G-prior Bayes factor Therefore the Bayes factor is in closed form: π f (y|X, H1 ) (c0 + 1)(k+1−q)/2 B10 = = f (y|X0 , H0 ) (c + 1)(k+1)/2 1 ˜T T n/2 yT y − c0 T T −1 T − ˜ c0 +1 y X0 (X0 X0 ) X0 y c0 +1 β0 X0 X0 β0 c 1 ˜T T ˜ yT y − c+1 y T X(X T X)−1 X T y − c+1 β X X β Means using the same σ 2 on both models Problem: Still depends on the choice of (c0 , c) 65 / 143
  66. 66. Computational Bayesian Statistics Regression and variable selection Zellner’s noninformative G-prior Zellner’s noninformative G-prior Difference with informative G-prior setup is that we now consider c as unknown (relief!) Solution ˜ Use the same G-prior distribution with β = 0k+1 , conditional on c, and introduce a diffuse prior on c, π(c) = c−1 IN∗ (c) . 66 / 143
  67. 67. Computational Bayesian Statistics Regression and variable selection Zellner’s noninformative G-prior Posterior distribution Corresponding marginal posterior on the parameters of interest π(β, σ 2 |y, X) = π(β, σ 2 |y, X, c)π(c|y, X) dc ∞ ∝ π(β, σ 2 |y, X, c)f (y|X, c)π(c) c=1 ∞ ∝ π(β, σ 2 |y, X, c)f (y|X, c) c−1 . c=1 and −n/2 T c T f (y|X, c) ∝ (c+1) −(k+1)/2 y y− y X(X T X)−1 X T y . c+1 67 / 143
  68. 68. Computational Bayesian Statistics Regression and variable selection Zellner’s noninformative G-prior Posterior means The Bayes estimates of β and σ 2 are given by Eπ [β|y, X] ˆ = Eπ [Eπ (β|y, X, c)|y, X] = Eπ [c/(c + 1)β)|y, X]  ∞   c/(c + 1)f (y|X, c)c −1   c=1  =   ∞ β  ˆ   f (y|X, c)c−1 c=1 and ∞ ˆ ˆ s2 + β T X T X β/(c + 1) f (y|X, c)c−1 c=1 n−2 Eπ [σ 2 |y, X] = ∞ . f (y|X, c)c −1 c=1 68 / 143
  69. 69. Computational Bayesian Statistics Regression and variable selection Zellner’s noninformative G-prior Marginal distribution Important point: the marginal distribution of the dataset is available in closed form ∞ −n/2 c f (y|X) ∝ c−1 (c + 1)−(k+1)/2 y T y − y T X(X T X)−1 X T y i=1 c+1 T -shape means normalising constant can be computed too. 69 / 143
  70. 70. Computational Bayesian Statistics Regression and variable selection Zellner’s noninformative G-prior Point null hypothesis For null hypothesis H0 : Rβ = r, the model under H0 can be rewritten as H0 y|β 0 , σ 2 , X0 ∼ Nn X0 β 0 , σ 2 In where β 0 is (k + 1 − q) dimensional. 70 / 143
  71. 71. Computational Bayesian Statistics Regression and variable selection Zellner’s noninformative G-prior Point null marginal Under the prior β 0 |X0 , σ 2 , c ∼ Nk+1−q 0k+1−q , cσ 2 (X0 X0 )−1 T and π(c) = 1/c, the marginal distribution of y under H0 is ∞ −n/2 c f (y|X0 , H0 ) ∝ (c+1)−(k+1−q)/2 y T y − y T X0 (X0 X0 )−1 X0 y T T . c=1 c+1 π Bayes factor B10 = f (y|X)/f (y|X0 , H0 ) can be computed 71 / 143
  72. 72. Computational Bayesian Statistics Regression and variable selection Zellner’s noninformative G-prior Processionary pine caterpillars π For H0 : β8 = β9 = 0, log10 (B10 ) = −0.7884 Estimate Post. Var. log10(BF) (Intercept) 9.2714 9.1164 1.4205 (***) X1 -0.0037 2e-06 0.8502 (**) X2 -0.0454 0.0004 0.5664 (**) X3 0.0573 0.0086 -0.3609 X4 -1.0905 0.2901 0.4520 (*) X5 0.1953 0.0099 0.4007 (*) X6 -0.3008 2.1372 -0.4412 X7 -0.2002 0.8815 -0.4404 X8 0.1526 0.0490 -0.3383 X9 -1.0835 0.6643 -0.0424 X10 -0.3651 0.4716 -0.3838 evidence against H0: (****) decisive, (***) strong, (**) subtantial, (*) poor 72 / 143
  73. 73. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods Complexity of most models encountered in Bayesian modelling Standard simulation methods not good enough a solution New technique at the core of Bayesian computing, based on Markov chains 73 / 143
  74. 74. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Markov chains Markov chain A process (θ(t) )t∈N is an homogeneous Markov chain if the distribution of θ(t) given the past (θ(0) , . . . , θ(t−1) ) 1 only depends on θ(t−1) 2 is the same for all t ∈ N∗ . 74 / 143
  75. 75. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Algorithms based on Markov chains Idea: simulate from a posterior density π(·|x) [or any density] by producing a Markov chain (θ(t) )t∈N whose stationary distribution is π(·|x) Translation For t large enough, θ(t) is approximately distributed from π(θ|x), no matter what the starting value θ(0) is [Ergodicity]. 75 / 143
  76. 76. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Convergence If an algorithm that generates such a chain can be constructed, the ergodic theorem guarantees that, in almost all settings, the average T 1 g(θ(t) ) T t=1 converges to Eπ [g(θ)|x], for (almost) any starting value 76 / 143
  77. 77. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods More convergence If the produced Markov chains are irreducible [can reach any region in a finite number of steps], then they are both positive recurrent with stationary distribution π(·|x) and ergodic [asymptotically independent from the starting value θ(0) ] While, for t large enough, θ(t) is approximately distributed from π(θ|x) and can thus be used like the output from a more standard simulation algorithm, one must take care of the correlations between the θ(t) ’s 77 / 143
  78. 78. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Demarginalising Takes advantage of hierarchical structures: if π(θ|x) = π1 (θ|x, λ)π2 (λ|x) dλ , simulating from π(θ|x) comes from simulating from the joint distribution π1 (θ|x, λ) π2 (λ|x) 78 / 143
  79. 79. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Two-stage Gibbs sampler Usually π2 (λ|x) not available/simulable More often, both conditional posterior distributions, π1 (θ|x, λ) and π2 (λ|x, θ) can be simulated. Idea: Create a Markov chain based on those conditionals Example of latent variables! 79 / 143
  80. 80. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Two-stage Gibbs sampler (cont’d) Initialization: Start with an arbitrary value λ(0) Iteration t: Given λ(t−1) , generate 1 θ(t) according to π1 (θ|x, λ(t−1) ) 2 λ(t) according to π2 (λ|x, θ(t) ) J.W. Gibbs (1839-1903) π(θ, λ|x) is a stationary distribution for this transition 80 / 143
  81. 81. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Implementation 1 Derive efficient decomposition of the joint distribution into simulable conditionals (mixing behavior, acf(), blocking, &tc.) 2 Find when to stop the algorithm (mode chasing, missing mass, shortcuts, &tc.) 81 / 143
  82. 82. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Simple Example: iid N (µ, σ 2 ) Observations iid When y1 , . . . , yn ∼ N (µ, σ 2 ) with both µ and σ unknown, the posterior in (µ, σ 2 ) is conjugate outside a standard family But... 1 n σ2 µ|y, σ 2 ∼ N µ n i=1 yi , n ) σ 2 |y, µ ∼ I G σ 2 n − 1, 2 n (yi 2 1 i=1 − µ)2 assuming constant (improper) priors on both µ and σ 2 Hence we may use the Gibbs sampler for simulating from the posterior of (µ, σ 2 ) 82 / 143
  83. 83. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Gibbs output analysis Example (Cauchy posterior) 2 e−µ /20 π(µ|D) ∝ (1 + (x1 − µ)2 )(1 + (x2 − µ)2 ) is marginal of 2 −µ2 /20 2 π(µ, ω|D) ∝ e × e−ωi [1+(xi −µ) ] . i=1 Corresponding conditionals (ω1 , ω2 )|µ ∼ E xp(1 + (x1 − µ)2 ) ⊗ E xp(1 + (x2 − µ))2 ) µ|ω ∼ N ωi xi /( ωi + 1/20), 1/(2 ωi + 1/10) i i i 83 / 143
  84. 84. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Gibbs output analysis (cont’d) 4 2 0 µ −2 −4 −6 9900 9920 9940 9960 9980 10000 n 0.15 Density 0.10 0.05 0.00 −10 −5 0 5 10 µ 84 / 143
  85. 85. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods Generalisation Consider several groups of parameters, θ, λ1 , . . . , λp , such that π(θ|x) = ... π(θ, λ1 , . . . , λp |x) dλ1 · · · dλp or simply divide θ in (θ1 , . . . , θp ) 85 / 143
  86. 86. Computational Bayesian Statistics Regression and variable selection Markov Chain Monte Carlo Methods The general Gibbs sampler For a joint distribution π(θ) with full conditionals π1 , . . . , πp , (t) (t) Given (θ1 , . . . , θp ), simulate (t+1) (t) (t) 1. θ1 ∼ π1 (θ1 |θ2 , . . . , θp ), (t+1) (t+1) (t) (t) 2. θ2 ∼ π2 (θ2 |θ1 , θ3 , . . . , θp ), . . . (t+1) (t+1) (t+1) p. θp ∼ πp (θp |θ1 , . . . , θp−1 ). Then θ(t) → θ ∼ π 86 / 143
  87. 87. Computational Bayesian Statistics Regression and variable selection Variable selection Variable selection Back to regression: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition... 87 / 143
  88. 88. Computational Bayesian Statistics Regression and variable selection Variable selection Model notations 1 X = 1n x1 · · · xk is the matrix containing 1n and all the k potential predictor variables 2 Each model Mγ associated with binary indicator vector γ ∈ Γ = {0, 1}k where γi = 1 means that the variable xi is included in the model Mγ 3 qγ = 1T γ number of variables included in the model Mγ n 4 t1 (γ) and t0 (γ) indices of variables included in the model and indices of variables not included in the model 88 / 143
  89. 89. Computational Bayesian Statistics Regression and variable selection Variable selection Model indicators For β ∈ Rk+1 and X, we define βγ as the subvector βγ = β0 , (βi )i∈t1 (γ) and Xγ as the submatrix of X where only the column 1n and the columns in t1 (γ) have been left. 89 / 143
  90. 90. Computational Bayesian Statistics Regression and variable selection Variable selection Models in competition The model Mγ is thus defined as y|γ, βγ , σ 2 , X ∼ Nn Xγ βγ , σ 2 In where βγ ∈ Rqγ +1 and σ 2 ∈ R∗ are the unknown parameters. + Warning σ 2 is common to all models and thus uses the same prior for all models 90 / 143
  91. 91. Computational Bayesian Statistics Regression and variable selection Variable selection Informative G-prior Many (2k ) models in competition: we cannot expect a practitioner to specify a prior on every Mγ in a completely subjective and autonomous manner. Shortcut: We derive all priors from a single global prior associated with the so-called full model that corresponds to γ = (1, . . . , 1). 91 / 143
  92. 92. Computational Bayesian Statistics Regression and variable selection Variable selection Prior definitions (i) For the full model, Zellner’s G-prior: ˜ β|σ 2 , X ∼ Nk+1 (β, cσ 2 (X T X)−1 ) and σ 2 ∼ π(σ 2 |X) = σ −2 (ii) For each model Mγ , the prior distribution of βγ conditional on σ 2 is fixed as ˜ −1 βγ |γ, σ 2 ∼ Nqγ +1 βγ , cσ 2 Xγ Xγ T , ˜ T −1 T ˜ where βγ = Xγ Xγ Xγ β and same prior on σ 2 . 92 / 143
  93. 93. Computational Bayesian Statistics Regression and variable selection Variable selection Prior completion The joint prior for model Mγ is the improper prior −(qγ +1)/2−1 1 ˜ T π(βγ , σ 2 |γ) ∝ σ2 exp − βγ − βγ 2(cσ 2 ) T ˜ (Xγ Xγ ) βγ − βγ . 93 / 143
  94. 94. Computational Bayesian Statistics Regression and variable selection Variable selection Prior competition (2) Infinitely many ways of defining a prior on the model index γ: choice of uniform prior π(γ|X) = 2−k . Posterior distribution of γ central to variable selection since proportional to the marginal density of y on Mγ (or evidence of Mγ ) π(γ|y, X) ∝ f (y|γ, X)π(γ|X) ∝ f (y|γ, X) = f (y|γ, β, σ 2 , X)π(β|γ, σ 2 , X) dβ π(σ 2 |X) dσ 2 . 94 / 143
  95. 95. Computational Bayesian Statistics Regression and variable selection Variable selection f (y|γ, σ 2 , X) = f (y|γ, β, σ 2 )π(β|γ, σ 2 ) dβ 1 T (c + 1)−(qγ +1)/2 (2π)−n/2 σ 2 −n/2 = exp − y y 2σ 2 1 −1 ˜T T ˜ + cy T Xγ Xγ Xγ T T Xγ y − βγ Xγ Xγ βγ , 2σ 2 (c + 1) this posterior density satisfies c T −1 π(γ|y, X) ∝ (c + 1)−(qγ +1)/2 y T y − T y Xγ Xγ Xγ T Xγ y c+1 −n/2 1 ˜T T ˜ − β X Xγ βγ . c+1 γ γ 95 / 143
  96. 96. Computational Bayesian Statistics Regression and variable selection Variable selection Pine processionary caterpillars t1 (γ) π(γ|y, X) 0,1,2,4,5 0.2316 0,1,2,4,5,9 0.0374 0,1,9 0.0344 0,1,2,4,5,10 0.0328 0,1,4,5 0.0306 0,1,2,9 0.0250 0,1,2,4,5,7 0.0241 0,1,2,4,5,8 0.0238 0,1,2,4,5,6 0.0237 0,1,2,3,4,5 0.0232 0,1,6,9 0.0146 0,1,2,3,9 0.0145 0,9 0.0143 0,1,2,6,9 0.0135 0,1,4,5,9 0.0128 0,1,3,9 0.0117 0,1,2,8 0.0115 96 / 143
  97. 97. Computational Bayesian Statistics Regression and variable selection Variable selection Pine processionary caterpillars (cont’d) Interpretation Model Mγ with the highest posterior probability is t1 (γ) = (1, 2, 4, 5), which corresponds to the variables - altitude, - slope, - height of the tree sampled in the center of the area, and - diameter of the tree sampled in the center of the area. Corresponds to the five variables identified in the R regression output 97 / 143
  98. 98. Computational Bayesian Statistics Regression and variable selection Variable selection Noninformative extension For Zellner noninformative prior with π(c) = 1/c, we have ∞ π(γ|y, X) ∝ c−1 (c + 1)−(qγ +1)/2 y T y− c=1 −n/2 c T T −1 T y Xγ Xγ Xγ Xγ y . c+1 98 / 143
  99. 99. Computational Bayesian Statistics Regression and variable selection Variable selection Pine processionary caterpillars t1 (γ) π(γ|y, X) 0,1,2,4,5 0.0929 0,1,2,4,5,9 0.0325 0,1,2,4,5,10 0.0295 0,1,2,4,5,7 0.0231 0,1,2,4,5,8 0.0228 0,1,2,4,5,6 0.0228 0,1,2,3,4,5 0.0224 0,1,2,3,4,5,9 0.0167 0,1,2,4,5,6,9 0.0167 0,1,2,4,5,8,9 0.0137 0,1,4,5 0.0110 0,1,2,4,5,9,10 0.0100 0,1,2,3,9 0.0097 0,1,2,9 0.0093 0,1,2,4,5,7,9 0.0092 0,1,2,6,9 0.0092 99 / 143
  100. 100. Computational Bayesian Statistics Regression and variable selection Variable selection Stochastic search for the most likely model When k gets large, impossible to compute the posterior probabilities of the 2k models. Need of a tailored algorithm that samples from π(γ|y, X) and selects the most likely models. Can be done by Gibbs sampling , given the availability of the full conditional posterior probabilities of the γi ’s. If γ−i = (γ1 , . . . , γi−1 , γi+1 , . . . , γk ) (1 ≤ i ≤ k) π(γi |y, γ−i , X) ∝ π(γ|y, X) (to be evaluated in both γi = 0 and γi = 1) 100 / 143
  101. 101. Computational Bayesian Statistics Regression and variable selection Variable selection Gibbs sampling for variable selection Initialization: Draw γ 0 from the uniform distribution on Γ (t−1) (t−1) Iteration t: Given (γ1 , . . . , γk ), generate (t) (t−1) (t−1) 1. γ1 according to π(γ1 |y, γ2 , . . . , γk , X) (t) 2. γ2 according to (t) (t−1) (t−1) π(γ2 |y, γ1 , γ3 , . . . , γk , X) . . . (t) (t) (t) p. γk according to π(γk |y, γ1 , . . . , γk−1 , X) 101 / 143
  102. 102. Computational Bayesian Statistics Regression and variable selection Variable selection MCMC interpretation After T ≫ 1 MCMC iterations, output used to approximate the posterior probabilities π(γ|y, X) by empirical averages T 1 π(γ|y, X) = Iγ (t) =γ . T − T0 + 1 t=T0 where the T0 first values are eliminated as burnin. And approximation of the probability to include i-th variable, T 1 P π (γi = 1|y, X) = Iγ (t) =1 . T − T0 + 1 i t=T0 102 / 143
  103. 103. Computational Bayesian Statistics Regression and variable selection Variable selection Pine processionary caterpillars γi P π (γi = 1|y, X) P π (γi = 1|y, X) γ1 0.8624 0.8844 γ2 0.7060 0.7716 γ3 0.1482 0.2978 γ4 0.6671 0.7261 γ5 0.6515 0.7006 γ6 0.1678 0.3115 γ7 0.1371 0.2880 γ8 0.1555 0.2876 γ9 0.4039 0.5168 γ10 0.1151 0.2609 ˜ Probabilities of inclusion with both informative (β = 011 , c = 100) and noninformative Zellner’s priors 103 / 143
  104. 104. Computational Bayesian Statistics Generalized linear models Generalized linear models 3 Generalized linear models Generalisation of linear models Metropolis–Hastings algorithms The Probit Model The logit model 104 / 143
  105. 105. Computational Bayesian Statistics Generalized linear models Generalisation of linear models Generalisation of the linear dependence Broader class of models to cover various dependence structures. Class of generalised linear models (GLM) where y|x, β ∼ f (y|xT β) . i.e., dependence of y on x partly linear 105 / 143
  106. 106. Computational Bayesian Statistics Generalized linear models Generalisation of linear models Specifications of GLM’s Definition (GLM) A GLM is a conditional model specified by two functions: 1 the density f of y given x parameterised by its expectation parameter µ = µ(x) [and possibly its dispersion parameter ϕ = ϕ(x)] 2 the link g between the mean µ and the explanatory variables, written customarily as g(µ) = xT β or, equivalently, E[y|x, β] = g −1 (xT β). For identifiability reasons, g needs to be bijective. 106 / 143
  107. 107. Computational Bayesian Statistics Generalized linear models Generalisation of linear models Likelihood Obvious representation of the likelihood n ℓ(β, ϕ|y, X) = f yi |xiT β, ϕ i=1 with parameters β ∈ Rk and ϕ > 0. 107 / 143
  108. 108. Computational Bayesian Statistics Generalized linear models Generalisation of linear models Examples Case of binary and binomial data, when yi |xi ∼ B(ni , p(xi )) with known ni Logit [or logistic regression] model Link is logit transform on probability of success g(pi ) = log(pi /(1 − pi )) , with likelihood Y „ ni « „ exp(xiT β) «yi „ n 1 «ni −yi yi 1 + exp(xiT β) 1 + exp(xiT β) i=1 ( n )ffi n X Y“ ”ni −yi ∝ exp yi xiT β 1 + exp(xiT β) i=1 i=1 108 / 143
  109. 109. Computational Bayesian Statistics Generalized linear models Generalisation of linear models Canonical link Special link function g that appears in the natural exponential family representation of the density g ⋆ (µ) = θ if f (y|µ) ∝ exp{T (y) · θ − Ψ(θ)} Example Logit link is canonical for the binomial model, since ni pi f (yi |pi ) = exp yi log + ni log(1 − pi ) , yi 1 − pi and thus θi = log pi /(1 − pi ) 109 / 143
  110. 110. Computational Bayesian Statistics Generalized linear models Generalisation of linear models Examples (2) Customary to use the canonical link, but only customary ... Probit model Probit link function given by g(µi ) = Φ−1 (µi ) where Φ standard normal cdf Likelihood n ℓ(β|y, X) ∝ Φ(xiT β)yi (1 − Φ(xiT β))ni −yi . i=1 Full processing 110 / 143
  111. 111. Computational Bayesian Statistics Generalized linear models Generalisation of linear models Log-linear models Standard approach to describe associations between several categorical variables, i.e, variables with finite support Sufficient statistic: contingency table, made of the cross-classified counts for the different categorical variables. Example (Titanic survivors) Child Adult Survivor Class Male Female Male Female 1st 0 0 118 4 2nd 0 0 154 13 No 3rd 35 17 387 89 Crew 0 0 670 3 1st 5 1 57 140 2nd 11 13 14 80 Yes 3rd 13 14 75 76 Crew 0 0 192 20 111 / 143
  112. 112. Computational Bayesian Statistics Generalized linear models Generalisation of linear models Poisson regression model 1 Each count yi is Poisson with mean µi = µ(xi ) 2 Link function connecting R+ with R, e.g. logarithm g(µi ) = log(µi ). Corresponding likelihood n 1 ℓ(β|y, X) = exp yi xiT β − exp(xiT β) . i=1 yi ! 112 / 143
  113. 113. Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Metropolis–Hastings algorithms Convergence assessment Posterior inference in GLMs harder than for linear models c Working with a GLM requires specific numerical or simulation tools [E.g., GLIM in classical analyses] Opportunity to introduce universal MCMC method: Metropolis–Hastings algorithm 113 / 143
  114. 114. Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Generic MCMC sampler Metropolis–Hastings algorithms are generic/down-the-shelf MCMC algorithms Only require likelihood up to a constant [difference with Gibbs sampler] can be tuned with a wide range of possibilities [difference with Gibbs sampler & blocking] natural extensions of standard simulation algorithms: based on the choice of a proposal distribution [difference in Markov proposal q(x, y) and acceptance] 114 / 143
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×