0
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models




Generalized linea...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation o...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hasti...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model
...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model
...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model
...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model
...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model
...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model
...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
          Loglinear m...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
          Loglinear m...
Bayesian Core: Chapter 4
Bayesian Core: Chapter 4
Bayesian Core: Chapter 4
Bayesian Core: Chapter 4
Upcoming SlideShare
Loading in...5
×

Bayesian Core: Chapter 4

1,637

Published on

This is the third part (Chapter 4) of our course associated with Bayesian Core.

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,637
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
86
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Bayesian Core: Chapter 4"

  1. 1. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalized linear models Generalized linear models 3 Generalisation of linear models Metropolis–Hastings algorithms The Probit Model The logit model Loglinear models 313 / 785
  2. 2. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Generalisation of Linear Models Linear models model connection between a response variable y and a set x of explanatory variables by a linear dependence relation with [approximately] normal perturbations. 314 / 785
  3. 3. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Generalisation of Linear Models Linear models model connection between a response variable y and a set x of explanatory variables by a linear dependence relation with [approximately] normal perturbations. Many instances where either of these assumptions not appropriate, e.g. when the support of y restricted to R+ or to N. 315 / 785
  4. 4. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models bank Four measurements on 100 genuine Swiss banknotes and 100 counterfeit ones: x1 length of the bill (in mm), x2 width of the left edge (in mm), x3 width of the right edge (in mm), x4 bottom margin width (in mm). Response variable y: status of the banknote [0 for genuine and 1 for counterfeit] 316 / 785
  5. 5. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models bank Four measurements on 100 genuine Swiss banknotes and 100 counterfeit ones: x1 length of the bill (in mm), x2 width of the left edge (in mm), x3 width of the right edge (in mm), x4 bottom margin width (in mm). Response variable y: status of the banknote [0 for genuine and 1 for counterfeit] Probabilistic model that predicts counterfeiting based on the four measurements 317 / 785
  6. 6. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models The impossible linear model Example of the influence of x4 on y Since y is binary, y|x4 ∼ B(p(x4 )) , c Normal model is impossible 318 / 785
  7. 7. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models The impossible linear model Example of the influence of x4 on y Since y is binary, y|x4 ∼ B(p(x4 )) , c Normal model is impossible Linear dependence in p(x) = E[y|x]’s p(x4i ) = β0 + β1 x4i , 319 / 785
  8. 8. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models The impossible linear model Example of the influence of x4 on y Since y is binary, y|x4 ∼ B(p(x4 )) , c Normal model is impossible Linear dependence in p(x) = E[y|x]’s p(x4i ) = β0 + β1 x4i , estimated [by MLE] as pi = −2.02 + 0.268 xi4 ˆ which gives pi = .12 for xi4 = 8 and ... ˆ 320 / 785
  9. 9. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models The impossible linear model Example of the influence of x4 on y Since y is binary, y|x4 ∼ B(p(x4 )) , c Normal model is impossible Linear dependence in p(x) = E[y|x]’s p(x4i ) = β0 + β1 x4i , estimated [by MLE] as pi = −2.02 + 0.268 xi4 ˆ which gives pi = .12 for xi4 = 8 and ... pi = 1.19 for xi4 = 12!!! ˆ ˆ c Linear dependence is impossible 321 / 785
  10. 10. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Generalisation of the linear dependence Broader class of models to cover various dependence structures. 322 / 785
  11. 11. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Generalisation of the linear dependence Broader class of models to cover various dependence structures. Class of generalised linear models (GLM) where y|x, β ∼ f (y|xT β) . i.e., dependence of y on x partly linear 323 / 785
  12. 12. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Notations Same as in linear regression chapter, with n–sample y = (y1 , . . . , yn ) and corresponding explanatory variables/covariates   x11 x12 . . . x1k  x21 x22 . . . x2k     x32 . . . x3k  X =  x31  . . . . . . . . . . . . xn1 xn2 . . . xnk 324 / 785
  13. 13. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Specifications of GLM’s Definition (GLM) A GLM is a conditional model specified by two functions: the density f of y given x parameterised by its expectation 1 parameter µ = µ(x) [and possibly its dispersion parameter ϕ = ϕ(x)] 325 / 785
  14. 14. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Specifications of GLM’s Definition (GLM) A GLM is a conditional model specified by two functions: the density f of y given x parameterised by its expectation 1 parameter µ = µ(x) [and possibly its dispersion parameter ϕ = ϕ(x)] the link g between the mean µ and the explanatory variables, 2 written customarily as g(µ) = xT β or, equivalently, E[y|x, β] = g −1 (xT β). 326 / 785
  15. 15. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Specifications of GLM’s Definition (GLM) A GLM is a conditional model specified by two functions: the density f of y given x parameterised by its expectation 1 parameter µ = µ(x) [and possibly its dispersion parameter ϕ = ϕ(x)] the link g between the mean µ and the explanatory variables, 2 written customarily as g(µ) = xT β or, equivalently, E[y|x, β] = g −1 (xT β). For identifiability reasons, g needs to be bijective. 327 / 785
  16. 16. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Likelihood Obvious representation of the likelihood n f yi |xiT β, ϕ ℓ(β, ϕ|y, X) = i=1 with parameters β ∈ Rk and ϕ > 0. 328 / 785
  17. 17. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Examples Ordinary linear regression Case of GLM where g(x) = x, ϕ = σ 2 , and y|X, β, σ 2 ∼ Nn (Xβ, σ 2 ). 329 / 785
  18. 18. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Examples (2) Case of binary and binomial data, when yi |xi ∼ B(ni , p(xi )) with known ni Logit [or logistic regression] model Link is logit transform on probability of success g(pi ) = log(pi /(1 − pi )) , with likelihood Y „ ni « „ exp(xiT β) «yi „ «ni −yi n 1 yi 1 + exp(xiT β) 1 + exp(xiT β) i=1 (n )ffi n Y“ ”ni −yi X yi xiT β 1 + exp(xiT β) ∝ exp i=1 i=1 330 / 785
  19. 19. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Canonical link Special link function g that appears in the natural exponential family representation of the density g ⋆ (µ) = θ if f (y|µ) ∝ exp{T (y) · θ − Ψ(θ)} 331 / 785
  20. 20. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Canonical link Special link function g that appears in the natural exponential family representation of the density g ⋆ (µ) = θ if f (y|µ) ∝ exp{T (y) · θ − Ψ(θ)} Example Logit link is canonical for the binomial model, since ni pi f (yi |pi ) = exp yi log + ni log(1 − pi ) , yi 1 − pi and thus θi = log pi /(1 − pi ) 332 / 785
  21. 21. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Examples (3) Customary to use the canonical link, but only customary ... Probit model Probit link function given by g(µi ) = Φ−1 (µi ) where Φ standard normal cdf Likelihood n Φ(xiT β)yi (1 − Φ(xiT β))ni −yi . ℓ(β|y, X) ∝ i=1 Full processing 333 / 785
  22. 22. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Log-linear models Standard approach to describe associations between several categorical variables, i.e, variables with finite support Sufficient statistic: contingency table, made of the cross-classified counts for the different categorical variables. Full entry to loglinear models 334 / 785
  23. 23. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Log-linear models Standard approach to describe associations between several categorical variables, i.e, variables with finite support Sufficient statistic: contingency table, made of the cross-classified counts for the different categorical variables. Full entry to loglinear models Example (Titanic survivors) Child Adult Survivor Class Male Female Male Female 1st 0 0 118 4 2nd 0 0 154 13 No 3rd 35 17 387 89 Crew 0 0 670 3 1st 5 1 57 140 2nd 11 13 14 80 Yes 3rd 13 14 75 76 Crew 0 0 192 20 335 / 785
  24. 24. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Poisson regression model Each count yi is Poisson with mean µi = µ(xi ) 1 Link function connecting R+ with R, e.g. logarithm 2 g(µi ) = log(µi ). 336 / 785
  25. 25. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Poisson regression model Each count yi is Poisson with mean µi = µ(xi ) 1 Link function connecting R+ with R, e.g. logarithm 2 g(µi ) = log(µi ). Corresponding likelihood n 1 exp yi xiT β − exp(xiT β) . ℓ(β|y, X) = yi ! i=1 337 / 785
  26. 26. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Metropolis–Hastings algorithms Convergence assessment Posterior inference in GLMs harder than for linear models 338 / 785
  27. 27. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Metropolis–Hastings algorithms Convergence assessment Posterior inference in GLMs harder than for linear models c Working with a GLM requires specific numerical or simulation tools [E.g., GLIM in classical analyses] 339 / 785
  28. 28. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Metropolis–Hastings algorithms Convergence assessment Posterior inference in GLMs harder than for linear models c Working with a GLM requires specific numerical or simulation tools [E.g., GLIM in classical analyses] Opportunity to introduce universal MCMC method: Metropolis–Hastings algorithm 340 / 785
  29. 29. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Generic MCMC sampler Metropolis–Hastings algorithms are generic/down-the-shelf MCMC algorithms Only require likelihood up to a constant [difference with Gibbs sampler] can be tuned with a wide range of possibilities [difference with Gibbs sampler & blocking] natural extensions of standard simulation algorithms: based on the choice of a proposal distribution [difference in Markov proposal q(x, y) and acceptance] 341 / 785
  30. 30. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Why Metropolis? Originally introduced by Metropolis, Rosenbluth, Rosenbluth, Teller and Teller in a setup of optimization on a discrete state-space. All authors involved in Los Alamos during and after WWII: 342 / 785
  31. 31. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Why Metropolis? Originally introduced by Metropolis, Rosenbluth, Rosenbluth, Teller and Teller in a setup of optimization on a discrete state-space. All authors involved in Los Alamos during and after WWII: Physicist and mathematician, Nicholas Metropolis is considered (with Stanislaw Ulam) to be the father of Monte Carlo methods. Also a physicist, Marshall Rosenbluth worked on the development of the hydrogen (H) bomb Edward Teller was one of the first scientists to work on the Manhattan Project that led to the production of the A bomb. Also managed to design with Ulam the H bomb. 343 / 785
  32. 32. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Generic Metropolis–Hastings sampler For target π and proposal kernel q(x, y) Initialization: Choose an arbitrary x(0) 344 / 785
  33. 33. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Generic Metropolis–Hastings sampler For target π and proposal kernel q(x, y) Initialization: Choose an arbitrary x(0) Iteration t: 1 Given x(t−1) , generate x ∼ q(x(t−1) , x) ˜ 2 Calculate π(˜)/q(x(t−1) , x) x ˜ ρ(x(t−1) , x) = min ˜ ,1 (t−1) )/q(˜, x(t−1) ) π(x x With probability ρ(x(t−1) , x) accept x and set x(t) = x; ˜ ˜ ˜ 3 otherwise reject x and set x(t) = x(t−1) . ˜ 345 / 785
  34. 34. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Universality Algorithm only needs to simulate from q which can be chosen [almost!] arbitrarily, i.e. unrelated with π [q also called instrumental distribution] Note: π and q known up to proportionality terms ok since proportionality constants cancel in ρ. 346 / 785
  35. 35. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Validation Markov chain theory Target π is stationary distribution of Markov chain (x(t) )t because probability ρ(x, y) satisfies detailed balance equation π(x)q(x, y)ρ(x, y) = π(y)q(y, x)ρ(y, x) [Integrate out x to see that π is stationary] 347 / 785
  36. 36. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Validation Markov chain theory Target π is stationary distribution of Markov chain (x(t) )t because probability ρ(x, y) satisfies detailed balance equation π(x)q(x, y)ρ(x, y) = π(y)q(y, x)ρ(y, x) [Integrate out x to see that π is stationary] For convergence/ergodicity, Markov chain must be irreducible: q has positive probability of reaching all areas with positive π probability in a finite number of steps. 348 / 785
  37. 37. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of proposal Theoretical guarantees of convergence very high, but choice of q is crucial in practice. 349 / 785
  38. 38. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of proposal Theoretical guarantees of convergence very high, but choice of q is crucial in practice. Poor choice of q may result in very high rejection rates, with very few moves of the Markov chain (x(t) )t hardly moves, or in 350 / 785
  39. 39. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of proposal Theoretical guarantees of convergence very high, but choice of q is crucial in practice. Poor choice of q may result in very high rejection rates, with very few moves of the Markov chain (x(t) )t hardly moves, or in a myopic exploration of the support of π, that is, in a dependence on the starting value x(0) , with the chain stuck in a neighbourhood mode to x(0) . 351 / 785
  40. 40. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of proposal Theoretical guarantees of convergence very high, but choice of q is crucial in practice. Poor choice of q may result in very high rejection rates, with very few moves of the Markov chain (x(t) )t hardly moves, or in a myopic exploration of the support of π, that is, in a dependence on the starting value x(0) , with the chain stuck in a neighbourhood mode to x(0) . Note: hybrid MCMC Simultaneous use of different kernels valid and recommended 352 / 785
  41. 41. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms The independence sampler Pick proposal q that is independent of its first argument, q(x, y) = q(y) ρ simplifies into π(y)/q(y) ρ(x, y) = min 1, . π(x)/q(x) Special case: q ∝ π Reduces to ρ(x, y) = 1 and iid sampling 353 / 785
  42. 42. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms The independence sampler Pick proposal q that is independent of its first argument, q(x, y) = q(y) ρ simplifies into π(y)/q(y) ρ(x, y) = min 1, . π(x)/q(x) Special case: q ∝ π Reduces to ρ(x, y) = 1 and iid sampling Analogy with Accept-Reject algorithm where max π/q replaced with the current value π(x(t−1) )/q(x(t−1) ) but sequence of accepted x(t) ’s not i.i.d. 354 / 785
  43. 43. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of q Convergence properties highly dependent on q. q needs to be positive everywhere on the support of π for a good exploration of this support, π/q needs to be bounded. 355 / 785
  44. 44. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of q Convergence properties highly dependent on q. q needs to be positive everywhere on the support of π for a good exploration of this support, π/q needs to be bounded. Otherwise, the chain takes too long to reach regions with low q/π values. 356 / 785
  45. 45. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms The random walk sampler Independence sampler requires too much global information about π: opt for a local gathering of information 357 / 785
  46. 46. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms The random walk sampler Independence sampler requires too much global information about π: opt for a local gathering of information Means exploration of the neighbourhood of the current value x(t) in search of other points of interest. 358 / 785
  47. 47. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms The random walk sampler Independence sampler requires too much global information about π: opt for a local gathering of information Means exploration of the neighbourhood of the current value x(t) in search of other points of interest. Simplest exploration device is based on random walk dynamics. 359 / 785
  48. 48. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Random walks Proposal is a symmetric transition density q(x, y) = qRW (y − x) = qRW (x − y) 360 / 785
  49. 49. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Random walks Proposal is a symmetric transition density q(x, y) = qRW (y − x) = qRW (x − y) Acceptance probability ρ(x, y) reduces to the simpler form π(y) ρ(x, y) = min 1, . π(x) Only depends on the target π [accepts all proposed values that increase π] 361 / 785
  50. 50. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of qRW Considerable flexibility in the choice of qRW , tails: Normal versus Student’s t scale: size of the neighbourhood 362 / 785
  51. 51. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of qRW Considerable flexibility in the choice of qRW , tails: Normal versus Student’s t scale: size of the neighbourhood Can also be used for restricted support targets [with a waste of simulations near the boundary] 363 / 785
  52. 52. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of qRW Considerable flexibility in the choice of qRW , tails: Normal versus Student’s t scale: size of the neighbourhood Can also be used for restricted support targets [with a waste of simulations near the boundary] Can be tuned towards an acceptance probability of 0.234 at the burnin stage [Magic number!] 364 / 785
  53. 53. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Convergence assessment Capital question: How many iterations do we need to run??? MCMC debuts 365 / 785
  54. 54. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Convergence assessment Capital question: How many iterations do we need to run??? MCMC debuts Rule # 1 There is no absolute number of simulations, i.e. 1, 000 is neither large, nor small. Rule # 2 It takes [much] longer to check for convergence than for the chain itself to converge. Rule # 3 MCMC is a “what-you-get-is-what-you-see” algorithm: it fails to tell about unexplored parts of the space. Rule # 4 When in doubt, run MCMC chains in parallel and check for consistency. 366 / 785
  55. 55. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Convergence assessment Capital question: How many iterations do we need to run??? MCMC debuts Rule # 1 There is no absolute number of simulations, i.e. 1, 000 is neither large, nor small. Rule # 2 It takes [much] longer to check for convergence than for the chain itself to converge. Rule # 3 MCMC is a “what-you-get-is-what-you-see” algorithm: it fails to tell about unexplored parts of the space. Rule # 4 When in doubt, run MCMC chains in parallel and check for consistency. Many “quick-&-dirty” solutions in the literature, but not necessarily trustworthy. 367 / 785
  56. 56. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Prohibited dynamic updating Tuning the proposal in terms of its past performances can only be implemented at burnin, because otherwise this cancels Markovian convergence properties. 368 / 785
  57. 57. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Prohibited dynamic updating Tuning the proposal in terms of its past performances can only be implemented at burnin, because otherwise this cancels Markovian convergence properties. Use of several MCMC proposals together within a single algorithm using circular or random design is ok. It almost always brings an improvement compared with its individual components (at the cost of increased simulation time) 369 / 785
  58. 58. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Effective sample size How many iid simulations from π are equivalent to N simulations from the MCMC algorithm? 370 / 785
  59. 59. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Effective sample size How many iid simulations from π are equivalent to N simulations from the MCMC algorithm? Based on estimated k-th order auto-correlation, ρk = corr x(t) , x(t+k) , effective sample size −1/2 T0 ess ρ2 N =n 1+2 ˆk , k=1 Only partial indicator that fails to signal chains stuck in one mode of the target 371 / 785
  60. 60. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model The Probit Model Likelihood Recall Probit n Φ(xiT β)yi (1 − Φ(xiT β))ni −yi . ℓ(β|y, X) ∝ i=1 372 / 785
  61. 61. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model The Probit Model Likelihood Recall Probit n Φ(xiT β)yi (1 − Φ(xiT β))ni −yi . ℓ(β|y, X) ∝ i=1 If no prior information available, resort to the flat prior π(β) ∝ 1 and then obtain the posterior distribution n yi ni −yi Φ xiT β 1 − Φ(xiT β) π(β|y, X) ∝ , i=1 nonstandard and simulated using MCMC techniques. 373 / 785
  62. 62. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model MCMC resolution Metropolis–Hastings random walk sampler works well for binary regression problems with small number of predictors ˆ Uses the maximum likelihood estimate β as starting value and ˆ asymptotic (Fisher) covariance matrix of the MLE, Σ, as scale 374 / 785
  63. 63. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model MLE proposal R function glm very useful to get the maximum likelihood estimate ˆ of β and its asymptotic covariance matrix Σ. Terminology used in R program mod=summary(glm(y~X-1,family=binomial(link=quot;probitquot;))) ˆ ˆ with mod$coeff[,1] denoting β and mod$cov.unscaled Σ. 375 / 785
  64. 64. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model MCMC algorithm Probit random-walk Metropolis-Hastings ˆ ˆ Initialization: Set β (0) = β and compute Σ Iteration t: ˜ ˆ Generate β ∼ Nk+1 (β (t−1) , τ Σ) 1 Compute 2 ˜ π(β|y) ˜ ρ(β (t−1) , β) = min 1, π(β (t−1) |y) ˜ ˜ With probability ρ(β (t−1) , β) set β (t) = β; 3 (t) (t−1) otherwise set β = β . 376 / 785
  65. 65. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model bank Probit modelling with 0.8 −1.0 1.0 0.4 no intercept over the −2.0 0.0 0.0 0 4000 8000 −2.0 −1.5 −1.0 −0.5 0 200 600 1000 four measurements. Three different scales 3 0.0 0.4 0.8 2 0.4 1 τ = 1, 0.1, 10: best −1 0.0 mixing behavior is 0 4000 8000 −1 0 1 2 3 0 200 600 1000 associated with τ = 1. 2.5 0.8 0.0 0.4 0.8 −0.5 1.0 0.4 Average of the 0.0 parameters over 9, 000 0 4000 8000 −0.5 0.5 1.5 2.5 0 200 600 1000 iterations gives plug-in 1.8 0.0 0.4 0.8 2.0 estimate 1.2 1.0 0.0 0.6 0 4000 8000 0.6 1.0 1.4 1.8 0 200 600 1000 pi = Φ (−1.2193xi1 + 0.9540xi2 + 0.9795xi3 + 1.1481xi4 ) . ˆ 377 / 785
  66. 66. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model G-priors for probit models Flat prior on β inappropriate for comparison purposes and Bayes factors. 378 / 785
  67. 67. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model G-priors for probit models Flat prior on β inappropriate for comparison purposes and Bayes factors. Replace the flat prior with a hierarchical prior, β|σ 2 , X ∼ Nk 0k , σ 2 (X T X)−1 and π(σ 2 |X) ∝ σ −3/2 , (almost) as in normal linear regression 379 / 785
  68. 68. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model G-priors for probit models Flat prior on β inappropriate for comparison purposes and Bayes factors. Replace the flat prior with a hierarchical prior, β|σ 2 , X ∼ Nk 0k , σ 2 (X T X)−1 and π(σ 2 |X) ∝ σ −3/2 , (almost) as in normal linear regression Warning The matrix X T X is not the Fisher information matrix 380 / 785
  69. 69. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model G-priors for testing Same argument as before: while π is improper, use of the same variance factor σ 2 in both models means the normalising constant cancels in the Bayes factor. 381 / 785
  70. 70. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model G-priors for testing Same argument as before: while π is improper, use of the same variance factor σ 2 in both models means the normalising constant cancels in the Bayes factor. Posterior distribution of β “ ”−(2k−1)/4 |X T X|1/2 Γ((2k − 1)/4) β T (X T X)β π −k/2 π(β|y, X) ∝ h i1−yi Y n Φ(xiT β)yi 1 − Φ(xiT β) × i=1 [where k matters!] 382 / 785
  71. 71. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Marginal approximation Marginal Z“ ”−(2k−1)/4 |X T X|1/2 π −k/2 Γ{(2k − 1)/4} β T (X T X)β f (y|X) ∝ h i1−yi Y n Φ(xiT β)yi 1 − (Φ(xiT β) × dβ, i=1 approximated by M ˛˛ ˛˛−(2k−1)/2 Y h i1−yi |X T X|1/2 X ˛˛ n (m) ˛˛ Φ(xiT β (m) )yi 1 − Φ(xiT β (m) ) ˛˛Xβ ˛˛ π k/2 M m=1 i=1 bb b (m) −β)T V −1 (β (m) −β)/4 b × Γ{(2k − 1)/4} |V |1/2 (4π)k/2 e+(β , where β (m) ∼ Nk (β, 2 V ) with β MCMC approximation of Eπ [β|y, X] and V MCMC approximation of V(β|y, X). 383 / 785
  72. 72. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Linear hypothesis Linear restriction on β H0 : Rβ = r (r ∈ Rq , R q × k matrix) where β 0 is (k − q) dimensional and X0 and x0 are linear transforms of X and of x of dimensions (n, k − q) and (k − q). Likelihood n 1−yi 0 Φ(xiT β 0 )yi 1 − Φ(xiT β 0 ) ℓ(β |y, X0 ) ∝ , 0 0 i=1 384 / 785
  73. 73. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Linear test Associated [projected] G-prior β 0 |σ 2 , X0 ∼ Nk−q 0k−q , σ 2 (X0 X0 )−1 T and π(σ 2 |X0 ) ∝ σ −3/2 , 385 / 785
  74. 74. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Linear test Associated [projected] G-prior β 0 |σ 2 , X0 ∼ Nk−q 0k−q , σ 2 (X0 X0 )−1 T and π(σ 2 |X0 ) ∝ σ −3/2 , Marginal distribution of y of the same type  ffZ ˛˛ ˛˛ (2(k − q) − 1) ˛˛Xβ 0 ˛˛−(2(k−q)−1)/2 |X0 X0 |1/2 π −(k−q)/2 Γ T f (y|X0 ) ∝ 4 h i1−yi Y n Φ(xiT β 0 )yi 1 − (Φ(xiT β 0 ) dβ 0 . 0 0 i=1 386 / 785
  75. 75. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model banknote π For H0 : β1 = β2 = 0, B10 = 157.73 [against H0 ] Generic regression-like output: Estimate Post. var. log10(BF) X1 -1.1552 0.0631 4.5844 (****) X2 0.9200 0.3299 -0.2875 X3 0.9121 0.2595 -0.0972 X4 1.0820 0.0287 15.6765 (****) evidence against H0: (****) decisive, (***) strong, (**) subtantial, (*) poor 387 / 785
  76. 76. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Informative settings If prior information available on p(x), transform into prior distribution on β by technique of imaginary observations: 388 / 785
  77. 77. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Informative settings If prior information available on p(x), transform into prior distribution on β by technique of imaginary observations: Start with k different values of the covariate vector, x1 , . . . , xk ˜ ˜ For each of these values, the practitioner specifies (i) a prior guess gi at the probability pi associated with xi ; (ii) an assessment of (un)certainty about that guess given by a number Ki of equivalent “prior observations”. 389 / 785
  78. 78. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Informative settings If prior information available on p(x), transform into prior distribution on β by technique of imaginary observations: Start with k different values of the covariate vector, x1 , . . . , xk ˜ ˜ For each of these values, the practitioner specifies (i) a prior guess gi at the probability pi associated with xi ; (ii) an assessment of (un)certainty about that guess given by a number Ki of equivalent “prior observations”. On how many imaginary observations did you build this guess? 390 / 785
  79. 79. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Informative prior k pKi gi −1 (1 − pi )Ki (1−gi )−1 π(p1 , . . . , pk ) ∝ i i=1 translates into [Jacobian rule] k Ki (1−gi )−1 Φ(˜ iT β)Ki gi −1 1 − Φ(˜ iT β) φ(˜ iT β) π(β) ∝ x x x i=1 391 / 785
  80. 80. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Informative prior k pKi gi −1 (1 − pi )Ki (1−gi )−1 π(p1 , . . . , pk ) ∝ i i=1 translates into [Jacobian rule] k Ki (1−gi )−1 Φ(˜ iT β)Ki gi −1 1 − Φ(˜ iT β) φ(˜ iT β) π(β) ∝ x x x i=1 [Almost] equivalent to using the G-prior 0 #−1 1 quot; X k β ∼ Nk @0k , A xj xjT ˜˜ j=1 392 / 785
  81. 81. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model The logit model Recall that [for ni = 1] exp(µi ) yi |µi ∼ B(1, µi ), ϕ = 1 and g(µi ) = . 1 + exp(µi ) Thus exp(xiT β) P(yi = 1|β) = 1 + exp(xiT β) with likelihood yi 1−yi n exp(xiT β) exp(xiT β) ℓ(β|y, X) = 1− 1 + exp(xiT β) 1 + exp(xiT β) i=1 393 / 785
  82. 82. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model Links with probit usual vague prior for β, π(β) ∝ 1 394 / 785
  83. 83. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model Links with probit usual vague prior for β, π(β) ∝ 1 Posterior given by n yi 1−yi exp(xiT β) exp(xiT β) π(β|y, X) ∝ 1− 1 + exp(xiT β) 1 + exp(xiT β) i=1 [intractable] 395 / 785
  84. 84. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model Links with probit usual vague prior for β, π(β) ∝ 1 Posterior given by n yi 1−yi exp(xiT β) exp(xiT β) π(β|y, X) ∝ 1− 1 + exp(xiT β) 1 + exp(xiT β) i=1 [intractable] Same Metropolis–Hastings sampler 396 / 785
  85. 85. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model bank −1 0.6 0.0 0.4 0.8 −3 Same scale factor equal to 0.3 −5 0.0 τ = 1: slight increase in 0 4000 8000 −5 −4 −3 −2 −1 0 200 600 1000 the skewness of the 246 0.0 0.4 0.8 0.2 histograms of the βi ’s. 0.0 −2 0 4000 8000 −2 0 2 4 6 0 200 600 1000 0.4 246 0.0 0.4 0.8 Plug-in estimate of 0.2 0.0 −2 predictive probability of a 0 4000 8000 −2 0 2 4 6 0 200 600 1000 counterfeit 3.5 0.0 0.4 0.8 2.5 0.6 1.5 0.0 0 4000 8000 1.5 2.5 3.5 0 200 600 1000 exp (−2.5888xi1 + 1.9967xi2 + 2.1260xi3 + 2.1879xi4 ) pi = ˆ . 1 + exp (−2.5888xi1 + 1.9967xi2 + 2.1260xi3 + 2.1879xi4 ) 397 / 785
  86. 86. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model G-priors for logit models Same story: Flat prior on β inappropriate for Bayes factors, to be replaced with hierarchical prior, β|σ 2 , X ∼ Nk 0k , σ 2 (X T X)−1 and π(σ 2 |X) ∝ σ −3/2 Example (bank) Estimate Post. var. log10(BF) X1 -2.3970 0.3286 4.8084 (****) X2 1.6978 1.2220 -0.2453 X3 2.1197 1.0094 -0.1529 X4 2.0230 0.1132 15.9530 (****) evidence against H0: (****) decisive, (***) strong, (**) subtantial, (*) poor 398 / 785
  87. 87. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Loglinear models Introduction to loglinear models Example (airquality) Benchmark in R > air=data(airquality) Repeated measurements over 111 consecutive days of ozone u (in parts per billion) and maximum daily temperature v discretized into dichotomous variables month 5 6 7 8 9 ozone temp [1,31] [57,79] 17 4 2 5 18 (79,97] 0 2332 (31,168] [57,79] 6 1031 (79,97] 1 2 21 12 8 Contingency table with 5 × 2 × 2 = 20 entries 399 / 785
  88. 88. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Poisson regression Observations/counts y = (y1 , . . . , yn ) are integers, so we can choose yi ∼ P(µi ) Saturated likelihood n 1 yi ℓ(µ|y) = µ exp(−µi ) µi ! i i=1 400 / 785
  89. 89. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Poisson regression Observations/counts y = (y1 , . . . , yn ) are integers, so we can choose yi ∼ P(µi ) Saturated likelihood n 1 yi ℓ(µ|y) = µ exp(−µi ) µi ! i i=1 GLM constraint via log-linear link iT β log(µi ) = xiT β , yi |xi ∼ P ex 401 / 785
  90. 90. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Categorical variables Special feature Incidence matrix X = (xi ) such that its elements are all zeros or ones, i.e. covariates are all indicators/dummy variables! 402 / 785
  91. 91. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Categorical variables Special feature Incidence matrix X = (xi ) such that its elements are all zeros or ones, i.e. covariates are all indicators/dummy variables! Several types of (sub)models are possible depending on relations between categorical variables. 403 / 785
  92. 92. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Categorical variables Special feature Incidence matrix X = (xi ) such that its elements are all zeros or ones, i.e. covariates are all indicators/dummy variables! Several types of (sub)models are possible depending on relations between categorical variables. Re-special feature Variable selection problem of a specific kind, in the sense that all indicators related with the same association must either remain or vanish at once. Thus much fewer submodels than in a regular variable selection problem. 404 / 785
  93. 93. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Parameterisations Example of three variables 1 ≤ u ≤ I, 1 ≤ v ≤ j and 1 ≤ w ≤ K. 405 / 785
  94. 94. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Parameterisations Example of three variables 1 ≤ u ≤ I, 1 ≤ v ≤ j and 1 ≤ w ≤ K. Simplest non-constant model is I J K u v w log(µτ ) = βb Ib (uτ ) + βb Ib (vτ ) + βb Ib (wτ ) , b=1 b=1 b=1 that is, u v w log(µl(i,j,k) ) = βi + βj + βk , where index l(i, j, k) corresponds to u = i, v = j and w = k. 406 / 785
  95. 95. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Parameterisations Example of three variables 1 ≤ u ≤ I, 1 ≤ v ≤ j and 1 ≤ w ≤ K. Simplest non-constant model is I J K u v w log(µτ ) = βb Ib (uτ ) + βb Ib (vτ ) + βb Ib (wτ ) , b=1 b=1 b=1 that is, u v w log(µl(i,j,k) ) = βi + βj + βk , where index l(i, j, k) corresponds to u = i, v = j and w = k. Saturated model is uvw log(µl(i,j,k) ) = βijk 407 / 785
  96. 96. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Log-linear model (over-)parameterisation Representation log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw , i j ij k ik jk ijk as in Anova models. 408 / 785
  97. 97. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Log-linear model (over-)parameterisation Representation log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw , i j ij k ik jk ijk as in Anova models. λ appears as the overall or reference average effect λu appears as the marginal discrepancy (against the reference i effect λ) when u = i, λuv as the interaction discrepancy (against the added effects ij λ + λu + λv ) when (u, v) = (i, j) i j and so on... 409 / 785
  98. 98. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Example of submodels if both v and w are irrelevant, then 1 u log(µl(i,j,k) ) = λ + λi , if all three categorical variables are mutually independent, then 2 log(µl(i,j,k) ) = λ + λi + λv + λw , u j k if u and v are associated but are both independent of w, then 3 log(µl(i,j,k) ) = λ + λi + λv + λw + λuv , u j k ij if u and v are conditionally independent given w, then 4 log(µl(i,j,k) ) = λ + λi + λv + λw + λuw + λvw , u j k ik jk if there is no three-factor interaction, then 5 log(µl(i,j,k) ) = λ + λi + λv + λw + λuv + λuw + λvw u j k ij ik jk [the most complete submodel] 410 / 785
  99. 99. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Identifiability Representation log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw , i j ij k ik jk ijk not identifiable but Bayesian approach handles non-identifiable settings and still estimate properly identifiable quantities. 411 / 785
  100. 100. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Identifiability Representation log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw , i j ij k ik jk ijk not identifiable but Bayesian approach handles non-identifiable settings and still estimate properly identifiable quantities. Customary to impose identifiability constraints on the parameters: set to 0 parameters corresponding to the first category of each variable, i.e. remove the indicator of the first category. E.g., if u ∈ {1, 2} and v ∈ {1, 2}, constraint could be λu = λv = λuv = λuv = λuv = 0 . 1 1 11 12 21 412 / 785
  101. 101. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Inference under a flat prior Noninformative prior π(β) ∝ 1 gives posterior distribution n yi exp(xiT β) exp{− exp(xiT β)} π(β|y, X) ∝ i=1 n n yi xiT β − exp(xiT β) = exp i=1 i=1 413 / 785
  102. 102. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Inference under a flat prior Noninformative prior π(β) ∝ 1 gives posterior distribution n yi exp(xiT β) exp{− exp(xiT β)} π(β|y, X) ∝ i=1 n n yi xiT β − exp(xiT β) = exp i=1 i=1 Use of same random walk M-H algorithm as in probit and logit cases, starting with MLE evaluation > mod=summary(glm(y~-1+X,family=poisson())) 414 / 785
  103. 103. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models airquality Effect Post. mean Post. var. λ 2.8041 0.0612 λu -1.0684 0.2176 2 λv -5.8652 1.7141 2 λw -1.4401 0.2735 2 λw -2.7178 0.7915 3 Identifiable non-saturated model λw -1.1031 0.2295 4 involves 16 parameters λw -0.0036 0.1127 5 λuv Obtained with 10, 000 MCMC 3.3559 0.4490 22 λuw -1.6242 1.2869 iterations with scale factor 22 λuw - 0.3456 0.8432 τ 2 = 0.5 23 λuw -0.2473 0.6658 24 λuw -1.3335 0.7115 25 λvw 4.5493 2.1997 22 λvw 6.8479 2.5881 23 λvw 4.6557 1.7201 24 λvw 3.9558 1.7128 25 415 / 785
  104. 104. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models airquality: MCMC output 0.0 −2 0.5 3.0 −1.5 −1.0 −6 −3.0 −10 −2.5 2.0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 6 1.0 0.0 −2 5 4 −1.5 0.0 −4 3 −6 −1.0 2 −3.0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 0 0 1 1 −2 −2 −1 −1 −4 −5 −3 −3 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 12 8 8 8 6 6 8 4 4 4 2 4 2 0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 416 / 785
  105. 105. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models airquality: MCMC output 1400 800 0.0 −2 0.5 800 3.0 −1.5 400 −1.0 −6 400 400 600 −3.0 −10 −2.5 2.0 0 0 0 0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 2.0 3.0 −2.5 −1.0 0.5 −10 −6 −2 −3.0 −1.5 0.0 1000 1000 6 1.0 0.0 800 −2 5 400 4 −1.5 0.0 400 −4 400 400 3 −6 −1.0 2 −3.0 0 0 0 0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 −7 −5 −3 −1 −3.0 −1.0 0.5 −1.0 0.0 1.0 2 4 6 800 800 800 0 0 1 1 400 400 −2 400 400 −2 −1 −1 −4 −5 −3 −3 0 0 0 0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 −5 −2 0 −3 −1 1 3 −3 −1 1 3 −4 −2 0 1200 1000 12 8 500 500 8 8 6 6 600 8 400 4 200 200 4 4 2 4 2 0 0 0 0 0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 0 4 8 4 8 12 2468 2468 417 / 785
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×