DIC, a few years later
Christian P. Robert
Universit´e Paris-Dauphine, University of Warwick, & CREST, Paris
Outline
DIC as in Dayesian
mixed-up definitions
Addendum
Bayesian model comparison
use posterior probabilities/Bayes factors
B12 = Θ1
f1(y|θ1) dπ1(θ1)
Θ2
f2(y|θ2) dπ2(θ2)
[Jeffreys...
Bayesian model comparison
use posterior probabilities/Bayes factors
B12 = Θ1
f1(y|θ1) dπ1(θ1)
Θ2
f2(y|θ2) dπ2(θ2)
[Jeffreys...
thou shalt not use the data twice
The data is used twice in the DIC method:
1. y used once to produce the posterior π(θ|y)...
Aitkin’s expected likelihood
Strong similarity with Aitkin’s (1991) posterior Bayes factor
f (y|θ)dπ(θ|y) ,
and Aitkin’s (...
Aitkin’s arguments
Central tool for Aitkin’s (2010) model fit is “posterior cdf” of the
likelihood,
F(z) = Prπ
(L(θ, y) > ε...
Aitkin’s arguments
Central tool for Aitkin’s (2010) model fit is “posterior cdf” of the
likelihood,
F(z) = Prπ
(L(θ, y) > ε...
counter-arguments by Gelman et al., 2013
using priors and posteriors is no guarantee that inference is
Bayesian
improper p...
DIC for missing data models
Framework of missing data models
f (y|θ) = f (y, z|θ)dz ,
with observed data y = (y1, . . . , ...
DIC for missing data models
Framework of missing data models
f (y|θ) = f (y, z|θ)dz ,
with observed data y = (y1, . . . , ...
how many DICs can you fit in a mixture?
1. observed DICs
DIC1 = −4Eθ [log f (y|θ)|y] + 2 log f (y|Eθ [θ|y])
often a poor ch...
how many DICs can you fit in a mixture?
1. observed DICs
DIC2 = −4Eθ [log f (y|θ)|y] + 2 log f (y|θ(y)) .
which uses poster...
how many DICs can you fit in a mixture?
1. observed DICs
DIC3 = −4Eθ [log f (y|θ)|y] + 2 log f (y) ,
which instead relies o...
how many DICs can you fit in a mixture?
1. observed DICs
2. complete DICs based on f (y, z|θ)
DIC4 = EZ [DIC(y, Z)|y]
= −4E...
how many DICs can you fit in a mixture?
1. observed DICs
2. complete DICs based on f (y, z|θ)
DIC5 = −4Eθ,Z [log f (y, Z|θ)...
how many DICs can you fit in a mixture?
1. observed DICs
2. complete DICs based on f (y, z|θ)
DIC6 = −4Eθ,Z [log f (y, Z|θ)...
how many DICs can you fit in a mixture?
1. observed DICs
2. complete DICs based on f (y, z|θ)
3. conditional DICs based on ...
how many DICs can you fit in a mixture?
1. observed DICs
2. complete DICs based on f (y, z|θ)
3. conditional DICs based on ...
Galactic DICs
Example of the galaxy mixture dataset
DIC2 DIC3 DIC4 DIC5 DIC6 DIC7 DIC8
K (pD2) (pD3) (pD4) (pD5) (pD6) (pD...
Further questions
what is the behaviour of DIC under model mispecification?
is there an absolute scale to the DIC values, i...
Further questions
what is the behaviour of DIC under model mispecification?
is there an absolute scale to the DIC values, i...
Likelihood-free settings
Cases when the likelihood function f (y|θ) is unavailable (in
analytic and numerical senses) and ...
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejecti...
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejecti...
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejecti...
Generic ABC for model choice
Algorithm 1 Likelihood-free model choice sampler (ABC-MC)
for t = 1 to T do
repeat
Generate m...
Concistency
Model selection feasible with ABC:
Choice of summary statistics is paramount
At best, ABC output → π(. | η(y))...
Concistency
Model selection feasible with ABC:
Choice of summary statistics is paramount
At best, ABC output → π(. | η(y))...
Upcoming SlideShare
Loading in …5
×

RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013

3,809 views

Published on

My re-discussion of the re-Read Paper by Spiegelhalter, Best, Carlin & van der Linde, Bayesian measures of model complexity and fit

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,809
On SlideShare
0
From Embeds
0
Number of Embeds
2,319
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013

  1. 1. DIC, a few years later Christian P. Robert Universit´e Paris-Dauphine, University of Warwick, & CREST, Paris
  2. 2. Outline DIC as in Dayesian mixed-up definitions Addendum
  3. 3. Bayesian model comparison use posterior probabilities/Bayes factors B12 = Θ1 f1(y|θ1) dπ1(θ1) Θ2 f2(y|θ2) dπ2(θ2) [Jeffreys, 1939] posterior predictive checks Pr(mi (y) ≤ mi (y)|y) [Gelman et al., 2013] comparisons of models based on prediction error and other loss-based measures DIC? BIC? integrated likelihood?
  4. 4. Bayesian model comparison use posterior probabilities/Bayes factors B12 = Θ1 f1(y|θ1) dπ1(θ1) Θ2 f2(y|θ2) dπ2(θ2) [Jeffreys, 1939] posterior predictive checks Pr(mi (y) ≤ mi (y)|y) [Gelman et al., 2013] comparisons of models based on prediction error and other loss-based measures DIC? BIC? integrated likelihood?
  5. 5. thou shalt not use the data twice The data is used twice in the DIC method: 1. y used once to produce the posterior π(θ|y), and the associated estimate, ˜θ(y) 2. y used a second time to compute the posterior expectation of the observed likelihood p(y|θ), log p(y|θ)π(dθ|y) ∝ log p(y|θ)p(y|θ)π(dθ) ,
  6. 6. Aitkin’s expected likelihood Strong similarity with Aitkin’s (1991) posterior Bayes factor f (y|θ)dπ(θ|y) , and Aitkin’s (2010) integrated likelihood in that deviance D(θ) = −2 log(f (x|θ)) is a renaming of the likelihood and is considered “a posteriori” both in ¯D = E[D(θ)|x] and in pD = ¯D − D(ˆθ) [Gelman, X & Rousseau, 2013]
  7. 7. Aitkin’s arguments Central tool for Aitkin’s (2010) model fit is “posterior cdf” of the likelihood, F(z) = Prπ (L(θ, y) > ε|y) . The approach is... ...general and resolves difficulties with Bayesian processing of point null hypotheses ...open to using of generic noninformative and improper priors, being related to a single model; ...more naturally handling the “vexed question of model fit,”
  8. 8. Aitkin’s arguments Central tool for Aitkin’s (2010) model fit is “posterior cdf” of the likelihood, F(z) = Prπ (L(θ, y) > ε|y) . The approach is... ...general and resolves difficulties with Bayesian processing of point null hypotheses ...open to using of generic noninformative and improper priors, being related to a single model; ...more naturally handling the “vexed question of model fit,”
  9. 9. counter-arguments by Gelman et al., 2013 using priors and posteriors is no guarantee that inference is Bayesian improper priors give meaningless (or at least not universally accepted) values for marginal likelihoods likelihood function does not exist a priori requires a joint distribution to be properly defined for model comparison [X & Marin, 2008]
  10. 10. DIC for missing data models Framework of missing data models f (y|θ) = f (y, z|θ)dz , with observed data y = (y1, . . . , yn) and corresponding missing data by z = (z1, . . . , zn) How do we define DIC in such settings?
  11. 11. DIC for missing data models Framework of missing data models f (y|θ) = f (y, z|θ)dz , with observed data y = (y1, . . . , yn) and corresponding missing data by z = (z1, . . . , zn) How do we define DIC in such settings?
  12. 12. how many DICs can you fit in a mixture? 1. observed DICs DIC1 = −4Eθ [log f (y|θ)|y] + 2 log f (y|Eθ [θ|y]) often a poor choice in case of unidentifiability 2. complete DICs based on f (y, z|θ) 3. conditional DICs based on f (y|z, θ) [Celeux et al., BA, 2006]
  13. 13. how many DICs can you fit in a mixture? 1. observed DICs DIC2 = −4Eθ [log f (y|θ)|y] + 2 log f (y|θ(y)) . which uses posterior mode instead 2. complete DICs based on f (y, z|θ) 3. conditional DICs based on f (y|z, θ) [Celeux et al., BA, 2006]
  14. 14. how many DICs can you fit in a mixture? 1. observed DICs DIC3 = −4Eθ [log f (y|θ)|y] + 2 log f (y) , which instead relies on the MCMC density estimate 2. complete DICs based on f (y, z|θ) 3. conditional DICs based on f (y|z, θ) [Celeux et al., BA, 2006]
  15. 15. how many DICs can you fit in a mixture? 1. observed DICs 2. complete DICs based on f (y, z|θ) DIC4 = EZ [DIC(y, Z)|y] = −4Eθ,Z [log f (y, Z|θ)|y] + 2EZ [log f (y, Z|Eθ[θ|y, Z])|y] . 3. conditional DICs based on f (y|z, θ) [Celeux et al., BA, 2006]
  16. 16. how many DICs can you fit in a mixture? 1. observed DICs 2. complete DICs based on f (y, z|θ) DIC5 = −4Eθ,Z [log f (y, Z|θ)|y] + 2 log f (y, z(y)|θ(y)) , using Z as an additional parameter 3. conditional DICs based on f (y|z, θ) [Celeux et al., BA, 2006]
  17. 17. how many DICs can you fit in a mixture? 1. observed DICs 2. complete DICs based on f (y, z|θ) DIC6 = −4Eθ,Z [log f (y, Z|θ)|y]+2EZ[log f (y, Z|θ(y))|y, θ(y)] . in analogy with EM, θ being an EM fixed point 3. conditional DICs based on f (y|z, θ) [Celeux et al., BA, 2006]
  18. 18. how many DICs can you fit in a mixture? 1. observed DICs 2. complete DICs based on f (y, z|θ) 3. conditional DICs based on f (y|z, θ) DIC7 = −4Eθ,Z [log f (y|Z, θ)|y] + 2 log f (y|z(y), θ(y)) , using MAP estimates [Celeux et al., BA, 2006]
  19. 19. how many DICs can you fit in a mixture? 1. observed DICs 2. complete DICs based on f (y, z|θ) 3. conditional DICs based on f (y|z, θ) DIC8 = −4Eθ,Z [log f (y|Z, θ)|y] + 2EZ log f (y|Z, θ(y, Z))|y , conditioning first on Z and then integrating over Z conditional on y [Celeux et al., BA, 2006]
  20. 20. Galactic DICs Example of the galaxy mixture dataset DIC2 DIC3 DIC4 DIC5 DIC6 DIC7 DIC8 K (pD2) (pD3) (pD4) (pD5) (pD6) (pD7) (pD8) 2 453 451 502 705 501 417 410 (5.56) (3.66) (5.50) (207.88) (4.48) (11.07) (4.09) 3 440 436 461 622 471 378 372 (9.23) (4.94) (6.40) (167.28) (15.80) (13.59) (7.43) 4 446 439 473 649 482 388 382 (11.58) (5.41) (7.52) (183.48) (16.51) (17.47) (11.37) 5 447 442 485 658 511 395 390 (10.80) (5.48) (7.58) (180.73) (33.29) (20.00) (15.15) 6 449 444 494 676 532 407 398 (11.26) (5.49) (8.49) (191.10) (46.83) (28.23) (19.34) 7 460 446 508 700 571 425 409 (19.26) (5.83) (8.93) (200.35) (71.26) (40.51) (24.57)
  21. 21. Further questions what is the behaviour of DIC under model mispecification? is there an absolute scale to the DIC values, i.e. when is a difference in DICs significant? how can DIC handle small n’s versus p’s? In an era of complex models, is DIC applicable?
  22. 22. Further questions what is the behaviour of DIC under model mispecification? is there an absolute scale to the DIC values, i.e. when is a difference in DICs significant? how can DIC handle small n’s versus p’s? In an era of complex models, is DIC applicable?
  23. 23. Likelihood-free settings Cases when the likelihood function f (y|θ) is unavailable (in analytic and numerical senses) and when the completion step f (y|θ) = Z f (y, z|θ) dz is impossible or too costly because of the dimension of z c MCMC cannot be implemented!
  24. 24. The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´e et al., 1997]
  25. 25. The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´e et al., 1997]
  26. 26. The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´e et al., 1997]
  27. 27. Generic ABC for model choice Algorithm 1 Likelihood-free model choice sampler (ABC-MC) for t = 1 to T do repeat Generate m from the prior π(M = m) Generate θm from the prior πm(θm) Generate z from the model fm(z|θm) until ρ{η(z), η(y)} < Set m(t) = m and θ(t) = θm end for [Grelaud et al., 2009]
  28. 28. Concistency Model selection feasible with ABC: Choice of summary statistics is paramount At best, ABC output → π(. | η(y)) which concentrates around µ0 For estimation : {θ; µ(θ) = µ0} = θ0 For testing {µ1(θ1), θ1 ∈ Θ1} ∩ {µ2(θ2), θ2 ∈ Θ2} = ∅ [Marin et al., 2013]
  29. 29. Concistency Model selection feasible with ABC: Choice of summary statistics is paramount At best, ABC output → π(. | η(y)) which concentrates around µ0 For estimation : {θ; µ(θ) = µ0} = θ0 For testing {µ1(θ1), θ1 ∈ Θ1} ∩ {µ2(θ2), θ2 ∈ Θ2} = ∅ [Marin et al., 2013]

×