Discussion of Fearnhead and Prangle, RSS< Dec. 14, 2011

  • 1,575 views
Uploaded on

Discussion seminar, CREST, Nov. 03, 2011

Discussion seminar, CREST, Nov. 03, 2011

More in: Education , Technology , Sports
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,575
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
13
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Semi-automatic ABCA Discussion Semi-automatic ABC A Discussion Christian P. Robert Universit´ Paris-Dauphine, IuF, & CREST e http://xianblog.wordpress.com November 2, 2011 LA TEX code borrowed from arXiv:1004.1112v2
  • 2. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Approximate Bayesian computation Approximate Bayesian computation (recap) Summary statistic selection
  • 3. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Regular Bayesian computation issues When faced with a non-standard posterior distribution π(θ|y) ∝ π(θ)L(θ|y) the standard solution is to use simulation (Monte Carlo) to produce a sample θ1 , . . . , θ T from π(θ|y) (or approximately by Markov chain Monte Carlo methods) [Robert & Casella, 2004]
  • 4. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Untractable likelihoods Cases when the likelihood function f (y|θ) is unavailable and when the completion step f (y|θ) = f (y, z|θ) dz Z is impossible or too costly because of the dimension of z c MCMC cannot be implemented!
  • 5. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Untractable likelihoods c MCMC cannot be implemented!
  • 6. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)The ABC method Bayesian setting: target is π(θ)f (x|θ)
  • 7. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique:
  • 8. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Rubin, 1984; Tavar´ et al., 1997] e
  • 9. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)A as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) ≤ where is a distance
  • 10. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)A as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) ≤ where is a distance Output distributed from π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < )
  • 11. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)ABC algorithm Algorithm 1 Likelihood-free rejection sampler for i = 1 to N do repeat generate θ from the prior distribution π(·) generate z from the likelihood f (·|θ ) until ρ{η(z), η(y)} ≤ set θi = θ end for where η(y) defines a (generaly in-sufficient) statistic
  • 12. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
  • 13. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) . [Not garanteed!]
  • 14. Semi-automatic ABCA Discussion Summary statistic selectionSummary statistic selection Approximate Bayesian computation (recap) Summary statistic selection F&P’s setting Noisy ABC Optimal summary statistic
  • 15. Semi-automatic ABCA Discussion Summary statistic selection F&P’s settingF&P’s ABC Use of a summary statistic S(·), an importance proposal g(·), a kernel K(·) ≤ 1 and a bandwidth h > 0 such that (θ, ysim ) ∼ g(θ)f (ysim |θ) is accepted with probability (hence the bound) K[{S(ysim ) − sobs }/h] [or is it K[{S(ysim ) − sobs }]/h, cf (2)? typo] the corresponding importance weight defined by π(θ) g(θ)
  • 16. Semi-automatic ABCA Discussion Summary statistic selection F&P’s settingErrors, errors, and errors Three levels of approximation π(θ|yobs ) by π(θ|sobs ) loss of information π(θ|sobs ) by π(s)K[{s − sobs }/h]π(θ|s) ds πABC (θ|sobs ) = π(s)K[{s − sobs }/h] ds noisy observations πABC (θ|sobs ) by importance Monte Carlo based on N simulations, represented by var(a(θ)|sobs )/Nacc [expected number of acceptances]
  • 17. Semi-automatic ABCA Discussion Summary statistic selection F&P’s settingAverage acceptance asymptotics For the average acceptance probability/approximate likelihood p(θ|sobs ) = f (ysim |θ) K[{S(ysim ) − sobs }/h] dysim , overall acceptance probability p(sobs ) = p(θ|sobs ) π(θ) dθ = π(sobs )hd + o(hd ) [Lemma 1]
  • 18. Semi-automatic ABCA Discussion Summary statistic selection F&P’s settingOptimal importance proposal Best choice of importance proposal in terms of effective sample size g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2 [Not particularly useful in practice]
  • 19. Semi-automatic ABCA Discussion Summary statistic selection F&P’s settingCalibration of h “This result gives insight into how S(·) and h affect the Monte Carlo error. To minimize Monte Carlo error, we need hd to be not too small. Thus ideally we want S(·) to be a low dimensional summary of the data that is sufficiently informative about θ that π(θ|sobs ) is close, in some sense, to π(θ|yobs )” (p.5) Constraint on h only addresses one term in the approximation error and acceptance probability h large prevents π(θ|sobs ) to be close to πABC (θ|sobs ) d small prevents π(θ|sobs ) to be close to π(θ|yobs )
  • 20. Semi-automatic ABCA Discussion Summary statistic selection Noisy ABCCalibrated ABC Definition For 0 < q < 1 and subset A, fix [one specific?/all?] event Eq (A) with PrABC (θ ∈ Eq (A)|sobs ) = q. Then ABC is calibrated if Pr(θ ∈ A|Eq (A)) = q Why calibrated and not exact?
  • 21. Semi-automatic ABCA Discussion Summary statistic selection Noisy ABCCalibrated ABC Theorem Noisy ABC, where sobs = S(yobs ) + h , ∼ K(·) is calibrated [Wilkinson, 2008] no condition on h
  • 22. Semi-automatic ABCA Discussion Summary statistic selection Noisy ABCCalibrated ABC Theorem For noisy ABC, the expected noisy-ABC log-likelihood, E {log[p(θ|sobs )]} = log[p(θ|S(yobs ) + h )]π(yobs |θ0 )K( )dyobs dx, has its maximum at θ = θ0 . [Last line of proof contains a typo] True for any choice of summary statistic? [Imposes at least identifiability...] Relevant in asymptotia and not for the data
  • 23. Semi-automatic ABCA Discussion Summary statistic selection Noisy ABCCalibrated ABC Corollary For noisy ABC, the ABC posterior converges onto a point mass on the true parameter value as m → ∞. For standard ABC, not always the case (unless h goes to zero). Strength of regularity conditions (c1) and (c2) in Bernardo & Smith, 1994? [constraints on posterior] Some condition upon summary statistic?
  • 24. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticLoss motivated statistic Under quadratic loss function, Theorem ˆ (i) The minimal posterior error E[L(θ, θ)|yobs ] occurs when ˆ = E(θ|yobs ) (!) θ (ii) When h → 0, EABC (θ|sobs ) converges to E(θ|yobs ) ˆ iii If S(yobs ) = E[θ|yobs ] then for θ = EABC [θ|sobs ] ˆ E[L(θ, θ)|yobs ] = trace(AΣ) + h2 xT AxK(x)dx + o(h2 ). Measure-theoretic difficulties? dependence of sobs on h makes me uncomfortable Relevant for choice of K?
  • 25. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticOptimal summary statistic “We take a different approach, and weaken the requirement for πABC to be a good approximation to π(θ|yobs ). We argue for πABC to be a good approximation solely in terms of the accuracy of certain estimates of the parameters.” (p.5) From this result, F&P derive their choice of summary statistic, S(y) = E(θ|y) [almost sufficient] suggest h = O(N −1/(2+d) ) and h = O(N −1/(4+d) ) as optimal bandwidths for noisy and standard ABC.
  • 26. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticOptimal summary statistic “We take a different approach, and weaken the requirement for πABC to be a good approximation to π(θ|yobs ). We argue for πABC to be a good approximation solely in terms of the accuracy of certain estimates of the parameters.” (p.5) From this result, F&P derive their choice of summary statistic, S(y) = E(θ|y) [EABC [θ|S(yobs )] = E[θ|yobs ]] suggest h = O(N −1/(2+d) ) and h = O(N −1/(4+d) ) as optimal bandwidths for noisy and standard ABC.
  • 27. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticCaveat Since E(θ|yobs ) is most usually unavailable, F&P suggest (i) use a pilot run of ABC to determine a region of non-negligible posterior mass; (ii) simulate sets of parameter values and data; (iii) use the simulated sets of parameter values and data to estimate the summary statistic; and (iv) run ABC with this choice of summary statistic.
  • 28. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticApproximating the summary statistic As Beaumont et al. (2002) and Blum and Fran¸ois (2010), F&P c use a linear regression to approximate E(θ|yobs ): (i) θi = β0 + β (i) f (yobs ) + i
  • 29. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticApplications The paper’s second half covers: g-and-k-distribution stochastic kinetic biochemical networks LotkaVolterra model Ricker map ecological model M/G/1-queue tuberculosis bacteria genotype data
  • 30. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticQuestions dependence on h and S(·) in the early stage reduction of Bayesian inference to point estimation approximation error in step (iii) not accounted for not parameterisation invariant practice shows that proper approximation to genuine posterior distributions stems from using a (much) larger number of summary statistics than the dimension of the parameter the validity of the approximation to the optimal summary statistic depends on the quality of the pilot run; important inferential issues like model choice are not covered by this approach.