Upcoming SlideShare
×

1,901 views

Published on

Discussion seminar, CREST, Nov. 03, 2011

Published in: Education, Technology, Sports
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,901
On SlideShare
0
From Embeds
0
Number of Embeds
905
Actions
Shares
0
15
0
Likes
0
Embeds 0
No embeds

No notes for slide

1. 1. Semi-automatic ABCA Discussion Semi-automatic ABC A Discussion Christian P. Robert Universit´ Paris-Dauphine, IuF, & CREST e http://xianblog.wordpress.com November 2, 2011 LA TEX code borrowed from arXiv:1004.1112v2
2. 2. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Approximate Bayesian computation Approximate Bayesian computation (recap) Summary statistic selection
3. 3. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Regular Bayesian computation issues When faced with a non-standard posterior distribution π(θ|y) ∝ π(θ)L(θ|y) the standard solution is to use simulation (Monte Carlo) to produce a sample θ1 , . . . , θ T from π(θ|y) (or approximately by Markov chain Monte Carlo methods) [Robert & Casella, 2004]
4. 4. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Untractable likelihoods Cases when the likelihood function f (y|θ) is unavailable and when the completion step f (y|θ) = f (y, z|θ) dz Z is impossible or too costly because of the dimension of z c MCMC cannot be implemented!
5. 5. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Untractable likelihoods c MCMC cannot be implemented!
6. 6. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)The ABC method Bayesian setting: target is π(θ)f (x|θ)
7. 7. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique:
8. 8. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Rubin, 1984; Tavar´ et al., 1997] e
9. 9. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)A as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) ≤ where is a distance
10. 10. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)A as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) ≤ where is a distance Output distributed from π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < )
11. 11. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)ABC algorithm Algorithm 1 Likelihood-free rejection sampler for i = 1 to N do repeat generate θ from the prior distribution π(·) generate z from the likelihood f (·|θ ) until ρ{η(z), η(y)} ≤ set θi = θ end for where η(y) deﬁnes a (generaly in-suﬃcient) statistic
12. 12. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
13. 13. Semi-automatic ABCA Discussion Approximate Bayesian computation (recap)Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) . [Not garanteed!]
14. 14. Semi-automatic ABCA Discussion Summary statistic selectionSummary statistic selection Approximate Bayesian computation (recap) Summary statistic selection F&P’s setting Noisy ABC Optimal summary statistic
15. 15. Semi-automatic ABCA Discussion Summary statistic selection F&P’s settingF&P’s ABC Use of a summary statistic S(·), an importance proposal g(·), a kernel K(·) ≤ 1 and a bandwidth h > 0 such that (θ, ysim ) ∼ g(θ)f (ysim |θ) is accepted with probability (hence the bound) K[{S(ysim ) − sobs }/h] [or is it K[{S(ysim ) − sobs }]/h, cf (2)? typo] the corresponding importance weight deﬁned by π(θ) g(θ)
16. 16. Semi-automatic ABCA Discussion Summary statistic selection F&P’s settingErrors, errors, and errors Three levels of approximation π(θ|yobs ) by π(θ|sobs ) loss of information π(θ|sobs ) by π(s)K[{s − sobs }/h]π(θ|s) ds πABC (θ|sobs ) = π(s)K[{s − sobs }/h] ds noisy observations πABC (θ|sobs ) by importance Monte Carlo based on N simulations, represented by var(a(θ)|sobs )/Nacc [expected number of acceptances]
17. 17. Semi-automatic ABCA Discussion Summary statistic selection F&P’s settingAverage acceptance asymptotics For the average acceptance probability/approximate likelihood p(θ|sobs ) = f (ysim |θ) K[{S(ysim ) − sobs }/h] dysim , overall acceptance probability p(sobs ) = p(θ|sobs ) π(θ) dθ = π(sobs )hd + o(hd ) [Lemma 1]
18. 18. Semi-automatic ABCA Discussion Summary statistic selection F&P’s settingOptimal importance proposal Best choice of importance proposal in terms of eﬀective sample size g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2 [Not particularly useful in practice]
19. 19. Semi-automatic ABCA Discussion Summary statistic selection F&P’s settingCalibration of h “This result gives insight into how S(·) and h aﬀect the Monte Carlo error. To minimize Monte Carlo error, we need hd to be not too small. Thus ideally we want S(·) to be a low dimensional summary of the data that is suﬃciently informative about θ that π(θ|sobs ) is close, in some sense, to π(θ|yobs )” (p.5) Constraint on h only addresses one term in the approximation error and acceptance probability h large prevents π(θ|sobs ) to be close to πABC (θ|sobs ) d small prevents π(θ|sobs ) to be close to π(θ|yobs )
20. 20. Semi-automatic ABCA Discussion Summary statistic selection Noisy ABCCalibrated ABC Deﬁnition For 0 < q < 1 and subset A, ﬁx [one speciﬁc?/all?] event Eq (A) with PrABC (θ ∈ Eq (A)|sobs ) = q. Then ABC is calibrated if Pr(θ ∈ A|Eq (A)) = q Why calibrated and not exact?
21. 21. Semi-automatic ABCA Discussion Summary statistic selection Noisy ABCCalibrated ABC Theorem Noisy ABC, where sobs = S(yobs ) + h , ∼ K(·) is calibrated [Wilkinson, 2008] no condition on h
22. 22. Semi-automatic ABCA Discussion Summary statistic selection Noisy ABCCalibrated ABC Theorem For noisy ABC, the expected noisy-ABC log-likelihood, E {log[p(θ|sobs )]} = log[p(θ|S(yobs ) + h )]π(yobs |θ0 )K( )dyobs dx, has its maximum at θ = θ0 . [Last line of proof contains a typo] True for any choice of summary statistic? [Imposes at least identiﬁability...] Relevant in asymptotia and not for the data
23. 23. Semi-automatic ABCA Discussion Summary statistic selection Noisy ABCCalibrated ABC Corollary For noisy ABC, the ABC posterior converges onto a point mass on the true parameter value as m → ∞. For standard ABC, not always the case (unless h goes to zero). Strength of regularity conditions (c1) and (c2) in Bernardo & Smith, 1994? [constraints on posterior] Some condition upon summary statistic?
24. 24. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticLoss motivated statistic Under quadratic loss function, Theorem ˆ (i) The minimal posterior error E[L(θ, θ)|yobs ] occurs when ˆ = E(θ|yobs ) (!) θ (ii) When h → 0, EABC (θ|sobs ) converges to E(θ|yobs ) ˆ iii If S(yobs ) = E[θ|yobs ] then for θ = EABC [θ|sobs ] ˆ E[L(θ, θ)|yobs ] = trace(AΣ) + h2 xT AxK(x)dx + o(h2 ). Measure-theoretic diﬃculties? dependence of sobs on h makes me uncomfortable Relevant for choice of K?
25. 25. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticOptimal summary statistic “We take a diﬀerent approach, and weaken the requirement for πABC to be a good approximation to π(θ|yobs ). We argue for πABC to be a good approximation solely in terms of the accuracy of certain estimates of the parameters.” (p.5) From this result, F&P derive their choice of summary statistic, S(y) = E(θ|y) [almost suﬃcient] suggest h = O(N −1/(2+d) ) and h = O(N −1/(4+d) ) as optimal bandwidths for noisy and standard ABC.
26. 26. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticOptimal summary statistic “We take a diﬀerent approach, and weaken the requirement for πABC to be a good approximation to π(θ|yobs ). We argue for πABC to be a good approximation solely in terms of the accuracy of certain estimates of the parameters.” (p.5) From this result, F&P derive their choice of summary statistic, S(y) = E(θ|y) [EABC [θ|S(yobs )] = E[θ|yobs ]] suggest h = O(N −1/(2+d) ) and h = O(N −1/(4+d) ) as optimal bandwidths for noisy and standard ABC.
27. 27. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticCaveat Since E(θ|yobs ) is most usually unavailable, F&P suggest (i) use a pilot run of ABC to determine a region of non-negligible posterior mass; (ii) simulate sets of parameter values and data; (iii) use the simulated sets of parameter values and data to estimate the summary statistic; and (iv) run ABC with this choice of summary statistic.
28. 28. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticApproximating the summary statistic As Beaumont et al. (2002) and Blum and Fran¸ois (2010), F&P c use a linear regression to approximate E(θ|yobs ): (i) θi = β0 + β (i) f (yobs ) + i
29. 29. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticApplications The paper’s second half covers: g-and-k-distribution stochastic kinetic biochemical networks LotkaVolterra model Ricker map ecological model M/G/1-queue tuberculosis bacteria genotype data
30. 30. Semi-automatic ABCA Discussion Summary statistic selection Optimal summary statisticQuestions dependence on h and S(·) in the early stage reduction of Bayesian inference to point estimation approximation error in step (iii) not accounted for not parameterisation invariant practice shows that proper approximation to genuine posterior distributions stems from using a (much) larger number of summary statistics than the dimension of the parameter the validity of the approximation to the optimal summary statistic depends on the quality of the pilot run; important inferential issues like model choice are not covered by this approach.