Guided sequential ABC schemes for simulation-based inference

1. Guided sequential ABC schemes for simulation-based inference Umberto Picchini Dept. Mathematical Sciences, Chalmers and Gothenburg University 7@uPicchini BayesComp, Levi (Finland) 12-17 March 2023 1

2. Joint work with Massimiliano Tamborrino (Warwick Uni, UK) U. Picchini, M. Tamborrino. Guided sequential ABC schemes for intractable Bayesian models. arXiv:2206.12235, 2022. 2

3. The work is about approximate Bayesian computation (ABC) which is a simulation-based inference methodology. We want to construct ways to improve sequential ABC schemes. Sequential ABC schemers are the state-of-art, however these can be “slow” in the sense that the cloud of parameters “particles” can be very diffuse, especially in the initial iterations. This is typical as the initial proposal sampler is the (possibly vague) prior → very different from the posterior → low acceptance rate. We explore how to make parameter proposals informed by the observed data and hence be guided by data. This will reduce the rejection rate and reduce the computational effort, and we show that we keep accurate inference. 3

8. As long as we have a realistic model (implemented into a computer model) from which we can simulate artificial datasets given model parameters θ, we can produce “some” statistical inference. This is useful when the likelihood function of θ is unavailable. Even for models that are very simple to write down, it may be impossible to do exact inference due to high-dimensional integration. 4

12. A paradigm shift is the concept of generative model or simulator. Say we write a computer code for the model M(θ) simulator as an idealized representation of the phenomenon under study. θ∗ → M(θ∗ ) → y∗ As long as we are able to run an instance of the model, we simulate/generate artificial data y∗with y∗ ∼ p(y∗|θ = θ∗). So we have obtained a sample y∗ from the (unknown) likelihood using the simulator M(θ) Therefore the simulator M(θ) defines the probabilistic model p(y|θ) implicitly! 5

15. ABC-rejection with summary statistics 1. simulate from the prior θ∗ ∼ π(θ) 2. simulate M(θ∗) → y∗, compute S(y∗) 3. if ∥ S(y∗) − S(yo) ∥< ϵ store θ∗. Go to step 1 and repeat. Stored θ’s are from πϵ(θ|S(yo)) with πϵ(θ|S(yo )) ∝ π(θ) Z Y IAϵ,yo (y∗ )p(y∗ |θ)dy∗ Aϵ,yo (y∗ ) = {y∗ ∈ Y; ∥ S(y∗ ) − S(yo ) ∥< ϵ}. 6

16. ABC-rejection with summary statistics 1. simulate from the prior θ∗ ∼ π(θ) 2. simulate M(θ∗) → y∗, compute S(y∗) 3. if ∥ S(y∗) − S(yo) ∥< ϵ store θ∗. Go to step 1 and repeat. Stored θ’s are from πϵ(θ|S(yo)) with πϵ(θ|S(yo )) ∝ π(θ) Z Y IAϵ,yo (y∗ )p(y∗ |θ)dy∗ Aϵ,yo (y∗ ) = {y∗ ∈ Y; ∥ S(y∗ ) − S(yo ) ∥< ϵ}. 6

17. How to simulate the proposal θ∗ ? Several possibilities to simulate the proposal θ∗. The most common are: ˆ (inefficient but parallelisable) acceptance-rejection ABC, proposes from θ∗ ∼ π(θ); ˆ MCMC-ABC (serial, difficult to tune the ϵ), high autocorrelations, proposes from kernel θ∗ ∼ q(θ∗|θ); ˆ SIS-ABC and SMC-ABC (parallelisable, more plug-and play than MCMC-ABC). Today we exclusively discuss sequential ABC (SMC-ABC and SIS-ABC) 7

21. With sequential schemes, we transverse T generations of populations of parameters. We first sample many “particles” from the prior→ accept some using a large threshold ϵ1 →perturb them→accept some using a smaller ϵ2 < ϵ1 →perturb... We stop when it takes an unbearable time to accept further particles (eg stop when acceptance rate is below 1.5%). 8

22. With sequential schemes, we transverse T generations of populations of parameters. We first sample many “particles” from the prior→ accept some using a large threshold ϵ1 →perturb them→accept some using a smaller ϵ2 < ϵ1 →perturb... We stop when it takes an unbearable time to accept further particles (eg stop when acceptance rate is below 1.5%). 8

23. SMC-ABC moves N particles through T ABC posteriors as: ˆ (t = 1) run acceptance-rejection ABC, getting particles θ (1) 1 , ..., θ (N) 1 . Assign weights to each particle w (i) 1 = 1/N. ˆ for t=2:T for i=1:N repeat until acceptance: sample a θ∗ k from set (θ (1) t−1, ..., θ (N) t−1) with probabilities (w (1) t−1, ..., w (N) t−1). (perturb) θ∗∗ k ∼ qt(·|θ∗ k) simulate y∗ ∼ p(y|θ∗∗ k ), get summaries s∗ = S(y∗ ) accept and store θ (i) t := θ∗∗ k if ||s∗ − sy|| < ϵt if accept, set w (i) t = π(θ (i) t )/ PN j=1 w (j) t−1qt(θ (i) t |θ (j) t−1) normalise w (i) t := w (i) t / P i w (i) t end end Reduce ϵ for next iteration. end 9

26. The “perturbing proposal” sampler is an arbitrary qt(·|θ). In practice, the most popular proposal sampler is a multivariate Gaussian centred at a randomly picked particle θ∗ t−1 chosen from the previous iteration θ∗∗ t ∼ qt(·|θ∗ ) = N(θ∗ t−1, 2Σt−1) with Σt−1 = cov(θ (1) t−1, ..., θ (N) t−1). ˆ Beaumont, Cornuet, Marin, Robert. Biometrika, 96(4):983–990, 2009. ˆ Toni, Welch, Strelkowa, Ipsen, and Stumpf. Journal of the Royal Society Interface, 6(31):187–202, 2008. ˆ Filippi, Barnes, Cornebise, and Stumpf. Statistical applications in genetics and molecular biology, 12(1):87–107, 2013. This proposal is implemented in off-the-shelf software such as pyABC and ABCpy. 10

30. What we propose here is to construct ways to incorporate observed data summaries sy in the proposal sampler, so to propose from some qt(·|sy, ...). Therefore we construct guided (by data) sequential proposal samplers. 11

31. What we propose here is to construct ways to incorporate observed data summaries sy in the proposal sampler, so to propose from some qt(·|sy, ...). Therefore we construct guided (by data) sequential proposal samplers. 11

32. Guided Sequential Importance Sampling ABC (SIS-ABC) 1: for i = 1, . . . , N do 2: repeat 3: θ∗ ∼ qt(µt−1, Σt−1) = N(µt−1, 2Σt−1). 4: ...etc 5: until ||s∗ − sy|| < ϵ 6: end for Goal : From : θ∗ ∼ qt(µt−1, Σt−1) to qt(· · · |sy) We build on P, Simola and Corander1 : ˆ denote by (θ(i) , s(i) ) a (parameter,summary)-pair accepted at the previous iteration; ˆ assume that (θ(i) , s(i) ) ∼ N(m, S), with m ≡ (mθ, ms), St−1 ≡ " Sθ Sθs Ssθ Ss # , 1 P, Simola, Corander. Sequentially guided MCMC proposals for synthetic likelihoods and correlated synthetic likelihoods, Bayesian Analysis 2022. 12

35. Then it is well know that conditionals of a multivariate Gaussian are Gaussian θ(i) |s(i) ∼ N(mθ(i)|s(i) , Sθ(i)|s(i) ). let’s use this fact. 13

36. For x (i) t := (θ (i) t , s (i) t ) ∼ N(mt, St) we can estimate mt and St via the (weighted) accepted particles as m̂t = N X i=1 w (i) t x (i) t , Ŝt = PN i=1 w (i) t (x (i) t−1 − m̂t)(x(i) − m̂t)′ (1 − PN i=1 w (i) t 2 ) (1) ˆ We have m̂t = (m̂θ, m̂s) ˆ and Ŝ ≡ " Ŝθ Ŝθs Ŝsθ Ŝs # . Using the fact that conditionals are Gaussian, we can write a guided proposal for next iteration t + 1 qt+1(θ|sy) ≡ N(m̂θ|sy,t, Ŝθ|sy,t) with m̂θ|sy,t = m̂θ + Ŝθs(Ŝs)−1 (sy − m̂s) (2) Ŝθ|sy,t = Ŝθ − Ŝθs(Ŝs)−1 Ŝsθ. (3) 14

39. The guided sampler we have defined is useful for a sequential importance sampling ABC (SIS-ABC). We use it for SIS-ABC and not for SMC-ABC since our proposal function has global features, since both the mean and the covariance are not particle-specific. qt(θ|sy) ≡ N(m̂θ|sy,t−1, Ŝθ|sy,t−1) m̂θ|sy,t = m̂θ + Ŝθs(Ŝs)−1 (sy − m̂s) (4) Ŝθ|sy,t = Ŝθ − Ŝθs(Ŝs)−1 Ŝsθ. (5) We call this sampler blocked because it proposes in block all coordinates of θ. 15

40. The previous sampler need some adjustment (details in the paper!) as otherwise it is too “mode-seeking”, ie neglects the posterior tails to focus on the mode-region. No time here, but the paper has details on how to fix this. 16

41. Guided sequential Monte Carlo ABC (SMC-ABC) We have built also other proposals. What we do here is specific for SMC-ABC since this time we do condition on a sampled particle AND sy. Recall in SMC-ABC we have a resampling step (snippet below): 1: for i = 1, . . . , N do 2: repeat 3: pick (with replacement) θ∗ from the weighted set {θ (i) t−1, w (i) t−1}N i=1. 4: θ∗∗ ∼ N(θ∗ , 2Σt−1). 5: ...etc 6: until ||s∗ − sy|| < ϵ 7: end for We are going to sample θ∗∗ conditionally on components of θ∗ and sy. 17

42. Guided SMC-ABC As usual we iteratively resample a θ∗ from the previous iteration. How can we condition on both sy and θ∗ ? We cannot condition simultaneously on all coordinates of θ∗ : imagine instead decomposing θ∗ = (θ∗ k, θ∗ −k) and now consider    θ∗ k θ∗ −k sy    (6) We now place a multivariate Gaussian assumption on (6) so we can produce a “perturbation kernel” qt(θ∗ k|θ∗ −k, sy). Example: θ∗ = (θ∗ 1, θ∗ 2, θ∗ 3) then we propose θ∗∗ as θ∗∗ 1 ∼ q(θ∗ 1|θ∗ 2,3, sy), θ∗∗ 2 ∼ q(θ∗ 2|θ∗ 1,3, sy), θ∗∗ 3 ∼ q(θ∗ 3|θ∗ 1:2, sy), and finally compose θ∗∗ = (θ∗∗ 1 , θ∗∗ 2 , θ∗∗ 3 ). 18

45. Guided SMC-ABC It is easy to show that we can write the following guided sampler (this is for the case where k is a single index, but can be generalized) θ∗∗ k ∼ N(m̂∗ k|sy,t−1, σ̂2 k|sy,t−1) m̂∗ k|sy,t−1 = m̂k + Ŝk,−k(Ŝ−k,−k)−1 θ∗ −k sy # − m̂−k m̂s # , σ̂2 k|sy,t−1 = σ̂2 k − Ŝk,−k(Ŝ−k,−k)−1 Ŝ−k,k, where all quantities are computed using accepted particles from the previous iteration. We call this fullcond guided SMC-ABC sampler (“fully conditional”). 19

46. A few important remarks 1. Besides Gaussian proposals we also construct copula-proposals (Gaussian copulas and t-copulas), with many possible choice of marginal distributions. This is discussed in the paper. 2. Experiments suggest that the Gaussianity assumption on the joint (θ, sy) works very well even with highly non Gaussian posterior targets. To allow for more flexibility, we considered (Gaussian and t) copulas with different marginals. Notice guided Gaussian copulas are also in Y. Chen, M. Gutmann, Adaptive Gaussian Copula ABC, AISTATS, 2019. We go through a couple of examples now. 20

50. We compare against the following SMC-ABC samplers: ˆ standard: Gaussian sampler N(θ∗ , 2 · Covt−1) (as in Filippi et al. 20132 , by generalizing Beaumont et al); ˆ olcm: Gaussian sampler with the optimal local covariance matrix of Filippi et al. 2013 2 Filippi, Barnes, Cornebise, and Stumpf. Statistical Applications in Genetics and Molecular Biology, 12(1):87–107, 2013. 21

51. We compare against the following SMC-ABC samplers: ˆ standard: Gaussian sampler N(θ∗ , 2 · Covt−1) (as in Filippi et al. 20132 , by generalizing Beaumont et al); ˆ olcm: Gaussian sampler with the optimal local covariance matrix of Filippi et al. 2013 2 Filippi, Barnes, Cornebise, and Stumpf. Statistical Applications in Genetics and Molecular Biology, 12(1):87–107, 2013. 21

52. Example: Twisted prior with highly correlated posterior * Observed data y = (y1, ..., ydθ ) ∼ N(θ, Ψ), with θ = (θ1, ..., θdθ ), Ψ = diag(σ0, ..., σ0)). * The prior is the twisted-normal prior with density function prop. to π(θ) ∝ exp − θ2 1 200 − (θ2 − bθ2 1 + 100b)2 2 − dθ X j=3 θ2 j . * Here: σ0 = 1, b = 0.1, dθ = 5, y = (10, 0, 0, 0, 0). As before, S(y) = y. How are our approaches coping with highly correlated parameters in the posterior? 22

53. Illustration from Nott et al. (2018), High-dimensional ABC. In this case study we know a-priori that (θ1, θ2) are highly correlated. We make use of this information in our fully-conditional SMC-ABC. So we sample a θ∗ = (θ∗ 1, ..., θ∗ 5) from the previous population and, perturb it as ˆ (θ∗∗ 1 , θ∗∗ 2 ) ∼ q(θ1, θ2|sy, θ∗ 3:5) (in block since they are correlated); ˆ θ∗∗ 3 ∼ q(θ3|sy, θ∗ 1:2,4,5) ˆ θ∗∗ 4 ∼ q(θ4|sy, θ∗ 1:2, θ∗ 3,5); ˆ θ∗∗ 5 ∼ q(θ5|sy, θ∗ 1:4); 23

59. 0 2 4 6 8 10 14 18 22 −1.00 −0.25 0.50 1.25 2.00 2.75 olcm standard fullcondoptblocked log(Wasserstein) A 0 2 4 6 8 10 14 18 22 −1.00 −0.25 0.50 1.25 2.00 2.75 olcm standard fullcondoptblocked log(Wasserstein) A 0 2 4 6 8 10 14 18 22 −1.00 −0.25 0.50 1.25 2.00 2.75 olcm standard fullcondoptblocked log(Wasserstein) A 0 2 4 6 8 10 14 18 22 −1.00 −0.25 0.50 1.25 2.00 2.75 Gaussian cop−blocked t cop−blocked Gaussian cop−blockedopt t cop−blockedopt Gaussian cop−hybrid t cop−hybrid B: Normal Marginals 0 2 4 6 8 10 14 18 22 −1.00 −0.25 0.50 1.25 2.00 2.75 gaussian−cop−blocked t−cop−blocked gaussian−cop−blockedopt t−cop−blockedopt gaussian−cop−hybrid t−cop−hybrid C: Triangular Marginals 0 2 4 6 8 10 14 18 22 −1.00 −0.25 0.50 1.25 2.00 2.75 gaussian−cop−blocked t−cop−blocked gaussian−cop−blockedopt t−cop−blockedopt gaussian−cop−hybrid t−cop−hybrid D: Location−Scale T Marginals iterations log(Wasserstein) 0 2 4 6 8 10 14 18 22 −1.00 −0.25 0.50 1.25 2.00 2.75 gaussian−cop−blocked t−cop−blocked gaussian−cop−blockedopt t−cop−blockedopt gaussian−cop−hybrid t−cop−hybrid E: Logistic Marginals iterations 0 2 4 6 8 10 14 18 22 −1.00 −0.25 0.50 1.25 2.00 2.75 gaussian−cop−blocked t−cop−blocked gaussian−cop−blockedopt t−cop−blockedopt gaussian−cop−hybrid t−cop−hybrid F: Uniform Marginals iterations Figure 1: Log-Wasserstein distances of all ten independent estimations at each iteration (N = 1, 000) 24

60. Example 2: Hierarchical g-and-k model Setup from Clarté et al. (2021) ”Componentwise approximate Bayesian computation via Gibbs-like steps”. ˆ xi1, . . . , xiJ iid from from gk model with (Ai, B, g, k, c) parameters with B, g, k, c known constants. ˆ Ai ∼ N(α, 1), θ = (α, A1, . . . , An). α A1 A2 An · · ·        x11 . . . x1J               x21 . . . x2J               xn1 . . . xnJ        · · · ˆ 21-dim. parameter θ = (α, A1, . . . , A20). ˆ s(xi) = quant(xi; l = 8)(l = 0, . . . , 8) ⇒ 180 summaries. 25

61. We run guided and non-guided methods with N = 104 particles and compare them with ABC-Gibbs (Clarté et al.,2021). How are our approaches coping with high dimensional parameter and summaries spaces? A few numbers for one run: ˆ (non-guided) standard ABC-SMC: 14.5M simulations to reach ϵ = 2.14 in more than 42 hours. ˆ (non-guided) olcm ABC-SMC: 16M simulations to reach ϵ = 0.70 in 55 hrs. ˆ guided-SIS-ABC: 1M simulations to go below ϵ = 0.60 in 2.7 hours! 26

62. comparison with ABC-Gibbs (Clarté et al.,2021) 4 6 8 0 0.5 1 1.5 6 8 0 2 4 6 A1 4 5 6 7 0 5 10 A2 4 5 6 7 0 5 10 A3 5 6 8 0 2 4 6 A4 5 6 7 8 0 2 4 6 A5 (a) (non guided) standard ABC-SMC, 14.5M simulations 4 6 8 0 0.5 1 1.5 6 8 0 2 4 6 A1 4 6 8 0 2 4 6 A2 4 6 8 0 2 4 6 A3 6 8 0 2 4 6 A4 6 8 0 2 4 6 A5 (b) (non guided) olcm ABC-SMC, 16M simulations 4 6 8 0 2 4 6 6.8 7 7.2 0 20 40 A1 0 20 40 A2 5 5.5 0 20 40 A3 6.8 7 7.2 0 20 40 A4 6.4 6.6 0 20 40 A5 27

63. Take-home message: ˆ We propose guided proposals samplers making use of data information. ˆ These are easy to construct and incorporate into existing packages (any help for ABCpy or pyABC?). ˆ No substantial computational overhead when computing our guided methods compared to not using them. ˆ We propose different Gaussian proposals, but also Gaussian-copulas and a t- copulas. ˆ For copula-based samplers, the mean/covariance/marginals are not learned via deep-learning unlike other authors. ˆ We challenged our methods with: highly-correlated posteriors; multimodal posteriors; models with up to 20 parameters and up to 400 summary statistics. 28

69. NORDSTAT 2023 See you in Gothenburg 19-22 June, for 3.5 days of mathstats! Abstracts submission deadline is March 30! https://nordstat2023.org/ 7@nordstat2023 THANK YOU 7@uPicchini 29

70. NORDSTAT 2023 See you in Gothenburg 19-22 June, for 3.5 days of mathstats! Abstracts submission deadline is March 30! https://nordstat2023.org/ 7@nordstat2023 THANK YOU 7@uPicchini 29

Guided sequential ABC schemes for simulation-based inference

Recommended

Recommended

More Related Content

Similar to Guided sequential ABC schemes for simulation-based inference

Similar to Guided sequential ABC schemes for simulation-based inference (20)

More from Umberto Picchini

More from Umberto Picchini (6)

Recently uploaded

Recently uploaded (20)

Guided sequential ABC schemes for simulation-based inference