MUMS: Transition & SPUQ Workshop - Complexities in Bayesian Inverse Problems: Models and Distributions - Simon Cotter, May 17, 2019

1. Complexities in Bayesian inverse problems: Models and Distributions Simon Cotter University of Manchester 17th May 2019 Simon Cotter Bayesian Complexities 0 / 39

2. The Graveyard Shift Last day After lunch My promises to you: 2 talks in 1 to keep you on your toes (or if you go to sleep in the 1st you have a second chance!) Almost no details! As many pictures of gorillas as is scientiﬁcally justiﬁable Simon Cotter Bayesian Complexities 1 / 39

9. Collaborators Left: Catherine Powell (University of Manchester, UK), Center: James Rynn (University of Manchester, UK), Right: Louise Wright (National Physical Laboratories, UK) Simon Cotter Bayesian Complexities 2 / 39

10. Collaborators Left: Colin Cotter (Imperial College, UK), Center: Yannis Kevrekidis (John Hopkins, US), Right: Paul Russell (formerly University of Manchester, UK). SLC is grateful to EPSRC for First Grant award EP/L023393/1 Simon Cotter Bayesian Complexities 3 / 39

11. Outline 1 Introduction 2 SGFEM Surrogate-Accelerated Inference 3 Transport Map-Accelerated Adaptive Importance Sampling 4 Conclusions Simon Cotter Bayesian Complexities 3 / 39

12. Bayesian Inverse Problems Find the unknown θ given nz observations z, satisfying z = G(θ) + η, η ∼ N(0, Σ), where z ∈ Rnz is a given vector of observations, G : Θ → Rnz is the observation operator, θ ∈ Θ is the unknown parameter, η ∈ Rnz is a vector of observational noise. Goal: Efﬁciently estimate the posterior density π(θ|z) for the unknowns θ given the data z. Simon Cotter Bayesian Complexities 4 / 39

18. Markov Chain Monte Carlo (MCMC) Methods π(θ|z) ∝ L(z|θ)π0(θ) ∝ exp − 1 2 z − G(θ) 2 Σ π0(θ). Markov chain Monte Carlo estimates Eπ[φ] = Θ φ(θ)π(θ|z)dθ ≈ 1 M M i=1 φ(θi). Extremely costly for some problem classes Expensive likelihood evaluations (talk #1) Poor mixing requiring high values of M (talk #2) Simon Cotter Bayesian Complexities 5 / 39

24. Industrial Example Simon Cotter Bayesian Complexities 6 / 39

25. Industrial Example 0 5 10 15 20 25 30 35 40 -1 0 1 2 3 4 5 6 7 8 9 Simon Cotter Bayesian Complexities 7 / 39

26. Industrial Example Possible unknowns: λ — thermal conductivity, I — laser intensity, κ — heat transfer coefﬁcient (affects boundary conditions), σ — standard deviation of measurement noise. . . . Simon Cotter Bayesian Complexities 8 / 39

27. Forward Model: Heat Equation Model the physical relationship between λ, I and the temperature T of the material using the heat equation: cp ∂T ∂t − · (λ T) = Q(I). Given a sample of λ and I, we can use numerical methods (FEMs) to approximate T at the measurement times. One evaluation of the FEM code takes ≈ 30 seconds, for a “reasonable level of accuracy”. Simon Cotter Bayesian Complexities 9 / 39

28. Forward Model: Heat Equation Model the physical relationship between λ, I and the temperature T of the material using the heat equation: cp ∂T ∂t − · (λ T) = Q(I). Given a sample of λ and I, we can use numerical methods (FEMs) to approximate T at the measurement times. One evaluation of the FEM code takes ≈ 30 seconds, for a “reasonable level of accuracy”. Simon Cotter Bayesian Complexities 9 / 39

29. Metropolis Hastings Algorithm (FEM) Algorithm 1: Metropolis-Hastings Algorithm set initial state X(0) = θ0 for m = 1, 2, . . . , M do draw proposal evaluate likelihood by FEM approximation of G (expensive!) compute acceptance probability α accept proposal with probability α end for output chain X = (θ0, θ1, . . . , θM) 30 seconds per (time-dependant) PDE solve =⇒ 10m samples takes 3 × 108 seconds = 9.5 years (single CPU) Simon Cotter Bayesian Complexities 10 / 39

30. Surrogate Model The temperature T is a function of θ = (λ, I). cp ∂T(θ) ∂t − · (λ T(θ)) = Q(I). Instead of solving for individual samples of λ and I: Ofﬂine: build an approximation of the form Tapprox(θ) = nk i=1 TiΨi(θ), (Surrogate) where Ψi are orthogonal polynomials in θ (Legendre) This can be cheaply evaluated in MCMC routines (Online). Simon Cotter Bayesian Complexities 11 / 39

31. Surrogate Model The temperature T is a function of θ = (λ, I). cp ∂T(θ) ∂t − · (λ T(θ)) = Q(I). Instead of solving for individual samples of λ and I: Ofﬂine: build an approximation of the form Tapprox(θ) = nk i=1 TiΨi(θ), (Surrogate) where Ψi are orthogonal polynomials in θ (Legendre) This can be cheaply evaluated in MCMC routines (Online). Simon Cotter Bayesian Complexities 11 / 39

32. Metropolis Hastings Algorithm (SGFEM) Algorithm 2: Metropolis-Hastings Algorithm with SGFEM Surro- gate compute SGFEM solution (cost dependant on FEM discretisation parameters + maximum polynomial degree k) set initial state X(0) = θ0 for m = 1, 2, . . . , M do draw proposal evaluate likelihood by evaluating SGFEM approximation of G (cheap!) compute acceptance probability α accept proposal with probability α end for output chain X = (θ0, θ1, . . . , θM) Simon Cotter Bayesian Complexities 12 / 39

33. Results: Copper Sample Ofﬂine: Compute surrogate solution : ≈ 16 mins (Solved 600K equations ×nt = 800 time steps) Online: Generate M = 107 samples of θ from the approximate posterior πapprox(θ|d) using standard MCMC method ≈ 26 mins Computational Costs: Ofﬂine Online Standard -/- M × (nt × CPDE) Surrogate nt × O(nk × CPDE) M × Ceval Simon Cotter Bayesian Complexities 13 / 39

34. Results: Copper Sample Ofﬂine: Compute surrogate solution : ≈ 16 mins (Solved 600K equations ×nt = 800 time steps) Online: Generate M = 107 samples of θ from the approximate posterior πapprox(θ|d) using standard MCMC method ≈ 26 mins Computational Costs: Ofﬂine Online Standard -/- M × (nt × CPDE) Surrogate nt × O(nk × CPDE) M × Ceval Simon Cotter Bayesian Complexities 13 / 39

35. Posterior Density, π(θ|z) 353 354 355 356 357 1.179 1.18 1.181 1.182 1.183 1.184 10 12 1 1.5 2 2.5 3 3.5 4 4.5 5 10-10 352 353 354 355 356 357 358 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.178 1.179 1.18 1.181 1.182 1.183 1.184 1.185 1012 0 1 2 3 4 5 6 7 10 -10 Simon Cotter Bayesian Complexities 14 / 39

36. Posterior Convergence in k (Polynomial Degree) 352 353 354 355 356 357 358 359 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Simon Cotter Bayesian Complexities 15 / 39

37. V. HOANG, C. SCHWAB, AND A. STUART, Complexity Analysis of Accelerated MCMC Methods for Bayesian Inversion, Inverse Problems, 29 (2013), p. 085010. Y. MARZOUK, H. NAJM, AND L. RAHN, Stochastic Spectral Methods for Efﬁcient Bayesian Solution of Inverse Problems, Journal of Computational Physics, 224 (2007), pp. 560–586. F. NOBILE AND R. TEMPONE, Analysis and Implementation Issues for the Numerical Approximation of Parabolic Equations with Random Coefﬁcients, International Journal for Numerical Methods in Engineering, 80 (2009), pp. 979–1006. J. A. RYNN, S. L. COTTER, C. E. POWELL, AND L. WRIGHT, Surrogate accelerated bayesian inversion for the determination of the thermal diffusivity of a material, Metrologia, 56 (2019), p. 015018. Simon Cotter Bayesian Complexities 16 / 39

39. Motivation Simon Cotter Bayesian Complexities 18 / 39

42. Multiscale Systems 0 0.05 0.1 0.15 0.2 70 80 90 100 110 120 130 Time t Numberofmolecules X1 X2 X3 (X1+X2)/2 ∅ k1 −−−−→ X1 k2x1 −−−−→ ←−−−− k3x2 X2 k4x2 −−−−→ X3 k5x3 −−−−→ ∅ Simon Cotter Bayesian Complexities 21 / 39

43. Constrained approximation: Simple Example 100 105 0 0.2 0.4 k1k2 100 105 0 20 0 10 20 30 0 0.05 0.1 k3 100 105 0 20 40 0 20 0 20 40 0 20 40 0 0.05 0.1 k1 k4 100 105 0 5 k2 0 20 0 5 k3 0 20 40 0 5 0 5 0 0.5 1 k4 Figure: CMA approximation of the posterior arising from observations of the slow variable S = X1 + X2, concentrated around a manifold k1(k2+k3+k4) k2k4 = C, i.e. more challenging than this plot suggests. (Any visualisation suggestions?) Simon Cotter Bayesian Complexities 22 / 39

44. Importance Sampling Sample xi ∼ ν Compute weights wi = π(xi)/ν(xi) Monte Carlo estimates: Eπ(f) ≈ 1 j wj N i=1 f(xi)wi Variance of weights indicator for efﬁciency Small when π ≈ ν Simon Cotter Bayesian Complexities 23 / 39

49. Advantages of Importance Sampling: 102 samples 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 25 Simon Cotter Bayesian Complexities 24 / 39

50. Advantages of Importance Sampling: 103 samples 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 Simon Cotter Bayesian Complexities 24 / 39

51. Advantages of Importance Sampling: 104 samples 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 Simon Cotter Bayesian Complexities 24 / 39

52. Advantages of Importance Sampling: 105 samples 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 Simon Cotter Bayesian Complexities 24 / 39

53. Advantages of Importance Sampling: 106 samples 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 Simon Cotter Bayesian Complexities 24 / 39

54. Advantages of Importance Sampling: Weights 0 0.2 0.4 0.6 0.8 1 10 −15 10 −10 10 −5 10 0 10 5 / Simon Cotter Bayesian Complexities 25 / 39

55. Disadvantages of Importance Sampling: 102 samples −0.5 0 0.5 1 0 5 10 15 20 25 30 35 40 Simon Cotter Bayesian Complexities 26 / 39

56. Disadvantages of Importance Sampling: 103 samples −0.5 0 0.5 1 0 5 10 15 20 25 30 35 Simon Cotter Bayesian Complexities 26 / 39

57. Disadvantages of Importance Sampling: 104 samples −0.5 0 0.5 1 0 5 10 15 20 25 Simon Cotter Bayesian Complexities 26 / 39

58. Disadvantages of Importance Sampling: 105 samples −0.5 0 0.5 1 0 2 4 6 8 10 12 Simon Cotter Bayesian Complexities 26 / 39

59. Disadvantages of Importance Sampling: 106 samples −0.5 0 0.5 1 0 2 4 6 8 10 12 14 Simon Cotter Bayesian Complexities 26 / 39

60. Disadvantages of Importance Sampling: Weights −4 −2 0 2 4 6 10 −300 10 −200 10 −100 10 0 10 100 / Simon Cotter Bayesian Complexities 27 / 39

61. Ensemble Transport Adaptive Importance Sampling Proposal distribution in kth iteration informed by M ensemble members with states θi χ(k) = 1 M M i=1 q(·; θ (k) i , β) q(·; ·, β) a transition kernel, e.g. Gaussian, MALA proposal, etc, with scaling parameter β Resampling step; ensemble transform method For large M, greedy approximation used, “multinomial approximation” C. Cotter, SLC, P. Russell, “Ensemble transport adaptive importance sampling”, SIAM JUQ 2019. S. Reich, “A non-parametric ensemble transform method for Bayesian inference”, SISC 2013. Simon Cotter Bayesian Complexities 28 / 39

65. Ensemble Transport Adaptive Importance Sampling: Prior and Posterior −5 0 5 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Simon Cotter Bayesian Complexities 29 / 39

66. Ensemble Transport Adaptive Importance Sampling: Current State Xi 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Simon Cotter Bayesian Complexities 29 / 39

67. Ensemble Transport Adaptive Importance Sampling: MALA Proposals 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Simon Cotter Bayesian Complexities 29 / 39

68. Ensemble Transport Adaptive Importance Sampling: Aggregate Proposal 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Simon Cotter Bayesian Complexities 29 / 39

69. Ensemble Transport Adaptive Importance Sampling: Aggregate Proposal 0 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Simon Cotter Bayesian Complexities 29 / 39

70. Ensemble Transport Adaptive Importance Sampling: Aggregate Proposal and Weight Function 10 −3 10 −2 10 −1 10 0 10 1 10 −50 10 −40 10 −30 10 −20 10 −10 10 0 Target distribution Proposal distribution Weights / Simon Cotter Bayesian Complexities 29 / 39

71. Ensemble Transport Adaptive Importance Sampling: Samples from Proposal 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 Simon Cotter Bayesian Complexities 29 / 39

72. Ensemble Transport Adaptive Importance Sampling: Sample Weights 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Simon Cotter Bayesian Complexities 29 / 39

73. Ensemble Transport Adaptive Importance Sampling: Resampled States 0 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Simon Cotter Bayesian Complexities 29 / 39

74. ETAIS - pros and cons PROS: Possible big speed-ups with parallelisation Well-informed proposals Reduces variance of importance weights Adaptive to global differences in scales of parameters CONS: Posterior concentrated on lower dimensional manifold: Stability issues Slow convergence Requires large ensemble size (expensive) Particle transition kernel q needs to “know” about the manifold Simon Cotter Bayesian Complexities 30 / 39

83. Motivation �� Simon Cotter Bayesian Complexities 31 / 39

84. Transport maps Find homeomorphism T : Rd → Rd which maps target measure π to an easily explored reference measure πr µ(T−1 (A)) = µr (A) Simple proposal densities on πr map to complex informed densities on π via T−1 v ∼ T−1 (q(·, u; β)) Low-dimensional approximation can be computed from a posterior sample M. Parno, Y. Marzouk, “Transport Map Accelerated Markov Chain Monte Carlo”, SIAM journal on uncertainty quantiﬁcation, 2018. Simon Cotter Bayesian Complexities 32 / 39

87. Transport map simpliﬁcation of Rosenbrock Target parameter space -1 0 1 2 3 0 2 4 6 (a) Original sample θ from MH-RW algorithm. Reference parameter space -4 -2 0 2 4 -4 -2 0 2 4 (b) Push forward of θ onto reference space. Figure: The effect of the approximate transport map ˜T on a sample from the Rosenbrock target density. Simon Cotter Bayesian Complexities 33 / 39

88. Rosenbrock density Number of samples 102 103 104 105 106 relativeL2 error 10-4 10-3 10-2 10-1 RWMH TRWMH ETAIS-RW ETAIS-TRW Simon Cotter Bayesian Complexities 34 / 39

89. Constrained approximation: Simple Example 100 105 0 0.2 0.4 k1k2 100 105 0 20 0 10 20 30 0 0.05 0.1 k3 100 105 0 20 40 0 20 0 20 40 0 20 40 0 0.05 0.1 k1 k4 100 105 0 5 k2 0 20 0 5 k3 0 20 40 0 5 0 5 0 0.5 1 k4 Figure: CMA approximation of the posterior arising from observations of the slow variable S = X1 + X2, concentrated around a manifold k1(k2+k3+k4) k2k4 = C, i.e. more challenging than this plot suggests. (Any visualisation suggestions?) Simon Cotter Bayesian Complexities 35 / 39

90. Multiscale stochastic reaction network example Number of samples 10 2 10 3 10 4 10 5 10 6 relativeerror 10-4 10-3 10 -2 10 -1 10 0 MH-logRW MH-logTRW ETAIS-logRW ETAIS-logTRW O(1/ √ N) Figure: Sampling algorithms with a log preconditioner for ˜T. Simon Cotter Bayesian Complexities 36 / 39

91. Simon Cotter Bayesian Complexities 37 / 39

92. References SLC, I. Kevrekidis, P. Russell, 2019. “Transport map accelerated adaptive importance sampling, and application to inverse problems arising from multiscale stochastic reaction networks.” arXiv preprint arXiv:1901.11269, submitted to SIAM UQ. M. Parno, Y. Marzouk, “Transport Map Accelerated Markov Chain Monte Carlo”, SIAM journal on uncertainty quantiﬁcation, 2018. C. Cotter, SLC, P. Russell, “Ensemble transport adaptive importance sampling”, SIAM journal on uncertainty quantiﬁcation, 2019. S. Reich, “A non-parametric ensemble transform method for Bayesian inference”, SISC 2013. SLC, “Constrained approximation of effective generators for multiscale stochastic reaction networks and application to conditioned path sampling”, Journal of Computational Physics, 2016 Simon Cotter Bayesian Complexities 38 / 39

94. Conclusions Multiple possible reasons for extortionate costs in Bayesian inference High cost of likelihood evaluations due to numerical approximation of PDEs Surrogates can sample efﬁciently from approximation of posterior Does πapprox(θ|d) converge to π(θ|d)? In what sense? How is the error π(θ|d) − πapprox(θ|d) affected by the error between the solution to the forward model and the chosen surrogate, T(θ) − Tapprox(θ) ? Large number of MCMC samples required due to complex posterior structure Many posteriors are concentrated on lower-dimensional manifolds Optimal transport maps can simplify target distributions Ensemble adaptive importance sampling schemes can be stabilised and accelerated for lower ensemble sizes Simon Cotter Bayesian Complexities 39 / 39

MUMS: Transition & SPUQ Workshop - Complexities in Bayesian Inverse Problems: Models and Distributions - Simon Cotter, May 17, 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MUMS: Transition & SPUQ Workshop - Complexities in Bayesian Inverse Problems: Models and Distributions - Simon Cotter, May 17, 2019

Similar to MUMS: Transition & SPUQ Workshop - Complexities in Bayesian Inverse Problems: Models and Distributions - Simon Cotter, May 17, 2019 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

MUMS: Transition & SPUQ Workshop - Complexities in Bayesian Inverse Problems: Models and Distributions - Simon Cotter, May 17, 2019