Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

1,616 views

Published on

Short-course about Bayesian computation with INLA given on the AS2013 conference in Ribno, Slovenia.

No Downloads

Total views

1,616

On SlideShare

0

From Embeds

0

Number of Embeds

10

Shares

0

Downloads

96

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Bayesian computation using INLA Thiago G. Martins Norwegian University of Science and Technology Trondheim, Norway AS 2013, Ribno, Slovenia September, 2013 1 / 140
- 2. Parte I Latent Gaussian models and INLA methodology 2 / 140
- 3. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 3 / 140
- 4. Hierarchical Bayesian models Hierarchical models are an extremely useful tool in Bayesian model building. Three parts: Observations (y): Encodes information about observed data, including design and collection issues. The latent process (x): The unobserved process. May be the focus of the study, or may be included to reduce autocorrelation. E.g., encode spatial and/or temporal dependence. The Parameter model (θ): Models for all of the parameters in the observation and latent processes. 4 / 140
- 5. Hierarchical Bayesian models Hierarchical models are an extremely useful tool in Bayesian model building. Three parts: Observations (y): Encodes information about observed data, including design and collection issues. The latent process (x): The unobserved process. May be the focus of the study, or may be included to reduce autocorrelation. E.g., encode spatial and/or temporal dependence. The Parameter model (θ): Models for all of the parameters in the observation and latent processes. 4 / 140
- 6. Hierarchical Bayesian models Hierarchical models are an extremely useful tool in Bayesian model building. Three parts: Observations (y): Encodes information about observed data, including design and collection issues. The latent process (x): The unobserved process. May be the focus of the study, or may be included to reduce autocorrelation. E.g., encode spatial and/or temporal dependence. The Parameter model (θ): Models for all of the parameters in the observation and latent processes. 4 / 140
- 7. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian ﬁeld x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
- 8. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian ﬁeld x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
- 9. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian ﬁeld x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
- 10. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian ﬁeld x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
- 11. Precision matrix The precision matrix of the latent ﬁeld Q(θ) = Σ(θ)−1 plays a key role! Two issues Building models through conditioning (“hierarchical models”) Computational beneﬁts 6 / 140
- 12. Precision matrix The precision matrix of the latent ﬁeld Q(θ) = Σ(θ)−1 plays a key role! Two issues Building models through conditioning (“hierarchical models”) Computational beneﬁts 6 / 140
- 13. Building models through conditioning If x ∼ N (0, Q−1 ) x y|x ∼ N (x, Q−1 ) y then Q(x,y) = Qx + Qy −Qy −Qy Qy Not so nice expressions using the Covariance-matrix 7 / 140
- 14. Computational beneﬁts Precision matrices encodes conditional independence: xi ⊥ xj |x−ij ⇐⇒ Qij = 0 We are interested in models with sparse precision matrices. x ∼ N (·, Σ(θ)) with sparse Q(θ) = Σ(θ)−1 Gaussians with a sparse precision matrix are called Gaussian Markov random ﬁelds (GMRFs) Good computational properties through numerical algorithms for sparse matrices 8 / 140
- 15. Computational beneﬁts Precision matrices encodes conditional independence: xi ⊥ xj |x−ij ⇐⇒ Qij = 0 We are interested in models with sparse precision matrices. x ∼ N (·, Σ(θ)) with sparse Q(θ) = Σ(θ)−1 Gaussians with a sparse precision matrix are called Gaussian Markov random ﬁelds (GMRFs) Good computational properties through numerical algorithms for sparse matrices 8 / 140
- 16. Computational beneﬁts Precision matrices encodes conditional independence: xi ⊥ xj |x−ij ⇐⇒ Qij = 0 We are interested in models with sparse precision matrices. x ∼ N (·, Σ(θ)) with sparse Q(θ) = Σ(θ)−1 Gaussians with a sparse precision matrix are called Gaussian Markov random ﬁelds (GMRFs) Good computational properties through numerical algorithms for sparse matrices 8 / 140
- 17. Numerical algorithms for sparse matrices: scaling properties Time: O(n) Space: O(n3/2 ) Space-time: O(n2 ) This is to be compared with general O(n3 ) algorithms for dense matrices. 9 / 140
- 18. Numerical algorithms for sparse matrices: scaling properties Time: O(n) Space: O(n3/2 ) Space-time: O(n2 ) This is to be compared with general O(n3 ) algorithms for dense matrices. 9 / 140
- 19. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 10 / 140
- 20. Example (I): Mixed-eﬀect model yij |ηij , θ 1 ∼ π(yij |ηij , θ 1 ), i = 1, . . . , N, j = 1, . . . , M ηij = µ + cij β + ui + vj + wij where u, v and w are “random eﬀects”. If we assign Gaussian priors on µ, β, u and v, then x|θ 2 = (µ, β, u, v, η)|θ 2 is jointly Gaussian. θ = (θ 1 , θ 2 ) 11 / 140
- 21. Example (I): Mixed-eﬀect model yij |ηij , θ 1 ∼ π(yij |ηij , θ 1 ), i = 1, . . . , N, j = 1, . . . , M ηij = µ + cij β + ui + vj + wij where u, v and w are “random eﬀects”. If we assign Gaussian priors on µ, β, u and v, then x|θ 2 = (µ, β, u, v, η)|θ 2 is jointly Gaussian. θ = (θ 1 , θ 2 ) 11 / 140
- 22. Example (I) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 12 / 140
- 23. Example (I) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 12 / 140
- 24. Example (I) - cont. 0.0 0.2 0.4 0.6 0.8 1.0 Precision matrix (η, u, v, µ, β) N = 100, M = 5. 0.0 0.2 0.4 0.6 0.8 1.0 13 / 140
- 25. Example (II): Time-series model Smoothing of binary time-series Data is sequence of 0 and 1s Probability for a 1 at time t, pt , depends on time pt = exp(ηt ) 1 + exp(ηt ) Linear predictor ηt = µ + βct + ut + vt , t = 1, . . . , n 14 / 140
- 26. Example (II): Time-series model Smoothing of binary time-series Data is sequence of 0 and 1s Probability for a 1 at time t, pt , depends on time pt = exp(ηt ) 1 + exp(ηt ) Linear predictor ηt = µ + βct + ut + vt , t = 1, . . . , n 14 / 140
- 27. Example (II): Time-series model Smoothing of binary time-series Data is sequence of 0 and 1s Probability for a 1 at time t, pt , depends on time pt = exp(ηt ) 1 + exp(ηt ) Linear predictor ηt = µ + βct + ut + vt , t = 1, . . . , n 14 / 140
- 28. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
- 29. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
- 30. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
- 31. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
- 32. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
- 33. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
- 34. Example (II) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 16 / 140
- 35. Example (II) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 16 / 140
- 36. Example (II) - cont. 0.0 0.2 0.4 0.6 0.8 1.0 Precision matrix (η, u, v, µ, β), n = 100. 0.0 0.2 0.4 0.6 0.8 1.0 17 / 140
- 37. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth eﬀect of a covariate c −0.63 18 / 140
- 38. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth eﬀect of a covariate c −0.63 18 / 140
- 39. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth eﬀect of a covariate c −0.63 18 / 140
- 40. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth eﬀect of a covariate c −0.63 18 / 140
- 41. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth eﬀect of a covariate c −0.63 18 / 140
- 42. Yet Another Example (III) We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 19 / 140
- 43. Example (III) - cont. 0.0 0.2 0.4 0.6 0.8 1.0 Precision matrix (η, u, v, µ, f) 0.0 0.2 0.4 0.6 0.8 1.0 20 / 140
- 44. What we have learned so far The latent Gaussian model construct θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i occurs in many, seemingly unrelated, statistical models. GLM/GAM/GLMM/GAMM/++ 21 / 140
- 45. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 46. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 47. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 48. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 49. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 50. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 51. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 52. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 53. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 54. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 55. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 56. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 57. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
- 58. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 23 / 140
- 59. Bayesian computing We are interested in the posterior marginal quantities like π(xi |y) and π(θi |y). This requires the evaluation of integrals of the form π(xi |y) ∝ π(y |x, θ)π(x|θ)π(θ) dθ dx{−i} x{−i} θ The computation of massively high dimensional integrals is at the core of Bayesian computing. 24 / 140
- 60. Bayesian computing We are interested in the posterior marginal quantities like π(xi |y) and π(θi |y). This requires the evaluation of integrals of the form π(xi |y) ∝ π(y |x, θ)π(x|θ)π(θ) dθ dx{−i} x{−i} θ The computation of massively high dimensional integrals is at the core of Bayesian computing. 24 / 140
- 61. Bayesian computing We are interested in the posterior marginal quantities like π(xi |y) and π(θi |y). This requires the evaluation of integrals of the form π(xi |y) ∝ π(y |x, θ)π(x|θ)π(θ) dθ dx{−i} x{−i} θ The computation of massively high dimensional integrals is at the core of Bayesian computing. 24 / 140
- 62. But surely we can already do this Markov Chain Monte Carlo (MCMC) is widely used by the applied community. There are generic tools available for MCMC, OpenBUGS, JAGS, STAN and others for speciﬁc models, like BayesX. The issue of Bayesian computing is not “solved” even though MCMC is available Hierarchical models are more diﬃcult for MCMC Strong dependencies, bad mixing. A main obstacle for Bayesian modeling is still the issue of “Bayesian computing” 25 / 140
- 63. But surely we can already do this Markov Chain Monte Carlo (MCMC) is widely used by the applied community. There are generic tools available for MCMC, OpenBUGS, JAGS, STAN and others for speciﬁc models, like BayesX. The issue of Bayesian computing is not “solved” even though MCMC is available Hierarchical models are more diﬃcult for MCMC Strong dependencies, bad mixing. A main obstacle for Bayesian modeling is still the issue of “Bayesian computing” 25 / 140
- 64. But surely we can already do this Markov Chain Monte Carlo (MCMC) is widely used by the applied community. There are generic tools available for MCMC, OpenBUGS, JAGS, STAN and others for speciﬁc models, like BayesX. The issue of Bayesian computing is not “solved” even though MCMC is available Hierarchical models are more diﬃcult for MCMC Strong dependencies, bad mixing. A main obstacle for Bayesian modeling is still the issue of “Bayesian computing” 25 / 140
- 65. So what’s wrong with MCMC? This is actually a problem with any Monte-Carlo scheme. Error in expectations The Monte-Carlo error is Var E(f (X )) − 1 N N f (xi ) i=1 =O 1 √ N In practical terms, to reduce the variance to O(10−p ) you need O(102p ) samples! This can be optimistic! 26 / 140
- 66. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
- 67. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
- 68. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
- 69. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
- 70. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
- 71. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
- 72. Comparing results with MCMC When comparing the results of R-INLA with MCMC, it is important to use the same model. Here we have compared the EPIL example results with those obtained using JAGS via the rjags package 28 / 140
- 73. Comparing results with MCMC When comparing the results of R-INLA with MCMC, it is important to use the same model. Here we have compared the EPIL example results with those obtained using JAGS via the rjags package 28 / 140
- 74. Age 1.0 Density 0.5 3 0 0.0 1 2 Density 4 5 1.5 Intercept, 0.125 minutes 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 log(tau.b) 29 / 140
- 75. Age 0.8 Density 0.4 3 0 0.0 1 2 Density 4 5 1.2 6 Intercept, 0.25 minutes 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
- 76. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 0.5 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
- 77. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 1 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
- 78. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 2 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −1.0 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 1.0 Density 0.0 0.5 0.5 0.0 Density 1.0 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
- 79. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 4 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −1.0 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 1.0 Density 0.0 0.5 0.5 0.0 Density 1.0 1.5 1.5 a0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
- 80. Density 4 3 Density 2 1 0 1.3 1.4 1.5 1.6 1.7 1.8 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Age 5 Intercept, 8 minutes 1.9 −1.0 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 1.0 Density 0.0 0.5 0.5 0.0 Density 1.0 1.5 1.5 a0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
- 81. Density 4 3 Density 2 1 0 1.3 1.4 1.5 1.6 1.7 1.8 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 16 minutes 1.9 −1.5 −0.5 0.5 1.0 1.5 2.0 alpha.Age log(tau.Ind) log(tau.Rand) 1.0 Density 0.5 0.8 0.0 0.4 0.0 Density 1.2 1.5 a0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
- 82. Density 4 3 Density 2 1 0 1.3 1.4 1.5 1.6 1.7 1.8 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 32 minutes 1.9 −1.5 −0.5 0.5 1.0 1.5 2.0 alpha.Age log(tau.Ind) log(tau.Rand) 1.0 Density 0.5 0.8 0.0 0.4 0.0 Density 1.2 1.5 a0 0.0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 log(tau.b) 29 / 140
- 83. Density 4 3 Density 2 1 0 1.2 1.4 1.6 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 64 minutes 1.8 −1 0 1 log(tau.Ind) log(tau.Rand) 0.5 1.0 Density 1.2 0.8 0.0 0.4 0.0 Density 2 alpha.Age 1.5 a0 0.0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 log(tau.b) 29 / 140
- 84. Density 4 3 Density 2 1 0 1.2 1.4 1.6 1.8 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 120 minutes 2.0 −1 0 1 log(tau.Ind) log(tau.Rand) 0.5 1.0 Density 1.2 0.8 0.0 0.4 0.0 Density 2 alpha.Age 1.5 a0 0.0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 log(tau.b) 29 / 140
- 85. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 30 / 140
- 86. Main aim Posterior π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I Compute the posterior marginals: π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 31 / 140
- 87. Main aim Posterior π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I Compute the posterior marginals: π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 31 / 140
- 88. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
- 89. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
- 90. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
- 91. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
- 92. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
- 93. Task 1: π(θ|y) The Laplace approximation for π(θ|y) is π(θ|y) = ∝ ≈ π(x, θ|y) π(x|θ, y) π(θ) π(x|θ) π(y|x) π(x|θ, y) π(θ) π(x|θ) π(y|x, θ) πG (x|θ, y) x=x∗ (θ) where πG (x|θ, y) is the Gaussian approximation of π(x|θ, y) and x∗ (θ) is the mode. 33 / 140
- 94. The GMRF-approximation 1 π(x|y) ∝ exp − xT Qx + 2 log π(yi |xi ) i 1 ≈ exp − (x − µ)T (Q + diag(ci ))(x − µ) 2 = π(x|y) Constructed as follows: Locate the mode x∗ Expand to second order Markov and computational properties are preserved 34 / 140
- 95. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
- 96. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
- 97. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
- 98. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
- 99. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
- 100. Task 2: π(xi |y, θ) This task is more challenging, since dimension of x, n is large and there are potential n marginals to compute, or at least O(n). Here we present three options: 1. Gaussian approximation 2. Laplace approximation 3. Simpliﬁed Laplace approximation There is a trade-oﬀ between accuracy and complexity. 36 / 140
- 101. Task 2: π(xi |y, θ) This task is more challenging, since dimension of x, n is large and there are potential n marginals to compute, or at least O(n). Here we present three options: 1. Gaussian approximation 2. Laplace approximation 3. Simpliﬁed Laplace approximation There is a trade-oﬀ between accuracy and complexity. 36 / 140
- 102. π(xi |y, θ) - 1. Gaussian approximation An obvious simple and fast alternative, is to use the GMRF-approximation πG (x|y, θ) π(xi |θ, y) = N (xi ; µ(θ), σ 2 (θ)) It is the fastest option, only need to compute the diagonal of Q(θ)−1 . Can present errors in location and asymmetry. 37 / 140
- 103. π(xi |y, θ) - 1. Gaussian approximation An obvious simple and fast alternative, is to use the GMRF-approximation πG (x|y, θ) π(xi |θ, y) = N (xi ; µ(θ), σ 2 (θ)) It is the fastest option, only need to compute the diagonal of Q(θ)−1 . Can present errors in location and asymmetry. 37 / 140
- 104. π(xi |y, θ) - 2. Laplace approximation The Laplace approximation: π(xi | y, θ) ≈ π(x, θ|y) πGG (x−i |xi , y, θ) x−i =x∗ (xi ,θ) −i Again, approximation is very good, as x−i |xi , θ is ‘almost Gaussian’, but it is expensive. In order to get the n marginals: perform n optimizations, and n factorizations of n − 1 × n − 1 matrices. 38 / 140
- 105. π(xi |y, θ) - 2. Laplace approximation The Laplace approximation: π(xi | y, θ) ≈ π(x, θ|y) πGG (x−i |xi , y, θ) x−i =x∗ (xi ,θ) −i Again, approximation is very good, as x−i |xi , θ is ‘almost Gaussian’, but it is expensive. In order to get the n marginals: perform n optimizations, and n factorizations of n − 1 × n − 1 matrices. 38 / 140
- 106. π(xi |y, θ) - 2. Laplace approximation The Laplace approximation: π(xi | y, θ) ≈ π(x, θ|y) πGG (x−i |xi , y, θ) x−i =x∗ (xi ,θ) −i Again, approximation is very good, as x−i |xi , θ is ‘almost Gaussian’, but it is expensive. In order to get the n marginals: perform n optimizations, and n factorizations of n − 1 × n − 1 matrices. 38 / 140
- 107. π(xi |y, θ) - 3. Simpliﬁed Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) suﬃciently accurate for most applications 39 / 140
- 108. π(xi |y, θ) - 3. Simpliﬁed Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) suﬃciently accurate for most applications 39 / 140
- 109. π(xi |y, θ) - 3. Simpliﬁed Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) suﬃciently accurate for most applications 39 / 140
- 110. π(xi |y, θ) - 3. Simpliﬁed Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) suﬃciently accurate for most applications 39 / 140
- 111. Task 3: Numerical integration wrt θ Now that we know how to compute: π(θ|y) - Laplace approximation 1. Gaussian π(xi |θ, y) - 2. Laplace 3. Simpliﬁed Laplace Lets see how INLA works 40 / 140
- 112. Task 3: Numerical integration wrt θ Now that we know how to compute: π(θ|y) - Laplace approximation 1. Gaussian π(xi |θ, y) - 2. Laplace 3. Simpliﬁed Laplace Lets see how INLA works 40 / 140
- 113. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
- 114. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
- 115. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
- 116. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
- 117. The integrated nested Laplace approximation (INLA) II Step II For each θ j For each i, evaluate the Laplace approximation for selected values of xi Build a Skew-Normal or log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent the conditional marginal density. 42 / 140
- 118. The integrated nested Laplace approximation (INLA) II Step II For each θ j For each i, evaluate the Laplace approximation for selected values of xi Build a Skew-Normal or log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent the conditional marginal density. 42 / 140
- 119. The integrated nested Laplace approximation (INLA) II Step II For each θ j For each i, evaluate the Laplace approximation for selected values of xi Build a Skew-Normal or log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent the conditional marginal density. 42 / 140
- 120. The integrated nested Laplace approximation (INLA) III Step III Sum out θ j For each i, sum out θ π(xi | y) ∝ π(xi | y, θ j ) × π(θ j | y) j Build a log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent π(xi | y). 43 / 140
- 121. The integrated nested Laplace approximation (INLA) III Step III Sum out θ j For each i, sum out θ π(xi | y) ∝ π(xi | y, θ j ) × π(θ j | y) j Build a log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent π(xi | y). 43 / 140
- 122. The integrated nested Laplace approximation (INLA) III Step III Sum out θ j For each i, sum out θ π(xi | y) ∝ π(xi | y, θ j ) × π(θ j | y) j Build a log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent π(xi | y). 43 / 140
- 123. Computing posterior marginals for θj (I) Main idea Use the integration-points and build an interpolant Use numerical integration on that interpolant 44 / 140
- 124. Computing posterior marginals for θj (I) Main idea Use the integration-points and build an interpolant Use numerical integration on that interpolant 44 / 140
- 125. How can we assess the error in the approximations? Tool 1: Compare a sequence of improved approximations 1. Gaussian approximation 2. Simpliﬁed Laplace 3. Laplace 45 / 140
- 126. How can we assess the error in the approximations? Tool 2: Estimate the “eﬀective” number of parameters as deﬁned in the Deviance Information Criteria: pD (θ) = D(x; θ) − D(x; θ) and compare this with the number of observations. Low ratio is good. This criteria has theoretical justiﬁcation. 46 / 140
- 127. Parte II R-INLA package 47 / 140
- 128. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 48 / 140
- 129. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
- 130. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library Basic library written in C for fast computations for GMRFs. The inla-program The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
- 131. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program Deﬁne latent Gaussian models and interface with the GMRFLib-library Models are deﬁned using .ini-ﬁles inla-program write all the results (E/Var/marginals) to ﬁles The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
- 132. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program The INLA package for R R-interface to the inla-program. (That’s why its not on CRAN.) Convert “formula”-statements into “.ini”-ﬁle deﬁnitions Run inla-program Get results back to R Happily, the R package is all we need to learn!!! 49 / 140
- 133. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
- 134. The INLA package for R Produces: 1. Data Frame formula − Input files − ini file Input Runs the INLA package 2. inla program Output A R object of type list can get summary, plots etc. 3. Collects results 50 / 140
- 135. R-INLA Visit the www-site www.r-inla.org and follow the instructions. www-site contains source-code, examples, reports +++ The ﬁrst time do > source("http://www.math.ntnu.no/inla/givmeINLA.R") Later, you can upgrade the package doing > inla.upgrade() or if you want the test-version, which you want, > inla.upgrade(testing=TRUE) Available for Linux, Windows and Mac 51 / 140
- 136. R-INLA Visit the www-site www.r-inla.org and follow the instructions. www-site contains source-code, examples, reports +++ The ﬁrst time do > source("http://www.math.ntnu.no/inla/givmeINLA.R") Later, you can upgrade the package doing > inla.upgrade() or if you want the test-version, which you want, > inla.upgrade(testing=TRUE) Available for Linux, Windows and Mac 51 / 140
- 137. R-INLA Visit the www-site www.r-inla.org and follow the instructions. www-site contains source-code, examples, reports +++ The ﬁrst time do > source("http://www.math.ntnu.no/inla/givmeINLA.R") Later, you can upgrade the package doing > inla.upgrade() or if you want the test-version, which you want, > inla.upgrade(testing=TRUE) Available for Linux, Windows and Mac 51 / 140
- 138. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 52 / 140
- 139. The structure of an R program using INLA There are essentially three parts to an INLA program: 1. The data organization. 2. The formula - notation inherited from R’s native glm function. 3. The call to the INLA program. 53 / 140
- 140. The inla function This is all that’s needed for a basic call > result <- inla( formula = y ~ 1 + x, # This describes your latent # field family = "gaussian", # The likelihood distribution. data = data.frame(y,x) # A list or dataframe ) 54 / 140
- 141. The simplest case: Linear regression n = 100 x = sort(runif(n)) y = 1 + x + rnorm(n, sd = 0.1) plot(x,y) formula = y ~ 1 + x result = inla(formula, data = data.frame(x,y), family = "gaussian") summary(result) plot(result) 55 / 140
- 142. Call: c("inla(formula = formula, family = "gaussian", data = data.frame(x, ", Time used: Pre-processing 0.08050394 Running inla Post-processing 0.03020334 0.01916695 " y))") Total 0.12987423 Fixed effects: mean sd 0.025quant 0.5quant 0.975quant kld (Intercept) 0.9690533 0.01849785 0.9327319 0.9690531 1.005387 0 x 1.0426582 0.03126996 0.9812582 1.0426580 1.104079 0 The model has no random effects Model hyperparameters: mean sd 0.025quant 0.5quant Precision for the Gaussian observations 127.45 18.10 95.14 126.37 0.975quant Precision for the Gaussian observations 166.11 Expected number of effective parameters(std dev): 2.209(0.02362) Number of equivalent replicates : 45.27 Marginal Likelihood: 88.01 56 / 140
- 143. Likelihood functions - family argument result = inla(formula, data = data.frame(x,y), family = "gaussian") “binomial” “coxph” “Exponential” “gaussian” “gev” “laplace” “sn”(Skew Normal) “stochvol”, ”stochvol.nig”, ”stochvol.t” “T” “weibull” Many others: go to http://r-inla.org/ 57 / 140
- 144. Likelihood functions - family argument result = inla(formula, data = data.frame(x,y), family = "gaussian") “binomial” “coxph” “Exponential” “gaussian” “gev” “laplace” “sn”(Skew Normal) “stochvol”, ”stochvol.nig”, ”stochvol.t” “T” “weibull” Many others: go to http://r-inla.org/ 57 / 140
- 145. A more general model Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) where x1 , x2 βi x3 are covariates, linear eﬀect −1 ∼ N (0, τ1 ) can be the index for spatial eﬀect, random eﬀect, etc {f1 , f2 , . . . } ∼ N (0, Qf−1 (τ2 )) 58 / 140
- 146. A more general model Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) where x1 , x2 βi x3 are covariates, linear eﬀect −1 ∼ N (0, τ1 ) can be the index for spatial eﬀect, random eﬀect, etc {f1 , f2 , . . . } ∼ N (0, Qf−1 (τ2 )) 58 / 140
- 147. A more general model (cont.) Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) > formula = y ∼ x1 + x2 + f(x3, ...) y1 y2 y=. . . η1 η2 η=. . . g −→ yn ηn η1 1 x11 x21 η2 1 x12 x22 η = . = β0 ∗ . + β1 ∗ . + β2 ∗ . + . . . . . . . . ηn 1 x1n x2n fx31 fx3 2 . . . fx3n 59 / 140
- 148. A more general model (cont.) Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) > formula = y ∼ x1 + x2 + f(x3, ...) y1 y2 y=. . . η1 η2 η=. . . g −→ yn ηn η1 1 x11 x21 η2 1 x12 x22 η = . = β0 ∗ . + β1 ∗ . + β2 ∗ . + . . . . . . . . ηn 1 x1n x2n fx31 fx3 2 . . . fx3n 59 / 140
- 149. A more general model (cont.) Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) > formula = y ∼ x1 + x2 + f(x3, ...) y1 y2 y=. . . η1 η2 η=. . . g −→ yn ηn η1 1 x11 x21 η2 1 x12 x22 η = . = β0 ∗ . + β1 ∗ . + β2 ∗ . + . . . . . . . . ηn 1 x1n x2n fx31 fx3 2 . . . fx3n 59 / 140
- 150. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
- 151. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
- 152. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
- 153. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
- 154. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
- 155. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
- 156. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
- 157. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
- 158. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
- 159. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
- 160. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
- 161. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
- 162. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 62 / 140
- 163. EPIL example Seizure counts in a randomized trial of anti-convulsant therapy in epilepsy. From WinBUGS manual. Patient 1 2 .... 59 y1 5 3 y2 3 5 y3 3 3 y4 3 3 Trt 0 0 Base 11 11 Age 31 30 1 4 3 2 1 12 37 63 / 140
- 164. EPIL example (cont.) Mixed model with repeated Poisson counts yjk ∼ Poisson(µjk ); j = 1, . . . , 59; k = 1, . . . , 4 log (µjk ) = α0 + α1 log(Basej /4) + α2 Trtj +α3 Trtj log(Basej /4) + α4 Agej + α5 V 4 +Indj + βjk αi Indj βj k ∼ N (0, τα ) ∼ N (0, τInd ) ∼ N (0, τβ ) τα known τInd ∼ Gamma(a1 , b1 ) τβ ∼ Gamma(a2 , b2 ) 64 / 140
- 165. EPIL example (cont.) The Epil data frame: y 5 3 . . . Trt 0 0 Base 11 11 Age 31 31 V4 0 0 rand 1 2 Ind 1 1 Specifying the model: formula = y ∼ log(Base/4) + Trt + I(Trt * log(Base/4)) + log(Age) + V4 + f(Ind, model = "iid") + f(rand, model="iid") 1 1 η= = β0 ∗ . + . . . + . . 1 η4∗59 η1 η2 . . . Ind f1 f Ind 1 . + . . Ind f59 Rand f1 f Rand 2 . . . Ind f4∗59 65 / 140
- 166. EPIL example (cont.) The Epil data frame: y 5 3 . . . Trt 0 0 Base 11 11 Age 31 31 V4 0 0 rand 1 2 Ind 1 1 Specifying the model: formula = y ∼ log(Base/4) + Trt + I(Trt * log(Base/4)) + log(Age) + V4 + f(Ind, model = "iid") + f(rand, model="iid") 1 1 η= = β0 ∗ . + . . . + . . 1 η4∗59 η1 η2 . . . Ind f1 f Ind 1 . + . . Ind f59 Rand f1 f Rand 2 . . . Ind f4∗59 65 / 140
- 167. EPIL example (cont.) The Epil data frame: y 5 3 . . . Trt 0 0 Base 11 11 Age 31 31 V4 0 0 rand 1 2 Ind 1 1 Specifying the model: formula = y ∼ log(Base/4) + Trt + I(Trt * log(Base/4)) + log(Age) + V4 + f(Ind, model = "iid") + f(rand, model="iid") 1 1 η= = β0 ∗ . + . . . + . . 1 η4∗59 η1 η2 . . . Ind f1 f Ind 1 . + . . Ind f59 Rand f1 f Rand 2 . . . Ind f4∗59 65 / 140
- 168. data(Epil) my.center = function(x) (x - mean(x)) Epil$CTrt Epil$ClBase4 Epil$CV4 Epil$ClAge = = = = my.center(Epil$Trt) my.center(log(Epil$Base/4)) my.center(Epil$V4) my.center(log(Epil$Age)) formula = y ~ ClBase4*CTrt + ClAge + CV4 + f(Ind, model="iid") + f(rand, model="iid") result = inla(formula,family="poisson", data = Epil) summary(result) plot(result) 66 / 140
- 169. 0 1 2 3 4 5 Epil-example from Win/Open-BUGS 1.2 1.4 1.6 1.8 2.0 Marginals for α0 67 / 140
- 170. 0.0 0.1 0.2 0.3 Epil-example from Win/Open-BUGS 0 5 10 15 Marginals for τβ 67 / 140
- 171. EPIL example (cont.) Access results - Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld) result$summary.fixed result$summary.random$Ind result$summary.random$rand result$summary.hyperpar - Post. marginals (matrix with x- and y- axis) result$marginals.fixed result$marginals.random$Ind result$marginals.random$rand result$marginals.hyperpar 68 / 140
- 172. EPIL example (cont.) Access results - Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld) result$summary.fixed result$summary.random$Ind result$summary.random$rand result$summary.hyperpar - Post. marginals (matrix with x- and y- axis) result$marginals.fixed result$marginals.random$Ind result$marginals.random$rand result$marginals.hyperpar 68 / 140
- 173. 0.0 0.5 1.0 1.5 2.0 Smoothing binary times series 0 100 200 300 Time Number of days in Tokyo with rainfall above 1 mm in 1983-84. We want to estimate the probability of rain pt for calendar day t = 1, . . . , 366 69 / 140
- 174. Smoothing binary times series Model with time series component yt pt ηt f τ ∼ = = = ∼ Binomial(nt , pt ); t = 1, . . . , 366 exp(ηt ) 1+exp(ηt ) f (t) {f1 , . . . , f366 } ∼ cyclic RW2(τ ) Gamma(1, 0.0001) 70 / 140
- 175. Smoothing binary time series The Tokyo data frame: y 0 0 1 . . . n 2 2 2 time 1 2 3 71 / 140
- 176. Smoothing binary time series The Tokyo data frame: y 0 0 1 . . . n 2 2 2 time 1 2 3 Specifying the model: formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1 71 / 140
- 177. Smoothing binary time series The Tokyo data frame: y 0 0 1 . . . n 2 2 2 time 1 2 3 Specifying the model: formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1 time f1 f time 2 η= = . . . time f366 η366 η1 η2 . . . 71 / 140
- 178. data(Tokyo) formula = y ~ f(time, model="rw2", cyclic=TRUE) - 1 result = inla(formula, family="binomial", Ntrials=n, data=Tokyo) 72 / 140
- 179. Posterior for temporal eﬀect -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 time 0 100 200 300 PostMean 0.025% 0.5% 0.975% 73 / 140
- 180. Posterior for precision 0e+00 1e-05 2e-05 3e-05 4e-05 5e-05 6e-05 7e-05 PostDens [Precision for time] 0 10000 20000 30000 40000 50000 60000 74 / 140
- 181. Disease mapping in Germany Larynx cancer mortality counts are observed in the 544 district of Germany from 1986 to 1990 and level of smoking consumption (100 possible values). 2.55 97 2.23 85.2 1.91 73.41 1.59 61.61 1.27 49.82 0.95 38.02 0.63 26.22 75 / 140
- 182. yi , i = 1, . . . , 544 counts of cancer mortality in Region i Ei , i = 1, . . . , 544 known variable accounting for demographic variation in Region i ci , i = 1, . . . , 544 level of smoking consumption registered in Region i 2.55 97 2.23 85.2 1.91 73.41 1.59 61.61 1.27 49.82 0.95 38.02 0.63 26.22 76 / 140
- 183. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth eﬀect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial eﬀect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random eﬀect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
- 184. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth eﬀect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial eﬀect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random eﬀect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
- 185. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth eﬀect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial eﬀect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random eﬀect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
- 186. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth eﬀect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial eﬀect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random eﬀect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
- 187. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth eﬀect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial eﬀect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random eﬀect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
- 188. For identiﬁably we deﬁne a sum-to-zero constraint for all intrinsic models, so s fs (s) = 0 = 0 i fi 78 / 140
- 189. The Germany data frame: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 The model is: ηi = µ + f (ci ) + fs (si ) + fu (si ) The data set has to contain one separate column for each term speciﬁed through f() so in this case we have to add one column. > Germany = cbind(Germany, region.struct=Germany$region) We also need the graph ﬁle where the neighborhood structure is speciﬁed germany.graph 79 / 140
- 190. The Germany data frame: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 The model is: ηi = µ + f (ci ) + fs (si ) + fu (si ) The data set has to contain one separate column for each term speciﬁed through f() so in this case we have to add one column. > Germany = cbind(Germany, region.struct=Germany$region) We also need the graph ﬁle where the neighborhood structure is speciﬁed germany.graph 79 / 140
- 191. The Germany data frame: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 The model is: ηi = µ + f (ci ) + fs (si ) + fu (si ) The data set has to contain one separate column for each term speciﬁed through f() so in this case we have to add one column. > Germany = cbind(Germany, region.struct=Germany$region) We also need the graph ﬁle where the neighborhood structure is speciﬁed germany.graph 79 / 140
- 192. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) 80 / 140
- 193. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) The sum-to-zero constraint is default in the inla function for all intrinsic models. 80 / 140
- 194. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) The sum-to-zero constraint is default in the inla function for all intrinsic models. 80 / 140
- 195. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) 80 / 140
- 196. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) The location of the graph ﬁle has to be provided here (the graph ﬁle cannot be loaded in R) 80 / 140
- 197. The graph ﬁle 544 1 The germany.graph ﬁle: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identiﬁer for the node Number of neighbors Identiﬁers for the neighbors 81 / 140
- 198. The graph ﬁle 544 1 The germany.graph ﬁle: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identiﬁer for the node Number of neighbors Identiﬁers for the neighbors 81 / 140
- 199. The graph ﬁle 544 1 The germany.graph ﬁle: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identiﬁer for the node Number of neighbors Identiﬁers for the neighbors 81 / 140
- 200. The graph ﬁle 544 1 The germany.graph ﬁle: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identiﬁer for the node Number of neighbors Identiﬁers for the neighbors 81 / 140
- 201. The graph ﬁle 544 1 The germany.graph ﬁle: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identiﬁer for the node Number of neighbors Identiﬁers for the neighbors 81 / 140
- 202. data(Germany) g = system.file("demodata/germany.graph", package="INLA") source(system.file("demodata/Bym-map.R", package="INLA")) Germany = cbind(Germany, region.struct=Germany$region) # standard BYM model formula1 = Y ~ f(region.struct,model="besag",graph=g) + f(region,model="iid") # with linear covariate formula2 = Y ~ f(region.struct,model="besag",graph=g) + f(region,model="iid") + x # with smooth covariate formula3 = Y ~ f(region.struct,model="besag",graph=g) + f(region,model="iid") + f(x, model="rw2") 82 / 140
- 203. result1 = inla(formula1,family="poisson",data=Germany,E=E, control.compute=list(dic=TRUE)) result2 = inla(formula2,family="poisson",data=Germany,E=E, control.compute=list(dic=TRUE)) result3 = inla(formula3,family="poisson",data=Germany,E=E, control.compute=list(dic=TRUE)) 83 / 140
- 204. Other graph speciﬁcation - It is also possible to deﬁne the graph structure of your model using: A symmetric (dense or sparse) matrix, where the non-zero pattern of the matrix deﬁnes the graph. A inla.graph object. See FAQ on the webpage for more information. 84 / 140
- 205. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 85 / 140
- 206. Model evaluation Deviance Information Criterion (DIC): result = inla(..., control.compute = list(dic = TRUE)) result$dic$dic Conditional predictive ordinate (CPO) and probability integral transform (PIT): CPOi = π(yi |y−i ) PITi = Prob(Yi ≤ yiobs |y−i ) result = inla(..., control.compute = list(cpo = TRUE)) result$cpo$cpo result$cpo$dic 86 / 140
- 207. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 87 / 140
- 208. Controlling θ We often need to set our own priors and using our own parameters in these. These can be set in two ways Old style using prior=.., param=..., initial=..., fixed=... New style using hyper = list(prec = list(initial=2, fixed=TRUE, ....)) The old-style is there for backward-compatibility only. The two styles can also be mixed. 88 / 140
- 209. Controlling θ We often need to set our own priors and using our own parameters in these. These can be set in two ways Old style using prior=.., param=..., initial=..., fixed=... New style using hyper = list(prec = list(initial=2, fixed=TRUE, ....)) The old-style is there for backward-compatibility only. The two styles can also be mixed. 88 / 140
- 210. Controlling θ We often need to set our own priors and using our own parameters in these. These can be set in two ways Old style using prior=.., param=..., initial=..., fixed=... New style using hyper = list(prec = list(initial=2, fixed=TRUE, ....)) The old-style is there for backward-compatibility only. The two styles can also be mixed. 88 / 140
- 211. Example - New style hyper = list( prec = list( prior = param = initial fixed = ) ) "loggamma", c(2,0.1), = 3, FALSE formula = y ~ f(i, model="iid", hyper = hyper) + ... - Old style formula = y ~ f(i, model="iid", prior = "loggamma", param = c(2,0.1), inital = 3, fixed = FALSE) + ... 89 / 140
- 212. Internal and external scale Hyperparameters, like the precision τ is represented internally using a “good” transformation, like θ1 = log(τ ) Initial values are given in the internal scale the to.theta and from.theta functions can be used to map between the external and internal scale. 90 / 140
- 213. Internal and external scale Hyperparameters, like the precision τ is represented internally using a “good” transformation, like θ1 = log(τ ) Initial values are given in the internal scale the to.theta and from.theta functions can be used to map between the external and internal scale. 90 / 140
- 214. Internal and external scale Hyperparameters, like the precision τ is represented internally using a “good” transformation, like θ1 = log(τ ) Initial values are given in the internal scale the to.theta and from.theta functions can be used to map between the external and internal scale. 90 / 140
- 215. Example: AR1 model hyper theta1 name short.name prior param initial ﬁxed to.theta from.theta log precision prec loggamma 1 5e-05 4 FALSE name short.name prior param initial ﬁxed to.theta from.theta logit lag one correlation rho normal 0 0.15 2 FALSE theta2 constr nrow.ncol augmented aug.factor aug.constr n.div.by n.required set.default.values pdf FALSE FALSE FALSE 1 FALSE FALSE ar1 91 / 140
- 216. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 92 / 140
- 217. Feature: replicate “replicate” generates iid replicates from the same model with the same hyperparameters. If x | θ ∼ AR(1), then nrep=3, makes x = (x1 , x2 , x3 ) with mutually independent xi ’s from AR(1) with the same θ Most f()-models can be replicated 93 / 140
- 218. Example: replicate n=100 x1 = arima.sim(n, model=list(ar=0.9)) + 1 x2 = arima.sim(n, model=list(ar=0.9)) - 1 y1 = rpois(n,exp(x1)) y2 = rpois(n,exp(x2)) y = c(y1,y2) i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1 result = inla(formula, family = "poisson", data = data.frame(y=y,i=i,r=r)) 94 / 140
- 219. Example: replicate i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1 i f1,1 0 η1,1 1 y1,1 . . . . . . . . . . . . i. . . 0 fn,1 1 yn,1 g ηn,1 i y1,2 −→ η1,2 = f1,2 + β0,1 ∗ 0 + β0,2 ∗ 1 . . . . . . . . . . . . . . . i 1 ηn,2 0 yn,2 fn,2 95 / 140
- 220. Feature: More than one family Every observation could have its own likelihood! Response is a matrix or list Each “column” deﬁnes a separate “family” Each “family” has its own hyperparameters 96 / 140
- 221. n=100 phi = 0.9 x1 = 1 + arima.sim(n, model=list(ar=phi)) x2 = 0.5 + arima.sim(n, model=list(ar=phi)) y1 = rbinom(n,size=1, prob=exp(x1)/(1+exp(x1))) y2 = rpois(n,exp(x2)) y = matrix(NA, 2*n, 2) y[ 1:n, 1] = y1 y[n+1:n, 2] = y2 i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) Ntrials = c(rep(1,n), rep(NA,n)) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1 result = inla(formula, family = c("binomial", "poisson"), Ntrials = Ntrials, data = data.frame(y,i,r)) 97 / 140
- 222. y = matrix(NA, 2*n, 2) y[ 1:n, 1] = y1 y[n+1:n, 2] = y2 i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) Ntrials = c(rep(1,n), rep(NA,n)) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1 result = inla(formula, family = c("binomial", "poisson"), Ntrials = Ntrials, data = data.frame(y,i,r)) y1,1 . . . yn,1 NA . . . NA i f1,1 NA η1,1 1 0 . . . . . . . . . . . . . . . i g ηn,1 0 1 NA fn,1 i −→ η1,2 = f1,2 + β0,1 ∗ 0 + β0,2 ∗ 1 y1,2 . . . . . . . . . . . . . . . i yn,2 ηn,2 0 1 fn,2 98 / 140
- 223. More than one family - More examples Some rather advanced examples on www.r-inla.org using this feature Preferential sampling, geostatistics (marked point process) Weibull-survival data and “longitudinal” data 99 / 140
- 224. Feature: copy The model formula = y ~ f(i, ...) + ... Only allow ONE element from each sub-model, to contribute to the linear predictor for each observation. Sometimes this is not suﬃcient. 100 / 140
- 225. Feature: copy Suppose ηi = ui + ui+1 + ... Then we can code this as formula = f(i, model="iid") + f(i.plus, copy="i") The copy-feature, creates an additional sub-model which is -close to the target. Many copies allowed Copy with unknown scaling (default scaling is ﬁxed to 1). η1 u1 u2 . . . . = . + . . . . ηn un un 101 / 140
- 226. Feature: copy Suppose that ηi = ai + bi zi + .... where iid (ai , bi ) ∼ N2 (0, Σ) - Simulate data n = 100 Sigma = matrix(c(1, 0.8, 0.8, 1), 2, 2) z = runif(n) ab = rmvnorm(n, sigma = Sigma) a = ab[, 1] b = ab[, 2] eta = a + b * z s = 0.1 y = eta + rnorm(n, sd=s) 102 / 140
- 227. i = 1:n j = 1:n + n formula = y ~ f(i, model="iid2d", n = 2*n) + f(j, z, copy="i") -1 r = inla(formula, data = data.frame(y, i, j)) η1 . . . ηn = a1 . . + . an b1 . . . b1 ∗ z1 . . . bn ∗ zn bn 103 / 140
- 228. Feature: Linear-combinations Possible to extract extra information from the model through linear combinations of the latent ﬁeld, say v = Bx for a k × n matrix B. 104 / 140
- 229. Feature: Linear-combinations (cont.) Two diﬀerent approaches. 1. Most “correct” is to do the computations on the enlarged ﬁeld x = (x, v) But this often lead to more dense precision matrix. 2. The second option is to compute these “oﬄine”, as (conditionally on θ) Var (v1 ) = Var (bT x) ≈ bT Q−1 1 1 GMRFapprox b1 and E (v1 ) = b1 E (x) Approximate density of v1 with a Normal. 105 / 140
- 230. Feature: Linear-combinations (cont.) Two diﬀerent approaches. 1. Most “correct” is to do the computations on the enlarged ﬁeld x = (x, v) But this often lead to more dense precision matrix. 2. The second option is to compute these “oﬄine”, as (conditionally on θ) Var (v1 ) = Var (bT x) ≈ bT Q−1 1 1 GMRFapprox b1 and E (v1 ) = b1 E (x) Approximate density of v1 with a Normal. 105 / 140
- 231. formula = y ~ ClBase4*CTrt + ClAge + CV4 + f(Ind, model="iid") + f(rand, model="iid") ## Now I want the posterior for ## ## 1) 2*CTrt - CV4 ## 2) Ind[2] - rand[2] ## lc1 = inla.make.lincomb( CTrt = 2, CV4 = -1) names(lc1) = "lc1" lc2 = inla.make.lincomb( Ind = c(NA,1), rand = c(NA,-1)) names(lc2) = "lc2" ## default is to derive the marginals from lc’s without changing the ## latent field result1 = inla(formula,family="poisson", data = Epil, lincomb = c(lc1, lc2)) ## but the lincombs can also be additionally included into the latent ## field for increased accurancy... result2 = inla(formula,family="poisson", data = Epil, lincomb = c(lc1, lc2), control.inla = list(lincomb.derived.only = FALSE)) 106 / 140
- 232. - Get the results result$summary.lincomb.derived result$marginals.lincomb.derived # results of the # default method result$summary.lincomb result$marginals.lincomb # alternative method - Posterior correlation matrix between all the linear combinations control.inla = list(lincomb.derived.correlation.matrix = TRUE) result$misc$lincomb.derived.correlation.matrix - Many linear combinations at once Use inla.make.lincombs() 107 / 140
- 233. A-matrix in the linear predictor (I) Usual formula η = ... and yi ∼ π(yi | ηi , ...) 108 / 140
- 234. A-matrix in the linear predictor (II) Extended formula η = ... η ∗ = Aη and yi ∼ π(yi | ηi∗ , ...) Implemented as A = matrix(...) A = sparseMatrix(...) result = inla(formula, ..., control.predictor = list(A = A)) 109 / 140
- 235. A-matrix in the linear predictor (II) Extended formula η = ... η ∗ = Aη and yi ∼ π(yi | ηi∗ , ...) Implemented as A = matrix(...) A = sparseMatrix(...) result = inla(formula, ..., control.predictor = list(A = A)) 109 / 140
- 236. A-matrix in the linear predictor (III) Can really simplify model-formulations Duplicate to some extent the “copy” feature Really useful for some models; the A-matrix need not to be a square matrix... 110 / 140
- 237. Feature: remote computing For large/huge models, its more convenient to run the computations on the remote (Linux/Mac) computational server inla(...., inla.call="remote") using ssh (and Cygwin on windows). 111 / 140
- 238. Control statements The control.xxx statements control various parts of the INLA program control.predictor A — The ”A matrix”or ”Observational Matrix”linking the latent ﬁeld to the data. control.mode x,theta, result — Gives modes to INLA. restart = TRUE — Tells INLA to try to improve on the supplied mode control.compute dic, mlik, cpo — Compute measures of ﬁt. control.inla strategy and int.strategy contain useful advanced features. Various other—see help! 112 / 140
- 239. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 113 / 140
- 240. Space-varying regression Number of (insurance-type) losses Nkt in 431 municipalities/regions of Norway in relation to one weather covariate Wkt . The likelihood is Nkt ∼ Poisson(Akt pkt ); k = 1, . . . , 431 t = 1, . . . , 10 The model for log pkt is: log pkt = β0 + βk Wkt where βk is the regression coeﬃcients for each municipality. 114 / 140
- 241. Borrow strength.. Few losses is in each region; high variability in the estimates. Borrow strength, by letting {β1 , . . . , β431 } to be smooth in space: {β1 , . . . , β431 } ∼ CAR(τβ ) 115 / 140
- 242. Borrow strength.. Few losses is in each region; high variability in the estimates. Borrow strength, by letting {β1 , . . . , β431 } to be smooth in space: {β1 , . . . , β431 } ∼ CAR(τβ ) 115 / 140
- 243. 1 2 The data set: y 0 0 region 1 1 W 0.4 0.4 10 11 12 0 1 0 1 2 2 0.4 0.2 0.2 20 0 2 0.2 116 / 140
- 244. Second argument in f() is the weight which defaults to 1 ηi = ... + wi fi + ... is represented as f(i, w, ...) No need for sum-to-zero constraint! norway = read.table("norway.dat", header=TRUE) formula = y ~ 1 + f(region, W, model="besag", graph.file="norway.graph", constr=FALSE) result = inla(formula, family="poisson", data=norway) 117 / 140
- 245. Survival models patient 1 2 3 time 8,16 23,13 22,18 event 1,1 1,0 1,1 age 28,28 48,48 32,32 sex 0 1 0 Times of infection from the time of insertion of catheter on 38 kidney patients using portable dialysis equipment. 2 observation for each patient (38 patients). Each time can be an event (infection) or a censoring (no infection) 118 / 140
- 246. The Kidney data The Kidney data frame time 8 16 23 13 22 28 event 1 1 1 0 1 1 age 28 28 48 48 32 32 sex 0 0 1 1 0 0 ID 1 1 2 2 3 3 119 / 140
- 247. data(Kidney) formula = inla.surv(time,event) ~ age + sex + f(ID,model="iid") result1 = inla(formula, family="coxph", data=Kidney) result2 = inla(formula, family="weibull", data=Kidney) result3 = inla(formula, family="exponential", data=Kidney) 120 / 140
- 248. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 121 / 140
- 249. A toy-example using copy State-space model yt = xt + vt xt = 2xt−1 − xt−2 + wt Rewrite this as yt = xt + vt 0 = xt − 2xt−1 + xt−2 + wt and implement this as two families 1. Observations yt with precision Prec(vt ) 2. Observations 0 with precision Prec(wt ), or Prec=HIGH. 122 / 140
- 250. A toy-example using copy State-space model yt = xt + vt xt = 2xt−1 − xt−2 + wt Rewrite this as yt = xt + vt 0 = xt − 2xt−1 + xt−2 + wt and implement this as two families 1. Observations yt with precision Prec(vt ) 2. Observations 0 with precision Prec(wt ), or Prec=HIGH. 122 / 140
- 251. n = 100 m = n-2 y = sin((1:n)*0.2) + rnorm(n, sd=0.1) formula = Y ~ f(i, model="iid", initial=-10, fixed=TRUE) + f(j, w, copy="i") + f(k, copy="i") + f(l, model ="iid") -1 Y = matrix(NA, n+m, 2) Y[1:n, 1] = y Y[1:m + n, 2] = 0 i = c(1:n, 3:n) # x_t j = c(rep(NA,n), 3:n -1) # x_t-1 w = c(rep(NA,n), rep(-2,m)) # weights for j k = c(rep(NA,n), 3:n -2) # x_t-2 l = c(rep(NA,n), 1:m) # v_t r = inla(formula, data = data.frame(i,j,w,k,l,Y), family = c("gaussian", "gaussian"), control.data = list(list(), list(initial=10, fixed=TRUE 123 / 140
- 252. −2 0 2 4 Stochastic Volatility model 0 200 400 600 800 1000 Log of the daily diﬀerence of the pound-dollar exchange rate from October 1st, 1981, to June 28th, 1985. 124 / 140
- 253. Stochastic Volatility model Simple model xt | x1 , . . . , xt−1 , τ, φ ∼ N (φxt−1 , 1/τ ) where |φ| < 1 to ensure a stationary process. Observations are taken to be yt | x1 , . . . , xt , µ ∼ N (0, exp(µ + xt )) 125 / 140
- 254. Results Using just the ﬁrst 50 data-points only, which makes the problem much harder. 126 / 140
- 255. 0.00 0.02 0.04 0.06 0.08 0.10 Results −10 −5 0 5 10 15 20 ν = logit(2φ − 1) 126 / 140
- 256. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Results 0 2 4 6 log(κx ) 126 / 140
- 257. −2 0 2 4 Using the full dataset 0 200 400 600 800 100 The Pound-Dollar data. 127 / 140
- 258. −1 −3 −2 x$V2 0 1 2 Using the full dataset 0 200 400 600 800 x$V1 Mean of xt + µ 128 / 140
- 259. 0.015 0.010 0.005 0.000 convert.dens(xx, yy, FUN = exp)$y 0.020 Using the full dataset 0 100 200 300 400 500 convert.dens(xx, yy, FUN = exp)$x The posterior marginal for the precision. 129 / 140
- 260. 30 20 10 0 convert.dens(xx, yy, FUN = phi.trans)$y 40 Using the full dataset 0.70 0.75 0.80 0.85 0.90 0.95 1.00 convert.dens(xx, yy, FUN = phi.trans)$x The posterior marginal for the lag-1 correlation. 130 / 140
- 261. −1 −3 −2 x$V2 0 1 2 Using the full dataset 0 200 400 600 800 1000 x$V1 Predictions for µ + xt+k 131 / 140
- 262. New data-model: Student-tν Now extend the model to use Student-tν distribution yt | x1 , . . . , xt ∼ exp(µ/2 + xt /2) × Student-tν / ν/(ν − 2) 132 / 140
- 263. 0.06 0.04 0.02 0.00 convert.dens(xx, yy, FUN = dof.trans)$y 0.08 Student-tν 0 20 40 60 80 100 convert.dens(xx, yy, FUN = dof.trans)$x Posterior marginal for ν. 133 / 140
- 264. −1 −3 −2 x$V2 0 1 2 Student-tν 0 200 400 600 800 1000 x$V1 Predictions 134 / 140
- 265. −1 −3 −2 x$V2 0 1 2 Student-tν 0 200 400 600 800 1000 x$V1 Comparing predictions with Student−tν and Gaussian 135 / 140
- 266. Student-tν However, No support for Student-tν in the data Bayes-factor Deviance Information Criteria 136 / 140
- 267. Disease mapping: The BYM-model Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = ui + vi Structured component u 0.98 0.71 Unstructured component v 0.44 0.17 −0.1 −0.37 Log-precisions log κu and log κv −0.63 A hard case: Insulin Dependent Diabetes Mellitus in 366 districts of Sardinia. Few counts. dim(θ) = 2. 137 / 140
- 268. Marginals for θ|y 138 / 140
- 269. Marginals for θ|y 138 / 140
- 270. Marginals for xi |y 139 / 140
- 271. THANK YOU 140 / 140

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment