Upcoming SlideShare
×

# Bayesian computation with INLA

1,616 views

Published on

Short-course about Bayesian computation with INLA given on the AS2013 conference in Ribno, Slovenia.

Published in: Education, Technology
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,616
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
96
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Bayesian computation with INLA

1. 1. Bayesian computation using INLA Thiago G. Martins Norwegian University of Science and Technology Trondheim, Norway AS 2013, Ribno, Slovenia September, 2013 1 / 140
2. 2. Parte I Latent Gaussian models and INLA methodology 2 / 140
3. 3. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 3 / 140
4. 4. Hierarchical Bayesian models Hierarchical models are an extremely useful tool in Bayesian model building. Three parts: Observations (y): Encodes information about observed data, including design and collection issues. The latent process (x): The unobserved process. May be the focus of the study, or may be included to reduce autocorrelation. E.g., encode spatial and/or temporal dependence. The Parameter model (θ): Models for all of the parameters in the observation and latent processes. 4 / 140
5. 5. Hierarchical Bayesian models Hierarchical models are an extremely useful tool in Bayesian model building. Three parts: Observations (y): Encodes information about observed data, including design and collection issues. The latent process (x): The unobserved process. May be the focus of the study, or may be included to reduce autocorrelation. E.g., encode spatial and/or temporal dependence. The Parameter model (θ): Models for all of the parameters in the observation and latent processes. 4 / 140
6. 6. Hierarchical Bayesian models Hierarchical models are an extremely useful tool in Bayesian model building. Three parts: Observations (y): Encodes information about observed data, including design and collection issues. The latent process (x): The unobserved process. May be the focus of the study, or may be included to reduce autocorrelation. E.g., encode spatial and/or temporal dependence. The Parameter model (θ): Models for all of the parameters in the observation and latent processes. 4 / 140
7. 7. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian ﬁeld x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
8. 8. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian ﬁeld x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
9. 9. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian ﬁeld x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
10. 10. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian ﬁeld x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
11. 11. Precision matrix The precision matrix of the latent ﬁeld Q(θ) = Σ(θ)−1 plays a key role! Two issues Building models through conditioning (“hierarchical models”) Computational beneﬁts 6 / 140
12. 12. Precision matrix The precision matrix of the latent ﬁeld Q(θ) = Σ(θ)−1 plays a key role! Two issues Building models through conditioning (“hierarchical models”) Computational beneﬁts 6 / 140
13. 13. Building models through conditioning If x ∼ N (0, Q−1 ) x y|x ∼ N (x, Q−1 ) y then Q(x,y) = Qx + Qy −Qy −Qy Qy Not so nice expressions using the Covariance-matrix 7 / 140
14. 14. Computational beneﬁts Precision matrices encodes conditional independence: xi ⊥ xj |x−ij ⇐⇒ Qij = 0 We are interested in models with sparse precision matrices. x ∼ N (·, Σ(θ)) with sparse Q(θ) = Σ(θ)−1 Gaussians with a sparse precision matrix are called Gaussian Markov random ﬁelds (GMRFs) Good computational properties through numerical algorithms for sparse matrices 8 / 140
15. 15. Computational beneﬁts Precision matrices encodes conditional independence: xi ⊥ xj |x−ij ⇐⇒ Qij = 0 We are interested in models with sparse precision matrices. x ∼ N (·, Σ(θ)) with sparse Q(θ) = Σ(θ)−1 Gaussians with a sparse precision matrix are called Gaussian Markov random ﬁelds (GMRFs) Good computational properties through numerical algorithms for sparse matrices 8 / 140
16. 16. Computational beneﬁts Precision matrices encodes conditional independence: xi ⊥ xj |x−ij ⇐⇒ Qij = 0 We are interested in models with sparse precision matrices. x ∼ N (·, Σ(θ)) with sparse Q(θ) = Σ(θ)−1 Gaussians with a sparse precision matrix are called Gaussian Markov random ﬁelds (GMRFs) Good computational properties through numerical algorithms for sparse matrices 8 / 140
17. 17. Numerical algorithms for sparse matrices: scaling properties Time: O(n) Space: O(n3/2 ) Space-time: O(n2 ) This is to be compared with general O(n3 ) algorithms for dense matrices. 9 / 140
18. 18. Numerical algorithms for sparse matrices: scaling properties Time: O(n) Space: O(n3/2 ) Space-time: O(n2 ) This is to be compared with general O(n3 ) algorithms for dense matrices. 9 / 140
19. 19. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 10 / 140
20. 20. Example (I): Mixed-eﬀect model yij |ηij , θ 1 ∼ π(yij |ηij , θ 1 ), i = 1, . . . , N, j = 1, . . . , M ηij = µ + cij β + ui + vj + wij where u, v and w are “random eﬀects”. If we assign Gaussian priors on µ, β, u and v, then x|θ 2 = (µ, β, u, v, η)|θ 2 is jointly Gaussian. θ = (θ 1 , θ 2 ) 11 / 140
21. 21. Example (I): Mixed-eﬀect model yij |ηij , θ 1 ∼ π(yij |ηij , θ 1 ), i = 1, . . . , N, j = 1, . . . , M ηij = µ + cij β + ui + vj + wij where u, v and w are “random eﬀects”. If we assign Gaussian priors on µ, β, u and v, then x|θ 2 = (µ, β, u, v, η)|θ 2 is jointly Gaussian. θ = (θ 1 , θ 2 ) 11 / 140
22. 22. Example (I) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 12 / 140
23. 23. Example (I) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 12 / 140
24. 24. Example (I) - cont. 0.0 0.2 0.4 0.6 0.8 1.0 Precision matrix (η, u, v, µ, β) N = 100, M = 5. 0.0 0.2 0.4 0.6 0.8 1.0 13 / 140
25. 25. Example (II): Time-series model Smoothing of binary time-series Data is sequence of 0 and 1s Probability for a 1 at time t, pt , depends on time pt = exp(ηt ) 1 + exp(ηt ) Linear predictor ηt = µ + βct + ut + vt , t = 1, . . . , n 14 / 140
26. 26. Example (II): Time-series model Smoothing of binary time-series Data is sequence of 0 and 1s Probability for a 1 at time t, pt , depends on time pt = exp(ηt ) 1 + exp(ηt ) Linear predictor ηt = µ + βct + ut + vt , t = 1, . . . , n 14 / 140
27. 27. Example (II): Time-series model Smoothing of binary time-series Data is sequence of 0 and 1s Probability for a 1 at time t, pt , depends on time pt = exp(ηt ) 1 + exp(ηt ) Linear predictor ηt = µ + βct + ut + vt , t = 1, . . . , n 14 / 140
28. 28. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
29. 29. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
30. 30. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
31. 31. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
32. 32. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
33. 33. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random eﬀect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
34. 34. Example (II) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 16 / 140
35. 35. Example (II) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 16 / 140
36. 36. Example (II) - cont. 0.0 0.2 0.4 0.6 0.8 1.0 Precision matrix (η, u, v, µ, β), n = 100. 0.0 0.2 0.4 0.6 0.8 1.0 17 / 140
37. 37. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth eﬀect of a covariate c −0.63 18 / 140
38. 38. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth eﬀect of a covariate c −0.63 18 / 140
39. 39. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth eﬀect of a covariate c −0.63 18 / 140
40. 40. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth eﬀect of a covariate c −0.63 18 / 140
41. 41. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth eﬀect of a covariate c −0.63 18 / 140
42. 42. Yet Another Example (III) We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 19 / 140
43. 43. Example (III) - cont. 0.0 0.2 0.4 0.6 0.8 1.0 Precision matrix (η, u, v, µ, f) 0.0 0.2 0.4 0.6 0.8 1.0 20 / 140
44. 44. What we have learned so far The latent Gaussian model construct θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i occurs in many, seemingly unrelated, statistical models. GLM/GAM/GLMM/GAMM/++ 21 / 140
45. 45. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
46. 46. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
47. 47. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
48. 48. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
49. 49. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
50. 50. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
51. 51. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
52. 52. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
53. 53. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
54. 54. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
55. 55. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
56. 56. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
57. 57. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
58. 58. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 23 / 140
59. 59. Bayesian computing We are interested in the posterior marginal quantities like π(xi |y) and π(θi |y). This requires the evaluation of integrals of the form π(xi |y) ∝ π(y |x, θ)π(x|θ)π(θ) dθ dx{−i} x{−i} θ The computation of massively high dimensional integrals is at the core of Bayesian computing. 24 / 140
60. 60. Bayesian computing We are interested in the posterior marginal quantities like π(xi |y) and π(θi |y). This requires the evaluation of integrals of the form π(xi |y) ∝ π(y |x, θ)π(x|θ)π(θ) dθ dx{−i} x{−i} θ The computation of massively high dimensional integrals is at the core of Bayesian computing. 24 / 140
61. 61. Bayesian computing We are interested in the posterior marginal quantities like π(xi |y) and π(θi |y). This requires the evaluation of integrals of the form π(xi |y) ∝ π(y |x, θ)π(x|θ)π(θ) dθ dx{−i} x{−i} θ The computation of massively high dimensional integrals is at the core of Bayesian computing. 24 / 140
62. 62. But surely we can already do this Markov Chain Monte Carlo (MCMC) is widely used by the applied community. There are generic tools available for MCMC, OpenBUGS, JAGS, STAN and others for speciﬁc models, like BayesX. The issue of Bayesian computing is not “solved” even though MCMC is available Hierarchical models are more diﬃcult for MCMC Strong dependencies, bad mixing. A main obstacle for Bayesian modeling is still the issue of “Bayesian computing” 25 / 140
63. 63. But surely we can already do this Markov Chain Monte Carlo (MCMC) is widely used by the applied community. There are generic tools available for MCMC, OpenBUGS, JAGS, STAN and others for speciﬁc models, like BayesX. The issue of Bayesian computing is not “solved” even though MCMC is available Hierarchical models are more diﬃcult for MCMC Strong dependencies, bad mixing. A main obstacle for Bayesian modeling is still the issue of “Bayesian computing” 25 / 140
64. 64. But surely we can already do this Markov Chain Monte Carlo (MCMC) is widely used by the applied community. There are generic tools available for MCMC, OpenBUGS, JAGS, STAN and others for speciﬁc models, like BayesX. The issue of Bayesian computing is not “solved” even though MCMC is available Hierarchical models are more diﬃcult for MCMC Strong dependencies, bad mixing. A main obstacle for Bayesian modeling is still the issue of “Bayesian computing” 25 / 140
65. 65. So what’s wrong with MCMC? This is actually a problem with any Monte-Carlo scheme. Error in expectations The Monte-Carlo error is Var E(f (X )) − 1 N N f (xi ) i=1 =O 1 √ N In practical terms, to reduce the variance to O(10−p ) you need O(102p ) samples! This can be optimistic! 26 / 140
66. 66. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
67. 67. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
68. 68. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
69. 69. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
70. 70. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
71. 71. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a speciﬁc class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
72. 72. Comparing results with MCMC When comparing the results of R-INLA with MCMC, it is important to use the same model. Here we have compared the EPIL example results with those obtained using JAGS via the rjags package 28 / 140
73. 73. Comparing results with MCMC When comparing the results of R-INLA with MCMC, it is important to use the same model. Here we have compared the EPIL example results with those obtained using JAGS via the rjags package 28 / 140
74. 74. Age 1.0 Density 0.5 3 0 0.0 1 2 Density 4 5 1.5 Intercept, 0.125 minutes 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 log(tau.b) 29 / 140
75. 75. Age 0.8 Density 0.4 3 0 0.0 1 2 Density 4 5 1.2 6 Intercept, 0.25 minutes 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
76. 76. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 0.5 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
77. 77. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 1 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
78. 78. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 2 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −1.0 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 1.0 Density 0.0 0.5 0.5 0.0 Density 1.0 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
79. 79. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 4 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −1.0 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 1.0 Density 0.0 0.5 0.5 0.0 Density 1.0 1.5 1.5 a0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
80. 80. Density 4 3 Density 2 1 0 1.3 1.4 1.5 1.6 1.7 1.8 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Age 5 Intercept, 8 minutes 1.9 −1.0 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 1.0 Density 0.0 0.5 0.5 0.0 Density 1.0 1.5 1.5 a0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
81. 81. Density 4 3 Density 2 1 0 1.3 1.4 1.5 1.6 1.7 1.8 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 16 minutes 1.9 −1.5 −0.5 0.5 1.0 1.5 2.0 alpha.Age log(tau.Ind) log(tau.Rand) 1.0 Density 0.5 0.8 0.0 0.4 0.0 Density 1.2 1.5 a0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
82. 82. Density 4 3 Density 2 1 0 1.3 1.4 1.5 1.6 1.7 1.8 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 32 minutes 1.9 −1.5 −0.5 0.5 1.0 1.5 2.0 alpha.Age log(tau.Ind) log(tau.Rand) 1.0 Density 0.5 0.8 0.0 0.4 0.0 Density 1.2 1.5 a0 0.0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 log(tau.b) 29 / 140
83. 83. Density 4 3 Density 2 1 0 1.2 1.4 1.6 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 64 minutes 1.8 −1 0 1 log(tau.Ind) log(tau.Rand) 0.5 1.0 Density 1.2 0.8 0.0 0.4 0.0 Density 2 alpha.Age 1.5 a0 0.0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 log(tau.b) 29 / 140
84. 84. Density 4 3 Density 2 1 0 1.2 1.4 1.6 1.8 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 120 minutes 2.0 −1 0 1 log(tau.Ind) log(tau.Rand) 0.5 1.0 Density 1.2 0.8 0.0 0.4 0.0 Density 2 alpha.Age 1.5 a0 0.0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 log(tau.b) 29 / 140
85. 85. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 30 / 140
86. 86. Main aim Posterior π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I Compute the posterior marginals: π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 31 / 140
87. 87. Main aim Posterior π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I Compute the posterior marginals: π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 31 / 140
88. 88. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
89. 89. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
90. 90. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
91. 91. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
92. 92. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
93. 93. Task 1: π(θ|y) The Laplace approximation for π(θ|y) is π(θ|y) = ∝ ≈ π(x, θ|y) π(x|θ, y) π(θ) π(x|θ) π(y|x) π(x|θ, y) π(θ) π(x|θ) π(y|x, θ) πG (x|θ, y) x=x∗ (θ) where πG (x|θ, y) is the Gaussian approximation of π(x|θ, y) and x∗ (θ) is the mode. 33 / 140
94. 94. The GMRF-approximation 1 π(x|y) ∝ exp − xT Qx + 2 log π(yi |xi ) i 1 ≈ exp − (x − µ)T (Q + diag(ci ))(x − µ) 2 = π(x|y) Constructed as follows: Locate the mode x∗ Expand to second order Markov and computational properties are preserved 34 / 140
95. 95. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
96. 96. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
97. 97. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
98. 98. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
99. 99. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
100. 100. Task 2: π(xi |y, θ) This task is more challenging, since dimension of x, n is large and there are potential n marginals to compute, or at least O(n). Here we present three options: 1. Gaussian approximation 2. Laplace approximation 3. Simpliﬁed Laplace approximation There is a trade-oﬀ between accuracy and complexity. 36 / 140
101. 101. Task 2: π(xi |y, θ) This task is more challenging, since dimension of x, n is large and there are potential n marginals to compute, or at least O(n). Here we present three options: 1. Gaussian approximation 2. Laplace approximation 3. Simpliﬁed Laplace approximation There is a trade-oﬀ between accuracy and complexity. 36 / 140
102. 102. π(xi |y, θ) - 1. Gaussian approximation An obvious simple and fast alternative, is to use the GMRF-approximation πG (x|y, θ) π(xi |θ, y) = N (xi ; µ(θ), σ 2 (θ)) It is the fastest option, only need to compute the diagonal of Q(θ)−1 . Can present errors in location and asymmetry. 37 / 140
103. 103. π(xi |y, θ) - 1. Gaussian approximation An obvious simple and fast alternative, is to use the GMRF-approximation πG (x|y, θ) π(xi |θ, y) = N (xi ; µ(θ), σ 2 (θ)) It is the fastest option, only need to compute the diagonal of Q(θ)−1 . Can present errors in location and asymmetry. 37 / 140
104. 104. π(xi |y, θ) - 2. Laplace approximation The Laplace approximation: π(xi | y, θ) ≈ π(x, θ|y) πGG (x−i |xi , y, θ) x−i =x∗ (xi ,θ) −i Again, approximation is very good, as x−i |xi , θ is ‘almost Gaussian’, but it is expensive. In order to get the n marginals: perform n optimizations, and n factorizations of n − 1 × n − 1 matrices. 38 / 140
105. 105. π(xi |y, θ) - 2. Laplace approximation The Laplace approximation: π(xi | y, θ) ≈ π(x, θ|y) πGG (x−i |xi , y, θ) x−i =x∗ (xi ,θ) −i Again, approximation is very good, as x−i |xi , θ is ‘almost Gaussian’, but it is expensive. In order to get the n marginals: perform n optimizations, and n factorizations of n − 1 × n − 1 matrices. 38 / 140
106. 106. π(xi |y, θ) - 2. Laplace approximation The Laplace approximation: π(xi | y, θ) ≈ π(x, θ|y) πGG (x−i |xi , y, θ) x−i =x∗ (xi ,θ) −i Again, approximation is very good, as x−i |xi , θ is ‘almost Gaussian’, but it is expensive. In order to get the n marginals: perform n optimizations, and n factorizations of n − 1 × n − 1 matrices. 38 / 140
107. 107. π(xi |y, θ) - 3. Simpliﬁed Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) suﬃciently accurate for most applications 39 / 140
108. 108. π(xi |y, θ) - 3. Simpliﬁed Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) suﬃciently accurate for most applications 39 / 140
109. 109. π(xi |y, θ) - 3. Simpliﬁed Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) suﬃciently accurate for most applications 39 / 140
110. 110. π(xi |y, θ) - 3. Simpliﬁed Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) suﬃciently accurate for most applications 39 / 140
111. 111. Task 3: Numerical integration wrt θ Now that we know how to compute: π(θ|y) - Laplace approximation 1. Gaussian π(xi |θ, y) - 2. Laplace 3. Simpliﬁed Laplace Lets see how INLA works 40 / 140
112. 112. Task 3: Numerical integration wrt θ Now that we know how to compute: π(θ|y) - Laplace approximation 1. Gaussian π(xi |θ, y) - 2. Laplace 3. Simpliﬁed Laplace Lets see how INLA works 40 / 140
113. 113. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
114. 114. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
115. 115. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
116. 116. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
117. 117. The integrated nested Laplace approximation (INLA) II Step II For each θ j For each i, evaluate the Laplace approximation for selected values of xi Build a Skew-Normal or log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent the conditional marginal density. 42 / 140
118. 118. The integrated nested Laplace approximation (INLA) II Step II For each θ j For each i, evaluate the Laplace approximation for selected values of xi Build a Skew-Normal or log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent the conditional marginal density. 42 / 140
119. 119. The integrated nested Laplace approximation (INLA) II Step II For each θ j For each i, evaluate the Laplace approximation for selected values of xi Build a Skew-Normal or log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent the conditional marginal density. 42 / 140
120. 120. The integrated nested Laplace approximation (INLA) III Step III Sum out θ j For each i, sum out θ π(xi | y) ∝ π(xi | y, θ j ) × π(θ j | y) j Build a log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent π(xi | y). 43 / 140
121. 121. The integrated nested Laplace approximation (INLA) III Step III Sum out θ j For each i, sum out θ π(xi | y) ∝ π(xi | y, θ j ) × π(θ j | y) j Build a log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent π(xi | y). 43 / 140
122. 122. The integrated nested Laplace approximation (INLA) III Step III Sum out θ j For each i, sum out θ π(xi | y) ∝ π(xi | y, θ j ) × π(θ j | y) j Build a log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent π(xi | y). 43 / 140
123. 123. Computing posterior marginals for θj (I) Main idea Use the integration-points and build an interpolant Use numerical integration on that interpolant 44 / 140
124. 124. Computing posterior marginals for θj (I) Main idea Use the integration-points and build an interpolant Use numerical integration on that interpolant 44 / 140
125. 125. How can we assess the error in the approximations? Tool 1: Compare a sequence of improved approximations 1. Gaussian approximation 2. Simpliﬁed Laplace 3. Laplace 45 / 140
126. 126. How can we assess the error in the approximations? Tool 2: Estimate the “eﬀective” number of parameters as deﬁned in the Deviance Information Criteria: pD (θ) = D(x; θ) − D(x; θ) and compare this with the number of observations. Low ratio is good. This criteria has theoretical justiﬁcation. 46 / 140
127. 127. Parte II R-INLA package 47 / 140
128. 128. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 48 / 140
129. 129. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
130. 130. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library Basic library written in C for fast computations for GMRFs. The inla-program The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
131. 131. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program Deﬁne latent Gaussian models and interface with the GMRFLib-library Models are deﬁned using .ini-ﬁles inla-program write all the results (E/Var/marginals) to ﬁles The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
132. 132. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program The INLA package for R R-interface to the inla-program. (That’s why its not on CRAN.) Convert “formula”-statements into “.ini”-ﬁle deﬁnitions Run inla-program Get results back to R Happily, the R package is all we need to learn!!! 49 / 140
133. 133. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
134. 134. The INLA package for R Produces: 1. Data Frame formula − Input files − ini file Input Runs the INLA package 2. inla program Output A R object of type list can get summary, plots etc. 3. Collects results 50 / 140
135. 135. R-INLA Visit the www-site www.r-inla.org and follow the instructions. www-site contains source-code, examples, reports +++ The ﬁrst time do > source("http://www.math.ntnu.no/inla/givmeINLA.R") Later, you can upgrade the package doing > inla.upgrade() or if you want the test-version, which you want, > inla.upgrade(testing=TRUE) Available for Linux, Windows and Mac 51 / 140
136. 136. R-INLA Visit the www-site www.r-inla.org and follow the instructions. www-site contains source-code, examples, reports +++ The ﬁrst time do > source("http://www.math.ntnu.no/inla/givmeINLA.R") Later, you can upgrade the package doing > inla.upgrade() or if you want the test-version, which you want, > inla.upgrade(testing=TRUE) Available for Linux, Windows and Mac 51 / 140
137. 137. R-INLA Visit the www-site www.r-inla.org and follow the instructions. www-site contains source-code, examples, reports +++ The ﬁrst time do > source("http://www.math.ntnu.no/inla/givmeINLA.R") Later, you can upgrade the package doing > inla.upgrade() or if you want the test-version, which you want, > inla.upgrade(testing=TRUE) Available for Linux, Windows and Mac 51 / 140
138. 138. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 52 / 140
139. 139. The structure of an R program using INLA There are essentially three parts to an INLA program: 1. The data organization. 2. The formula - notation inherited from R’s native glm function. 3. The call to the INLA program. 53 / 140
140. 140. The inla function This is all that’s needed for a basic call > result <- inla( formula = y ~ 1 + x, # This describes your latent # field family = "gaussian", # The likelihood distribution. data = data.frame(y,x) # A list or dataframe ) 54 / 140
141. 141. The simplest case: Linear regression n = 100 x = sort(runif(n)) y = 1 + x + rnorm(n, sd = 0.1) plot(x,y) formula = y ~ 1 + x result = inla(formula, data = data.frame(x,y), family = "gaussian") summary(result) plot(result) 55 / 140
142. 142. Call: c("inla(formula = formula, family = "gaussian", data = data.frame(x, ", Time used: Pre-processing 0.08050394 Running inla Post-processing 0.03020334 0.01916695 " y))") Total 0.12987423 Fixed effects: mean sd 0.025quant 0.5quant 0.975quant kld (Intercept) 0.9690533 0.01849785 0.9327319 0.9690531 1.005387 0 x 1.0426582 0.03126996 0.9812582 1.0426580 1.104079 0 The model has no random effects Model hyperparameters: mean sd 0.025quant 0.5quant Precision for the Gaussian observations 127.45 18.10 95.14 126.37 0.975quant Precision for the Gaussian observations 166.11 Expected number of effective parameters(std dev): 2.209(0.02362) Number of equivalent replicates : 45.27 Marginal Likelihood: 88.01 56 / 140
143. 143. Likelihood functions - family argument result = inla(formula, data = data.frame(x,y), family = "gaussian") “binomial” “coxph” “Exponential” “gaussian” “gev” “laplace” “sn”(Skew Normal) “stochvol”, ”stochvol.nig”, ”stochvol.t” “T” “weibull” Many others: go to http://r-inla.org/ 57 / 140
144. 144. Likelihood functions - family argument result = inla(formula, data = data.frame(x,y), family = "gaussian") “binomial” “coxph” “Exponential” “gaussian” “gev” “laplace” “sn”(Skew Normal) “stochvol”, ”stochvol.nig”, ”stochvol.t” “T” “weibull” Many others: go to http://r-inla.org/ 57 / 140
145. 145. A more general model Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) where x1 , x2 βi x3 are covariates, linear eﬀect −1 ∼ N (0, τ1 ) can be the index for spatial eﬀect, random eﬀect, etc {f1 , f2 , . . . } ∼ N (0, Qf−1 (τ2 )) 58 / 140
146. 146. A more general model Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) where x1 , x2 βi x3 are covariates, linear eﬀect −1 ∼ N (0, τ1 ) can be the index for spatial eﬀect, random eﬀect, etc {f1 , f2 , . . . } ∼ N (0, Qf−1 (τ2 )) 58 / 140
147. 147. A more general model (cont.) Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) > formula = y ∼ x1 + x2 + f(x3, ...)   y1 y2    y=. . .   η1 η2    η=. . . g −→ yn ηn         η1 1 x11 x21 η2  1 x12  x22          η =  .  = β0 ∗  .  + β1 ∗  .  + β2 ∗  .  + . . .  . .  .  .  . ηn 1 x1n x2n   fx31 fx3   2  .   .  . fx3n 59 / 140
148. 148. A more general model (cont.) Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) > formula = y ∼ x1 + x2 + f(x3, ...)   y1 y2    y=. . .   η1 η2    η=. . . g −→ yn ηn         η1 1 x11 x21 η2  1 x12  x22          η =  .  = β0 ∗  .  + β1 ∗  .  + β2 ∗  .  + . . .  . .  .  .  . ηn 1 x1n x2n   fx31 fx3   2  .   .  . fx3n 59 / 140
149. 149. A more general model (cont.) Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) > formula = y ∼ x1 + x2 + f(x3, ...)   y1 y2    y=. . .   η1 η2    η=. . . g −→ yn ηn         η1 1 x11 x21 η2  1 x12  x22          η =  .  = β0 ∗  .  + β1 ∗  .  + β2 ∗  .  + . . .  . .  .  .  . ηn 1 x1n x2n   fx31 fx3   2  .   .  . fx3n 59 / 140
150. 150. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
151. 151. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
152. 152. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
153. 153. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
154. 154. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
155. 155. Model speciﬁcation - INLA package The model is speciﬁed in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is ﬁtted automatically! Use -1 in your formula to avoid it. The ﬁxed eﬀects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random eﬀect speciﬁcations. Some models iid, iid1d, ii2d, iid3d: random eﬀects rw1, rw2, ar1: smooth eﬀect of covariates or time eﬀect seasonal: seasonal eﬀect besag: spatial eﬀect (CAR model) generic: user deﬁned precision matrix 60 / 140
156. 156. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
157. 157. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
158. 158. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
159. 159. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
160. 160. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
161. 161. Specifying random eﬀects Random eﬀects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random eﬀect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
162. 162. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 62 / 140
163. 163. EPIL example Seizure counts in a randomized trial of anti-convulsant therapy in epilepsy. From WinBUGS manual. Patient 1 2 .... 59 y1 5 3 y2 3 5 y3 3 3 y4 3 3 Trt 0 0 Base 11 11 Age 31 30 1 4 3 2 1 12 37 63 / 140
164. 164. EPIL example (cont.) Mixed model with repeated Poisson counts yjk ∼ Poisson(µjk ); j = 1, . . . , 59; k = 1, . . . , 4 log (µjk ) = α0 + α1 log(Basej /4) + α2 Trtj +α3 Trtj log(Basej /4) + α4 Agej + α5 V 4 +Indj + βjk αi Indj βj k ∼ N (0, τα ) ∼ N (0, τInd ) ∼ N (0, τβ ) τα known τInd ∼ Gamma(a1 , b1 ) τβ ∼ Gamma(a2 , b2 ) 64 / 140
165. 165. EPIL example (cont.) The Epil data frame: y 5 3 . . . Trt 0 0 Base 11 11 Age 31 31 V4 0 0 rand 1 2 Ind 1 1 Specifying the model: formula = y ∼ log(Base/4) + Trt + I(Trt * log(Base/4)) + log(Age) + V4 + f(Ind, model = "iid") + f(rand, model="iid")    1 1       η=  = β0 ∗  .  + . . . +   . . 1 η4∗59 η1 η2 . . .   Ind  f1 f Ind  1   . +  .  . Ind f59  Rand  f1 f Rand   2  .   .  . Ind f4∗59 65 / 140
166. 166. EPIL example (cont.) The Epil data frame: y 5 3 . . . Trt 0 0 Base 11 11 Age 31 31 V4 0 0 rand 1 2 Ind 1 1 Specifying the model: formula = y ∼ log(Base/4) + Trt + I(Trt * log(Base/4)) + log(Age) + V4 + f(Ind, model = "iid") + f(rand, model="iid")    1 1       η=  = β0 ∗  .  + . . . +   . . 1 η4∗59 η1 η2 . . .   Ind  f1 f Ind  1   . +  .  . Ind f59  Rand  f1 f Rand   2  .   .  . Ind f4∗59 65 / 140
167. 167. EPIL example (cont.) The Epil data frame: y 5 3 . . . Trt 0 0 Base 11 11 Age 31 31 V4 0 0 rand 1 2 Ind 1 1 Specifying the model: formula = y ∼ log(Base/4) + Trt + I(Trt * log(Base/4)) + log(Age) + V4 + f(Ind, model = "iid") + f(rand, model="iid")    1 1       η=  = β0 ∗  .  + . . . +   . . 1 η4∗59 η1 η2 . . .   Ind  f1 f Ind  1   . +  .  . Ind f59  Rand  f1 f Rand   2  .   .  . Ind f4∗59 65 / 140
168. 168. data(Epil) my.center = function(x) (x - mean(x)) Epil\$CTrt Epil\$ClBase4 Epil\$CV4 Epil\$ClAge = = = = my.center(Epil\$Trt) my.center(log(Epil\$Base/4)) my.center(Epil\$V4) my.center(log(Epil\$Age)) formula = y ~ ClBase4*CTrt + ClAge + CV4 + f(Ind, model="iid") + f(rand, model="iid") result = inla(formula,family="poisson", data = Epil) summary(result) plot(result) 66 / 140
169. 169. 0 1 2 3 4 5 Epil-example from Win/Open-BUGS 1.2 1.4 1.6 1.8 2.0 Marginals for α0 67 / 140
170. 170. 0.0 0.1 0.2 0.3 Epil-example from Win/Open-BUGS 0 5 10 15 Marginals for τβ 67 / 140
171. 171. EPIL example (cont.) Access results - Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld) result\$summary.fixed result\$summary.random\$Ind result\$summary.random\$rand result\$summary.hyperpar - Post. marginals (matrix with x- and y- axis) result\$marginals.fixed result\$marginals.random\$Ind result\$marginals.random\$rand result\$marginals.hyperpar 68 / 140
172. 172. EPIL example (cont.) Access results - Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld) result\$summary.fixed result\$summary.random\$Ind result\$summary.random\$rand result\$summary.hyperpar - Post. marginals (matrix with x- and y- axis) result\$marginals.fixed result\$marginals.random\$Ind result\$marginals.random\$rand result\$marginals.hyperpar 68 / 140
173. 173. 0.0 0.5 1.0 1.5 2.0 Smoothing binary times series 0 100 200 300 Time Number of days in Tokyo with rainfall above 1 mm in 1983-84. We want to estimate the probability of rain pt for calendar day t = 1, . . . , 366 69 / 140
174. 174. Smoothing binary times series Model with time series component yt pt ηt f τ ∼ = = = ∼ Binomial(nt , pt ); t = 1, . . . , 366 exp(ηt ) 1+exp(ηt ) f (t) {f1 , . . . , f366 } ∼ cyclic RW2(τ ) Gamma(1, 0.0001) 70 / 140
175. 175. Smoothing binary time series The Tokyo data frame: y 0 0 1 . . . n 2 2 2 time 1 2 3 71 / 140
176. 176. Smoothing binary time series The Tokyo data frame: y 0 0 1 . . . n 2 2 2 time 1 2 3 Specifying the model: formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1 71 / 140
177. 177. Smoothing binary time series The Tokyo data frame: y 0 0 1 . . . n 2 2 2 time 1 2 3 Specifying the model: formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1  time  f1  f time     2  η=  =  .    .   . time f366 η366  η1 η2 . . .  71 / 140
178. 178. data(Tokyo) formula = y ~ f(time, model="rw2", cyclic=TRUE) - 1 result = inla(formula, family="binomial", Ntrials=n, data=Tokyo) 72 / 140
179. 179. Posterior for temporal eﬀect -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 time 0 100 200 300 PostMean 0.025% 0.5% 0.975% 73 / 140
180. 180. Posterior for precision 0e+00 1e-05 2e-05 3e-05 4e-05 5e-05 6e-05 7e-05 PostDens [Precision for time] 0 10000 20000 30000 40000 50000 60000 74 / 140
181. 181. Disease mapping in Germany Larynx cancer mortality counts are observed in the 544 district of Germany from 1986 to 1990 and level of smoking consumption (100 possible values). 2.55 97 2.23 85.2 1.91 73.41 1.59 61.61 1.27 49.82 0.95 38.02 0.63 26.22 75 / 140
182. 182. yi , i = 1, . . . , 544 counts of cancer mortality in Region i Ei , i = 1, . . . , 544 known variable accounting for demographic variation in Region i ci , i = 1, . . . , 544 level of smoking consumption registered in Region i 2.55 97 2.23 85.2 1.91 73.41 1.59 61.61 1.27 49.82 0.95 38.02 0.63 26.22 76 / 140
183. 183. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth eﬀect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial eﬀect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random eﬀect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
184. 184. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth eﬀect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial eﬀect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random eﬀect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
185. 185. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth eﬀect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial eﬀect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random eﬀect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
186. 186. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth eﬀect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial eﬀect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random eﬀect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
187. 187. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth eﬀect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial eﬀect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random eﬀect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
188. 188. For identiﬁably we deﬁne a sum-to-zero constraint for all intrinsic models, so s fs (s) = 0 = 0 i fi 78 / 140
189. 189. The Germany data frame: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 The model is: ηi = µ + f (ci ) + fs (si ) + fu (si ) The data set has to contain one separate column for each term speciﬁed through f() so in this case we have to add one column. > Germany = cbind(Germany, region.struct=Germany\$region) We also need the graph ﬁle where the neighborhood structure is speciﬁed germany.graph 79 / 140
190. 190. The Germany data frame: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 The model is: ηi = µ + f (ci ) + fs (si ) + fu (si ) The data set has to contain one separate column for each term speciﬁed through f() so in this case we have to add one column. > Germany = cbind(Germany, region.struct=Germany\$region) We also need the graph ﬁle where the neighborhood structure is speciﬁed germany.graph 79 / 140
191. 191. The Germany data frame: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 The model is: ηi = µ + f (ci ) + fs (si ) + fu (si ) The data set has to contain one separate column for each term speciﬁed through f() so in this case we have to add one column. > Germany = cbind(Germany, region.struct=Germany\$region) We also need the graph ﬁle where the neighborhood structure is speciﬁed germany.graph 79 / 140
192. 192. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) 80 / 140
193. 193. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) The sum-to-zero constraint is default in the inla function for all intrinsic models. 80 / 140
194. 194. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) The sum-to-zero constraint is default in the inla function for all intrinsic models. 80 / 140
195. 195. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) 80 / 140
196. 196. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) The location of the graph ﬁle has to be provided here (the graph ﬁle cannot be loaded in R) 80 / 140
197. 197. The graph ﬁle 544 1 The germany.graph ﬁle: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identiﬁer for the node Number of neighbors Identiﬁers for the neighbors 81 / 140
198. 198. The graph ﬁle 544 1 The germany.graph ﬁle: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identiﬁer for the node Number of neighbors Identiﬁers for the neighbors 81 / 140
199. 199. The graph ﬁle 544 1 The germany.graph ﬁle: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identiﬁer for the node Number of neighbors Identiﬁers for the neighbors 81 / 140
200. 200. The graph ﬁle 544 1 The germany.graph ﬁle: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identiﬁer for the node Number of neighbors Identiﬁers for the neighbors 81 / 140
201. 201. The graph ﬁle 544 1 The germany.graph ﬁle: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identiﬁer for the node Number of neighbors Identiﬁers for the neighbors 81 / 140
202. 202. data(Germany) g = system.file("demodata/germany.graph", package="INLA") source(system.file("demodata/Bym-map.R", package="INLA")) Germany = cbind(Germany, region.struct=Germany\$region) # standard BYM model formula1 = Y ~ f(region.struct,model="besag",graph=g) + f(region,model="iid") # with linear covariate formula2 = Y ~ f(region.struct,model="besag",graph=g) + f(region,model="iid") + x # with smooth covariate formula3 = Y ~ f(region.struct,model="besag",graph=g) + f(region,model="iid") + f(x, model="rw2") 82 / 140
203. 203. result1 = inla(formula1,family="poisson",data=Germany,E=E, control.compute=list(dic=TRUE)) result2 = inla(formula2,family="poisson",data=Germany,E=E, control.compute=list(dic=TRUE)) result3 = inla(formula3,family="poisson",data=Germany,E=E, control.compute=list(dic=TRUE)) 83 / 140
204. 204. Other graph speciﬁcation - It is also possible to deﬁne the graph structure of your model using: A symmetric (dense or sparse) matrix, where the non-zero pattern of the matrix deﬁnes the graph. A inla.graph object. See FAQ on the webpage for more information. 84 / 140
205. 205. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 85 / 140
206. 206. Model evaluation Deviance Information Criterion (DIC): result = inla(..., control.compute = list(dic = TRUE)) result\$dic\$dic Conditional predictive ordinate (CPO) and probability integral transform (PIT): CPOi = π(yi |y−i ) PITi = Prob(Yi ≤ yiobs |y−i ) result = inla(..., control.compute = list(cpo = TRUE)) result\$cpo\$cpo result\$cpo\$dic 86 / 140
207. 207. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 87 / 140
208. 208. Controlling θ We often need to set our own priors and using our own parameters in these. These can be set in two ways Old style using prior=.., param=..., initial=..., fixed=... New style using hyper = list(prec = list(initial=2, fixed=TRUE, ....)) The old-style is there for backward-compatibility only. The two styles can also be mixed. 88 / 140
209. 209. Controlling θ We often need to set our own priors and using our own parameters in these. These can be set in two ways Old style using prior=.., param=..., initial=..., fixed=... New style using hyper = list(prec = list(initial=2, fixed=TRUE, ....)) The old-style is there for backward-compatibility only. The two styles can also be mixed. 88 / 140
210. 210. Controlling θ We often need to set our own priors and using our own parameters in these. These can be set in two ways Old style using prior=.., param=..., initial=..., fixed=... New style using hyper = list(prec = list(initial=2, fixed=TRUE, ....)) The old-style is there for backward-compatibility only. The two styles can also be mixed. 88 / 140
211. 211. Example - New style hyper = list( prec = list( prior = param = initial fixed = ) ) "loggamma", c(2,0.1), = 3, FALSE formula = y ~ f(i, model="iid", hyper = hyper) + ... - Old style formula = y ~ f(i, model="iid", prior = "loggamma", param = c(2,0.1), inital = 3, fixed = FALSE) + ... 89 / 140
212. 212. Internal and external scale Hyperparameters, like the precision τ is represented internally using a “good” transformation, like θ1 = log(τ ) Initial values are given in the internal scale the to.theta and from.theta functions can be used to map between the external and internal scale. 90 / 140
213. 213. Internal and external scale Hyperparameters, like the precision τ is represented internally using a “good” transformation, like θ1 = log(τ ) Initial values are given in the internal scale the to.theta and from.theta functions can be used to map between the external and internal scale. 90 / 140
214. 214. Internal and external scale Hyperparameters, like the precision τ is represented internally using a “good” transformation, like θ1 = log(τ ) Initial values are given in the internal scale the to.theta and from.theta functions can be used to map between the external and internal scale. 90 / 140
215. 215. Example: AR1 model hyper theta1 name short.name prior param initial ﬁxed to.theta from.theta log precision prec loggamma 1 5e-05 4 FALSE name short.name prior param initial ﬁxed to.theta from.theta logit lag one correlation rho normal 0 0.15 2 FALSE theta2 constr nrow.ncol augmented aug.factor aug.constr n.div.by n.required set.default.values pdf FALSE FALSE FALSE 1 FALSE FALSE ar1 91 / 140
216. 216. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 92 / 140
217. 217. Feature: replicate “replicate” generates iid replicates from the same model with the same hyperparameters. If x | θ ∼ AR(1), then nrep=3, makes x = (x1 , x2 , x3 ) with mutually independent xi ’s from AR(1) with the same θ Most f()-models can be replicated 93 / 140
218. 218. Example: replicate n=100 x1 = arima.sim(n, model=list(ar=0.9)) + 1 x2 = arima.sim(n, model=list(ar=0.9)) - 1 y1 = rpois(n,exp(x1)) y2 = rpois(n,exp(x2)) y = c(y1,y2) i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1 result = inla(formula, family = "poisson", data = data.frame(y=y,i=i,r=r)) 94 / 140
219. 219. Example: replicate i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1      i     f1,1 0 η1,1 1 y1,1 .  .   .  .  .  .  .   .  .  .  .  .   i.  .  .  0 fn,1  1 yn,1  g ηn,1       i      y1,2  −→ η1,2  = f1,2  + β0,1 ∗ 0 + β0,2 ∗ 1           .  .   .  .  .  .  .   .  .  .  . . . . . i 1 ηn,2 0 yn,2 fn,2  95 / 140
220. 220. Feature: More than one family Every observation could have its own likelihood! Response is a matrix or list Each “column” deﬁnes a separate “family” Each “family” has its own hyperparameters 96 / 140
221. 221. n=100 phi = 0.9 x1 = 1 + arima.sim(n, model=list(ar=phi)) x2 = 0.5 + arima.sim(n, model=list(ar=phi)) y1 = rbinom(n,size=1, prob=exp(x1)/(1+exp(x1))) y2 = rpois(n,exp(x2)) y = matrix(NA, 2*n, 2) y[ 1:n, 1] = y1 y[n+1:n, 2] = y2 i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) Ntrials = c(rep(1,n), rep(NA,n)) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1 result = inla(formula, family = c("binomial", "poisson"), Ntrials = Ntrials, data = data.frame(y,i,r)) 97 / 140
222. 222. y = matrix(NA, 2*n, 2) y[ 1:n, 1] = y1 y[n+1:n, 2] = y2 i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) Ntrials = c(rep(1,n), rep(NA,n)) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1 result = inla(formula, family = c("binomial", "poisson"), Ntrials = Ntrials, data = data.frame(y,i,r)) y1,1  .  .  . yn,1   NA   .  . . NA   i         f1,1 NA η1,1 1 0  .  .   .  . .  .  .  .  .  . . . .   .    .    i   g ηn,1  0 1 NA  fn,1        i  −→ η1,2  = f1,2  + β0,1 ∗ 0 + β0,2 ∗ 1 y1,2             .  . . .   .  .   .  . . .   . . . . . i yn,2 ηn,2 0 1 fn,2 98 / 140
223. 223. More than one family - More examples Some rather advanced examples on www.r-inla.org using this feature Preferential sampling, geostatistics (marked point process) Weibull-survival data and “longitudinal” data 99 / 140
224. 224. Feature: copy The model formula = y ~ f(i, ...) + ... Only allow ONE element from each sub-model, to contribute to the linear predictor for each observation. Sometimes this is not suﬃcient. 100 / 140
225. 225. Feature: copy Suppose ηi = ui + ui+1 + ... Then we can code this as formula = f(i, model="iid") + f(i.plus, copy="i") The copy-feature, creates an additional sub-model which is -close to the target. Many copies allowed Copy with unknown scaling (default scaling is ﬁxed to 1).       η1 u1 u2 . . .  .  =  . +  .  . . . ηn un un 101 / 140
226. 226. Feature: copy Suppose that ηi = ai + bi zi + .... where iid (ai , bi ) ∼ N2 (0, Σ) - Simulate data n = 100 Sigma = matrix(c(1, 0.8, 0.8, 1), 2, 2) z = runif(n) ab = rmvnorm(n, sigma = Sigma) a = ab[, 1] b = ab[, 2] eta = a + b * z s = 0.1 y = eta + rnorm(n, sd=s) 102 / 140
227. 227. i = 1:n j = 1:n + n formula = y ~ f(i, model="iid2d", n = 2*n) + f(j, z, copy="i") -1 r = inla(formula, data = data.frame(y, i, j))   η1 . . . ηn =   a1 . . + . an    b1    . . .  b1 ∗ z1  .   .  .  bn ∗ zn bn 103 / 140
228. 228. Feature: Linear-combinations Possible to extract extra information from the model through linear combinations of the latent ﬁeld, say v = Bx for a k × n matrix B. 104 / 140
229. 229. Feature: Linear-combinations (cont.) Two diﬀerent approaches. 1. Most “correct” is to do the computations on the enlarged ﬁeld x = (x, v) But this often lead to more dense precision matrix. 2. The second option is to compute these “oﬄine”, as (conditionally on θ) Var (v1 ) = Var (bT x) ≈ bT Q−1 1 1 GMRFapprox b1 and E (v1 ) = b1 E (x) Approximate density of v1 with a Normal. 105 / 140
230. 230. Feature: Linear-combinations (cont.) Two diﬀerent approaches. 1. Most “correct” is to do the computations on the enlarged ﬁeld x = (x, v) But this often lead to more dense precision matrix. 2. The second option is to compute these “oﬄine”, as (conditionally on θ) Var (v1 ) = Var (bT x) ≈ bT Q−1 1 1 GMRFapprox b1 and E (v1 ) = b1 E (x) Approximate density of v1 with a Normal. 105 / 140
231. 231. formula = y ~ ClBase4*CTrt + ClAge + CV4 + f(Ind, model="iid") + f(rand, model="iid") ## Now I want the posterior for ## ## 1) 2*CTrt - CV4 ## 2) Ind[2] - rand[2] ## lc1 = inla.make.lincomb( CTrt = 2, CV4 = -1) names(lc1) = "lc1" lc2 = inla.make.lincomb( Ind = c(NA,1), rand = c(NA,-1)) names(lc2) = "lc2" ## default is to derive the marginals from lc’s without changing the ## latent field result1 = inla(formula,family="poisson", data = Epil, lincomb = c(lc1, lc2)) ## but the lincombs can also be additionally included into the latent ## field for increased accurancy... result2 = inla(formula,family="poisson", data = Epil, lincomb = c(lc1, lc2), control.inla = list(lincomb.derived.only = FALSE)) 106 / 140
232. 232. - Get the results result\$summary.lincomb.derived result\$marginals.lincomb.derived # results of the # default method result\$summary.lincomb result\$marginals.lincomb # alternative method - Posterior correlation matrix between all the linear combinations control.inla = list(lincomb.derived.correlation.matrix = TRUE) result\$misc\$lincomb.derived.correlation.matrix - Many linear combinations at once Use inla.make.lincombs() 107 / 140
233. 233. A-matrix in the linear predictor (I) Usual formula η = ... and yi ∼ π(yi | ηi , ...) 108 / 140
234. 234. A-matrix in the linear predictor (II) Extended formula η = ... η ∗ = Aη and yi ∼ π(yi | ηi∗ , ...) Implemented as A = matrix(...) A = sparseMatrix(...) result = inla(formula, ..., control.predictor = list(A = A)) 109 / 140
235. 235. A-matrix in the linear predictor (II) Extended formula η = ... η ∗ = Aη and yi ∼ π(yi | ηi∗ , ...) Implemented as A = matrix(...) A = sparseMatrix(...) result = inla(formula, ..., control.predictor = list(A = A)) 109 / 140
236. 236. A-matrix in the linear predictor (III) Can really simplify model-formulations Duplicate to some extent the “copy” feature Really useful for some models; the A-matrix need not to be a square matrix... 110 / 140
237. 237. Feature: remote computing For large/huge models, its more convenient to run the computations on the remote (Linux/Mac) computational server inla(...., inla.call="remote") using ssh (and Cygwin on windows). 111 / 140
238. 238. Control statements The control.xxx statements control various parts of the INLA program control.predictor A — The ”A matrix”or ”Observational Matrix”linking the latent ﬁeld to the data. control.mode x,theta, result — Gives modes to INLA. restart = TRUE — Tells INLA to try to improve on the supplied mode control.compute dic, mlik, cpo — Compute measures of ﬁt. control.inla strategy and int.strategy contain useful advanced features. Various other—see help! 112 / 140
239. 239. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 113 / 140
240. 240. Space-varying regression Number of (insurance-type) losses Nkt in 431 municipalities/regions of Norway in relation to one weather covariate Wkt . The likelihood is Nkt ∼ Poisson(Akt pkt ); k = 1, . . . , 431 t = 1, . . . , 10 The model for log pkt is: log pkt = β0 + βk Wkt where βk is the regression coeﬃcients for each municipality. 114 / 140
241. 241. Borrow strength.. Few losses is in each region; high variability in the estimates. Borrow strength, by letting {β1 , . . . , β431 } to be smooth in space: {β1 , . . . , β431 } ∼ CAR(τβ ) 115 / 140
242. 242. Borrow strength.. Few losses is in each region; high variability in the estimates. Borrow strength, by letting {β1 , . . . , β431 } to be smooth in space: {β1 , . . . , β431 } ∼ CAR(τβ ) 115 / 140
243. 243. 1 2 The data set: y 0 0 region 1 1 W 0.4 0.4 10 11 12 0 1 0 1 2 2 0.4 0.2 0.2 20 0 2 0.2 116 / 140
244. 244. Second argument in f() is the weight which defaults to 1 ηi = ... + wi fi + ... is represented as f(i, w, ...) No need for sum-to-zero constraint! norway = read.table("norway.dat", header=TRUE) formula = y ~ 1 + f(region, W, model="besag", graph.file="norway.graph", constr=FALSE) result = inla(formula, family="poisson", data=norway) 117 / 140
245. 245. Survival models patient 1 2 3 time 8,16 23,13 22,18 event 1,1 1,0 1,1 age 28,28 48,48 32,32 sex 0 1 0 Times of infection from the time of insertion of catheter on 38 kidney patients using portable dialysis equipment. 2 observation for each patient (38 patients). Each time can be an event (infection) or a censoring (no infection) 118 / 140
246. 246. The Kidney data The Kidney data frame time 8 16 23 13 22 28 event 1 1 1 0 1 1 age 28 28 48 48 32 32 sex 0 0 1 1 0 0 ID 1 1 2 2 3 3 119 / 140
247. 247. data(Kidney) formula = inla.surv(time,event) ~ age + sex + f(ID,model="iid") result1 = inla(formula, family="coxph", data=Kidney) result2 = inla(formula, family="weibull", data=Kidney) result3 = inla(formula, family="exponential", data=Kidney) 120 / 140
248. 248. Outline INLA implementation R-INLA - Model speciﬁcation Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 121 / 140
249. 249. A toy-example using copy State-space model yt = xt + vt xt = 2xt−1 − xt−2 + wt Rewrite this as yt = xt + vt 0 = xt − 2xt−1 + xt−2 + wt and implement this as two families 1. Observations yt with precision Prec(vt ) 2. Observations 0 with precision Prec(wt ), or Prec=HIGH. 122 / 140
250. 250. A toy-example using copy State-space model yt = xt + vt xt = 2xt−1 − xt−2 + wt Rewrite this as yt = xt + vt 0 = xt − 2xt−1 + xt−2 + wt and implement this as two families 1. Observations yt with precision Prec(vt ) 2. Observations 0 with precision Prec(wt ), or Prec=HIGH. 122 / 140
251. 251. n = 100 m = n-2 y = sin((1:n)*0.2) + rnorm(n, sd=0.1) formula = Y ~ f(i, model="iid", initial=-10, fixed=TRUE) + f(j, w, copy="i") + f(k, copy="i") + f(l, model ="iid") -1 Y = matrix(NA, n+m, 2) Y[1:n, 1] = y Y[1:m + n, 2] = 0 i = c(1:n, 3:n) # x_t j = c(rep(NA,n), 3:n -1) # x_t-1 w = c(rep(NA,n), rep(-2,m)) # weights for j k = c(rep(NA,n), 3:n -2) # x_t-2 l = c(rep(NA,n), 1:m) # v_t r = inla(formula, data = data.frame(i,j,w,k,l,Y), family = c("gaussian", "gaussian"), control.data = list(list(), list(initial=10, fixed=TRUE 123 / 140
252. 252. −2 0 2 4 Stochastic Volatility model 0 200 400 600 800 1000 Log of the daily diﬀerence of the pound-dollar exchange rate from October 1st, 1981, to June 28th, 1985. 124 / 140
253. 253. Stochastic Volatility model Simple model xt | x1 , . . . , xt−1 , τ, φ ∼ N (φxt−1 , 1/τ ) where |φ| < 1 to ensure a stationary process. Observations are taken to be yt | x1 , . . . , xt , µ ∼ N (0, exp(µ + xt )) 125 / 140
254. 254. Results Using just the ﬁrst 50 data-points only, which makes the problem much harder. 126 / 140
255. 255. 0.00 0.02 0.04 0.06 0.08 0.10 Results −10 −5 0 5 10 15 20 ν = logit(2φ − 1) 126 / 140
256. 256. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Results 0 2 4 6 log(κx ) 126 / 140
257. 257. −2 0 2 4 Using the full dataset 0 200 400 600 800 100 The Pound-Dollar data. 127 / 140
258. 258. −1 −3 −2 x\$V2 0 1 2 Using the full dataset 0 200 400 600 800 x\$V1 Mean of xt + µ 128 / 140
259. 259. 0.015 0.010 0.005 0.000 convert.dens(xx, yy, FUN = exp)\$y 0.020 Using the full dataset 0 100 200 300 400 500 convert.dens(xx, yy, FUN = exp)\$x The posterior marginal for the precision. 129 / 140
260. 260. 30 20 10 0 convert.dens(xx, yy, FUN = phi.trans)\$y 40 Using the full dataset 0.70 0.75 0.80 0.85 0.90 0.95 1.00 convert.dens(xx, yy, FUN = phi.trans)\$x The posterior marginal for the lag-1 correlation. 130 / 140
261. 261. −1 −3 −2 x\$V2 0 1 2 Using the full dataset 0 200 400 600 800 1000 x\$V1 Predictions for µ + xt+k 131 / 140
262. 262. New data-model: Student-tν Now extend the model to use Student-tν distribution yt | x1 , . . . , xt ∼ exp(µ/2 + xt /2) × Student-tν / ν/(ν − 2) 132 / 140
263. 263. 0.06 0.04 0.02 0.00 convert.dens(xx, yy, FUN = dof.trans)\$y 0.08 Student-tν 0 20 40 60 80 100 convert.dens(xx, yy, FUN = dof.trans)\$x Posterior marginal for ν. 133 / 140
264. 264. −1 −3 −2 x\$V2 0 1 2 Student-tν 0 200 400 600 800 1000 x\$V1 Predictions 134 / 140
265. 265. −1 −3 −2 x\$V2 0 1 2 Student-tν 0 200 400 600 800 1000 x\$V1 Comparing predictions with Student−tν and Gaussian 135 / 140
266. 266. Student-tν However, No support for Student-tν in the data Bayes-factor Deviance Information Criteria 136 / 140
267. 267. Disease mapping: The BYM-model Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = ui + vi Structured component u 0.98 0.71 Unstructured component v 0.44 0.17 −0.1 −0.37 Log-precisions log κu and log κv −0.63 A hard case: Insulin Dependent Diabetes Mellitus in 366 districts of Sardinia. Few counts. dim(θ) = 2. 137 / 140
268. 268. Marginals for θ|y 138 / 140
269. 269. Marginals for θ|y 138 / 140
270. 270. Marginals for xi |y 139 / 140
271. 271. THANK YOU 140 / 140