Bayesian computation using INLA
Thiago G. Martins
Norwegian University of Science and Technology
Trondheim, Norway
AS 2013...
Parte I
Latent Gaussian models and INLA
methodology

2 / 140
Outline

Latent Gaussian models
Are latent Gaussian models important?
Bayesian computing
INLA method

3 / 140
Hierarchical Bayesian models
Hierarchical models are an extremely useful tool in Bayesian model
building.
Three parts:
Obs...
Hierarchical Bayesian models
Hierarchical models are an extremely useful tool in Bayesian model
building.
Three parts:
Obs...
Hierarchical Bayesian models
Hierarchical models are an extremely useful tool in Bayesian model
building.
Three parts:
Obs...
Latent Gaussian models
A latent Gaussian model is a Bayesian hierarchical model of the
following form
Observed data y, yi ...
Latent Gaussian models
A latent Gaussian model is a Bayesian hierarchical model of the
following form
Observed data y, yi ...
Latent Gaussian models
A latent Gaussian model is a Bayesian hierarchical model of the
following form
Observed data y, yi ...
Latent Gaussian models
A latent Gaussian model is a Bayesian hierarchical model of the
following form
Observed data y, yi ...
Precision matrix

The precision matrix of the latent field
Q(θ) = Σ(θ)−1
plays a key role!
Two issues
Building models throu...
Precision matrix

The precision matrix of the latent field
Q(θ) = Σ(θ)−1
plays a key role!
Two issues
Building models throu...
Building models through conditioning

If
x ∼ N (0, Q−1 )
x
y|x ∼ N (x, Q−1 )
y
then
Q(x,y) =

Qx + Qy
−Qy

−Qy
Qy

Not so ...
Computational benefits
Precision matrices encodes conditional independence:
xi ⊥ xj |x−ij ⇐⇒ Qij = 0
We are interested in m...
Computational benefits
Precision matrices encodes conditional independence:
xi ⊥ xj |x−ij ⇐⇒ Qij = 0
We are interested in m...
Computational benefits
Precision matrices encodes conditional independence:
xi ⊥ xj |x−ij ⇐⇒ Qij = 0
We are interested in m...
Numerical algorithms for sparse matrices: scaling
properties

Time: O(n)
Space: O(n3/2 )
Space-time: O(n2 )
This is to be ...
Numerical algorithms for sparse matrices: scaling
properties

Time: O(n)
Space: O(n3/2 )
Space-time: O(n2 )
This is to be ...
Outline

Latent Gaussian models
Are latent Gaussian models important?
Bayesian computing
INLA method

10 / 140
Example (I): Mixed-effect model

yij |ηij , θ 1 ∼ π(yij |ηij , θ 1 ),

i = 1, . . . , N,

j = 1, . . . , M

ηij = µ + cij β...
Example (I): Mixed-effect model

yij |ηij , θ 1 ∼ π(yij |ηij , θ 1 ),

i = 1, . . . , N,

j = 1, . . . , M

ηij = µ + cij β...
Example (I) - cont.

We can reinterpret the model as
θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0, Q−1 (θ))
y|x, θ ∼

π(yi |ηi , θ)
i

dim...
Example (I) - cont.

We can reinterpret the model as
θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0, Q−1 (θ))
y|x, θ ∼

π(yi |ηi , θ)
i

dim...
Example (I) - cont.

0.0

0.2

0.4

0.6

0.8

1.0

Precision matrix (η, u, v, µ, β) N = 100, M = 5.

0.0

0.2

0.4

0.6

0...
Example (II): Time-series model
Smoothing of binary time-series
Data is sequence of 0 and 1s
Probability for a 1 at time t...
Example (II): Time-series model
Smoothing of binary time-series
Data is sequence of 0 and 1s
Probability for a 1 at time t...
Example (II): Time-series model
Smoothing of binary time-series
Data is sequence of 0 and 1s
Probability for a 1 at time t...
Example (II) - cont.
Prior models
µ and β are Normal
u AR-model, like
ut = φut−1 +
with parameters

t

(φ, σ 2 ).

v is an...
Example (II) - cont.
Prior models
µ and β are Normal
u AR-model, like
ut = φut−1 +
with parameters

t

(φ, σ 2 ).

v is an...
Example (II) - cont.
Prior models
µ and β are Normal
u AR-model, like
ut = φut−1 +
with parameters

t

(φ, σ 2 ).

v is an...
Example (II) - cont.
Prior models
µ and β are Normal
u AR-model, like
ut = φut−1 +
with parameters

t

(φ, σ 2 ).

v is an...
Example (II) - cont.
Prior models
µ and β are Normal
u AR-model, like
ut = φut−1 +
with parameters

t

(φ, σ 2 ).

v is an...
Example (II) - cont.
Prior models
µ and β are Normal
u AR-model, like
ut = φut−1 +
with parameters

t

(φ, σ 2 ).

v is an...
Example (II) - cont.

We can reinterpret the model as
θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0, Q−1 (θ))
y|x, θ ∼

π(yi |ηi , θ)
i

di...
Example (II) - cont.

We can reinterpret the model as
θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0, Q−1 (θ))
y|x, θ ∼

π(yi |ηi , θ)
i

di...
Example (II) - cont.

0.0

0.2

0.4

0.6

0.8

1.0

Precision matrix (η, u, v, µ, β), n = 100.

0.0

0.2

0.4

0.6

0.8

1...
Example (III): Disease mapping

Data yi ∼ Poisson(Ei exp(ηi ))
Log-relative risk
ηi = µ + ui + vi + f (ci )
Structured com...
Example (III): Disease mapping

Data yi ∼ Poisson(Ei exp(ηi ))
Log-relative risk
ηi = µ + ui + vi + f (ci )
Structured com...
Example (III): Disease mapping

Data yi ∼ Poisson(Ei exp(ηi ))
Log-relative risk
ηi = µ + ui + vi + f (ci )
Structured com...
Example (III): Disease mapping

Data yi ∼ Poisson(Ei exp(ηi ))
Log-relative risk
ηi = µ + ui + vi + f (ci )
Structured com...
Example (III): Disease mapping

Data yi ∼ Poisson(Ei exp(ηi ))
Log-relative risk
ηi = µ + ui + vi + f (ci )
Structured com...
Yet Another Example (III)

We can reinterpret the model as
θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0, Q−1 (θ))
y|x, θ ∼

π(yi |ηi , θ)
...
Example (III) - cont.

0.0

0.2

0.4

0.6

0.8

1.0

Precision matrix (η, u, v, µ, f)

0.0

0.2

0.4

0.6

0.8

1.0

20 / ...
What we have learned so far

The latent Gaussian model construct

θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0, Q−1 (θ))
y|x, θ ∼

π(yi |η...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Further Examples
Dynamic linear models
Stochastic volatility
Generalized linear (mixed) models
Generalized additive (mixed...
Outline

Latent Gaussian models
Are latent Gaussian models important?
Bayesian computing
INLA method

23 / 140
Bayesian computing
We are interested in the posterior marginal quantities like π(xi |y)
and π(θi |y).

This requires the e...
Bayesian computing
We are interested in the posterior marginal quantities like π(xi |y)
and π(θi |y).

This requires the e...
Bayesian computing
We are interested in the posterior marginal quantities like π(xi |y)
and π(θi |y).

This requires the e...
But surely we can already do this
Markov Chain Monte Carlo (MCMC) is widely used by the
applied community.
There are gener...
But surely we can already do this
Markov Chain Monte Carlo (MCMC) is widely used by the
applied community.
There are gener...
But surely we can already do this
Markov Chain Monte Carlo (MCMC) is widely used by the
applied community.
There are gener...
So what’s wrong with MCMC?
This is actually a problem with any Monte-Carlo scheme.

Error in expectations
The Monte-Carlo ...
Be more narrow
MCMC
MCMC ‘works’ for everything, but it is not usually optimal
when we focus on a specific class of models....
Be more narrow
MCMC
MCMC ‘works’ for everything, but it is not usually optimal
when we focus on a specific class of models....
Be more narrow
MCMC
MCMC ‘works’ for everything, but it is not usually optimal
when we focus on a specific class of models....
Be more narrow
MCMC
MCMC ‘works’ for everything, but it is not usually optimal
when we focus on a specific class of models....
Be more narrow
MCMC
MCMC ‘works’ for everything, but it is not usually optimal
when we focus on a specific class of models....
Be more narrow
MCMC
MCMC ‘works’ for everything, but it is not usually optimal
when we focus on a specific class of models....
Comparing results with MCMC

When comparing the results of R-INLA with MCMC, it is
important to use the same model.
Here w...
Comparing results with MCMC

When comparing the results of R-INLA with MCMC, it is
important to use the same model.
Here w...
Age

1.0

Density

0.5

3
0

0.0

1

2

Density

4

5

1.5

Intercept, 0.125 minutes

1.4

1.5

1.6

1.7

1.8

1.9

−0.5

...
Age

0.8

Density

0.4

3
0

0.0

1

2

Density

4

5

1.2

6

Intercept, 0.25 minutes

1.4

1.5

1.6

1.7

1.8

1.9

−0.5...
Age

0.8

Density

0.4

3
2
0

0.0

1

Density

4

5

1.2

Intercept, 0.5 minutes

1.3

1.4

1.5

1.6

1.7

1.8

1.9

−0.5...
Age

0.8

Density

0.4

3
2
0

0.0

1

Density

4

5

1.2

Intercept, 1 minutes

1.3

1.4

1.5

1.6

1.7

1.8

1.9

−0.5

...
Age

0.8

Density

0.4

3
2
0

0.0

1

Density

4

5

1.2

Intercept, 2 minutes

1.3

1.4

1.5

1.6

1.7

1.8

1.9

−1.0 −...
Age

0.8

Density

0.4

3
2
0

0.0

1

Density

4

5

1.2

Intercept, 4 minutes

1.3

1.4

1.5

1.6

1.7

1.8

1.9

−1.0

...
Density

4
3

Density

2
1
0

1.3

1.4

1.5

1.6

1.7

1.8

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Age

5

Intercept, 8 minutes

1.9...
Density

4
3

Density

2
1
0

1.3

1.4

1.5

1.6

1.7

1.8

0.0 0.2 0.4 0.6 0.8 1.0

Age

5

Intercept, 16 minutes

1.9

−...
Density

4
3

Density

2
1
0

1.3

1.4

1.5

1.6

1.7

1.8

0.0 0.2 0.4 0.6 0.8 1.0

Age

5

Intercept, 32 minutes

1.9

−...
Density

4
3

Density

2
1
0
1.2

1.4

1.6

0.0 0.2 0.4 0.6 0.8 1.0

Age

5

Intercept, 64 minutes

1.8

−1

0

1

log(tau...
Density

4
3

Density

2
1
0
1.2

1.4

1.6

1.8

0.0 0.2 0.4 0.6 0.8 1.0

Age

5

Intercept, 120 minutes

2.0

−1

0

1

l...
Outline

Latent Gaussian models
Are latent Gaussian models important?
Bayesian computing
INLA method

30 / 140
Main aim

Posterior
π(x, θ|y) ∝ π(θ) π(x|θ)

π(yi |xi , θ)
i∈I

Compute the posterior marginals:

π(xi |y) =

π(θ|y) π(xi ...
Main aim

Posterior
π(x, θ|y) ∝ π(θ) π(x|θ)

π(yi |xi , θ)
i∈I

Compute the posterior marginals:

π(xi |y) =

π(θ|y) π(xi ...
Tasks

1. Build an approximation to π(θ|y): π(θ |y)
2. Build an approximation to π(xi |θ, y): π(xi |θ, y)

π(xi |y) =

π(θ...
Tasks

1. Build an approximation to π(θ|y): π(θ |y)
2. Build an approximation to π(xi |θ, y): π(xi |θ, y)

π(xi |y) =

π(θ...
Tasks

1. Build an approximation to π(θ|y): π(θ |y)
2. Build an approximation to π(xi |θ, y): π(xi |θ, y)

π(xi |y) =

π(θ...
Tasks

1. Build an approximation to π(θ|y): π(θ |y)
2. Build an approximation to π(xi |θ, y): π(xi |θ, y)

π(xi |y) =

π(θ...
Tasks

1. Build an approximation to π(θ|y): π(θ |y)
2. Build an approximation to π(xi |θ, y): π(xi |θ, y)

π(xi |y) =

π(θ...
Task 1: π(θ|y)
The Laplace approximation for π(θ|y) is

π(θ|y) =
∝
≈

π(x, θ|y)
π(x|θ, y)
π(θ) π(x|θ) π(y|x)
π(x|θ, y)
π(θ...
The GMRF-approximation

1
π(x|y) ∝ exp − xT Qx +
2

log π(yi |xi )
i

1
≈ exp − (x − µ)T (Q + diag(ci ))(x − µ)
2

= π(x|y...
Remarks

The Laplace approximation
π(θ|y)
turn out to be accurate: x|y, θ appears almost Gaussian in most
cases, as
x is a...
Remarks

The Laplace approximation
π(θ|y)
turn out to be accurate: x|y, θ appears almost Gaussian in most
cases, as
x is a...
Remarks

The Laplace approximation
π(θ|y)
turn out to be accurate: x|y, θ appears almost Gaussian in most
cases, as
x is a...
Remarks

The Laplace approximation
π(θ|y)
turn out to be accurate: x|y, θ appears almost Gaussian in most
cases, as
x is a...
Remarks

The Laplace approximation
π(θ|y)
turn out to be accurate: x|y, θ appears almost Gaussian in most
cases, as
x is a...
Task 2: π(xi |y, θ)
This task is more challenging, since
dimension of x, n is large
and there are potential n marginals to...
Task 2: π(xi |y, θ)
This task is more challenging, since
dimension of x, n is large
and there are potential n marginals to...
π(xi |y, θ) - 1. Gaussian approximation

An obvious simple and fast alternative, is to use the
GMRF-approximation πG (x|y,...
π(xi |y, θ) - 1. Gaussian approximation

An obvious simple and fast alternative, is to use the
GMRF-approximation πG (x|y,...
π(xi |y, θ) - 2. Laplace approximation

The Laplace approximation:
π(xi | y, θ) ≈

π(x, θ|y)
πGG (x−i |xi , y, θ)

x−i =x∗...
π(xi |y, θ) - 2. Laplace approximation

The Laplace approximation:
π(xi | y, θ) ≈

π(x, θ|y)
πGG (x−i |xi , y, θ)

x−i =x∗...
π(xi |y, θ) - 2. Laplace approximation

The Laplace approximation:
π(xi | y, θ) ≈

π(x, θ|y)
πGG (x−i |xi , y, θ)

x−i =x∗...
π(xi |y, θ) - 3. Simplified Laplace approximation
Taylor expansions of the LA for π(xi |θ, y):
computational much faster
co...
π(xi |y, θ) - 3. Simplified Laplace approximation
Taylor expansions of the LA for π(xi |θ, y):
computational much faster
co...
π(xi |y, θ) - 3. Simplified Laplace approximation
Taylor expansions of the LA for π(xi |θ, y):
computational much faster
co...
π(xi |y, θ) - 3. Simplified Laplace approximation
Taylor expansions of the LA for π(xi |θ, y):
computational much faster
co...
Task 3: Numerical integration wrt θ

Now that we know how to compute:
π(θ|y) - Laplace approximation
1. Gaussian
π(xi |θ, ...
Task 3: Numerical integration wrt θ

Now that we know how to compute:
π(θ|y) - Laplace approximation
1. Gaussian
π(xi |θ, ...
The integrated nested Laplace approximation (INLA) I
Explore π(θ|y)
Locate the mode
Use the Hessian to construct new varia...
The integrated nested Laplace approximation (INLA) I
Explore π(θ|y)
Locate the mode
Use the Hessian to construct new varia...
The integrated nested Laplace approximation (INLA) I
Explore π(θ|y)
Locate the mode
Use the Hessian to construct new varia...
The integrated nested Laplace approximation (INLA) I
Explore π(θ|y)
Locate the mode
Use the Hessian to construct new varia...
The integrated nested Laplace approximation (INLA) II

Step II For each θ j
For each i, evaluate the Laplace approximation...
The integrated nested Laplace approximation (INLA) II

Step II For each θ j
For each i, evaluate the Laplace approximation...
The integrated nested Laplace approximation (INLA) II

Step II For each θ j
For each i, evaluate the Laplace approximation...
The integrated nested Laplace approximation (INLA) III

Step III Sum out θ j
For each i, sum out θ
π(xi | y) ∝

π(xi | y, ...
The integrated nested Laplace approximation (INLA) III

Step III Sum out θ j
For each i, sum out θ
π(xi | y) ∝

π(xi | y, ...
The integrated nested Laplace approximation (INLA) III

Step III Sum out θ j
For each i, sum out θ
π(xi | y) ∝

π(xi | y, ...
Computing posterior marginals for θj (I)

Main idea
Use the integration-points and build an interpolant
Use numerical inte...
Computing posterior marginals for θj (I)

Main idea
Use the integration-points and build an interpolant
Use numerical inte...
How can we assess the error in the approximations?

Tool 1: Compare a sequence of improved approximations
1. Gaussian appr...
How can we assess the error in the approximations?

Tool 2: Estimate the “effective” number of parameters as defined
in the ...
Parte II
R-INLA package

47 / 140
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and pri...
Implementing INLA

All procedures required to perform INLA need to be carefully
implemented to achieve a good speed; easy ...
Implementing INLA

All procedures required to perform INLA need to be carefully
implemented to achieve a good speed; easy ...
Implementing INLA
All procedures required to perform INLA need to be carefully
implemented to achieve a good speed; easy t...
Implementing INLA
All procedures required to perform INLA need to be carefully
implemented to achieve a good speed; easy t...
Implementing INLA

All procedures required to perform INLA need to be carefully
implemented to achieve a good speed; easy ...
The INLA package for R

Produces:
1.

Data Frame
formula

− Input files
− ini file

Input

Runs the

INLA
package

2.

inl...
R-INLA
Visit the www-site
www.r-inla.org
and follow the instructions.
www-site contains source-code, examples, reports +++...
R-INLA
Visit the www-site
www.r-inla.org
and follow the instructions.
www-site contains source-code, examples, reports +++...
R-INLA
Visit the www-site
www.r-inla.org
and follow the instructions.
www-site contains source-code, examples, reports +++...
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and pri...
The structure of an R program using INLA

There are essentially three parts to an INLA program:
1. The data organization.
...
The inla function

This is all that’s needed for a basic call
> result <- inla(
formula = y ~ 1 + x,

# This describes you...
The simplest case: Linear regression
n = 100
x = sort(runif(n))
y = 1 + x + rnorm(n, sd = 0.1)
plot(x,y)
formula = y ~ 1 +...
Call:
c("inla(formula = formula, family = "gaussian", data = data.frame(x, ",
Time used:
Pre-processing
0.08050394

Runnin...
Likelihood functions - family argument
result = inla(formula,
data = data.frame(x,y),
family = "gaussian")
“binomial”
“cox...
Likelihood functions - family argument
result = inla(formula,
data = data.frame(x,y),
family = "gaussian")
“binomial”
“cox...
A more general model
Assume the following model:
y

∼ π(y |η)

η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 )
where
x1 , x2
βi
x...
A more general model
Assume the following model:
y

∼ π(y |η)

η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 )
where
x1 , x2
βi
x...
A more general model (cont.)
Assume the following model:
y

∼ π(y |η)

η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 )
> formula ...
A more general model (cont.)
Assume the following model:
y

∼ π(y |η)

η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 )
> formula ...
A more general model (cont.)
Assume the following model:
y

∼ π(y |η)

η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 )
> formula ...
Model specification - INLA package
The model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + ...
Model specification - INLA package
The model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + ...
Model specification - INLA package
The model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + ...
Model specification - INLA package
The model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + ...
Model specification - INLA package
The model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + ...
Model specification - INLA package
The model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + ...
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
rep...
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
rep...
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
rep...
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
rep...
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
rep...
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
rep...
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and pri...
EPIL example

Seizure counts in a randomized trial of anti-convulsant therapy in
epilepsy. From WinBUGS manual.
Patient
1
...
EPIL example (cont.)
Mixed model with repeated Poisson counts

yjk

∼ Poisson(µjk ); j = 1, . . . , 59; k = 1, . . . , 4

...
EPIL example (cont.)
The Epil data frame:
y
5
3
.
.
.

Trt
0
0

Base
11
11

Age
31
31

V4
0
0

rand
1
2

Ind
1
1

Specifyi...
EPIL example (cont.)
The Epil data frame:
y
5
3
.
.
.

Trt
0
0

Base
11
11

Age
31
31

V4
0
0

rand
1
2

Ind
1
1

Specifyi...
EPIL example (cont.)
The Epil data frame:
y
5
3
.
.
.

Trt
0
0

Base
11
11

Age
31
31

V4
0
0

rand
1
2

Ind
1
1

Specifyi...
data(Epil)
my.center = function(x) (x - mean(x))
Epil$CTrt
Epil$ClBase4
Epil$CV4
Epil$ClAge

=
=
=
=

my.center(Epil$Trt)
...
0

1

2

3

4

5

Epil-example from Win/Open-BUGS

1.2

1.4

1.6

1.8

2.0

Marginals for α0
67 / 140
0.0

0.1

0.2

0.3

Epil-example from Win/Open-BUGS

0

5

10

15

Marginals for τβ
67 / 140
EPIL example (cont.)
Access results
- Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld)
result$summary.fixed
result...
EPIL example (cont.)
Access results
- Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld)
result$summary.fixed
result...
0.0

0.5

1.0

1.5

2.0

Smoothing binary times series

0

100

200

300

Time

Number of days in Tokyo with rainfall abov...
Smoothing binary times series

Model with time series component
yt
pt
ηt
f
τ

∼
=
=
=
∼

Binomial(nt , pt ); t = 1, . . . ...
Smoothing binary time series

The Tokyo data frame:
y
0
0
1
.
.
.

n
2
2
2

time
1
2
3

71 / 140
Smoothing binary time series

The Tokyo data frame:
y
0
0
1
.
.
.

n
2
2
2

time
1
2
3

Specifying the model:
formula = y ...
Smoothing binary time series
The Tokyo data frame:
y
0
0
1
.
.
.

n
2
2
2

time
1
2
3

Specifying the model:
formula = y ∼...
data(Tokyo)
formula = y ~ f(time, model="rw2", cyclic=TRUE) - 1
result = inla(formula, family="binomial", Ntrials=n,
data=...
Posterior for temporal effect

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

time

0

100

200

300

PostMean 0.025% 0.5% 0.975%
...
Posterior for precision

0e+00

1e-05

2e-05

3e-05

4e-05

5e-05

6e-05

7e-05

PostDens [Precision for time]

0

10000

...
Disease mapping in Germany
Larynx cancer mortality counts are observed in the 544 district of
Germany from 1986 to 1990 an...
yi , i = 1, . . . , 544 counts of cancer mortality in Region i
Ei , i = 1, . . . , 544 known variable accounting for demog...
The model
yi
ηi

∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544
= µ + f (ci ) + fs (si ) + fu (si )

where:
f (ci ) is a smoot...
The model
yi
ηi

∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544
= µ + f (ci ) + fs (si ) + fu (si )

where:
f (ci ) is a smoot...
The model
yi
ηi

∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544
= µ + f (ci ) + fs (si ) + fu (si )

where:
f (ci ) is a smoot...
The model
yi
ηi

∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544
= µ + f (ci ) + fs (si ) + fu (si )

where:
f (ci ) is a smoot...
The model
yi
ηi

∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544
= µ + f (ci ) + fs (si ) + fu (si )

where:
f (ci ) is a smoot...
For identifiably we define a sum-to-zero constraint for all intrinsic
models, so
s fs (s) = 0
= 0
i fi

78 / 140
The Germany data frame:
region
0
1

E
7.965008
22.836219

Y
8
22

x
56
65

The model is:
ηi = µ + f (ci ) + fs (si ) + fu ...
The Germany data frame:
region
0
1

E
7.965008
22.836219

Y
8
22

x
56
65

The model is:
ηi = µ + f (ci ) + fs (si ) + fu ...
The Germany data frame:
region
0
1

E
7.965008
22.836219

Y
8
22

x
56
65

The model is:
ηi = µ + f (ci ) + fs (si ) + fu ...
The new data set is:
region
0
1

E
7.965008
22.836219

Y
8
22

x
56
65

region.struct
0
1

Then the formula is
formula <- ...
The new data set is:
region
0
1

E
7.965008
22.836219

Y
8
22

x
56
65

region.struct
0
1

Then the formula is
formula <- ...
The new data set is:
region
0
1

E
7.965008
22.836219

Y
8
22

x
56
65

region.struct
0
1

Then the formula is
formula <- ...
The new data set is:
region
0
1

E
7.965008
22.836219

Y
8
22

x
56
65

region.struct
0
1

Then the formula is
formula <- ...
The new data set is:
region
0
1

E
7.965008
22.836219

Y
8
22

x
56
65

region.struct
0
1

Then the formula is
formula <- ...
The graph file

544
1
The germany.graph file: 2
3
.
.
.

1
2
4

12
10
6

11
8

15

387

Total number of nodes in the graph
I...
The graph file

544
1
The germany.graph file: 2
3
.
.
.

1
2
4

12
10
6

11
8

15

387

Total number of nodes in the graph
I...
The graph file

544
1
The germany.graph file: 2
3
.
.
.

1
2
4

12
10
6

11
8

15

387

Total number of nodes in the graph
I...
The graph file

544
1
The germany.graph file: 2
3
.
.
.

1
2
4

12
10
6

11
8

15

387

Total number of nodes in the graph
I...
The graph file

544
1
The germany.graph file: 2
3
.
.
.

1
2
4

12
10
6

11
8

15

387

Total number of nodes in the graph
I...
data(Germany)
g = system.file("demodata/germany.graph", package="INLA")
source(system.file("demodata/Bym-map.R", package="...
result1 = inla(formula1,family="poisson",data=Germany,E=E,
control.compute=list(dic=TRUE))
result2 = inla(formula2,family=...
Other graph specification

- It is also possible to define the graph structure of your model
using:
A symmetric (dense or sp...
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and pri...
Model evaluation
Deviance Information Criterion (DIC):
result = inla(..., control.compute = list(dic = TRUE))
result$dic$d...
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and pri...
Controlling θ

We often need to set our own priors and using our own
parameters in these.
These can be set in two ways
Old...
Controlling θ

We often need to set our own priors and using our own
parameters in these.
These can be set in two ways
Old...
Controlling θ

We often need to set our own priors and using our own
parameters in these.
These can be set in two ways
Old...
Example
- New style
hyper = list(
prec = list(
prior =
param =
initial
fixed =
)
)

"loggamma",
c(2,0.1),
= 3,
FALSE

form...
Internal and external scale

Hyperparameters, like the precision τ is represented internally using
a “good” transformation...
Internal and external scale

Hyperparameters, like the precision τ is represented internally using
a “good” transformation...
Internal and external scale

Hyperparameters, like the precision τ is represented internally using
a “good” transformation...
Example: AR1 model
hyper
theta1
name
short.name
prior
param
initial
fixed
to.theta
from.theta

log precision
prec
loggamma
...
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and pri...
Feature: replicate

“replicate” generates iid replicates from the same model with the
same hyperparameters.
If x | θ ∼ AR(...
Example: replicate

n=100
x1 = arima.sim(n, model=list(ar=0.9)) + 1
x2 = arima.sim(n, model=list(ar=0.9)) - 1
y1 = rpois(n...
Example: replicate

i = rep(1:n,2)
r = rep(1:2,each=n)
intercept = as.factor(r)
formula = y ~ f(i, model="ar1", replicate=...
Feature: More than one family

Every observation could have its own likelihood!
Response is a matrix or list
Each “column”...
n=100
phi = 0.9
x1 = 1
+ arima.sim(n, model=list(ar=phi))
x2 = 0.5 + arima.sim(n, model=list(ar=phi))
y1 = rbinom(n,size=1...
y = matrix(NA, 2*n, 2)
y[ 1:n, 1] = y1
y[n+1:n, 2] = y2
i = rep(1:n,2)
r = rep(1:2,each=n)
intercept = as.factor(r)
Ntrial...
More than one family - More examples

Some rather advanced examples on www.r-inla.org using this
feature
Preferential samp...
Feature: copy

The model
formula = y ~ f(i, ...) + ...
Only allow ONE element from each sub-model, to contribute to
the li...
Feature: copy
Suppose
ηi = ui + ui+1 + ...
Then we can code this as
formula = f(i, model="iid") + f(i.plus, copy="i")
The ...
Feature: copy
Suppose that
ηi = ai + bi zi + ....
where

iid

(ai , bi ) ∼ N2 (0, Σ)
- Simulate data
n = 100
Sigma = matri...
i = 1:n
j = 1:n + n
formula = y ~ f(i, model="iid2d", n = 2*n) + f(j, z, copy="i") -1
r = inla(formula, data = data.frame(...
Feature: Linear-combinations

Possible to extract extra information from the model through linear
combinations of the late...
Feature: Linear-combinations (cont.)
Two different approaches.
1. Most “correct” is to do the computations on the enlarged ...
Feature: Linear-combinations (cont.)
Two different approaches.
1. Most “correct” is to do the computations on the enlarged ...
formula = y ~ ClBase4*CTrt + ClAge + CV4 +
f(Ind, model="iid") + f(rand, model="iid")
## Now I want the posterior for
##
#...
- Get the results

result$summary.lincomb.derived
result$marginals.lincomb.derived # results of the
# default method
resul...
A-matrix in the linear predictor (I)

Usual formula
η = ...
and
yi ∼ π(yi | ηi , ...)

108 / 140
A-matrix in the linear predictor (II)
Extended formula
η = ...
η ∗ = Aη
and
yi ∼ π(yi | ηi∗ , ...)
Implemented as
A = matr...
A-matrix in the linear predictor (II)
Extended formula
η = ...
η ∗ = Aη
and
yi ∼ π(yi | ηi∗ , ...)
Implemented as
A = matr...
A-matrix in the linear predictor (III)

Can really simplify model-formulations
Duplicate to some extent the “copy” feature...
Feature: remote computing

For large/huge models, its more convenient to run the
computations on the remote (Linux/Mac) co...
Control statements
The control.xxx statements control various parts of the INLA
program
control.predictor
A — The ”A matri...
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and pri...
Space-varying regression

Number of (insurance-type) losses Nkt in 431
municipalities/regions of Norway in relation to one...
Borrow strength..

Few losses is in each region; high variability in the estimates.
Borrow strength, by letting {β1 , . . ...
Borrow strength..

Few losses is in each region; high variability in the estimates.
Borrow strength, by letting {β1 , . . ...
1
2

The data set:

y
0
0

region
1
1

W
0.4
0.4

10
11
12

0
1
0

1
2
2

0.4
0.2
0.2

20

0

2

0.2

116 / 140
Second argument in f() is the weight which defaults to 1
ηi = ... + wi fi + ...
is represented as
f(i, w, ...)
No need for...
Survival models

patient
1
2
3

time
8,16
23,13
22,18

event
1,1
1,0
1,1

age
28,28
48,48
32,32

sex
0
1
0

Times of infec...
The Kidney data

The Kidney data frame
time
8
16
23
13
22
28

event
1
1
1
0
1
1

age
28
28
48
48
32
32

sex
0
0
1
1
0
0

I...
data(Kidney)
formula = inla.surv(time,event) ~ age + sex + f(ID,model="iid")
result1 = inla(formula, family="coxph", data=...
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and pri...
A toy-example using copy

State-space model
yt = xt + vt
xt = 2xt−1 − xt−2 + wt
Rewrite this as
yt = xt + vt
0 = xt − 2xt−...
A toy-example using copy

State-space model
yt = xt + vt
xt = 2xt−1 − xt−2 + wt
Rewrite this as
yt = xt + vt
0 = xt − 2xt−...
n = 100
m = n-2
y = sin((1:n)*0.2) + rnorm(n, sd=0.1)
formula = Y ~ f(i, model="iid", initial=-10, fixed=TRUE) +
f(j, w, c...
−2

0

2

4

Stochastic Volatility model

0

200

400

600

800

1000

Log of the daily difference of the pound-dollar exch...
Stochastic Volatility model

Simple model
xt | x1 , . . . , xt−1 , τ, φ ∼ N (φxt−1 , 1/τ )
where |φ| < 1 to ensure a stati...
Results

Using just the first 50 data-points only, which makes the problem
much harder.

126 / 140
0.00

0.02

0.04

0.06

0.08

0.10

Results

−10

−5

0

5

10

15

20

ν = logit(2φ − 1)

126 / 140
0.00

0.05

0.10

0.15

0.20

0.25

0.30

Results

0

2

4

6

log(κx )

126 / 140
−2

0

2

4

Using the full dataset

0

200

400

600

800

100

The Pound-Dollar data.
127 / 140
−1
−3

−2

x$V2

0

1

2

Using the full dataset

0

200

400

600

800

x$V1

Mean of xt + µ
128 / 140
0.015
0.010
0.005
0.000

convert.dens(xx, yy, FUN = exp)$y

0.020

Using the full dataset

0

100

200

300

400

500

con...
30
20
10
0

convert.dens(xx, yy, FUN = phi.trans)$y

40

Using the full dataset

0.70

0.75

0.80

0.85

0.90

0.95

1.00
...
−1
−3

−2

x$V2

0

1

2

Using the full dataset

0

200

400

600

800

1000

x$V1

Predictions for µ + xt+k
131 / 140
New data-model: Student-tν

Now extend the model to use Student-tν distribution
yt | x1 , . . . , xt ∼ exp(µ/2 + xt /2) × ...
0.06
0.04
0.02
0.00

convert.dens(xx, yy, FUN = dof.trans)$y

0.08

Student-tν

0

20

40

60

80

100

convert.dens(xx, y...
−1
−3

−2

x$V2

0

1

2

Student-tν

0

200

400

600

800

1000

x$V1

Predictions
134 / 140
−1
−3

−2

x$V2

0

1

2

Student-tν

0

200

400

600

800

1000

x$V1

Comparing predictions with Student−tν and Gaussia...
Student-tν

However,
No support for Student-tν in the data
Bayes-factor
Deviance Information Criteria

136 / 140
Disease mapping: The BYM-model

Data yi ∼ Poisson(Ei exp(ηi ))
Log-relative risk ηi = ui + vi
Structured component u
0.98
...
Marginals for θ|y

138 / 140
Marginals for θ|y

138 / 140
Marginals for xi |y

139 / 140
THANK YOU

140 / 140
Upcoming SlideShare
Loading in …5
×

Bayesian computation with INLA

1,616 views

Published on

Short-course about Bayesian computation with INLA given on the AS2013 conference in Ribno, Slovenia.

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,616
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
96
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Bayesian computation with INLA

  1. 1. Bayesian computation using INLA Thiago G. Martins Norwegian University of Science and Technology Trondheim, Norway AS 2013, Ribno, Slovenia September, 2013 1 / 140
  2. 2. Parte I Latent Gaussian models and INLA methodology 2 / 140
  3. 3. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 3 / 140
  4. 4. Hierarchical Bayesian models Hierarchical models are an extremely useful tool in Bayesian model building. Three parts: Observations (y): Encodes information about observed data, including design and collection issues. The latent process (x): The unobserved process. May be the focus of the study, or may be included to reduce autocorrelation. E.g., encode spatial and/or temporal dependence. The Parameter model (θ): Models for all of the parameters in the observation and latent processes. 4 / 140
  5. 5. Hierarchical Bayesian models Hierarchical models are an extremely useful tool in Bayesian model building. Three parts: Observations (y): Encodes information about observed data, including design and collection issues. The latent process (x): The unobserved process. May be the focus of the study, or may be included to reduce autocorrelation. E.g., encode spatial and/or temporal dependence. The Parameter model (θ): Models for all of the parameters in the observation and latent processes. 4 / 140
  6. 6. Hierarchical Bayesian models Hierarchical models are an extremely useful tool in Bayesian model building. Three parts: Observations (y): Encodes information about observed data, including design and collection issues. The latent process (x): The unobserved process. May be the focus of the study, or may be included to reduce autocorrelation. E.g., encode spatial and/or temporal dependence. The Parameter model (θ): Models for all of the parameters in the observation and latent processes. 4 / 140
  7. 7. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian field x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
  8. 8. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian field x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
  9. 9. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian field x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
  10. 10. Latent Gaussian models A latent Gaussian model is a Bayesian hierarchical model of the following form Observed data y, yi |xi ∼ π(yi |xi , θ) Latent Gaussian field x ∼ N (·, Σ(θ)) Hyperparameters θ ∼ π(θ) variability length/strength of dependence parameters in the likelihood π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I 5 / 140
  11. 11. Precision matrix The precision matrix of the latent field Q(θ) = Σ(θ)−1 plays a key role! Two issues Building models through conditioning (“hierarchical models”) Computational benefits 6 / 140
  12. 12. Precision matrix The precision matrix of the latent field Q(θ) = Σ(θ)−1 plays a key role! Two issues Building models through conditioning (“hierarchical models”) Computational benefits 6 / 140
  13. 13. Building models through conditioning If x ∼ N (0, Q−1 ) x y|x ∼ N (x, Q−1 ) y then Q(x,y) = Qx + Qy −Qy −Qy Qy Not so nice expressions using the Covariance-matrix 7 / 140
  14. 14. Computational benefits Precision matrices encodes conditional independence: xi ⊥ xj |x−ij ⇐⇒ Qij = 0 We are interested in models with sparse precision matrices. x ∼ N (·, Σ(θ)) with sparse Q(θ) = Σ(θ)−1 Gaussians with a sparse precision matrix are called Gaussian Markov random fields (GMRFs) Good computational properties through numerical algorithms for sparse matrices 8 / 140
  15. 15. Computational benefits Precision matrices encodes conditional independence: xi ⊥ xj |x−ij ⇐⇒ Qij = 0 We are interested in models with sparse precision matrices. x ∼ N (·, Σ(θ)) with sparse Q(θ) = Σ(θ)−1 Gaussians with a sparse precision matrix are called Gaussian Markov random fields (GMRFs) Good computational properties through numerical algorithms for sparse matrices 8 / 140
  16. 16. Computational benefits Precision matrices encodes conditional independence: xi ⊥ xj |x−ij ⇐⇒ Qij = 0 We are interested in models with sparse precision matrices. x ∼ N (·, Σ(θ)) with sparse Q(θ) = Σ(θ)−1 Gaussians with a sparse precision matrix are called Gaussian Markov random fields (GMRFs) Good computational properties through numerical algorithms for sparse matrices 8 / 140
  17. 17. Numerical algorithms for sparse matrices: scaling properties Time: O(n) Space: O(n3/2 ) Space-time: O(n2 ) This is to be compared with general O(n3 ) algorithms for dense matrices. 9 / 140
  18. 18. Numerical algorithms for sparse matrices: scaling properties Time: O(n) Space: O(n3/2 ) Space-time: O(n2 ) This is to be compared with general O(n3 ) algorithms for dense matrices. 9 / 140
  19. 19. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 10 / 140
  20. 20. Example (I): Mixed-effect model yij |ηij , θ 1 ∼ π(yij |ηij , θ 1 ), i = 1, . . . , N, j = 1, . . . , M ηij = µ + cij β + ui + vj + wij where u, v and w are “random effects”. If we assign Gaussian priors on µ, β, u and v, then x|θ 2 = (µ, β, u, v, η)|θ 2 is jointly Gaussian. θ = (θ 1 , θ 2 ) 11 / 140
  21. 21. Example (I): Mixed-effect model yij |ηij , θ 1 ∼ π(yij |ηij , θ 1 ), i = 1, . . . , N, j = 1, . . . , M ηij = µ + cij β + ui + vj + wij where u, v and w are “random effects”. If we assign Gaussian priors on µ, β, u and v, then x|θ 2 = (µ, β, u, v, η)|θ 2 is jointly Gaussian. θ = (θ 1 , θ 2 ) 11 / 140
  22. 22. Example (I) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 12 / 140
  23. 23. Example (I) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 12 / 140
  24. 24. Example (I) - cont. 0.0 0.2 0.4 0.6 0.8 1.0 Precision matrix (η, u, v, µ, β) N = 100, M = 5. 0.0 0.2 0.4 0.6 0.8 1.0 13 / 140
  25. 25. Example (II): Time-series model Smoothing of binary time-series Data is sequence of 0 and 1s Probability for a 1 at time t, pt , depends on time pt = exp(ηt ) 1 + exp(ηt ) Linear predictor ηt = µ + βct + ut + vt , t = 1, . . . , n 14 / 140
  26. 26. Example (II): Time-series model Smoothing of binary time-series Data is sequence of 0 and 1s Probability for a 1 at time t, pt , depends on time pt = exp(ηt ) 1 + exp(ηt ) Linear predictor ηt = µ + βct + ut + vt , t = 1, . . . , n 14 / 140
  27. 27. Example (II): Time-series model Smoothing of binary time-series Data is sequence of 0 and 1s Probability for a 1 at time t, pt , depends on time pt = exp(ηt ) 1 + exp(ηt ) Linear predictor ηt = µ + βct + ut + vt , t = 1, . . . , n 14 / 140
  28. 28. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random effect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
  29. 29. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random effect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
  30. 30. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random effect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
  31. 31. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random effect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
  32. 32. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random effect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
  33. 33. Example (II) - cont. Prior models µ and β are Normal u AR-model, like ut = φut−1 + with parameters t (φ, σ 2 ). v is an unstructured term or a “random effect” gives x|θ = (µ, β, u, v, η) is jointly Gaussian. Hyperparameters 2 θ = (φ, σ 2 , σv ) 15 / 140
  34. 34. Example (II) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 16 / 140
  35. 35. Example (II) - cont. We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 16 / 140
  36. 36. Example (II) - cont. 0.0 0.2 0.4 0.6 0.8 1.0 Precision matrix (η, u, v, µ, β), n = 100. 0.0 0.2 0.4 0.6 0.8 1.0 17 / 140
  37. 37. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth effect of a covariate c −0.63 18 / 140
  38. 38. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth effect of a covariate c −0.63 18 / 140
  39. 39. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth effect of a covariate c −0.63 18 / 140
  40. 40. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth effect of a covariate c −0.63 18 / 140
  41. 41. Example (III): Disease mapping Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = µ + ui + vi + f (ci ) Structured component u 0.98 0.71 0.44 Unstructured component v 0.17 −0.1 −0.37 Smooth effect of a covariate c −0.63 18 / 140
  42. 42. Yet Another Example (III) We can reinterpret the model as θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i dim(x) could be large 102 -105 dim(θ) is small 1-5 19 / 140
  43. 43. Example (III) - cont. 0.0 0.2 0.4 0.6 0.8 1.0 Precision matrix (η, u, v, µ, f) 0.0 0.2 0.4 0.6 0.8 1.0 20 / 140
  44. 44. What we have learned so far The latent Gaussian model construct θ ∼ π(θ) x|θ ∼ π(x|θ) = N (0, Q−1 (θ)) y|x, θ ∼ π(yi |ηi , θ) i occurs in many, seemingly unrelated, statistical models. GLM/GAM/GLMM/GAMM/++ 21 / 140
  45. 45. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  46. 46. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  47. 47. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  48. 48. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  49. 49. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  50. 50. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  51. 51. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  52. 52. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  53. 53. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  54. 54. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  55. 55. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  56. 56. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  57. 57. Further Examples Dynamic linear models Stochastic volatility Generalized linear (mixed) models Generalized additive (mixed) models Spline smoothing Semi-parametric regression Space-varying (semi-parametric) regression models Disease mapping Log-Gaussian Cox-processes Model-based geostatistics (*) Spatio-temporal models Survival analysis +++ 22 / 140
  58. 58. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 23 / 140
  59. 59. Bayesian computing We are interested in the posterior marginal quantities like π(xi |y) and π(θi |y). This requires the evaluation of integrals of the form π(xi |y) ∝ π(y |x, θ)π(x|θ)π(θ) dθ dx{−i} x{−i} θ The computation of massively high dimensional integrals is at the core of Bayesian computing. 24 / 140
  60. 60. Bayesian computing We are interested in the posterior marginal quantities like π(xi |y) and π(θi |y). This requires the evaluation of integrals of the form π(xi |y) ∝ π(y |x, θ)π(x|θ)π(θ) dθ dx{−i} x{−i} θ The computation of massively high dimensional integrals is at the core of Bayesian computing. 24 / 140
  61. 61. Bayesian computing We are interested in the posterior marginal quantities like π(xi |y) and π(θi |y). This requires the evaluation of integrals of the form π(xi |y) ∝ π(y |x, θ)π(x|θ)π(θ) dθ dx{−i} x{−i} θ The computation of massively high dimensional integrals is at the core of Bayesian computing. 24 / 140
  62. 62. But surely we can already do this Markov Chain Monte Carlo (MCMC) is widely used by the applied community. There are generic tools available for MCMC, OpenBUGS, JAGS, STAN and others for specific models, like BayesX. The issue of Bayesian computing is not “solved” even though MCMC is available Hierarchical models are more difficult for MCMC Strong dependencies, bad mixing. A main obstacle for Bayesian modeling is still the issue of “Bayesian computing” 25 / 140
  63. 63. But surely we can already do this Markov Chain Monte Carlo (MCMC) is widely used by the applied community. There are generic tools available for MCMC, OpenBUGS, JAGS, STAN and others for specific models, like BayesX. The issue of Bayesian computing is not “solved” even though MCMC is available Hierarchical models are more difficult for MCMC Strong dependencies, bad mixing. A main obstacle for Bayesian modeling is still the issue of “Bayesian computing” 25 / 140
  64. 64. But surely we can already do this Markov Chain Monte Carlo (MCMC) is widely used by the applied community. There are generic tools available for MCMC, OpenBUGS, JAGS, STAN and others for specific models, like BayesX. The issue of Bayesian computing is not “solved” even though MCMC is available Hierarchical models are more difficult for MCMC Strong dependencies, bad mixing. A main obstacle for Bayesian modeling is still the issue of “Bayesian computing” 25 / 140
  65. 65. So what’s wrong with MCMC? This is actually a problem with any Monte-Carlo scheme. Error in expectations The Monte-Carlo error is Var E(f (X )) − 1 N N f (xi ) i=1 =O 1 √ N In practical terms, to reduce the variance to O(10−p ) you need O(102p ) samples! This can be optimistic! 26 / 140
  66. 66. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a specific class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
  67. 67. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a specific class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
  68. 68. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a specific class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
  69. 69. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a specific class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
  70. 70. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a specific class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
  71. 71. Be more narrow MCMC MCMC ‘works’ for everything, but it is not usually optimal when we focus on a specific class of models. It works for latent Gaussian models, but it’s too slow. (Unfortunately) sometimes it’s the only thing we can do. INLA Integrated Nested Laplace Approximations Deterministic rather than stochastic algorithm, like MCMC. Specially designed for latent Gaussian models. Accurate results in a small fraction of computational time, when compared to MCMC. 27 / 140
  72. 72. Comparing results with MCMC When comparing the results of R-INLA with MCMC, it is important to use the same model. Here we have compared the EPIL example results with those obtained using JAGS via the rjags package 28 / 140
  73. 73. Comparing results with MCMC When comparing the results of R-INLA with MCMC, it is important to use the same model. Here we have compared the EPIL example results with those obtained using JAGS via the rjags package 28 / 140
  74. 74. Age 1.0 Density 0.5 3 0 0.0 1 2 Density 4 5 1.5 Intercept, 0.125 minutes 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 log(tau.b) 29 / 140
  75. 75. Age 0.8 Density 0.4 3 0 0.0 1 2 Density 4 5 1.2 6 Intercept, 0.25 minutes 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
  76. 76. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 0.5 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
  77. 77. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 1 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 0.0 0.5 1.0 Density 1.0 0.5 0.0 Density 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
  78. 78. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 2 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −1.0 −0.5 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 1.0 Density 0.0 0.5 0.5 0.0 Density 1.0 1.5 1.5 a0 0.5 1.0 1.5 log(tau.b1) 2.0 2.5 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
  79. 79. Age 0.8 Density 0.4 3 2 0 0.0 1 Density 4 5 1.2 Intercept, 4 minutes 1.3 1.4 1.5 1.6 1.7 1.8 1.9 −1.0 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 1.0 Density 0.0 0.5 0.5 0.0 Density 1.0 1.5 1.5 a0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
  80. 80. Density 4 3 Density 2 1 0 1.3 1.4 1.5 1.6 1.7 1.8 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Age 5 Intercept, 8 minutes 1.9 −1.0 0.0 0.5 1.0 log(tau.Ind) 1.5 2.0 alpha.Age log(tau.Rand) 1.0 Density 0.0 0.5 0.5 0.0 Density 1.0 1.5 1.5 a0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
  81. 81. Density 4 3 Density 2 1 0 1.3 1.4 1.5 1.6 1.7 1.8 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 16 minutes 1.9 −1.5 −0.5 0.5 1.0 1.5 2.0 alpha.Age log(tau.Ind) log(tau.Rand) 1.0 Density 0.5 0.8 0.0 0.4 0.0 Density 1.2 1.5 a0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 log(tau.b) 29 / 140
  82. 82. Density 4 3 Density 2 1 0 1.3 1.4 1.5 1.6 1.7 1.8 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 32 minutes 1.9 −1.5 −0.5 0.5 1.0 1.5 2.0 alpha.Age log(tau.Ind) log(tau.Rand) 1.0 Density 0.5 0.8 0.0 0.4 0.0 Density 1.2 1.5 a0 0.0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 log(tau.b) 29 / 140
  83. 83. Density 4 3 Density 2 1 0 1.2 1.4 1.6 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 64 minutes 1.8 −1 0 1 log(tau.Ind) log(tau.Rand) 0.5 1.0 Density 1.2 0.8 0.0 0.4 0.0 Density 2 alpha.Age 1.5 a0 0.0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 log(tau.b) 29 / 140
  84. 84. Density 4 3 Density 2 1 0 1.2 1.4 1.6 1.8 0.0 0.2 0.4 0.6 0.8 1.0 Age 5 Intercept, 120 minutes 2.0 −1 0 1 log(tau.Ind) log(tau.Rand) 0.5 1.0 Density 1.2 0.8 0.0 0.4 0.0 Density 2 alpha.Age 1.5 a0 0.0 0.5 1.0 1.5 2.0 log(tau.b1) 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 log(tau.b) 29 / 140
  85. 85. Outline Latent Gaussian models Are latent Gaussian models important? Bayesian computing INLA method 30 / 140
  86. 86. Main aim Posterior π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I Compute the posterior marginals: π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 31 / 140
  87. 87. Main aim Posterior π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ) i∈I Compute the posterior marginals: π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 31 / 140
  88. 88. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
  89. 89. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
  90. 90. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
  91. 91. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
  92. 92. Tasks 1. Build an approximation to π(θ|y): π(θ |y) 2. Build an approximation to π(xi |θ, y): π(xi |θ, y) π(xi |y) = π(θ|y) π(xi |θ, y) dθ π(θj |y) = π(θ|y) dθ −j 3. Do the integration wrt θ numerically. 32 / 140
  93. 93. Task 1: π(θ|y) The Laplace approximation for π(θ|y) is π(θ|y) = ∝ ≈ π(x, θ|y) π(x|θ, y) π(θ) π(x|θ) π(y|x) π(x|θ, y) π(θ) π(x|θ) π(y|x, θ) πG (x|θ, y) x=x∗ (θ) where πG (x|θ, y) is the Gaussian approximation of π(x|θ, y) and x∗ (θ) is the mode. 33 / 140
  94. 94. The GMRF-approximation 1 π(x|y) ∝ exp − xT Qx + 2 log π(yi |xi ) i 1 ≈ exp − (x − µ)T (Q + diag(ci ))(x − µ) 2 = π(x|y) Constructed as follows: Locate the mode x∗ Expand to second order Markov and computational properties are preserved 34 / 140
  95. 95. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
  96. 96. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
  97. 97. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
  98. 98. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
  99. 99. Remarks The Laplace approximation π(θ|y) turn out to be accurate: x|y, θ appears almost Gaussian in most cases, as x is a priori Gaussian. y is typically not very informative. Observational model is usually ‘well-behaved’. Note: π(θ|y) itself does not look Gaussian! 35 / 140
  100. 100. Task 2: π(xi |y, θ) This task is more challenging, since dimension of x, n is large and there are potential n marginals to compute, or at least O(n). Here we present three options: 1. Gaussian approximation 2. Laplace approximation 3. Simplified Laplace approximation There is a trade-off between accuracy and complexity. 36 / 140
  101. 101. Task 2: π(xi |y, θ) This task is more challenging, since dimension of x, n is large and there are potential n marginals to compute, or at least O(n). Here we present three options: 1. Gaussian approximation 2. Laplace approximation 3. Simplified Laplace approximation There is a trade-off between accuracy and complexity. 36 / 140
  102. 102. π(xi |y, θ) - 1. Gaussian approximation An obvious simple and fast alternative, is to use the GMRF-approximation πG (x|y, θ) π(xi |θ, y) = N (xi ; µ(θ), σ 2 (θ)) It is the fastest option, only need to compute the diagonal of Q(θ)−1 . Can present errors in location and asymmetry. 37 / 140
  103. 103. π(xi |y, θ) - 1. Gaussian approximation An obvious simple and fast alternative, is to use the GMRF-approximation πG (x|y, θ) π(xi |θ, y) = N (xi ; µ(θ), σ 2 (θ)) It is the fastest option, only need to compute the diagonal of Q(θ)−1 . Can present errors in location and asymmetry. 37 / 140
  104. 104. π(xi |y, θ) - 2. Laplace approximation The Laplace approximation: π(xi | y, θ) ≈ π(x, θ|y) πGG (x−i |xi , y, θ) x−i =x∗ (xi ,θ) −i Again, approximation is very good, as x−i |xi , θ is ‘almost Gaussian’, but it is expensive. In order to get the n marginals: perform n optimizations, and n factorizations of n − 1 × n − 1 matrices. 38 / 140
  105. 105. π(xi |y, θ) - 2. Laplace approximation The Laplace approximation: π(xi | y, θ) ≈ π(x, θ|y) πGG (x−i |xi , y, θ) x−i =x∗ (xi ,θ) −i Again, approximation is very good, as x−i |xi , θ is ‘almost Gaussian’, but it is expensive. In order to get the n marginals: perform n optimizations, and n factorizations of n − 1 × n − 1 matrices. 38 / 140
  106. 106. π(xi |y, θ) - 2. Laplace approximation The Laplace approximation: π(xi | y, θ) ≈ π(x, θ|y) πGG (x−i |xi , y, θ) x−i =x∗ (xi ,θ) −i Again, approximation is very good, as x−i |xi , θ is ‘almost Gaussian’, but it is expensive. In order to get the n marginals: perform n optimizations, and n factorizations of n − 1 × n − 1 matrices. 38 / 140
  107. 107. π(xi |y, θ) - 3. Simplified Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) sufficiently accurate for most applications 39 / 140
  108. 108. π(xi |y, θ) - 3. Simplified Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) sufficiently accurate for most applications 39 / 140
  109. 109. π(xi |y, θ) - 3. Simplified Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) sufficiently accurate for most applications 39 / 140
  110. 110. π(xi |y, θ) - 3. Simplified Laplace approximation Taylor expansions of the LA for π(xi |θ, y): computational much faster correct the Gaussian approximation for error in shift and skewness 1 1 log π(xi |θ, y) = − xi2 + bxi + d xi3 + · · · 2 6 Fit a skew-Normal density 2φ(x)Φ(ax) sufficiently accurate for most applications 39 / 140
  111. 111. Task 3: Numerical integration wrt θ Now that we know how to compute: π(θ|y) - Laplace approximation 1. Gaussian π(xi |θ, y) - 2. Laplace 3. Simplified Laplace Lets see how INLA works 40 / 140
  112. 112. Task 3: Numerical integration wrt θ Now that we know how to compute: π(θ|y) - Laplace approximation 1. Gaussian π(xi |θ, y) - 2. Laplace 3. Simplified Laplace Lets see how INLA works 40 / 140
  113. 113. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
  114. 114. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
  115. 115. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
  116. 116. The integrated nested Laplace approximation (INLA) I Explore π(θ|y) Locate the mode Use the Hessian to construct new variables Grid-search 41 / 140
  117. 117. The integrated nested Laplace approximation (INLA) II Step II For each θ j For each i, evaluate the Laplace approximation for selected values of xi Build a Skew-Normal or log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent the conditional marginal density. 42 / 140
  118. 118. The integrated nested Laplace approximation (INLA) II Step II For each θ j For each i, evaluate the Laplace approximation for selected values of xi Build a Skew-Normal or log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent the conditional marginal density. 42 / 140
  119. 119. The integrated nested Laplace approximation (INLA) II Step II For each θ j For each i, evaluate the Laplace approximation for selected values of xi Build a Skew-Normal or log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent the conditional marginal density. 42 / 140
  120. 120. The integrated nested Laplace approximation (INLA) III Step III Sum out θ j For each i, sum out θ π(xi | y) ∝ π(xi | y, θ j ) × π(θ j | y) j Build a log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent π(xi | y). 43 / 140
  121. 121. The integrated nested Laplace approximation (INLA) III Step III Sum out θ j For each i, sum out θ π(xi | y) ∝ π(xi | y, θ j ) × π(θ j | y) j Build a log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent π(xi | y). 43 / 140
  122. 122. The integrated nested Laplace approximation (INLA) III Step III Sum out θ j For each i, sum out θ π(xi | y) ∝ π(xi | y, θ j ) × π(θ j | y) j Build a log-spline corrected Gaussian N (xi ; µi , σi2 ) × exp(spline) to represent π(xi | y). 43 / 140
  123. 123. Computing posterior marginals for θj (I) Main idea Use the integration-points and build an interpolant Use numerical integration on that interpolant 44 / 140
  124. 124. Computing posterior marginals for θj (I) Main idea Use the integration-points and build an interpolant Use numerical integration on that interpolant 44 / 140
  125. 125. How can we assess the error in the approximations? Tool 1: Compare a sequence of improved approximations 1. Gaussian approximation 2. Simplified Laplace 3. Laplace 45 / 140
  126. 126. How can we assess the error in the approximations? Tool 2: Estimate the “effective” number of parameters as defined in the Deviance Information Criteria: pD (θ) = D(x; θ) − D(x; θ) and compare this with the number of observations. Low ratio is good. This criteria has theoretical justification. 46 / 140
  127. 127. Parte II R-INLA package 47 / 140
  128. 128. Outline INLA implementation R-INLA - Model specification Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 48 / 140
  129. 129. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
  130. 130. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library Basic library written in C for fast computations for GMRFs. The inla-program The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
  131. 131. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program Define latent Gaussian models and interface with the GMRFLib-library Models are defined using .ini-files inla-program write all the results (E/Var/marginals) to files The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
  132. 132. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program The INLA package for R R-interface to the inla-program. (That’s why its not on CRAN.) Convert “formula”-statements into “.ini”-file definitions Run inla-program Get results back to R Happily, the R package is all we need to learn!!! 49 / 140
  133. 133. Implementing INLA All procedures required to perform INLA need to be carefully implemented to achieve a good speed; easy to implement a slow version of INLA. The GMRFLib-library The inla-program The INLA package for R Happily, the R package is all we need to learn!!! 49 / 140
  134. 134. The INLA package for R Produces: 1. Data Frame formula − Input files − ini file Input Runs the INLA package 2. inla program Output A R object of type list can get summary, plots etc. 3. Collects results 50 / 140
  135. 135. R-INLA Visit the www-site www.r-inla.org and follow the instructions. www-site contains source-code, examples, reports +++ The first time do > source("http://www.math.ntnu.no/inla/givmeINLA.R") Later, you can upgrade the package doing > inla.upgrade() or if you want the test-version, which you want, > inla.upgrade(testing=TRUE) Available for Linux, Windows and Mac 51 / 140
  136. 136. R-INLA Visit the www-site www.r-inla.org and follow the instructions. www-site contains source-code, examples, reports +++ The first time do > source("http://www.math.ntnu.no/inla/givmeINLA.R") Later, you can upgrade the package doing > inla.upgrade() or if you want the test-version, which you want, > inla.upgrade(testing=TRUE) Available for Linux, Windows and Mac 51 / 140
  137. 137. R-INLA Visit the www-site www.r-inla.org and follow the instructions. www-site contains source-code, examples, reports +++ The first time do > source("http://www.math.ntnu.no/inla/givmeINLA.R") Later, you can upgrade the package doing > inla.upgrade() or if you want the test-version, which you want, > inla.upgrade(testing=TRUE) Available for Linux, Windows and Mac 51 / 140
  138. 138. Outline INLA implementation R-INLA - Model specification Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 52 / 140
  139. 139. The structure of an R program using INLA There are essentially three parts to an INLA program: 1. The data organization. 2. The formula - notation inherited from R’s native glm function. 3. The call to the INLA program. 53 / 140
  140. 140. The inla function This is all that’s needed for a basic call > result <- inla( formula = y ~ 1 + x, # This describes your latent # field family = "gaussian", # The likelihood distribution. data = data.frame(y,x) # A list or dataframe ) 54 / 140
  141. 141. The simplest case: Linear regression n = 100 x = sort(runif(n)) y = 1 + x + rnorm(n, sd = 0.1) plot(x,y) formula = y ~ 1 + x result = inla(formula, data = data.frame(x,y), family = "gaussian") summary(result) plot(result) 55 / 140
  142. 142. Call: c("inla(formula = formula, family = "gaussian", data = data.frame(x, ", Time used: Pre-processing 0.08050394 Running inla Post-processing 0.03020334 0.01916695 " y))") Total 0.12987423 Fixed effects: mean sd 0.025quant 0.5quant 0.975quant kld (Intercept) 0.9690533 0.01849785 0.9327319 0.9690531 1.005387 0 x 1.0426582 0.03126996 0.9812582 1.0426580 1.104079 0 The model has no random effects Model hyperparameters: mean sd 0.025quant 0.5quant Precision for the Gaussian observations 127.45 18.10 95.14 126.37 0.975quant Precision for the Gaussian observations 166.11 Expected number of effective parameters(std dev): 2.209(0.02362) Number of equivalent replicates : 45.27 Marginal Likelihood: 88.01 56 / 140
  143. 143. Likelihood functions - family argument result = inla(formula, data = data.frame(x,y), family = "gaussian") “binomial” “coxph” “Exponential” “gaussian” “gev” “laplace” “sn”(Skew Normal) “stochvol”, ”stochvol.nig”, ”stochvol.t” “T” “weibull” Many others: go to http://r-inla.org/ 57 / 140
  144. 144. Likelihood functions - family argument result = inla(formula, data = data.frame(x,y), family = "gaussian") “binomial” “coxph” “Exponential” “gaussian” “gev” “laplace” “sn”(Skew Normal) “stochvol”, ”stochvol.nig”, ”stochvol.t” “T” “weibull” Many others: go to http://r-inla.org/ 57 / 140
  145. 145. A more general model Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) where x1 , x2 βi x3 are covariates, linear effect −1 ∼ N (0, τ1 ) can be the index for spatial effect, random effect, etc {f1 , f2 , . . . } ∼ N (0, Qf−1 (τ2 )) 58 / 140
  146. 146. A more general model Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) where x1 , x2 βi x3 are covariates, linear effect −1 ∼ N (0, τ1 ) can be the index for spatial effect, random effect, etc {f1 , f2 , . . . } ∼ N (0, Qf−1 (τ2 )) 58 / 140
  147. 147. A more general model (cont.) Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) > formula = y ∼ x1 + x2 + f(x3, ...)   y1 y2    y=. . .   η1 η2    η=. . . g −→ yn ηn         η1 1 x11 x21 η2  1 x12  x22          η =  .  = β0 ∗  .  + β1 ∗  .  + β2 ∗  .  + . . .  . .  .  .  . ηn 1 x1n x2n   fx31 fx3   2  .   .  . fx3n 59 / 140
  148. 148. A more general model (cont.) Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) > formula = y ∼ x1 + x2 + f(x3, ...)   y1 y2    y=. . .   η1 η2    η=. . . g −→ yn ηn         η1 1 x11 x21 η2  1 x12  x22          η =  .  = β0 ∗  .  + β1 ∗  .  + β2 ∗  .  + . . .  . .  .  .  . ηn 1 x1n x2n   fx31 fx3   2  .   .  . fx3n 59 / 140
  149. 149. A more general model (cont.) Assume the following model: y ∼ π(y |η) η = g (λ) = β0 + β1 x1 + β2 x2 + f (x3 ) > formula = y ∼ x1 + x2 + f(x3, ...)   y1 y2    y=. . .   η1 η2    η=. . . g −→ yn ηn         η1 1 x11 x21 η2  1 x12  x22          η =  .  = β0 ∗  .  + β1 ∗  .  + β2 ∗  .  + . . .  . .  .  .  . ηn 1 x1n x2n   fx31 fx3   2  .   .  . fx3n 59 / 140
  150. 150. Model specification - INLA package The model is specified in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is fitted automatically! Use -1 in your formula to avoid it. The fixed effects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random effect specifications. Some models iid, iid1d, ii2d, iid3d: random effects rw1, rw2, ar1: smooth effect of covariates or time effect seasonal: seasonal effect besag: spatial effect (CAR model) generic: user defined precision matrix 60 / 140
  151. 151. Model specification - INLA package The model is specified in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is fitted automatically! Use -1 in your formula to avoid it. The fixed effects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random effect specifications. Some models iid, iid1d, ii2d, iid3d: random effects rw1, rw2, ar1: smooth effect of covariates or time effect seasonal: seasonal effect besag: spatial effect (CAR model) generic: user defined precision matrix 60 / 140
  152. 152. Model specification - INLA package The model is specified in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is fitted automatically! Use -1 in your formula to avoid it. The fixed effects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random effect specifications. Some models iid, iid1d, ii2d, iid3d: random effects rw1, rw2, ar1: smooth effect of covariates or time effect seasonal: seasonal effect besag: spatial effect (CAR model) generic: user defined precision matrix 60 / 140
  153. 153. Model specification - INLA package The model is specified in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is fitted automatically! Use -1 in your formula to avoid it. The fixed effects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random effect specifications. Some models iid, iid1d, ii2d, iid3d: random effects rw1, rw2, ar1: smooth effect of covariates or time effect seasonal: seasonal effect besag: spatial effect (CAR model) generic: user defined precision matrix 60 / 140
  154. 154. Model specification - INLA package The model is specified in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is fitted automatically! Use -1 in your formula to avoid it. The fixed effects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random effect specifications. Some models iid, iid1d, ii2d, iid3d: random effects rw1, rw2, ar1: smooth effect of covariates or time effect seasonal: seasonal effect besag: spatial effect (CAR model) generic: user defined precision matrix 60 / 140
  155. 155. Model specification - INLA package The model is specified in R through a formula, similar to glm: > formula = y ∼ x1 + x2 + f(x3, ...) y is the name of your response variable in your data frame. An intercept is fitted automatically! Use -1 in your formula to avoid it. The fixed effects (β0 , β1 and β2 ) are taken as i.i.d. normal with zero mean and small precision. (This can be changed) The f() function contains the random effect specifications. Some models iid, iid1d, ii2d, iid3d: random effects rw1, rw2, ar1: smooth effect of covariates or time effect seasonal: seasonal effect besag: spatial effect (CAR model) generic: user defined precision matrix 60 / 140
  156. 156. Specifying random effects Random effects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random effect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
  157. 157. Specifying random effects Random effects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random effect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
  158. 158. Specifying random effects Random effects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random effect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
  159. 159. Specifying random effects Random effects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random effect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
  160. 160. Specifying random effects Random effects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random effect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
  161. 161. Specifying random effects Random effects are added to the formula through the function f(name, model="...", hyper = ..., replicate = ..., constr = FALSE, cyclic = FALSE) name - the name of the random effect. Also refers to the values in data which are used for various things, usually indexes, e.g. for space or time. model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc. hyper - specify the prior on the hyperparameters constr - Sum to zero constraint? cyclic - Are you cyclic? (RW1, RW2 and AR1) The are more advanced options, we see later. 61 / 140
  162. 162. Outline INLA implementation R-INLA - Model specification Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 62 / 140
  163. 163. EPIL example Seizure counts in a randomized trial of anti-convulsant therapy in epilepsy. From WinBUGS manual. Patient 1 2 .... 59 y1 5 3 y2 3 5 y3 3 3 y4 3 3 Trt 0 0 Base 11 11 Age 31 30 1 4 3 2 1 12 37 63 / 140
  164. 164. EPIL example (cont.) Mixed model with repeated Poisson counts yjk ∼ Poisson(µjk ); j = 1, . . . , 59; k = 1, . . . , 4 log (µjk ) = α0 + α1 log(Basej /4) + α2 Trtj +α3 Trtj log(Basej /4) + α4 Agej + α5 V 4 +Indj + βjk αi Indj βj k ∼ N (0, τα ) ∼ N (0, τInd ) ∼ N (0, τβ ) τα known τInd ∼ Gamma(a1 , b1 ) τβ ∼ Gamma(a2 , b2 ) 64 / 140
  165. 165. EPIL example (cont.) The Epil data frame: y 5 3 . . . Trt 0 0 Base 11 11 Age 31 31 V4 0 0 rand 1 2 Ind 1 1 Specifying the model: formula = y ∼ log(Base/4) + Trt + I(Trt * log(Base/4)) + log(Age) + V4 + f(Ind, model = "iid") + f(rand, model="iid")    1 1       η=  = β0 ∗  .  + . . . +   . . 1 η4∗59 η1 η2 . . .   Ind  f1 f Ind  1   . +  .  . Ind f59  Rand  f1 f Rand   2  .   .  . Ind f4∗59 65 / 140
  166. 166. EPIL example (cont.) The Epil data frame: y 5 3 . . . Trt 0 0 Base 11 11 Age 31 31 V4 0 0 rand 1 2 Ind 1 1 Specifying the model: formula = y ∼ log(Base/4) + Trt + I(Trt * log(Base/4)) + log(Age) + V4 + f(Ind, model = "iid") + f(rand, model="iid")    1 1       η=  = β0 ∗  .  + . . . +   . . 1 η4∗59 η1 η2 . . .   Ind  f1 f Ind  1   . +  .  . Ind f59  Rand  f1 f Rand   2  .   .  . Ind f4∗59 65 / 140
  167. 167. EPIL example (cont.) The Epil data frame: y 5 3 . . . Trt 0 0 Base 11 11 Age 31 31 V4 0 0 rand 1 2 Ind 1 1 Specifying the model: formula = y ∼ log(Base/4) + Trt + I(Trt * log(Base/4)) + log(Age) + V4 + f(Ind, model = "iid") + f(rand, model="iid")    1 1       η=  = β0 ∗  .  + . . . +   . . 1 η4∗59 η1 η2 . . .   Ind  f1 f Ind  1   . +  .  . Ind f59  Rand  f1 f Rand   2  .   .  . Ind f4∗59 65 / 140
  168. 168. data(Epil) my.center = function(x) (x - mean(x)) Epil$CTrt Epil$ClBase4 Epil$CV4 Epil$ClAge = = = = my.center(Epil$Trt) my.center(log(Epil$Base/4)) my.center(Epil$V4) my.center(log(Epil$Age)) formula = y ~ ClBase4*CTrt + ClAge + CV4 + f(Ind, model="iid") + f(rand, model="iid") result = inla(formula,family="poisson", data = Epil) summary(result) plot(result) 66 / 140
  169. 169. 0 1 2 3 4 5 Epil-example from Win/Open-BUGS 1.2 1.4 1.6 1.8 2.0 Marginals for α0 67 / 140
  170. 170. 0.0 0.1 0.2 0.3 Epil-example from Win/Open-BUGS 0 5 10 15 Marginals for τβ 67 / 140
  171. 171. EPIL example (cont.) Access results - Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld) result$summary.fixed result$summary.random$Ind result$summary.random$rand result$summary.hyperpar - Post. marginals (matrix with x- and y- axis) result$marginals.fixed result$marginals.random$Ind result$marginals.random$rand result$marginals.hyperpar 68 / 140
  172. 172. EPIL example (cont.) Access results - Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld) result$summary.fixed result$summary.random$Ind result$summary.random$rand result$summary.hyperpar - Post. marginals (matrix with x- and y- axis) result$marginals.fixed result$marginals.random$Ind result$marginals.random$rand result$marginals.hyperpar 68 / 140
  173. 173. 0.0 0.5 1.0 1.5 2.0 Smoothing binary times series 0 100 200 300 Time Number of days in Tokyo with rainfall above 1 mm in 1983-84. We want to estimate the probability of rain pt for calendar day t = 1, . . . , 366 69 / 140
  174. 174. Smoothing binary times series Model with time series component yt pt ηt f τ ∼ = = = ∼ Binomial(nt , pt ); t = 1, . . . , 366 exp(ηt ) 1+exp(ηt ) f (t) {f1 , . . . , f366 } ∼ cyclic RW2(τ ) Gamma(1, 0.0001) 70 / 140
  175. 175. Smoothing binary time series The Tokyo data frame: y 0 0 1 . . . n 2 2 2 time 1 2 3 71 / 140
  176. 176. Smoothing binary time series The Tokyo data frame: y 0 0 1 . . . n 2 2 2 time 1 2 3 Specifying the model: formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1 71 / 140
  177. 177. Smoothing binary time series The Tokyo data frame: y 0 0 1 . . . n 2 2 2 time 1 2 3 Specifying the model: formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1  time  f1  f time     2  η=  =  .    .   . time f366 η366  η1 η2 . . .  71 / 140
  178. 178. data(Tokyo) formula = y ~ f(time, model="rw2", cyclic=TRUE) - 1 result = inla(formula, family="binomial", Ntrials=n, data=Tokyo) 72 / 140
  179. 179. Posterior for temporal effect -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 time 0 100 200 300 PostMean 0.025% 0.5% 0.975% 73 / 140
  180. 180. Posterior for precision 0e+00 1e-05 2e-05 3e-05 4e-05 5e-05 6e-05 7e-05 PostDens [Precision for time] 0 10000 20000 30000 40000 50000 60000 74 / 140
  181. 181. Disease mapping in Germany Larynx cancer mortality counts are observed in the 544 district of Germany from 1986 to 1990 and level of smoking consumption (100 possible values). 2.55 97 2.23 85.2 1.91 73.41 1.59 61.61 1.27 49.82 0.95 38.02 0.63 26.22 75 / 140
  182. 182. yi , i = 1, . . . , 544 counts of cancer mortality in Region i Ei , i = 1, . . . , 544 known variable accounting for demographic variation in Region i ci , i = 1, . . . , 544 level of smoking consumption registered in Region i 2.55 97 2.23 85.2 1.91 73.41 1.59 61.61 1.27 49.82 0.95 38.02 0.63 26.22 76 / 140
  183. 183. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth effect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial effect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random effect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
  184. 184. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth effect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial effect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random effect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
  185. 185. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth effect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial effect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random effect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
  186. 186. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth effect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial effect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random effect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
  187. 187. The model yi ηi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544 = µ + f (ci ) + fs (si ) + fu (si ) where: f (ci ) is a smooth effect of the covariate f = {f1 , . . . , f100 } ∼ RW2(τf ) fs (si ) is a spatial effect modeled as an intrinsic GMRF fs (s)|fs (s ), s = s , λs ∼ N ( 1 ns fs (s ), s∼s τfs ) ns fu (si ) is a random effect f u = {fu (s1 ), . . . , fu (s544 )} ∼ N(0, τfu I) µ is an intercept term µ ∼ N (0, 0.0001) 77 / 140
  188. 188. For identifiably we define a sum-to-zero constraint for all intrinsic models, so s fs (s) = 0 = 0 i fi 78 / 140
  189. 189. The Germany data frame: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 The model is: ηi = µ + f (ci ) + fs (si ) + fu (si ) The data set has to contain one separate column for each term specified through f() so in this case we have to add one column. > Germany = cbind(Germany, region.struct=Germany$region) We also need the graph file where the neighborhood structure is specified germany.graph 79 / 140
  190. 190. The Germany data frame: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 The model is: ηi = µ + f (ci ) + fs (si ) + fu (si ) The data set has to contain one separate column for each term specified through f() so in this case we have to add one column. > Germany = cbind(Germany, region.struct=Germany$region) We also need the graph file where the neighborhood structure is specified germany.graph 79 / 140
  191. 191. The Germany data frame: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 The model is: ηi = µ + f (ci ) + fs (si ) + fu (si ) The data set has to contain one separate column for each term specified through f() so in this case we have to add one column. > Germany = cbind(Germany, region.struct=Germany$region) We also need the graph file where the neighborhood structure is specified germany.graph 79 / 140
  192. 192. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) 80 / 140
  193. 193. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) The sum-to-zero constraint is default in the inla function for all intrinsic models. 80 / 140
  194. 194. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) The sum-to-zero constraint is default in the inla function for all intrinsic models. 80 / 140
  195. 195. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) 80 / 140
  196. 196. The new data set is: region 0 1 E 7.965008 22.836219 Y 8 22 x 56 65 region.struct 0 1 Then the formula is formula <- Y ∼ f(region.struct,model="besag",graph="germany.graph")+ f(x,model="rw2")+f(region) The location of the graph file has to be provided here (the graph file cannot be loaded in R) 80 / 140
  197. 197. The graph file 544 1 The germany.graph file: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identifier for the node Number of neighbors Identifiers for the neighbors 81 / 140
  198. 198. The graph file 544 1 The germany.graph file: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identifier for the node Number of neighbors Identifiers for the neighbors 81 / 140
  199. 199. The graph file 544 1 The germany.graph file: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identifier for the node Number of neighbors Identifiers for the neighbors 81 / 140
  200. 200. The graph file 544 1 The germany.graph file: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identifier for the node Number of neighbors Identifiers for the neighbors 81 / 140
  201. 201. The graph file 544 1 The germany.graph file: 2 3 . . . 1 2 4 12 10 6 11 8 15 387 Total number of nodes in the graph Identifier for the node Number of neighbors Identifiers for the neighbors 81 / 140
  202. 202. data(Germany) g = system.file("demodata/germany.graph", package="INLA") source(system.file("demodata/Bym-map.R", package="INLA")) Germany = cbind(Germany, region.struct=Germany$region) # standard BYM model formula1 = Y ~ f(region.struct,model="besag",graph=g) + f(region,model="iid") # with linear covariate formula2 = Y ~ f(region.struct,model="besag",graph=g) + f(region,model="iid") + x # with smooth covariate formula3 = Y ~ f(region.struct,model="besag",graph=g) + f(region,model="iid") + f(x, model="rw2") 82 / 140
  203. 203. result1 = inla(formula1,family="poisson",data=Germany,E=E, control.compute=list(dic=TRUE)) result2 = inla(formula2,family="poisson",data=Germany,E=E, control.compute=list(dic=TRUE)) result3 = inla(formula3,family="poisson",data=Germany,E=E, control.compute=list(dic=TRUE)) 83 / 140
  204. 204. Other graph specification - It is also possible to define the graph structure of your model using: A symmetric (dense or sparse) matrix, where the non-zero pattern of the matrix defines the graph. A inla.graph object. See FAQ on the webpage for more information. 84 / 140
  205. 205. Outline INLA implementation R-INLA - Model specification Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 85 / 140
  206. 206. Model evaluation Deviance Information Criterion (DIC): result = inla(..., control.compute = list(dic = TRUE)) result$dic$dic Conditional predictive ordinate (CPO) and probability integral transform (PIT): CPOi = π(yi |y−i ) PITi = Prob(Yi ≤ yiobs |y−i ) result = inla(..., control.compute = list(cpo = TRUE)) result$cpo$cpo result$cpo$dic 86 / 140
  207. 207. Outline INLA implementation R-INLA - Model specification Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 87 / 140
  208. 208. Controlling θ We often need to set our own priors and using our own parameters in these. These can be set in two ways Old style using prior=.., param=..., initial=..., fixed=... New style using hyper = list(prec = list(initial=2, fixed=TRUE, ....)) The old-style is there for backward-compatibility only. The two styles can also be mixed. 88 / 140
  209. 209. Controlling θ We often need to set our own priors and using our own parameters in these. These can be set in two ways Old style using prior=.., param=..., initial=..., fixed=... New style using hyper = list(prec = list(initial=2, fixed=TRUE, ....)) The old-style is there for backward-compatibility only. The two styles can also be mixed. 88 / 140
  210. 210. Controlling θ We often need to set our own priors and using our own parameters in these. These can be set in two ways Old style using prior=.., param=..., initial=..., fixed=... New style using hyper = list(prec = list(initial=2, fixed=TRUE, ....)) The old-style is there for backward-compatibility only. The two styles can also be mixed. 88 / 140
  211. 211. Example - New style hyper = list( prec = list( prior = param = initial fixed = ) ) "loggamma", c(2,0.1), = 3, FALSE formula = y ~ f(i, model="iid", hyper = hyper) + ... - Old style formula = y ~ f(i, model="iid", prior = "loggamma", param = c(2,0.1), inital = 3, fixed = FALSE) + ... 89 / 140
  212. 212. Internal and external scale Hyperparameters, like the precision τ is represented internally using a “good” transformation, like θ1 = log(τ ) Initial values are given in the internal scale the to.theta and from.theta functions can be used to map between the external and internal scale. 90 / 140
  213. 213. Internal and external scale Hyperparameters, like the precision τ is represented internally using a “good” transformation, like θ1 = log(τ ) Initial values are given in the internal scale the to.theta and from.theta functions can be used to map between the external and internal scale. 90 / 140
  214. 214. Internal and external scale Hyperparameters, like the precision τ is represented internally using a “good” transformation, like θ1 = log(τ ) Initial values are given in the internal scale the to.theta and from.theta functions can be used to map between the external and internal scale. 90 / 140
  215. 215. Example: AR1 model hyper theta1 name short.name prior param initial fixed to.theta from.theta log precision prec loggamma 1 5e-05 4 FALSE name short.name prior param initial fixed to.theta from.theta logit lag one correlation rho normal 0 0.15 2 FALSE theta2 constr nrow.ncol augmented aug.factor aug.constr n.div.by n.required set.default.values pdf FALSE FALSE FALSE 1 FALSE FALSE ar1 91 / 140
  216. 216. Outline INLA implementation R-INLA - Model specification Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 92 / 140
  217. 217. Feature: replicate “replicate” generates iid replicates from the same model with the same hyperparameters. If x | θ ∼ AR(1), then nrep=3, makes x = (x1 , x2 , x3 ) with mutually independent xi ’s from AR(1) with the same θ Most f()-models can be replicated 93 / 140
  218. 218. Example: replicate n=100 x1 = arima.sim(n, model=list(ar=0.9)) + 1 x2 = arima.sim(n, model=list(ar=0.9)) - 1 y1 = rpois(n,exp(x1)) y2 = rpois(n,exp(x2)) y = c(y1,y2) i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1 result = inla(formula, family = "poisson", data = data.frame(y=y,i=i,r=r)) 94 / 140
  219. 219. Example: replicate i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1      i     f1,1 0 η1,1 1 y1,1 .  .   .  .  .  .  .   .  .  .  .  .   i.  .  .  0 fn,1  1 yn,1  g ηn,1       i      y1,2  −→ η1,2  = f1,2  + β0,1 ∗ 0 + β0,2 ∗ 1           .  .   .  .  .  .  .   .  .  .  . . . . . i 1 ηn,2 0 yn,2 fn,2  95 / 140
  220. 220. Feature: More than one family Every observation could have its own likelihood! Response is a matrix or list Each “column” defines a separate “family” Each “family” has its own hyperparameters 96 / 140
  221. 221. n=100 phi = 0.9 x1 = 1 + arima.sim(n, model=list(ar=phi)) x2 = 0.5 + arima.sim(n, model=list(ar=phi)) y1 = rbinom(n,size=1, prob=exp(x1)/(1+exp(x1))) y2 = rpois(n,exp(x2)) y = matrix(NA, 2*n, 2) y[ 1:n, 1] = y1 y[n+1:n, 2] = y2 i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) Ntrials = c(rep(1,n), rep(NA,n)) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1 result = inla(formula, family = c("binomial", "poisson"), Ntrials = Ntrials, data = data.frame(y,i,r)) 97 / 140
  222. 222. y = matrix(NA, 2*n, 2) y[ 1:n, 1] = y1 y[n+1:n, 2] = y2 i = rep(1:n,2) r = rep(1:2,each=n) intercept = as.factor(r) Ntrials = c(rep(1,n), rep(NA,n)) formula = y ~ f(i, model="ar1", replicate=r) + intercept -1 result = inla(formula, family = c("binomial", "poisson"), Ntrials = Ntrials, data = data.frame(y,i,r)) y1,1  .  .  . yn,1   NA   .  . . NA   i         f1,1 NA η1,1 1 0  .  .   .  . .  .  .  .  .  . . . .   .    .    i   g ηn,1  0 1 NA  fn,1        i  −→ η1,2  = f1,2  + β0,1 ∗ 0 + β0,2 ∗ 1 y1,2             .  . . .   .  .   .  . . .   . . . . . i yn,2 ηn,2 0 1 fn,2 98 / 140
  223. 223. More than one family - More examples Some rather advanced examples on www.r-inla.org using this feature Preferential sampling, geostatistics (marked point process) Weibull-survival data and “longitudinal” data 99 / 140
  224. 224. Feature: copy The model formula = y ~ f(i, ...) + ... Only allow ONE element from each sub-model, to contribute to the linear predictor for each observation. Sometimes this is not sufficient. 100 / 140
  225. 225. Feature: copy Suppose ηi = ui + ui+1 + ... Then we can code this as formula = f(i, model="iid") + f(i.plus, copy="i") The copy-feature, creates an additional sub-model which is -close to the target. Many copies allowed Copy with unknown scaling (default scaling is fixed to 1).       η1 u1 u2 . . .  .  =  . +  .  . . . ηn un un 101 / 140
  226. 226. Feature: copy Suppose that ηi = ai + bi zi + .... where iid (ai , bi ) ∼ N2 (0, Σ) - Simulate data n = 100 Sigma = matrix(c(1, 0.8, 0.8, 1), 2, 2) z = runif(n) ab = rmvnorm(n, sigma = Sigma) a = ab[, 1] b = ab[, 2] eta = a + b * z s = 0.1 y = eta + rnorm(n, sd=s) 102 / 140
  227. 227. i = 1:n j = 1:n + n formula = y ~ f(i, model="iid2d", n = 2*n) + f(j, z, copy="i") -1 r = inla(formula, data = data.frame(y, i, j))   η1 . . . ηn =   a1 . . + . an    b1    . . .  b1 ∗ z1  .   .  .  bn ∗ zn bn 103 / 140
  228. 228. Feature: Linear-combinations Possible to extract extra information from the model through linear combinations of the latent field, say v = Bx for a k × n matrix B. 104 / 140
  229. 229. Feature: Linear-combinations (cont.) Two different approaches. 1. Most “correct” is to do the computations on the enlarged field x = (x, v) But this often lead to more dense precision matrix. 2. The second option is to compute these “offline”, as (conditionally on θ) Var (v1 ) = Var (bT x) ≈ bT Q−1 1 1 GMRFapprox b1 and E (v1 ) = b1 E (x) Approximate density of v1 with a Normal. 105 / 140
  230. 230. Feature: Linear-combinations (cont.) Two different approaches. 1. Most “correct” is to do the computations on the enlarged field x = (x, v) But this often lead to more dense precision matrix. 2. The second option is to compute these “offline”, as (conditionally on θ) Var (v1 ) = Var (bT x) ≈ bT Q−1 1 1 GMRFapprox b1 and E (v1 ) = b1 E (x) Approximate density of v1 with a Normal. 105 / 140
  231. 231. formula = y ~ ClBase4*CTrt + ClAge + CV4 + f(Ind, model="iid") + f(rand, model="iid") ## Now I want the posterior for ## ## 1) 2*CTrt - CV4 ## 2) Ind[2] - rand[2] ## lc1 = inla.make.lincomb( CTrt = 2, CV4 = -1) names(lc1) = "lc1" lc2 = inla.make.lincomb( Ind = c(NA,1), rand = c(NA,-1)) names(lc2) = "lc2" ## default is to derive the marginals from lc’s without changing the ## latent field result1 = inla(formula,family="poisson", data = Epil, lincomb = c(lc1, lc2)) ## but the lincombs can also be additionally included into the latent ## field for increased accurancy... result2 = inla(formula,family="poisson", data = Epil, lincomb = c(lc1, lc2), control.inla = list(lincomb.derived.only = FALSE)) 106 / 140
  232. 232. - Get the results result$summary.lincomb.derived result$marginals.lincomb.derived # results of the # default method result$summary.lincomb result$marginals.lincomb # alternative method - Posterior correlation matrix between all the linear combinations control.inla = list(lincomb.derived.correlation.matrix = TRUE) result$misc$lincomb.derived.correlation.matrix - Many linear combinations at once Use inla.make.lincombs() 107 / 140
  233. 233. A-matrix in the linear predictor (I) Usual formula η = ... and yi ∼ π(yi | ηi , ...) 108 / 140
  234. 234. A-matrix in the linear predictor (II) Extended formula η = ... η ∗ = Aη and yi ∼ π(yi | ηi∗ , ...) Implemented as A = matrix(...) A = sparseMatrix(...) result = inla(formula, ..., control.predictor = list(A = A)) 109 / 140
  235. 235. A-matrix in the linear predictor (II) Extended formula η = ... η ∗ = Aη and yi ∼ π(yi | ηi∗ , ...) Implemented as A = matrix(...) A = sparseMatrix(...) result = inla(formula, ..., control.predictor = list(A = A)) 109 / 140
  236. 236. A-matrix in the linear predictor (III) Can really simplify model-formulations Duplicate to some extent the “copy” feature Really useful for some models; the A-matrix need not to be a square matrix... 110 / 140
  237. 237. Feature: remote computing For large/huge models, its more convenient to run the computations on the remote (Linux/Mac) computational server inla(...., inla.call="remote") using ssh (and Cygwin on windows). 111 / 140
  238. 238. Control statements The control.xxx statements control various parts of the INLA program control.predictor A — The ”A matrix”or ”Observational Matrix”linking the latent field to the data. control.mode x,theta, result — Gives modes to INLA. restart = TRUE — Tells INLA to try to improve on the supplied mode control.compute dic, mlik, cpo — Compute measures of fit. control.inla strategy and int.strategy contain useful advanced features. Various other—see help! 112 / 140
  239. 239. Outline INLA implementation R-INLA - Model specification Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 113 / 140
  240. 240. Space-varying regression Number of (insurance-type) losses Nkt in 431 municipalities/regions of Norway in relation to one weather covariate Wkt . The likelihood is Nkt ∼ Poisson(Akt pkt ); k = 1, . . . , 431 t = 1, . . . , 10 The model for log pkt is: log pkt = β0 + βk Wkt where βk is the regression coefficients for each municipality. 114 / 140
  241. 241. Borrow strength.. Few losses is in each region; high variability in the estimates. Borrow strength, by letting {β1 , . . . , β431 } to be smooth in space: {β1 , . . . , β431 } ∼ CAR(τβ ) 115 / 140
  242. 242. Borrow strength.. Few losses is in each region; high variability in the estimates. Borrow strength, by letting {β1 , . . . , β431 } to be smooth in space: {β1 , . . . , β431 } ∼ CAR(τβ ) 115 / 140
  243. 243. 1 2 The data set: y 0 0 region 1 1 W 0.4 0.4 10 11 12 0 1 0 1 2 2 0.4 0.2 0.2 20 0 2 0.2 116 / 140
  244. 244. Second argument in f() is the weight which defaults to 1 ηi = ... + wi fi + ... is represented as f(i, w, ...) No need for sum-to-zero constraint! norway = read.table("norway.dat", header=TRUE) formula = y ~ 1 + f(region, W, model="besag", graph.file="norway.graph", constr=FALSE) result = inla(formula, family="poisson", data=norway) 117 / 140
  245. 245. Survival models patient 1 2 3 time 8,16 23,13 22,18 event 1,1 1,0 1,1 age 28,28 48,48 32,32 sex 0 1 0 Times of infection from the time of insertion of catheter on 38 kidney patients using portable dialysis equipment. 2 observation for each patient (38 patients). Each time can be an event (infection) or a censoring (no infection) 118 / 140
  246. 246. The Kidney data The Kidney data frame time 8 16 23 13 22 28 event 1 1 1 0 1 1 age 28 28 48 48 32 32 sex 0 0 1 1 0 0 ID 1 1 2 2 3 3 119 / 140
  247. 247. data(Kidney) formula = inla.surv(time,event) ~ age + sex + f(ID,model="iid") result1 = inla(formula, family="coxph", data=Kidney) result2 = inla(formula, family="weibull", data=Kidney) result3 = inla(formula, family="exponential", data=Kidney) 120 / 140
  248. 248. Outline INLA implementation R-INLA - Model specification Some examples Model evaluation Controlling hyperparameters and priors Some more advanced features More examples Extras 121 / 140
  249. 249. A toy-example using copy State-space model yt = xt + vt xt = 2xt−1 − xt−2 + wt Rewrite this as yt = xt + vt 0 = xt − 2xt−1 + xt−2 + wt and implement this as two families 1. Observations yt with precision Prec(vt ) 2. Observations 0 with precision Prec(wt ), or Prec=HIGH. 122 / 140
  250. 250. A toy-example using copy State-space model yt = xt + vt xt = 2xt−1 − xt−2 + wt Rewrite this as yt = xt + vt 0 = xt − 2xt−1 + xt−2 + wt and implement this as two families 1. Observations yt with precision Prec(vt ) 2. Observations 0 with precision Prec(wt ), or Prec=HIGH. 122 / 140
  251. 251. n = 100 m = n-2 y = sin((1:n)*0.2) + rnorm(n, sd=0.1) formula = Y ~ f(i, model="iid", initial=-10, fixed=TRUE) + f(j, w, copy="i") + f(k, copy="i") + f(l, model ="iid") -1 Y = matrix(NA, n+m, 2) Y[1:n, 1] = y Y[1:m + n, 2] = 0 i = c(1:n, 3:n) # x_t j = c(rep(NA,n), 3:n -1) # x_t-1 w = c(rep(NA,n), rep(-2,m)) # weights for j k = c(rep(NA,n), 3:n -2) # x_t-2 l = c(rep(NA,n), 1:m) # v_t r = inla(formula, data = data.frame(i,j,w,k,l,Y), family = c("gaussian", "gaussian"), control.data = list(list(), list(initial=10, fixed=TRUE 123 / 140
  252. 252. −2 0 2 4 Stochastic Volatility model 0 200 400 600 800 1000 Log of the daily difference of the pound-dollar exchange rate from October 1st, 1981, to June 28th, 1985. 124 / 140
  253. 253. Stochastic Volatility model Simple model xt | x1 , . . . , xt−1 , τ, φ ∼ N (φxt−1 , 1/τ ) where |φ| < 1 to ensure a stationary process. Observations are taken to be yt | x1 , . . . , xt , µ ∼ N (0, exp(µ + xt )) 125 / 140
  254. 254. Results Using just the first 50 data-points only, which makes the problem much harder. 126 / 140
  255. 255. 0.00 0.02 0.04 0.06 0.08 0.10 Results −10 −5 0 5 10 15 20 ν = logit(2φ − 1) 126 / 140
  256. 256. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Results 0 2 4 6 log(κx ) 126 / 140
  257. 257. −2 0 2 4 Using the full dataset 0 200 400 600 800 100 The Pound-Dollar data. 127 / 140
  258. 258. −1 −3 −2 x$V2 0 1 2 Using the full dataset 0 200 400 600 800 x$V1 Mean of xt + µ 128 / 140
  259. 259. 0.015 0.010 0.005 0.000 convert.dens(xx, yy, FUN = exp)$y 0.020 Using the full dataset 0 100 200 300 400 500 convert.dens(xx, yy, FUN = exp)$x The posterior marginal for the precision. 129 / 140
  260. 260. 30 20 10 0 convert.dens(xx, yy, FUN = phi.trans)$y 40 Using the full dataset 0.70 0.75 0.80 0.85 0.90 0.95 1.00 convert.dens(xx, yy, FUN = phi.trans)$x The posterior marginal for the lag-1 correlation. 130 / 140
  261. 261. −1 −3 −2 x$V2 0 1 2 Using the full dataset 0 200 400 600 800 1000 x$V1 Predictions for µ + xt+k 131 / 140
  262. 262. New data-model: Student-tν Now extend the model to use Student-tν distribution yt | x1 , . . . , xt ∼ exp(µ/2 + xt /2) × Student-tν / ν/(ν − 2) 132 / 140
  263. 263. 0.06 0.04 0.02 0.00 convert.dens(xx, yy, FUN = dof.trans)$y 0.08 Student-tν 0 20 40 60 80 100 convert.dens(xx, yy, FUN = dof.trans)$x Posterior marginal for ν. 133 / 140
  264. 264. −1 −3 −2 x$V2 0 1 2 Student-tν 0 200 400 600 800 1000 x$V1 Predictions 134 / 140
  265. 265. −1 −3 −2 x$V2 0 1 2 Student-tν 0 200 400 600 800 1000 x$V1 Comparing predictions with Student−tν and Gaussian 135 / 140
  266. 266. Student-tν However, No support for Student-tν in the data Bayes-factor Deviance Information Criteria 136 / 140
  267. 267. Disease mapping: The BYM-model Data yi ∼ Poisson(Ei exp(ηi )) Log-relative risk ηi = ui + vi Structured component u 0.98 0.71 Unstructured component v 0.44 0.17 −0.1 −0.37 Log-precisions log κu and log κv −0.63 A hard case: Insulin Dependent Diabetes Mellitus in 366 districts of Sardinia. Few counts. dim(θ) = 2. 137 / 140
  268. 268. Marginals for θ|y 138 / 140
  269. 269. Marginals for θ|y 138 / 140
  270. 270. Marginals for xi |y 139 / 140
  271. 271. THANK YOU 140 / 140

×