Stats sem 2013

Denitions

Estimation

Inference

Challenges open questions

Generalized linear mixed models: overview and
open questions
Ben Bolker

McMaster University, Mathematics Statistics and Biology
12 November 2013

Ben Bolker
GLMMs

References

Denitions

Estimation

Inference

Outline
1

Examples and denitions

2

Estimation
Overview
Methods

3

Inference

4


Ben Bolker
GLMMs


References

Denitions

Estimation

Inference


References

(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:

Linear combinations

of categorical and continuous

predictors, and interactions
Response distributions in the

exponential family

(binomial, Poisson, and extensions)
Any smooth, monotonic

link function

(e.g. logistic, exponential models)
Flexible combinations of

blocking factors

(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs

Denitions

Estimation

Inference


References

Examples

ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students

×

teachers)

psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)

Ben Bolker
GLMMs

Denitions

Estimation

Inference


Coral protection by symbionts
Number of predation events

Number of blocks

10
8
6

2

2

2

2

1
1

4

0

2

0

shrimp

crabs

0

1
0
none

Symbionts
Ben Bolker
GLMMs

both

References

Denitions

Estimation

Inference

Environmental stress:
0

Anoxia
Osm=12.8

0.03


References

Glycera cell survival

0.1

0.32

0

Anoxia
Osm=22.4

Anoxia
Osm=32

0.03

0.1

0.32

Anoxia
Osm=41.6

Anoxia
Osm=51.2

1.0

133.3

66.6

0.8

33.3
0.6

Copper

0

Normoxia
Osm=12.8

Normoxia
Osm=22.4

Normoxia
Osm=32

Normoxia
Osm=41.6

Normoxia
Osm=51.2

0.4

133.3

66.6

0.2

33.3

0.0

0

0

0.03

0.1

0.32

0

0.03

0.1

H2S

Ben Bolker
GLMMs

0.32

0

0.03

0.1

0.32

Denitions

Estimation

Inference


Arabidopsis response to fertilization clipping
panel: nutrient, color: genotype
nutrient : 1

nutrient : 8
q
q
q
q

Log(1+fruit set)

3
2
1
0

Ben Bolker
GLMMs

q
q
q
q
q

q
q

q
q

q
q
q

q
q
q

q
q
q

q
q
q

unclipped

4

q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q

q
q

5

clipped

unclipped

clipped

q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q

q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q

q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q

References

Denitions

Estimation

Inference


References

Coral demography
Before
q

Mortality probability

1.00

qqq
q

Experimental
q q
q
q
q q
q qqqqqqqqq qq qq q
qq q
q qqq qq
q
q qqqq qq q q
q
qq qq qq q
q q q qq
q

q

q

q

q

0.75
Treatment
q

0.25

q q qq q
qq q qqq q
qqq q q q
qq
q
q
q q q qq
qq qqqq q q q
q q
qq q q

0.00
0

10

q

q
q

q

20

30

q q
q q qq q q
q
qq qq qq q q qq q
q
q
q
qq qqq qq qq q
q q qqq
qqqqq q q q q
q q
q q qq

q

40

50

0

10

Previous size (cm)

Ben Bolker
GLMMs

20

q

q

q

30

qq q

40

50

Present

q

0.50

Removed

Denitions

Estimation

Inference


Technical denition
conditional
distribution
Yi

∼

Distr

response

η
linear
predictor

b

conditional
modes

Ben Bolker
GLMMs

=

Xβ
xed
eects

(g −1 (η ),
i

φ

)

scale
inverse
parameter
link
function

+

Zb

random
eects

∼ MVN(0, Σ(θ) )
variancecovariance
matrix

References

Denitions

Estimation

Inference


References

Overview

Maximum likelihood estimation

L(Y |θ, β) =
i

L(Y |β, b )

···

i

likelihood

data|random eects

× L(b |Σ(θ))

d

b

random eects

Best t is a compromise between two components
(consistency of data with

β

and random eects, consistency of

random eect with RE distribution)

Ben Bolker
GLMMs

Denitions

Estimation

Inference


Overview

Integrated (marginal) likelihood

L (x|b, β)

Scaled probability

1.0
0.8

L prod

0.6

L (b |σ2)

0.4
0.2
0.0
−10

−5

0

5

conditional mode value (u )
Ben Bolker
GLMMs

10

References

Denitions

Estimation

Inference


Overview

Shrinkage

Mean(log) fruit set

Arabidopsis block estimates
5
9 11 2 5
10 5 7 9 4
q q
4 2 6
q q q q q q
4 6
9
q q
3 9
q q q
4
q q q
10
q q
8
q
q
3 2 10 q
q q q
q

3
0
−3
−15

q q

0

5

10

15

Genotype

Ben Bolker
GLMMs

20

25

References

Denitions

Estimation

Inference


Methods

Estimation methods

deterministic:

precision vs. computational cost:

penalized quasi-likelihood, Laplace approximation, adaptive
Gauss-Hermite quadrature (Breslow, 2004) . . .

stochastic

(Monte Carlo): frequentist and Bayesian (Booth

and Hobert, 1999; Ponciano et al., 2009; Sung, 2007)

Ben Bolker
GLMMs

References

Denitions

Estimation

Inference


References

Methods

Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)

biased

for small unit samples (e.g. counts

5,

binary or

low-survival data)
widely used: SAS

PROC GLIMMIX,

R

MASS:glmmPQL:

90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM

Ben Bolker
GLMMs

in

≈

Denitions

Estimation

Inference


Methods

Breslow (2004) on PQL
As usual when software for complicated statistical
inference procedures is broadly disseminated, there is
potential for abuse and misinterpretation. In spite of the
fact that PQL was initially advertised as a procedure for
approximate inference in GLMMs, and its tendency to
give seriously biased estimates of variance components
and a fortiori regression parameters with binary outcome
data was emphasized in multiple publications [5, 6, 24],
some statisticians seemed to ignore these warnings and to
think of PQL as synonymous with GLMM.

Ben Bolker
GLMMs

References

Denitions

Estimation

Inference


References

Methods

Laplace approximation

for given

β, θ

(RE parameters), nd conditional modes by

penalized, iterated reweighted least squares;
then use second-order Taylor expansion around the conditional
modes
more accurate than PQL
reasonably fast and exible

lme4:glmer, glmmML, glmmADMB, R2ADMB

Ben Bolker
GLMMs

(AD Model Builder)

Denitions

Estimation

Inference


Methods

Gauss-Hermite quadrature (GHQ)

as above, but compute additional terms in the integral
(typically 8, but often up to 20)
most accurate
slowest, hence not exible (23 RE at most, maybe only 1)

lme4:glmer, glmmML, repeated

Ben Bolker
GLMMs

References

Denitions

Estimation

Inference


Methods

Adaptive vs. non-adaptive GHQ

Adaptive GHQ is more expensive at a given n ,
but makes up for it in accuracy
Ben Bolker
GLMMs

References

Denitions

Estimation

Inference


References

Methods

Stochastic approaches
Mostly Bayesians (Bayesian computation handles
high-dimensional integration)
various avours: Gibbs sampling, MCMC, MCEM, etc.
generally slower but more exible
simplies many inferential problems
must specify priors, assess convergence/error
specialized:

bernor

glmmAK, MCMCglmm

(Hadeld, 2010),

INLA,

glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS),
R2jags, rjags (JAGS)

general:

Ben Bolker
GLMMs

Denitions

Estimation

Inference


Methods

Estimation: example (McKeon et al., 2012)
Log−odds of predation
−6

−4

−2

0

2

q
q
q
q
q

Added symbiont
q

q
q
q
q

Crab vs. Shrimp
q

Symbiont

Ben Bolker
GLMMs

q
q

q

q

GLM (fixed)
GLM (pooled)
PQL
Laplace
AGQ

References

Denitions

Estimation

Inference


Wald tests

Wald

tests (e.g. typical results of

summary)

based on information matrix
assume quadratic log-likelihood surface
exact for regular linear models;
only asymptotically OK for GLM(M)s
computationally cheap
approximation is sometimes awful (Hauck-Donner eect)

Ben Bolker
GLMMs

References

Denitions

Estimation

Inference


2D proles for coral predation
−2
−4
−6
−8 tttboth
−10
−12 0 1 2 3
−2 −6 −4 −2
−4
tttshrimp
−6
−8
−10 0 1 2 3

15

0
−4 −2 0
−2
−4
tttcrabs
−6
−8
0 1 2 3
−10
10

15

10

(Intercept)
5
0

0 1 2 3

2 4 6 8 101214

.sig01 0
−1
−2
−3

Scatter Plot Matrix

Ben Bolker
GLMMs

References

Denitions

Estimation

Inference


References

Likelihood ratio tests

better, but still have to deal with two nite-size problems:

when scale parameter is free (Gamma, etc.), deviance is ∼ F
rather than ∼ χ2 , with poorly dened denominator df
in GLM(M) case, numerator is only asymptotically χ2 anyway
Bartlett corrections (Cordeiro and Ferrari, 1998; Cordeiro
et al., 1994), higher-order asymptotics: cond [neither extended
to GLMMs!]
Prole condence intervals: moderately dicult/fragile

Ben Bolker
GLMMs

Denitions

Estimation

Inference


Parametric bootstrapping

t null model to data
simulate data from null model
t null and working model, compute likelihood dierence
repeat to estimate null distribution
should be OK but ??? not well tested
(assumes estimated parameters are suciently good)

Ben Bolker
GLMMs

References

Denitions

Estimation

Inference


Parametric bootstrap results
0.02 0.06

Inferred p value

H2S

Anoxia
0.08
0.06
0.04
0.02

Osm

Cu

0.08
0.06
0.04
0.02
0.02 0.06

True p value
Ben Bolker
GLMMs

References

Denitions

Estimation

Inference


References

Bayesian approaches

Provided that we have a good sample from the posterior
distribution (Markov chains have converged etc. etc.) we get
most of the inferences we want for free by summarizing the
marginal posteriors
Model selection is still an open question: reversible-jump
MCMC, deviance information criterion

Ben Bolker
GLMMs

Denitions

Estimation

Inference


Next steps

Dealing with complex random eects:
regularization, model selection, penalized methods
(lasso/fence)
Flexible correlation structures:
spatial, temporal, phylogenetic
hybrid improved MCMC methods (mcmcsamp, Stan)

Reliable

Ben Bolker
GLMMs

assessment of out-of-sample performance

References

Denitions

Estimation

Inference


Glycera estimates
q q
q q
q

Osm:Cu:H2S:Anoxia
q
q

Cu:H2S:Anoxia

q
q
q
q
q

Osm:Cu:Anoxia
qq

q q
q

Osm:Cu:H2S

q
qq
q
q

H2S:Anoxia

qq
qq
q
q
q
q
q
q

Cu:Anoxia
Osm:Anoxia
Cu:H2S

q
qq

qq
q
q
q

Osm:H2S:Anoxia

q
q
q

qq
q
q
qq
q
qq
qq
q
q
qq
q
q

Osm:H2S
Osm:Cu
Anoxia

qq
qq
q
qq q
q
q

H2S
Cu

q
q
q
q
q

Osm

−60 −40 −20

0

q
q
q
q
q

MCMCglmm
glmer(OD:2)
glmer(OD)
glmmML
glmer

20

Effect on survival

Ben Bolker
GLMMs

40

60

References

Denitions

Estimation

Inference


Acknowledgments

lme4:

Doug Bates, Martin

Mächler, Steve Walker
Data: Adrian Stier (UBC/OSU),

NSERC (Discovery)

Sea McKeon (Smithsonian),

SHARCnet

David Julian (UF), Jada-Simone
White (Univ Hawai'i)

Ben Bolker
GLMMs

References

Denitions

Estimation

Inference


References

Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265285.
doi:10.1111/1467-9868.00176.
Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattle
symposium in biostatistics: Analysis of correlated data, pages 122. Springer. ISBN 0387208623.
Cordeiro, G.M. and Ferrari, S.L.P., 1998. Journal of Statistical Planning and Inference,
71(1-2):261269. ISSN 0378-3758. doi:10.1016/S0378-3758(98)00005-6.
Cordeiro, G.M., Paula, G.A., and Botter, D.A., 1994. International Statistical Review / Revue
Internationale de Statistique, 62(2):257274. ISSN 03067734. doi:10.2307/1403512.
Hadeld, J.D., 2010. Journal of Statistical Software, 33(2):122. ISSN 1548-7660.
McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):10951103. ISSN 0029-8549.
doi:10.1007/s00442-012-2275-2.
Pinheiro, J.C. and Bates, D.M., 1996. Statistics and Computing, 6(3):289296.
doi:10.1007/BF00140873.
Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356362. ISSN 0012-9658.
Sung, Y.J., 2007. The Annals of Statistics, 35(3):9901011. ISSN 0090-5364.
doi:10.1214/009053606000001389.

Ben Bolker
GLMMs

Stats sem 2013

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to Stats sem 2013

Similar to Stats sem 2013 (20)

More from Ben Bolker

More from Ben Bolker (10)

Stats sem 2013