Ben Bolker_Generalized Linear Models Overview and Open Questions.pdf
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuous
predictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Ben Bolker_Generalized Linear Models Overview and Open Questions.pdf
1.
Denitions Estimation InferenceChallenges open questions References
Generalized linear mixed models: overview and
open questions
Ben Bolker
McMaster University, Mathematics Statistics and Biology
12 November 2013
Ben Bolker
GLMMs
2.
Denitions Estimation InferenceChallenges open questions References
Outline
1 Examples and denitions
2 Estimation
Overview
Methods
3 Inference
4 Challenges open questions
Ben Bolker
GLMMs
3.
Denitions Estimation InferenceChallenges open questions References
Outline
1 Examples and denitions
2 Estimation
Overview
Methods
3 Inference
4 Challenges open questions
Ben Bolker
GLMMs
4.
Denitions Estimation InferenceChallenges open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuous
predictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
5.
Denitions Estimation InferenceChallenges open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuous
predictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
6.
Denitions Estimation InferenceChallenges open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuous
predictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
7.
Denitions Estimation InferenceChallenges open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuous
predictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
8.
Denitions Estimation InferenceChallenges open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
9.
Denitions Estimation InferenceChallenges open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
10.
Denitions Estimation InferenceChallenges open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
11.
Denitions Estimation InferenceChallenges open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
12.
Denitions Estimation InferenceChallenges open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
13.
Denitions Estimation InferenceChallenges open questions References
Coral protection by symbionts
none shrimp crabs both
Number of predation events
Symbionts
Number
of
blocks
0
2
4
6
8
10
1
2
0
1
2
0
2
0
1
2
Ben Bolker
GLMMs
Denitions Estimation InferenceChallenges open questions References
Technical denition
Yi
|{z}
response
∼
conditional
distribution
z}|{
Distr (g
−1(ηi )
| {z }
inverse
link
function
, φ
|{z}
scale
parameter
)
η
|{z}
linear
predictor
= Xβ
|{z}
xed
eects
+ Zb
|{z}
random
eects
b
|{z}
conditional
modes
∼ MVN(0, Σ(θ)
|{z}
variance-
covariance
matrix
)
Ben Bolker
GLMMs
18.
Denitions Estimation InferenceChallenges open questions References
Outline
1 Examples and denitions
2 Estimation
Overview
Methods
3 Inference
4 Challenges open questions
Ben Bolker
GLMMs
19.
Denitions Estimation InferenceChallenges open questions References
Overview
Maximum likelihood estimation
L(Yi |θ, β)
| {z }
likelihood
=
Z
· · ·
Z
L(Yi |β, b)
| {z }
data|random eects
× L(b|Σ(θ))
| {z }
random eects
d b
Best t is a compromise between two components
(consistency of data with β and random eects, consistency of
random eect with RE distribution)
Ben Bolker
GLMMs
20.
Denitions Estimation InferenceChallenges open questions References
Overview
Integrated (marginal) likelihood
−10 −5 0 5 10
0.0
0.2
0.4
0.6
0.8
1.0
conditional mode value (u)
Scaled
probability
L(b|σ2
)
L(x|b, β)
Lprod
Ben Bolker
GLMMs
Denitions Estimation InferenceChallenges open questions References
Methods
Estimation methods
deterministic: precision vs. computational cost:
penalized quasi-likelihood, Laplace approximation, adaptive
Gauss-Hermite quadrature (Breslow, 2004) . . .
stochastic (Monte Carlo): frequentist and Bayesian (Booth
and Hobert, 1999; Ponciano et al., 2009; Sung, 2007)
Ben Bolker
GLMMs
23.
Denitions Estimation InferenceChallenges open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts 5, binary or
low-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
24.
Denitions Estimation InferenceChallenges open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts 5, binary or
low-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
25.
Denitions Estimation InferenceChallenges open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts 5, binary or
low-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
26.
Denitions Estimation InferenceChallenges open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts 5, binary or
low-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
27.
Denitions Estimation InferenceChallenges open questions References
Methods
Breslow (2004) on PQL
As usual when software for complicated statistical
inference procedures is broadly disseminated, there is
potential for abuse and misinterpretation. In spite of the
fact that PQL was initially advertised as a procedure for
approximate inference in GLMMs, and its tendency to
give seriously biased estimates of variance components
and a fortiori regression parameters with binary outcome
data was emphasized in multiple publications [5, 6, 24],
some statisticians seemed to ignore these warnings and to
think of PQL as synonymous with GLMM.
Ben Bolker
GLMMs
28.
Denitions Estimation InferenceChallenges open questions References
Methods
Laplace approximation
for given β, θ (RE parameters), nd conditional modes by
penalized, iterated reweighted least squares;
then use second-order Taylor expansion around the conditional
modes
more accurate than PQL
reasonably fast and exible
lme4:glmer, glmmML, glmmADMB, R2ADMB (AD Model Builder)
Ben Bolker
GLMMs
29.
Denitions Estimation InferenceChallenges open questions References
Methods
Gauss-Hermite quadrature (GHQ)
as above, but compute additional terms in the integral
(typically 8, but often up to 20)
most accurate
slowest, hence not exible (23 RE at most, maybe only 1)
lme4:glmer, glmmML, repeated
Ben Bolker
GLMMs
30.
Denitions Estimation InferenceChallenges open questions References
Methods
Adaptive vs. non-adaptive GHQ
Adaptive GHQ is more expensive at a given n,
but makes up for it in accuracy
Ben Bolker
GLMMs
31.
Denitions Estimation InferenceChallenges open questions References
Methods
Stochastic approaches
Mostly Bayesians (Bayesian computation handles
high-dimensional integration)
various avours: Gibbs sampling, MCMC, MCEM, etc.
generally slower but more exible
simplies many inferential problems
must specify priors, assess convergence/error
specialized: glmmAK, MCMCglmm (Hadeld, 2010), INLA,
bernor
general: glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS),
R2jags, rjags (JAGS)
Ben Bolker
GLMMs
32.
Denitions Estimation InferenceChallenges open questions References
Methods
Estimation: example (McKeon et al., 2012)
Log−odds of predation
−6 −4 −2 0 2
Symbiont
Crab vs. Shrimp
Added symbiont
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
GLM (fixed)
GLM (pooled)
PQL
Laplace
AGQ
Ben Bolker
GLMMs
33.
Denitions Estimation InferenceChallenges open questions References
Outline
1 Examples and denitions
2 Estimation
Overview
Methods
3 Inference
4 Challenges open questions
Ben Bolker
GLMMs
34.
Denitions Estimation InferenceChallenges open questions References
Wald tests
Wald tests (e.g. typical results of summary)
based on information matrix
assume quadratic log-likelihood surface
exact for regular linear models;
only asymptotically OK for GLM(M)s
computationally cheap
approximation is sometimes awful (Hauck-Donner eect)
Ben Bolker
GLMMs
Denitions Estimation InferenceChallenges open questions References
Likelihood ratio tests
better, but still have to deal with two nite-size problems:
when scale parameter is free (Gamma, etc.), deviance is ∼ F
rather than ∼ χ2, with poorly dened denominator df
in GLM(M) case, numerator is only asymptotically χ2 anyway
Bartlett corrections (Cordeiro and Ferrari, 1998; Cordeiro
et al., 1994), higher-order asymptotics: cond [neither extended
to GLMMs!]
Prole condence intervals: moderately dicult/fragile
Ben Bolker
GLMMs
37.
Denitions Estimation InferenceChallenges open questions References
Parametric bootstrapping
t null model to data
simulate data from null model
t null and working model, compute likelihood dierence
repeat to estimate null distribution
should be OK but ??? not well tested
(assumes estimated parameters are suciently good)
Ben Bolker
GLMMs
38.
Denitions Estimation InferenceChallenges open questions References
Parametric bootstrap results
True p value
Inferred
p
value
0.02
0.04
0.06
0.08
0.02 0.06
Osm Cu
H2S
0.02 0.06
0.02
0.04
0.06
0.08
Anoxia
Ben Bolker
GLMMs
39.
Denitions Estimation InferenceChallenges open questions References
Bayesian approaches
Provided that we have a good sample from the posterior
distribution (Markov chains have converged etc. etc.) we get
most of the inferences we want for free by summarizing the
marginal posteriors
Model selection is still an open question: reversible-jump
MCMC, deviance information criterion
Ben Bolker
GLMMs
40.
Denitions Estimation InferenceChallenges open questions References
Outline
1 Examples and denitions
2 Estimation
Overview
Methods
3 Inference
4 Challenges open questions
Ben Bolker
GLMMs
41.
Denitions Estimation InferenceChallenges open questions References
Next steps
Dealing with complex random eects:
regularization, model selection, penalized methods
(lasso/fence)
Flexible correlation structures:
spatial, temporal, phylogenetic
hybrid improved MCMC methods (mcmcsamp, Stan)
Reliable assessment of out-of-sample performance
Ben Bolker
GLMMs
Denitions Estimation InferenceChallenges open questions References
Acknowledgments
lme4: Doug Bates, Martin
Mächler, Steve Walker
Data: Adrian Stier (UBC/OSU),
Sea McKeon (Smithsonian),
David Julian (UF), Jada-Simone
White (Univ Hawai'i)
NSERC (Discovery)
SHARCnet
Ben Bolker
GLMMs
44.
Denitions Estimation InferenceChallenges open questions References
Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265285.
doi:10.1111/1467-9868.00176.
Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattle
symposium in biostatistics: Analysis of correlated data, pages 122. Springer. ISBN 0387208623.
Cordeiro, G.M. and Ferrari, S.L.P., 1998. Journal of Statistical Planning and Inference,
71(1-2):261269. ISSN 0378-3758. doi:10.1016/S0378-3758(98)00005-6.
Cordeiro, G.M., Paula, G.A., and Botter, D.A., 1994. International Statistical Review / Revue
Internationale de Statistique, 62(2):257274. ISSN 03067734. doi:10.2307/1403512.
Hadeld, J.D., 2010. Journal of Statistical Software, 33(2):122. ISSN 1548-7660.
McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):10951103. ISSN 0029-8549.
doi:10.1007/s00442-012-2275-2.
Pinheiro, J.C. and Bates, D.M., 1996. Statistics and Computing, 6(3):289296.
doi:10.1007/BF00140873.
Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356362. ISSN 0012-9658.
Sung, Y.J., 2007. The Annals of Statistics, 35(3):9901011. ISSN 0090-5364.
doi:10.1214/009053606000001389.
Ben Bolker
GLMMs