Denitions Estimation Inference Challenges  open questions References
Generalized linear mixed models: overview and
open questions
Ben Bolker
McMaster University, Mathematics  Statistics and Biology
12 November 2013
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Outline
1 Examples and denitions
2 Estimation
Overview
Methods
3 Inference
4 Challenges  open questions
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Outline
1 Examples and denitions
2 Estimation
Overview
Methods
3 Inference
4 Challenges  open questions
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuous
predictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuous
predictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuous
predictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuous
predictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Coral protection by symbionts
none shrimp crabs both
Number of predation events
Symbionts
Number
of
blocks
0
2
4
6
8
10
1
2
0
1
2
0
2
0
1
2
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Environmental stress: Glycera cell survival
H2S
Copper
0
33.3
66.6
133.3
0 0.03 0.1 0.32
Osm=12.8
Normoxia
Osm=22.4
Normoxia
0 0.03 0.1 0.32
Osm=32
Normoxia
Osm=41.6
Normoxia
0 0.03 0.1 0.32
Osm=51.2
Normoxia
Osm=12.8
Anoxia
0 0.03 0.1 0.32
Osm=22.4
Anoxia
Osm=32
Anoxia
0 0.03 0.1 0.32
Osm=41.6
Anoxia
0
33.3
66.6
133.3
Osm=51.2
Anoxia
0.0
0.2
0.4
0.6
0.8
1.0
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Arabidopsis response to fertilization  clipping
panel: nutrient, color: genotype
Log(1+fruit
set)
0
1
2
3
4
5
unclipped clipped
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
● ●
●
● ●
●
● ●
●
● ●
● ●
● ●
● ●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
:
nutrient 1
unclipped clipped
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
:
nutrient 8
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Coral demography
Before Experimental
●
●
● ● ● ●
●
●
●
● ●
● ●
●
●
●
● ●
●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
● ●
●
● ●
●
● ●
●● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
● ●
●
●
●
0.00
0.25
0.50
0.75
1.00
0 10 20 30 40 50 0 10 20 30 40 50
Previous size (cm)
Mortality
probability
Treatment
●
●
Present
Removed
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Technical denition
Yi
|{z}
response
∼
conditional
distribution
z}|{
Distr (g
−1(ηi )
| {z }
inverse
link
function
, φ
|{z}
scale
parameter
)
η
|{z}
linear
predictor
= Xβ
|{z}
xed
eects
+ Zb
|{z}
random
eects
b
|{z}
conditional
modes
∼ MVN(0, Σ(θ)
|{z}
variance-
covariance
matrix
)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Outline
1 Examples and denitions
2 Estimation
Overview
Methods
3 Inference
4 Challenges  open questions
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Overview
Maximum likelihood estimation
L(Yi |θ, β)
| {z }
likelihood
=
Z
· · ·
Z
L(Yi |β, b)
| {z }
data|random eects
× L(b|Σ(θ))
| {z }
random eects
d b
Best t is a compromise between two components
(consistency of data with β and random eects, consistency of
random eect with RE distribution)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Overview
Integrated (marginal) likelihood
−10 −5 0 5 10
0.0
0.2
0.4
0.6
0.8
1.0
conditional mode value (u)
Scaled
probability
L(b|σ2
)
L(x|b, β)
Lprod
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Overview
Shrinkage
● ●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Arabidopsis block estimates
Genotype
Mean(log)
fruit
set
0 5 10 15 20 25
−15
−3
0
3
● ● ●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
3 2 10
8
10
4
3 9 9 4 6 4 2 6 10 5 7 9 4 9 11 2 5 5
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Methods
Estimation methods
deterministic: precision vs. computational cost:
penalized quasi-likelihood, Laplace approximation, adaptive
Gauss-Hermite quadrature (Breslow, 2004) . . .
stochastic (Monte Carlo): frequentist and Bayesian (Booth
and Hobert, 1999; Ponciano et al., 2009; Sung, 2007)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts  5, binary or
low-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts  5, binary or
low-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts  5, binary or
low-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts  5, binary or
low-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Methods
Breslow (2004) on PQL
As usual when software for complicated statistical
inference procedures is broadly disseminated, there is
potential for abuse and misinterpretation. In spite of the
fact that PQL was initially advertised as a procedure for
approximate inference in GLMMs, and its tendency to
give seriously biased estimates of variance components
and a fortiori regression parameters with binary outcome
data was emphasized in multiple publications [5, 6, 24],
some statisticians seemed to ignore these warnings and to
think of PQL as synonymous with GLMM.
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Methods
Laplace approximation
for given β, θ (RE parameters), nd conditional modes by
penalized, iterated reweighted least squares;
then use second-order Taylor expansion around the conditional
modes
more accurate than PQL
reasonably fast and exible
lme4:glmer, glmmML, glmmADMB, R2ADMB (AD Model Builder)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Methods
Gauss-Hermite quadrature (GHQ)
as above, but compute additional terms in the integral
(typically 8, but often up to 20)
most accurate
slowest, hence not exible (23 RE at most, maybe only 1)
lme4:glmer, glmmML, repeated
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Methods
Adaptive vs. non-adaptive GHQ
Adaptive GHQ is more expensive at a given n,
but makes up for it in accuracy
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Methods
Stochastic approaches
Mostly Bayesians (Bayesian computation handles
high-dimensional integration)
various avours: Gibbs sampling, MCMC, MCEM, etc.
generally slower but more exible
simplies many inferential problems
must specify priors, assess convergence/error
specialized: glmmAK, MCMCglmm (Hadeld, 2010), INLA,
bernor
general: glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS),
R2jags, rjags (JAGS)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Methods
Estimation: example (McKeon et al., 2012)
Log−odds of predation
−6 −4 −2 0 2
Symbiont
Crab vs. Shrimp
Added symbiont
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
GLM (fixed)
GLM (pooled)
PQL
Laplace
AGQ
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Outline
1 Examples and denitions
2 Estimation
Overview
Methods
3 Inference
4 Challenges  open questions
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Wald tests
Wald tests (e.g. typical results of summary)
based on information matrix
assume quadratic log-likelihood surface
exact for regular linear models;
only asymptotically OK for GLM(M)s
computationally cheap
approximation is sometimes awful (Hauck-Donner eect)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
2D proles for coral predation
Scatter Plot Matrix
.sig01
2 4 6 8 101214
−3
−2
−1
0
(Intercept)
0
5
10
15
10 15
0 1 2 3
tttcrabs
−10
−8
−6
−4
−2
0
−4 −2 0
0 1 2 3
tttshrimp
−10
−8
−6
−4
−2 −6 −4 −2
0 1 2 3
tttboth
−12
−10
−8
−6
−4
−2
0 1 2 3
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Likelihood ratio tests
better, but still have to deal with two nite-size problems:
when scale parameter is free (Gamma, etc.), deviance is ∼ F
rather than ∼ χ2, with poorly dened denominator df
in GLM(M) case, numerator is only asymptotically χ2 anyway
Bartlett corrections (Cordeiro and Ferrari, 1998; Cordeiro
et al., 1994), higher-order asymptotics: cond [neither extended
to GLMMs!]
Prole condence intervals: moderately dicult/fragile
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Parametric bootstrapping
t null model to data
simulate data from null model
t null and working model, compute likelihood dierence
repeat to estimate null distribution
should be OK but ??? not well tested
(assumes estimated parameters are suciently good)
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Parametric bootstrap results
True p value
Inferred
p
value
0.02
0.04
0.06
0.08
0.02 0.06
Osm Cu
H2S
0.02 0.06
0.02
0.04
0.06
0.08
Anoxia
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Bayesian approaches
Provided that we have a good sample from the posterior
distribution (Markov chains have converged etc. etc.) we get
most of the inferences we want for free by summarizing the
marginal posteriors
Model selection is still an open question: reversible-jump
MCMC, deviance information criterion
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Outline
1 Examples and denitions
2 Estimation
Overview
Methods
3 Inference
4 Challenges  open questions
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Next steps
Dealing with complex random eects:
regularization, model selection, penalized methods
(lasso/fence)
Flexible correlation structures:
spatial, temporal, phylogenetic
hybrid  improved MCMC methods (mcmcsamp, Stan)
Reliable assessment of out-of-sample performance
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Glycera estimates
Effect on survival
−60 −40 −20 0 20 40 60
Osm
Cu
H2S
Anoxia
Osm:Cu
Osm:H2S
Cu:H2S
Osm:Anoxia
Cu:Anoxia
H2S:Anoxia
Osm:Cu:H2S
Osm:Cu:Anoxia
Osm:H2S:Anoxia
Cu:H2S:Anoxia
Osm:Cu:H2S:Anoxia
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
MCMCglmm
glmer(OD:2)
glmer(OD)
glmmML
glmer
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Acknowledgments
lme4: Doug Bates, Martin
Mächler, Steve Walker
Data: Adrian Stier (UBC/OSU),
Sea McKeon (Smithsonian),
David Julian (UF), Jada-Simone
White (Univ Hawai'i)
NSERC (Discovery)
SHARCnet
Ben Bolker
GLMMs
Denitions Estimation Inference Challenges  open questions References
Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265285.
doi:10.1111/1467-9868.00176.
Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattle
symposium in biostatistics: Analysis of correlated data, pages 122. Springer. ISBN 0387208623.
Cordeiro, G.M. and Ferrari, S.L.P., 1998. Journal of Statistical Planning and Inference,
71(1-2):261269. ISSN 0378-3758. doi:10.1016/S0378-3758(98)00005-6.
Cordeiro, G.M., Paula, G.A., and Botter, D.A., 1994. International Statistical Review / Revue
Internationale de Statistique, 62(2):257274. ISSN 03067734. doi:10.2307/1403512.
Hadeld, J.D., 2010. Journal of Statistical Software, 33(2):122. ISSN 1548-7660.
McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):10951103. ISSN 0029-8549.
doi:10.1007/s00442-012-2275-2.
Pinheiro, J.C. and Bates, D.M., 1996. Statistics and Computing, 6(3):289296.
doi:10.1007/BF00140873.
Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356362. ISSN 0012-9658.
Sung, Y.J., 2007. The Annals of Statistics, 35(3):9901011. ISSN 0090-5364.
doi:10.1214/009053606000001389.
Ben Bolker
GLMMs

Ben Bolker_Generalized Linear Models Overview and Open Questions.pdf

  • 1.
    Denitions Estimation InferenceChallenges open questions References Generalized linear mixed models: overview and open questions Ben Bolker McMaster University, Mathematics Statistics and Biology 12 November 2013 Ben Bolker GLMMs
  • 2.
    Denitions Estimation InferenceChallenges open questions References Outline 1 Examples and denitions 2 Estimation Overview Methods 3 Inference 4 Challenges open questions Ben Bolker GLMMs
  • 3.
    Denitions Estimation InferenceChallenges open questions References Outline 1 Examples and denitions 2 Estimation Overview Methods 3 Inference 4 Challenges open questions Ben Bolker GLMMs
  • 4.
    Denitions Estimation InferenceChallenges open questions References (Generalized) linear mixed models (G)LMMs: a statistical modeling framework incorporating: Linear combinations of categorical and continuous predictors, and interactions Response distributions in the exponential family (binomial, Poisson, and extensions) Any smooth, monotonic link function (e.g. logistic, exponential models) Flexible combinations of blocking factors (clustering; random eects) Applications in ecology, neurobiology, behaviour, epidemiology, real estate, . . . Ben Bolker GLMMs
  • 5.
    Denitions Estimation InferenceChallenges open questions References (Generalized) linear mixed models (G)LMMs: a statistical modeling framework incorporating: Linear combinations of categorical and continuous predictors, and interactions Response distributions in the exponential family (binomial, Poisson, and extensions) Any smooth, monotonic link function (e.g. logistic, exponential models) Flexible combinations of blocking factors (clustering; random eects) Applications in ecology, neurobiology, behaviour, epidemiology, real estate, . . . Ben Bolker GLMMs
  • 6.
    Denitions Estimation InferenceChallenges open questions References (Generalized) linear mixed models (G)LMMs: a statistical modeling framework incorporating: Linear combinations of categorical and continuous predictors, and interactions Response distributions in the exponential family (binomial, Poisson, and extensions) Any smooth, monotonic link function (e.g. logistic, exponential models) Flexible combinations of blocking factors (clustering; random eects) Applications in ecology, neurobiology, behaviour, epidemiology, real estate, . . . Ben Bolker GLMMs
  • 7.
    Denitions Estimation InferenceChallenges open questions References (Generalized) linear mixed models (G)LMMs: a statistical modeling framework incorporating: Linear combinations of categorical and continuous predictors, and interactions Response distributions in the exponential family (binomial, Poisson, and extensions) Any smooth, monotonic link function (e.g. logistic, exponential models) Flexible combinations of blocking factors (clustering; random eects) Applications in ecology, neurobiology, behaviour, epidemiology, real estate, . . . Ben Bolker GLMMs
  • 8.
    Denitions Estimation InferenceChallenges open questions References Examples ecology survival, predation, etc. (experimental plots) genomics presence/absence of polymorphisms, gene expression (individuals) educational assessment student scores (students × teachers) psychology/sensometrics decisions, responses to stimuli (individuals) epidemiology disease prevalence (postal codes, provinces, countries) Ben Bolker GLMMs
  • 9.
    Denitions Estimation InferenceChallenges open questions References Examples ecology survival, predation, etc. (experimental plots) genomics presence/absence of polymorphisms, gene expression (individuals) educational assessment student scores (students × teachers) psychology/sensometrics decisions, responses to stimuli (individuals) epidemiology disease prevalence (postal codes, provinces, countries) Ben Bolker GLMMs
  • 10.
    Denitions Estimation InferenceChallenges open questions References Examples ecology survival, predation, etc. (experimental plots) genomics presence/absence of polymorphisms, gene expression (individuals) educational assessment student scores (students × teachers) psychology/sensometrics decisions, responses to stimuli (individuals) epidemiology disease prevalence (postal codes, provinces, countries) Ben Bolker GLMMs
  • 11.
    Denitions Estimation InferenceChallenges open questions References Examples ecology survival, predation, etc. (experimental plots) genomics presence/absence of polymorphisms, gene expression (individuals) educational assessment student scores (students × teachers) psychology/sensometrics decisions, responses to stimuli (individuals) epidemiology disease prevalence (postal codes, provinces, countries) Ben Bolker GLMMs
  • 12.
    Denitions Estimation InferenceChallenges open questions References Examples ecology survival, predation, etc. (experimental plots) genomics presence/absence of polymorphisms, gene expression (individuals) educational assessment student scores (students × teachers) psychology/sensometrics decisions, responses to stimuli (individuals) epidemiology disease prevalence (postal codes, provinces, countries) Ben Bolker GLMMs
  • 13.
    Denitions Estimation InferenceChallenges open questions References Coral protection by symbionts none shrimp crabs both Number of predation events Symbionts Number of blocks 0 2 4 6 8 10 1 2 0 1 2 0 2 0 1 2 Ben Bolker GLMMs
  • 14.
    Denitions Estimation InferenceChallenges open questions References Environmental stress: Glycera cell survival H2S Copper 0 33.3 66.6 133.3 0 0.03 0.1 0.32 Osm=12.8 Normoxia Osm=22.4 Normoxia 0 0.03 0.1 0.32 Osm=32 Normoxia Osm=41.6 Normoxia 0 0.03 0.1 0.32 Osm=51.2 Normoxia Osm=12.8 Anoxia 0 0.03 0.1 0.32 Osm=22.4 Anoxia Osm=32 Anoxia 0 0.03 0.1 0.32 Osm=41.6 Anoxia 0 33.3 66.6 133.3 Osm=51.2 Anoxia 0.0 0.2 0.4 0.6 0.8 1.0 Ben Bolker GLMMs
  • 15.
    Denitions Estimation InferenceChallenges open questions References Arabidopsis response to fertilization clipping panel: nutrient, color: genotype Log(1+fruit set) 0 1 2 3 4 5 unclipped clipped ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● : nutrient 1 unclipped clipped ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● : nutrient 8 Ben Bolker GLMMs
  • 16.
    Denitions Estimation InferenceChallenges open questions References Coral demography Before Experimental ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.25 0.50 0.75 1.00 0 10 20 30 40 50 0 10 20 30 40 50 Previous size (cm) Mortality probability Treatment ● ● Present Removed Ben Bolker GLMMs
  • 17.
    Denitions Estimation InferenceChallenges open questions References Technical denition Yi |{z} response ∼ conditional distribution z}|{ Distr (g −1(ηi ) | {z } inverse link function , φ |{z} scale parameter ) η |{z} linear predictor = Xβ |{z} xed eects + Zb |{z} random eects b |{z} conditional modes ∼ MVN(0, Σ(θ) |{z} variance- covariance matrix ) Ben Bolker GLMMs
  • 18.
    Denitions Estimation InferenceChallenges open questions References Outline 1 Examples and denitions 2 Estimation Overview Methods 3 Inference 4 Challenges open questions Ben Bolker GLMMs
  • 19.
    Denitions Estimation InferenceChallenges open questions References Overview Maximum likelihood estimation L(Yi |θ, β) | {z } likelihood = Z · · · Z L(Yi |β, b) | {z } data|random eects × L(b|Σ(θ)) | {z } random eects d b Best t is a compromise between two components (consistency of data with β and random eects, consistency of random eect with RE distribution) Ben Bolker GLMMs
  • 20.
    Denitions Estimation InferenceChallenges open questions References Overview Integrated (marginal) likelihood −10 −5 0 5 10 0.0 0.2 0.4 0.6 0.8 1.0 conditional mode value (u) Scaled probability L(b|σ2 ) L(x|b, β) Lprod Ben Bolker GLMMs
  • 21.
    Denitions Estimation InferenceChallenges open questions References Overview Shrinkage ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Arabidopsis block estimates Genotype Mean(log) fruit set 0 5 10 15 20 25 −15 −3 0 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 2 10 8 10 4 3 9 9 4 6 4 2 6 10 5 7 9 4 9 11 2 5 5 Ben Bolker GLMMs
  • 22.
    Denitions Estimation InferenceChallenges open questions References Methods Estimation methods deterministic: precision vs. computational cost: penalized quasi-likelihood, Laplace approximation, adaptive Gauss-Hermite quadrature (Breslow, 2004) . . . stochastic (Monte Carlo): frequentist and Bayesian (Booth and Hobert, 1999; Ponciano et al., 2009; Sung, 2007) Ben Bolker GLMMs
  • 23.
    Denitions Estimation InferenceChallenges open questions References Methods Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM t (Breslow, 2004) exible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈ 90% of small-unit-sample cases descendants: higher-order PQL, hierarchical GLM Ben Bolker GLMMs
  • 24.
    Denitions Estimation InferenceChallenges open questions References Methods Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM t (Breslow, 2004) exible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈ 90% of small-unit-sample cases descendants: higher-order PQL, hierarchical GLM Ben Bolker GLMMs
  • 25.
    Denitions Estimation InferenceChallenges open questions References Methods Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM t (Breslow, 2004) exible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈ 90% of small-unit-sample cases descendants: higher-order PQL, hierarchical GLM Ben Bolker GLMMs
  • 26.
    Denitions Estimation InferenceChallenges open questions References Methods Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM t (Breslow, 2004) exible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈ 90% of small-unit-sample cases descendants: higher-order PQL, hierarchical GLM Ben Bolker GLMMs
  • 27.
    Denitions Estimation InferenceChallenges open questions References Methods Breslow (2004) on PQL As usual when software for complicated statistical inference procedures is broadly disseminated, there is potential for abuse and misinterpretation. In spite of the fact that PQL was initially advertised as a procedure for approximate inference in GLMMs, and its tendency to give seriously biased estimates of variance components and a fortiori regression parameters with binary outcome data was emphasized in multiple publications [5, 6, 24], some statisticians seemed to ignore these warnings and to think of PQL as synonymous with GLMM. Ben Bolker GLMMs
  • 28.
    Denitions Estimation InferenceChallenges open questions References Methods Laplace approximation for given β, θ (RE parameters), nd conditional modes by penalized, iterated reweighted least squares; then use second-order Taylor expansion around the conditional modes more accurate than PQL reasonably fast and exible lme4:glmer, glmmML, glmmADMB, R2ADMB (AD Model Builder) Ben Bolker GLMMs
  • 29.
    Denitions Estimation InferenceChallenges open questions References Methods Gauss-Hermite quadrature (GHQ) as above, but compute additional terms in the integral (typically 8, but often up to 20) most accurate slowest, hence not exible (23 RE at most, maybe only 1) lme4:glmer, glmmML, repeated Ben Bolker GLMMs
  • 30.
    Denitions Estimation InferenceChallenges open questions References Methods Adaptive vs. non-adaptive GHQ Adaptive GHQ is more expensive at a given n, but makes up for it in accuracy Ben Bolker GLMMs
  • 31.
    Denitions Estimation InferenceChallenges open questions References Methods Stochastic approaches Mostly Bayesians (Bayesian computation handles high-dimensional integration) various avours: Gibbs sampling, MCMC, MCEM, etc. generally slower but more exible simplies many inferential problems must specify priors, assess convergence/error specialized: glmmAK, MCMCglmm (Hadeld, 2010), INLA, bernor general: glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS), R2jags, rjags (JAGS) Ben Bolker GLMMs
  • 32.
    Denitions Estimation InferenceChallenges open questions References Methods Estimation: example (McKeon et al., 2012) Log−odds of predation −6 −4 −2 0 2 Symbiont Crab vs. Shrimp Added symbiont ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GLM (fixed) GLM (pooled) PQL Laplace AGQ Ben Bolker GLMMs
  • 33.
    Denitions Estimation InferenceChallenges open questions References Outline 1 Examples and denitions 2 Estimation Overview Methods 3 Inference 4 Challenges open questions Ben Bolker GLMMs
  • 34.
    Denitions Estimation InferenceChallenges open questions References Wald tests Wald tests (e.g. typical results of summary) based on information matrix assume quadratic log-likelihood surface exact for regular linear models; only asymptotically OK for GLM(M)s computationally cheap approximation is sometimes awful (Hauck-Donner eect) Ben Bolker GLMMs
  • 35.
    Denitions Estimation InferenceChallenges open questions References 2D proles for coral predation Scatter Plot Matrix .sig01 2 4 6 8 101214 −3 −2 −1 0 (Intercept) 0 5 10 15 10 15 0 1 2 3 tttcrabs −10 −8 −6 −4 −2 0 −4 −2 0 0 1 2 3 tttshrimp −10 −8 −6 −4 −2 −6 −4 −2 0 1 2 3 tttboth −12 −10 −8 −6 −4 −2 0 1 2 3 Ben Bolker GLMMs
  • 36.
    Denitions Estimation InferenceChallenges open questions References Likelihood ratio tests better, but still have to deal with two nite-size problems: when scale parameter is free (Gamma, etc.), deviance is ∼ F rather than ∼ χ2, with poorly dened denominator df in GLM(M) case, numerator is only asymptotically χ2 anyway Bartlett corrections (Cordeiro and Ferrari, 1998; Cordeiro et al., 1994), higher-order asymptotics: cond [neither extended to GLMMs!] Prole condence intervals: moderately dicult/fragile Ben Bolker GLMMs
  • 37.
    Denitions Estimation InferenceChallenges open questions References Parametric bootstrapping t null model to data simulate data from null model t null and working model, compute likelihood dierence repeat to estimate null distribution should be OK but ??? not well tested (assumes estimated parameters are suciently good) Ben Bolker GLMMs
  • 38.
    Denitions Estimation InferenceChallenges open questions References Parametric bootstrap results True p value Inferred p value 0.02 0.04 0.06 0.08 0.02 0.06 Osm Cu H2S 0.02 0.06 0.02 0.04 0.06 0.08 Anoxia Ben Bolker GLMMs
  • 39.
    Denitions Estimation InferenceChallenges open questions References Bayesian approaches Provided that we have a good sample from the posterior distribution (Markov chains have converged etc. etc.) we get most of the inferences we want for free by summarizing the marginal posteriors Model selection is still an open question: reversible-jump MCMC, deviance information criterion Ben Bolker GLMMs
  • 40.
    Denitions Estimation InferenceChallenges open questions References Outline 1 Examples and denitions 2 Estimation Overview Methods 3 Inference 4 Challenges open questions Ben Bolker GLMMs
  • 41.
    Denitions Estimation InferenceChallenges open questions References Next steps Dealing with complex random eects: regularization, model selection, penalized methods (lasso/fence) Flexible correlation structures: spatial, temporal, phylogenetic hybrid improved MCMC methods (mcmcsamp, Stan) Reliable assessment of out-of-sample performance Ben Bolker GLMMs
  • 42.
    Denitions Estimation InferenceChallenges open questions References Glycera estimates Effect on survival −60 −40 −20 0 20 40 60 Osm Cu H2S Anoxia Osm:Cu Osm:H2S Cu:H2S Osm:Anoxia Cu:Anoxia H2S:Anoxia Osm:Cu:H2S Osm:Cu:Anoxia Osm:H2S:Anoxia Cu:H2S:Anoxia Osm:Cu:H2S:Anoxia ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● MCMCglmm glmer(OD:2) glmer(OD) glmmML glmer Ben Bolker GLMMs
  • 43.
    Denitions Estimation InferenceChallenges open questions References Acknowledgments lme4: Doug Bates, Martin Mächler, Steve Walker Data: Adrian Stier (UBC/OSU), Sea McKeon (Smithsonian), David Julian (UF), Jada-Simone White (Univ Hawai'i) NSERC (Discovery) SHARCnet Ben Bolker GLMMs
  • 44.
    Denitions Estimation InferenceChallenges open questions References Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265285. doi:10.1111/1467-9868.00176. Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattle symposium in biostatistics: Analysis of correlated data, pages 122. Springer. ISBN 0387208623. Cordeiro, G.M. and Ferrari, S.L.P., 1998. Journal of Statistical Planning and Inference, 71(1-2):261269. ISSN 0378-3758. doi:10.1016/S0378-3758(98)00005-6. Cordeiro, G.M., Paula, G.A., and Botter, D.A., 1994. International Statistical Review / Revue Internationale de Statistique, 62(2):257274. ISSN 03067734. doi:10.2307/1403512. Hadeld, J.D., 2010. Journal of Statistical Software, 33(2):122. ISSN 1548-7660. McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):10951103. ISSN 0029-8549. doi:10.1007/s00442-012-2275-2. Pinheiro, J.C. and Bates, D.M., 1996. Statistics and Computing, 6(3):289296. doi:10.1007/BF00140873. Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356362. ISSN 0012-9658. Sung, Y.J., 2007. The Annals of Statistics, 35(3):9901011. ISSN 0090-5364. doi:10.1214/009053606000001389. Ben Bolker GLMMs