Denitions

Estimation

Inference

Challenges  open questions

Generalized linear mixed models: overview and
open questions
Ben Bolker

McMaster University, Mathematics  Statistics and Biology
12 November 2013

Ben Bolker
GLMMs

References
Denitions

Estimation

Inference

Outline
1

Examples and denitions

2

Estimation
Overview
Methods

3

Inference

4

Challenges  open questions

Ben Bolker
GLMMs

Challenges  open questions

References
Denitions

Estimation

Inference

Outline
1

Examples and denitions

2

Estimation
Overview
Methods

3

Inference

4

Challenges  open questions

Ben Bolker
GLMMs

Challenges  open questions

References
Denitions

Estimation

Inference

Challenges  open questions

References

(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:

Linear combinations

of categorical and continuous

predictors, and interactions
Response distributions in the

exponential family

(binomial, Poisson, and extensions)
Any smooth, monotonic

link function

(e.g. logistic, exponential models)
Flexible combinations of

blocking factors

(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

References

(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:

Linear combinations

of categorical and continuous

predictors, and interactions
Response distributions in the

exponential family

(binomial, Poisson, and extensions)
Any smooth, monotonic

link function

(e.g. logistic, exponential models)
Flexible combinations of

blocking factors

(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

References

(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:

Linear combinations

of categorical and continuous

predictors, and interactions
Response distributions in the

exponential family

(binomial, Poisson, and extensions)
Any smooth, monotonic

link function

(e.g. logistic, exponential models)
Flexible combinations of

blocking factors

(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

References

(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:

Linear combinations

of categorical and continuous

predictors, and interactions
Response distributions in the

exponential family

(binomial, Poisson, and extensions)
Any smooth, monotonic

link function

(e.g. logistic, exponential models)
Flexible combinations of

blocking factors

(clustering; random eects)
Applications in ecology, neurobiology, behaviour, epidemiology, real
estate, . . .
Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

References

Examples

ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students

×

teachers)

psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)

Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

References

Examples

ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students

×

teachers)

psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)

Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

References

Examples

ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students

×

teachers)

psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)

Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

References

Examples

ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students

×

teachers)

psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)

Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

References

Examples

ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression
(individuals)
educational assessment student scores (students

×

teachers)

psychology/sensometrics decisions, responses to stimuli
(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)

Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

Coral protection by symbionts
Number of predation events

Number of blocks

10
8
6

2

2

2

2

1
1

4

0

2

0

shrimp

crabs

0

1
0
none

Symbionts
Ben Bolker
GLMMs

both

References
Denitions

Estimation

Inference

Environmental stress:
0

Anoxia
Osm=12.8

0.03

Challenges  open questions

References

Glycera cell survival

0.1

0.32

0

Anoxia
Osm=22.4

Anoxia
Osm=32

0.03

0.1

0.32

Anoxia
Osm=41.6

Anoxia
Osm=51.2

1.0

133.3

66.6

0.8

33.3
0.6

Copper

0

Normoxia
Osm=12.8

Normoxia
Osm=22.4

Normoxia
Osm=32

Normoxia
Osm=41.6

Normoxia
Osm=51.2

0.4

133.3

66.6

0.2

33.3

0.0

0

0

0.03

0.1

0.32

0

0.03

0.1

H2S

Ben Bolker
GLMMs

0.32

0

0.03

0.1

0.32
Denitions

Estimation

Inference

Challenges  open questions

Arabidopsis response to fertilization  clipping
panel: nutrient, color: genotype
nutrient : 1

nutrient : 8
q
q
q
q

Log(1+fruit set)

3
2
1
0

Ben Bolker
GLMMs

q
q
q
q
q

q
q

q
q

q
q
q

q
q
q

q
q
q

q
q
q

unclipped

4

q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q

q
q

5

clipped

unclipped

clipped

q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q

q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q

q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q

References
Denitions

Estimation

Inference

Challenges  open questions

References

Coral demography
Before
q

Mortality probability

1.00

qqq
q

Experimental
q q
q
q
q q
q qqqqqqqqq qq qq q
qq q
q qqq qq
q
q qqqq qq q q
q
qq qq qq q
q q q qq
q

q

q

q

q

0.75
Treatment
q

0.25

q q qq q
qq q qqq q
qqq q q q
qq
q
q
q q q qq
qq qqqq q q q
q q
qq q q

0.00
0

10

q

q
q

q

20

30

q q
q q qq q q
q
qq qq qq q q qq q
q
q
q
qq qqq qq qq q
q q qqq
qqqqq q q q q
q q
q q qq

q

40

50

0

10

Previous size (cm)

Ben Bolker
GLMMs

20

q

q

q

30

qq q

40

50

Present

q

0.50

Removed
Denitions

Estimation

Inference

Challenges  open questions

Technical denition
conditional
distribution
Yi

∼

Distr

response

η
linear
predictor

b

conditional
modes

Ben Bolker
GLMMs

=

Xβ
xed
eects

(g −1 (η ),
i

φ

)

scale
inverse
parameter
link
function

+

Zb

random
eects

∼ MVN(0, Σ(θ) )
variancecovariance
matrix

References
Denitions

Estimation

Inference

Outline
1

Examples and denitions

2

Estimation
Overview
Methods

3

Inference

4

Challenges  open questions

Ben Bolker
GLMMs

Challenges  open questions

References
Denitions

Estimation

Inference

Challenges  open questions

References

Overview

Maximum likelihood estimation

L(Y |θ, β) =
i

L(Y |β, b )

···

i

likelihood

data|random eects

× L(b |Σ(θ))

d

b

random eects

Best t is a compromise between two components
(consistency of data with

β

and random eects, consistency of

random eect with RE distribution)

Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

Overview

Integrated (marginal) likelihood

L (x|b, β)

Scaled probability

1.0
0.8

L prod

0.6

L (b |σ2)

0.4
0.2
0.0
−10

−5

0

5

conditional mode value (u )
Ben Bolker
GLMMs

10

References
Denitions

Estimation

Inference

Challenges  open questions

Overview

Shrinkage

Mean(log) fruit set

Arabidopsis block estimates
5
9 11 2 5
10 5 7 9 4
q q
4 2 6
q q q q q q
4 6
9
q q
3 9
q q q
4
q q q
10
q q
8
q
q
3 2 10 q
q q q
q

3
0
−3
−15

q q

0

5

10

15

Genotype

Ben Bolker
GLMMs

20

25

References
Denitions

Estimation

Inference

Challenges  open questions

Methods

Estimation methods

deterministic:

precision vs. computational cost:

penalized quasi-likelihood, Laplace approximation, adaptive
Gauss-Hermite quadrature (Breslow, 2004) . . .

stochastic

(Monte Carlo): frequentist and Bayesian (Booth

and Hobert, 1999; Ponciano et al., 2009; Sung, 2007)

Ben Bolker
GLMMs

References
Denitions

Estimation

Inference

Challenges  open questions

References

Methods

Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)

biased

for small unit samples (e.g. counts

 5,

binary or

low-survival data)
widely used: SAS

PROC GLIMMIX,

R

MASS:glmmPQL:

90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM

Ben Bolker
GLMMs

in

≈
Denitions

Estimation

Inference

Challenges  open questions

References

Methods

Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)

biased

for small unit samples (e.g. counts

 5,

binary or

low-survival data)
widely used: SAS

PROC GLIMMIX,

R

MASS:glmmPQL:

90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM

Ben Bolker
GLMMs

in

≈
Denitions

Estimation

Inference

Challenges  open questions

References

Methods

Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)

biased

for small unit samples (e.g. counts

 5,

binary or

low-survival data)
widely used: SAS

PROC GLIMMIX,

R

MASS:glmmPQL:

90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM

Ben Bolker
GLMMs

in

≈
Denitions

Estimation

Inference

Challenges  open questions

References

Methods

Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variances
to calculate weights; estimate LMMs given GLM t (Breslow,
2004)
exible (allows spatial/temporal correlations, crossed REs)

biased

for small unit samples (e.g. counts

 5,

binary or

low-survival data)
widely used: SAS

PROC GLIMMIX,

R

MASS:glmmPQL:

90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM

Ben Bolker
GLMMs

in

≈
Denitions

Estimation

Inference

Challenges  open questions

Methods

Breslow (2004) on PQL
As usual when software for complicated statistical
inference procedures is broadly disseminated, there is
potential for abuse and misinterpretation. In spite of the
fact that PQL was initially advertised as a procedure for
approximate inference in GLMMs, and its tendency to
give seriously biased estimates of variance components
and a fortiori regression parameters with binary outcome
data was emphasized in multiple publications [5, 6, 24],
some statisticians seemed to ignore these warnings and to
think of PQL as synonymous with GLMM.

Ben Bolker
GLMMs

References
Denitions

Estimation

Inference

Challenges  open questions

References

Methods

Laplace approximation

for given

β, θ

(RE parameters), nd conditional modes by

penalized, iterated reweighted least squares;
then use second-order Taylor expansion around the conditional
modes
more accurate than PQL
reasonably fast and exible

lme4:glmer, glmmML, glmmADMB, R2ADMB

Ben Bolker
GLMMs

(AD Model Builder)
Denitions

Estimation

Inference

Challenges  open questions

Methods

Gauss-Hermite quadrature (GHQ)

as above, but compute additional terms in the integral
(typically 8, but often up to 20)
most accurate
slowest, hence not exible (23 RE at most, maybe only 1)

lme4:glmer, glmmML, repeated

Ben Bolker
GLMMs

References
Denitions

Estimation

Inference

Challenges  open questions

Methods

Adaptive vs. non-adaptive GHQ

Adaptive GHQ is more expensive at a given n ,
but makes up for it in accuracy
Ben Bolker
GLMMs

References
Denitions

Estimation

Inference

Challenges  open questions

References

Methods

Stochastic approaches
Mostly Bayesians (Bayesian computation handles
high-dimensional integration)
various avours: Gibbs sampling, MCMC, MCEM, etc.
generally slower but more exible
simplies many inferential problems
must specify priors, assess convergence/error
specialized:

bernor

glmmAK, MCMCglmm

(Hadeld, 2010),

INLA,

glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS),
R2jags, rjags (JAGS)

general:

Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

Methods

Estimation: example (McKeon et al., 2012)
Log−odds of predation
−6

−4

−2

0

2

q
q
q
q
q

Added symbiont
q

q
q
q
q

Crab vs. Shrimp
q

Symbiont

Ben Bolker
GLMMs

q
q

q

q

GLM (fixed)
GLM (pooled)
PQL
Laplace
AGQ

References
Denitions

Estimation

Inference

Outline
1

Examples and denitions

2

Estimation
Overview
Methods

3

Inference

4

Challenges  open questions

Ben Bolker
GLMMs

Challenges  open questions

References
Denitions

Estimation

Inference

Challenges  open questions

Wald tests

Wald

tests (e.g. typical results of

summary)

based on information matrix
assume quadratic log-likelihood surface
exact for regular linear models;
only asymptotically OK for GLM(M)s
computationally cheap
approximation is sometimes awful (Hauck-Donner eect)

Ben Bolker
GLMMs

References
Denitions

Estimation

Inference

Challenges  open questions

2D proles for coral predation
−2
−4
−6
−8 tttboth
−10
−12 0 1 2 3
−2 −6 −4 −2
−4
tttshrimp
−6
−8
−10 0 1 2 3

15

0
−4 −2 0
−2
−4
tttcrabs
−6
−8
0 1 2 3
−10
10

15

10

(Intercept)
5
0

0 1 2 3

2 4 6 8 101214

.sig01 0
−1
−2
−3

Scatter Plot Matrix

Ben Bolker
GLMMs

References
Denitions

Estimation

Inference

Challenges  open questions

References

Likelihood ratio tests

better, but still have to deal with two nite-size problems:

when scale parameter is free (Gamma, etc.), deviance is ∼ F
rather than ∼ χ2 , with poorly dened denominator df
in GLM(M) case, numerator is only asymptotically χ2 anyway
Bartlett corrections (Cordeiro and Ferrari, 1998; Cordeiro
et al., 1994), higher-order asymptotics: cond [neither extended
to GLMMs!]
Prole condence intervals: moderately dicult/fragile

Ben Bolker
GLMMs
Denitions

Estimation

Inference

Challenges  open questions

Parametric bootstrapping

t null model to data
simulate data from null model
t null and working model, compute likelihood dierence
repeat to estimate null distribution
should be OK but ??? not well tested
(assumes estimated parameters are suciently good)

Ben Bolker
GLMMs

References
Denitions

Estimation

Inference

Challenges  open questions

Parametric bootstrap results
0.02 0.06

Inferred p value

H2S

Anoxia
0.08
0.06
0.04
0.02

Osm

Cu

0.08
0.06
0.04
0.02
0.02 0.06

True p value
Ben Bolker
GLMMs

References
Denitions

Estimation

Inference

Challenges  open questions

References

Bayesian approaches

Provided that we have a good sample from the posterior
distribution (Markov chains have converged etc. etc.) we get
most of the inferences we want for free by summarizing the
marginal posteriors
Model selection is still an open question: reversible-jump
MCMC, deviance information criterion

Ben Bolker
GLMMs
Denitions

Estimation

Inference

Outline
1

Examples and denitions

2

Estimation
Overview
Methods

3

Inference

4

Challenges  open questions

Ben Bolker
GLMMs

Challenges  open questions

References
Denitions

Estimation

Inference

Challenges  open questions

Next steps

Dealing with complex random eects:
regularization, model selection, penalized methods
(lasso/fence)
Flexible correlation structures:
spatial, temporal, phylogenetic
hybrid  improved MCMC methods (mcmcsamp, Stan)

Reliable

Ben Bolker
GLMMs

assessment of out-of-sample performance

References
Denitions

Estimation

Inference

Challenges  open questions

Glycera estimates
q q
q q
q

Osm:Cu:H2S:Anoxia
q
q

Cu:H2S:Anoxia

q
q
q
q
q

Osm:Cu:Anoxia
qq

q q
q

Osm:Cu:H2S

q
qq
q
q

H2S:Anoxia

qq
qq
q
q
q
q
q
q

Cu:Anoxia
Osm:Anoxia
Cu:H2S

q
qq

qq
q
q
q

Osm:H2S:Anoxia

q
q
q

qq
q
q
qq
q
qq
qq
q
q
qq
q
q

Osm:H2S
Osm:Cu
Anoxia

qq
qq
q
qq q
q
q

H2S
Cu

q
q
q
q
q

Osm

−60 −40 −20

0

q
q
q
q
q

MCMCglmm
glmer(OD:2)
glmer(OD)
glmmML
glmer

20

Effect on survival

Ben Bolker
GLMMs

40

60

References
Denitions

Estimation

Inference

Challenges  open questions

Acknowledgments

lme4:

Doug Bates, Martin

Mächler, Steve Walker
Data: Adrian Stier (UBC/OSU),

NSERC (Discovery)

Sea McKeon (Smithsonian),

SHARCnet

David Julian (UF), Jada-Simone
White (Univ Hawai'i)

Ben Bolker
GLMMs

References
Denitions

Estimation

Inference

Challenges  open questions

References

Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265285.
doi:10.1111/1467-9868.00176.
Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattle
symposium in biostatistics: Analysis of correlated data, pages 122. Springer. ISBN 0387208623.
Cordeiro, G.M. and Ferrari, S.L.P., 1998. Journal of Statistical Planning and Inference,
71(1-2):261269. ISSN 0378-3758. doi:10.1016/S0378-3758(98)00005-6.
Cordeiro, G.M., Paula, G.A., and Botter, D.A., 1994. International Statistical Review / Revue
Internationale de Statistique, 62(2):257274. ISSN 03067734. doi:10.2307/1403512.
Hadeld, J.D., 2010. Journal of Statistical Software, 33(2):122. ISSN 1548-7660.
McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):10951103. ISSN 0029-8549.
doi:10.1007/s00442-012-2275-2.
Pinheiro, J.C. and Bates, D.M., 1996. Statistics and Computing, 6(3):289296.
doi:10.1007/BF00140873.
Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356362. ISSN 0012-9658.
Sung, Y.J., 2007. The Annals of Statistics, 35(3):9901011. ISSN 0090-5364.
doi:10.1214/009053606000001389.

Ben Bolker
GLMMs

Stats sem 2013

  • 1.
    Denitions Estimation Inference Challenges openquestions Generalized linear mixed models: overview and open questions Ben Bolker McMaster University, Mathematics Statistics and Biology 12 November 2013 Ben Bolker GLMMs References
  • 2.
  • 3.
  • 4.
    Denitions Estimation Inference Challenges openquestions References (Generalized) linear mixed models (G)LMMs: a statistical modeling framework incorporating: Linear combinations of categorical and continuous predictors, and interactions Response distributions in the exponential family (binomial, Poisson, and extensions) Any smooth, monotonic link function (e.g. logistic, exponential models) Flexible combinations of blocking factors (clustering; random eects) Applications in ecology, neurobiology, behaviour, epidemiology, real estate, . . . Ben Bolker GLMMs
  • 5.
    Denitions Estimation Inference Challenges openquestions References (Generalized) linear mixed models (G)LMMs: a statistical modeling framework incorporating: Linear combinations of categorical and continuous predictors, and interactions Response distributions in the exponential family (binomial, Poisson, and extensions) Any smooth, monotonic link function (e.g. logistic, exponential models) Flexible combinations of blocking factors (clustering; random eects) Applications in ecology, neurobiology, behaviour, epidemiology, real estate, . . . Ben Bolker GLMMs
  • 6.
    Denitions Estimation Inference Challenges openquestions References (Generalized) linear mixed models (G)LMMs: a statistical modeling framework incorporating: Linear combinations of categorical and continuous predictors, and interactions Response distributions in the exponential family (binomial, Poisson, and extensions) Any smooth, monotonic link function (e.g. logistic, exponential models) Flexible combinations of blocking factors (clustering; random eects) Applications in ecology, neurobiology, behaviour, epidemiology, real estate, . . . Ben Bolker GLMMs
  • 7.
    Denitions Estimation Inference Challenges openquestions References (Generalized) linear mixed models (G)LMMs: a statistical modeling framework incorporating: Linear combinations of categorical and continuous predictors, and interactions Response distributions in the exponential family (binomial, Poisson, and extensions) Any smooth, monotonic link function (e.g. logistic, exponential models) Flexible combinations of blocking factors (clustering; random eects) Applications in ecology, neurobiology, behaviour, epidemiology, real estate, . . . Ben Bolker GLMMs
  • 8.
    Denitions Estimation Inference Challenges openquestions References Examples ecology survival, predation, etc. (experimental plots) genomics presence/absence of polymorphisms, gene expression (individuals) educational assessment student scores (students × teachers) psychology/sensometrics decisions, responses to stimuli (individuals) epidemiology disease prevalence (postal codes, provinces, countries) Ben Bolker GLMMs
  • 9.
    Denitions Estimation Inference Challenges openquestions References Examples ecology survival, predation, etc. (experimental plots) genomics presence/absence of polymorphisms, gene expression (individuals) educational assessment student scores (students × teachers) psychology/sensometrics decisions, responses to stimuli (individuals) epidemiology disease prevalence (postal codes, provinces, countries) Ben Bolker GLMMs
  • 10.
    Denitions Estimation Inference Challenges openquestions References Examples ecology survival, predation, etc. (experimental plots) genomics presence/absence of polymorphisms, gene expression (individuals) educational assessment student scores (students × teachers) psychology/sensometrics decisions, responses to stimuli (individuals) epidemiology disease prevalence (postal codes, provinces, countries) Ben Bolker GLMMs
  • 11.
    Denitions Estimation Inference Challenges openquestions References Examples ecology survival, predation, etc. (experimental plots) genomics presence/absence of polymorphisms, gene expression (individuals) educational assessment student scores (students × teachers) psychology/sensometrics decisions, responses to stimuli (individuals) epidemiology disease prevalence (postal codes, provinces, countries) Ben Bolker GLMMs
  • 12.
    Denitions Estimation Inference Challenges openquestions References Examples ecology survival, predation, etc. (experimental plots) genomics presence/absence of polymorphisms, gene expression (individuals) educational assessment student scores (students × teachers) psychology/sensometrics decisions, responses to stimuli (individuals) epidemiology disease prevalence (postal codes, provinces, countries) Ben Bolker GLMMs
  • 13.
    Denitions Estimation Inference Challenges openquestions Coral protection by symbionts Number of predation events Number of blocks 10 8 6 2 2 2 2 1 1 4 0 2 0 shrimp crabs 0 1 0 none Symbionts Ben Bolker GLMMs both References
  • 14.
    Denitions Estimation Inference Environmental stress: 0 Anoxia Osm=12.8 0.03 Challenges open questions References Glycera cell survival 0.1 0.32 0 Anoxia Osm=22.4 Anoxia Osm=32 0.03 0.1 0.32 Anoxia Osm=41.6 Anoxia Osm=51.2 1.0 133.3 66.6 0.8 33.3 0.6 Copper 0 Normoxia Osm=12.8 Normoxia Osm=22.4 Normoxia Osm=32 Normoxia Osm=41.6 Normoxia Osm=51.2 0.4 133.3 66.6 0.2 33.3 0.0 0 0 0.03 0.1 0.32 0 0.03 0.1 H2S Ben Bolker GLMMs 0.32 0 0.03 0.1 0.32
  • 15.
    Denitions Estimation Inference Challenges openquestions Arabidopsis response to fertilization clipping panel: nutrient, color: genotype nutrient : 1 nutrient : 8 q q q q Log(1+fruit set) 3 2 1 0 Ben Bolker GLMMs q q q q q q q q q q q q q q q q q q q q q unclipped 4 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 5 clipped unclipped clipped q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q References
  • 16.
    Denitions Estimation Inference Challenges openquestions References Coral demography Before q Mortality probability 1.00 qqq q Experimental q q q q q q q qqqqqqqqq qq qq q qq q q qqq qq q q qqqq qq q q q qq qq qq q q q q qq q q q q q 0.75 Treatment q 0.25 q q qq q qq q qqq q qqq q q q qq q q q q q qq qq qqqq q q q q q qq q q 0.00 0 10 q q q q 20 30 q q q q qq q q q qq qq qq q q qq q q q q qq qqq qq qq q q q qqq qqqqq q q q q q q q q qq q 40 50 0 10 Previous size (cm) Ben Bolker GLMMs 20 q q q 30 qq q 40 50 Present q 0.50 Removed
  • 17.
    Denitions Estimation Inference Challenges openquestions Technical denition conditional distribution Yi ∼ Distr response η linear predictor b conditional modes Ben Bolker GLMMs = Xβ xed eects (g −1 (η ), i φ ) scale inverse parameter link function + Zb random eects ∼ MVN(0, Σ(θ) ) variancecovariance matrix References
  • 18.
  • 19.
    Denitions Estimation Inference Challenges openquestions References Overview Maximum likelihood estimation L(Y |θ, β) = i L(Y |β, b ) ··· i likelihood data|random eects × L(b |Σ(θ)) d b random eects Best t is a compromise between two components (consistency of data with β and random eects, consistency of random eect with RE distribution) Ben Bolker GLMMs
  • 20.
    Denitions Estimation Inference Challenges openquestions Overview Integrated (marginal) likelihood L (x|b, β) Scaled probability 1.0 0.8 L prod 0.6 L (b |σ2) 0.4 0.2 0.0 −10 −5 0 5 conditional mode value (u ) Ben Bolker GLMMs 10 References
  • 21.
    Denitions Estimation Inference Challenges openquestions Overview Shrinkage Mean(log) fruit set Arabidopsis block estimates 5 9 11 2 5 10 5 7 9 4 q q 4 2 6 q q q q q q 4 6 9 q q 3 9 q q q 4 q q q 10 q q 8 q q 3 2 10 q q q q q 3 0 −3 −15 q q 0 5 10 15 Genotype Ben Bolker GLMMs 20 25 References
  • 22.
    Denitions Estimation Inference Challenges openquestions Methods Estimation methods deterministic: precision vs. computational cost: penalized quasi-likelihood, Laplace approximation, adaptive Gauss-Hermite quadrature (Breslow, 2004) . . . stochastic (Monte Carlo): frequentist and Bayesian (Booth and Hobert, 1999; Ponciano et al., 2009; Sung, 2007) Ben Bolker GLMMs References
  • 23.
    Denitions Estimation Inference Challenges openquestions References Methods Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM t (Breslow, 2004) exible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: 90% of small-unit-sample cases descendants: higher-order PQL, hierarchical GLM Ben Bolker GLMMs in ≈
  • 24.
    Denitions Estimation Inference Challenges openquestions References Methods Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM t (Breslow, 2004) exible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: 90% of small-unit-sample cases descendants: higher-order PQL, hierarchical GLM Ben Bolker GLMMs in ≈
  • 25.
    Denitions Estimation Inference Challenges openquestions References Methods Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM t (Breslow, 2004) exible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: 90% of small-unit-sample cases descendants: higher-order PQL, hierarchical GLM Ben Bolker GLMMs in ≈
  • 26.
    Denitions Estimation Inference Challenges openquestions References Methods Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM t (Breslow, 2004) exible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: 90% of small-unit-sample cases descendants: higher-order PQL, hierarchical GLM Ben Bolker GLMMs in ≈
  • 27.
    Denitions Estimation Inference Challenges openquestions Methods Breslow (2004) on PQL As usual when software for complicated statistical inference procedures is broadly disseminated, there is potential for abuse and misinterpretation. In spite of the fact that PQL was initially advertised as a procedure for approximate inference in GLMMs, and its tendency to give seriously biased estimates of variance components and a fortiori regression parameters with binary outcome data was emphasized in multiple publications [5, 6, 24], some statisticians seemed to ignore these warnings and to think of PQL as synonymous with GLMM. Ben Bolker GLMMs References
  • 28.
    Denitions Estimation Inference Challenges openquestions References Methods Laplace approximation for given β, θ (RE parameters), nd conditional modes by penalized, iterated reweighted least squares; then use second-order Taylor expansion around the conditional modes more accurate than PQL reasonably fast and exible lme4:glmer, glmmML, glmmADMB, R2ADMB Ben Bolker GLMMs (AD Model Builder)
  • 29.
    Denitions Estimation Inference Challenges openquestions Methods Gauss-Hermite quadrature (GHQ) as above, but compute additional terms in the integral (typically 8, but often up to 20) most accurate slowest, hence not exible (23 RE at most, maybe only 1) lme4:glmer, glmmML, repeated Ben Bolker GLMMs References
  • 30.
    Denitions Estimation Inference Challenges openquestions Methods Adaptive vs. non-adaptive GHQ Adaptive GHQ is more expensive at a given n , but makes up for it in accuracy Ben Bolker GLMMs References
  • 31.
    Denitions Estimation Inference Challenges openquestions References Methods Stochastic approaches Mostly Bayesians (Bayesian computation handles high-dimensional integration) various avours: Gibbs sampling, MCMC, MCEM, etc. generally slower but more exible simplies many inferential problems must specify priors, assess convergence/error specialized: bernor glmmAK, MCMCglmm (Hadeld, 2010), INLA, glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS), R2jags, rjags (JAGS) general: Ben Bolker GLMMs
  • 32.
    Denitions Estimation Inference Challenges openquestions Methods Estimation: example (McKeon et al., 2012) Log−odds of predation −6 −4 −2 0 2 q q q q q Added symbiont q q q q q Crab vs. Shrimp q Symbiont Ben Bolker GLMMs q q q q GLM (fixed) GLM (pooled) PQL Laplace AGQ References
  • 33.
  • 34.
    Denitions Estimation Inference Challenges openquestions Wald tests Wald tests (e.g. typical results of summary) based on information matrix assume quadratic log-likelihood surface exact for regular linear models; only asymptotically OK for GLM(M)s computationally cheap approximation is sometimes awful (Hauck-Donner eect) Ben Bolker GLMMs References
  • 35.
    Denitions Estimation Inference Challenges openquestions 2D proles for coral predation −2 −4 −6 −8 tttboth −10 −12 0 1 2 3 −2 −6 −4 −2 −4 tttshrimp −6 −8 −10 0 1 2 3 15 0 −4 −2 0 −2 −4 tttcrabs −6 −8 0 1 2 3 −10 10 15 10 (Intercept) 5 0 0 1 2 3 2 4 6 8 101214 .sig01 0 −1 −2 −3 Scatter Plot Matrix Ben Bolker GLMMs References
  • 36.
    Denitions Estimation Inference Challenges openquestions References Likelihood ratio tests better, but still have to deal with two nite-size problems: when scale parameter is free (Gamma, etc.), deviance is ∼ F rather than ∼ χ2 , with poorly dened denominator df in GLM(M) case, numerator is only asymptotically χ2 anyway Bartlett corrections (Cordeiro and Ferrari, 1998; Cordeiro et al., 1994), higher-order asymptotics: cond [neither extended to GLMMs!] Prole condence intervals: moderately dicult/fragile Ben Bolker GLMMs
  • 37.
    Denitions Estimation Inference Challenges openquestions Parametric bootstrapping t null model to data simulate data from null model t null and working model, compute likelihood dierence repeat to estimate null distribution should be OK but ??? not well tested (assumes estimated parameters are suciently good) Ben Bolker GLMMs References
  • 38.
    Denitions Estimation Inference Challenges openquestions Parametric bootstrap results 0.02 0.06 Inferred p value H2S Anoxia 0.08 0.06 0.04 0.02 Osm Cu 0.08 0.06 0.04 0.02 0.02 0.06 True p value Ben Bolker GLMMs References
  • 39.
    Denitions Estimation Inference Challenges openquestions References Bayesian approaches Provided that we have a good sample from the posterior distribution (Markov chains have converged etc. etc.) we get most of the inferences we want for free by summarizing the marginal posteriors Model selection is still an open question: reversible-jump MCMC, deviance information criterion Ben Bolker GLMMs
  • 40.
  • 41.
    Denitions Estimation Inference Challenges openquestions Next steps Dealing with complex random eects: regularization, model selection, penalized methods (lasso/fence) Flexible correlation structures: spatial, temporal, phylogenetic hybrid improved MCMC methods (mcmcsamp, Stan) Reliable Ben Bolker GLMMs assessment of out-of-sample performance References
  • 42.
    Denitions Estimation Inference Challenges openquestions Glycera estimates q q q q q Osm:Cu:H2S:Anoxia q q Cu:H2S:Anoxia q q q q q Osm:Cu:Anoxia qq q q q Osm:Cu:H2S q qq q q H2S:Anoxia qq qq q q q q q q Cu:Anoxia Osm:Anoxia Cu:H2S q qq qq q q q Osm:H2S:Anoxia q q q qq q q qq q qq qq q q qq q q Osm:H2S Osm:Cu Anoxia qq qq q qq q q q H2S Cu q q q q q Osm −60 −40 −20 0 q q q q q MCMCglmm glmer(OD:2) glmer(OD) glmmML glmer 20 Effect on survival Ben Bolker GLMMs 40 60 References
  • 43.
    Denitions Estimation Inference Challenges openquestions Acknowledgments lme4: Doug Bates, Martin Mächler, Steve Walker Data: Adrian Stier (UBC/OSU), NSERC (Discovery) Sea McKeon (Smithsonian), SHARCnet David Julian (UF), Jada-Simone White (Univ Hawai'i) Ben Bolker GLMMs References
  • 44.
    Denitions Estimation Inference Challenges openquestions References Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265285. doi:10.1111/1467-9868.00176. Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattle symposium in biostatistics: Analysis of correlated data, pages 122. Springer. ISBN 0387208623. Cordeiro, G.M. and Ferrari, S.L.P., 1998. Journal of Statistical Planning and Inference, 71(1-2):261269. ISSN 0378-3758. doi:10.1016/S0378-3758(98)00005-6. Cordeiro, G.M., Paula, G.A., and Botter, D.A., 1994. International Statistical Review / Revue Internationale de Statistique, 62(2):257274. ISSN 03067734. doi:10.2307/1403512. Hadeld, J.D., 2010. Journal of Statistical Software, 33(2):122. ISSN 1548-7660. McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):10951103. ISSN 0029-8549. doi:10.1007/s00442-012-2275-2. Pinheiro, J.C. and Bates, D.M., 1996. Statistics and Computing, 6(3):289296. doi:10.1007/BF00140873. Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356362. ISSN 0012-9658. Sung, Y.J., 2007. The Annals of Statistics, 35(3):9901011. ISSN 0090-5364. doi:10.1214/009053606000001389. Ben Bolker GLMMs