lme4: interface, testing, and community issues
Ben Bolker, McMaster University
Departments of Mathematics & Statistics and Biology
15 April 2014
Outline
Introduction
Interface issues
User guidance
Testing
Future directions
lme4
R package for mixed models
linear, generalized, nonlinear
speed and generality
alternatives (also see
http://glmm.wikidot.com/pkg-comparison)
R: MCMCglmm, glmmADMB, hglm, others
other: AD Model Builder, Stata (GLAMM, xtmixed,
xtmelogit), AS-REML, MLWiN, HLM, SAS PROC
GLIMMIX/MIXED/NLMIXED, NIMBLE (http:
//www.slideshare.net/dlebauer/de-valpine-nimble)
Bayesian frameworks: INLA, BUGS (JAGS: glm module), Stan
Features
formula interface
scalar and vector random eects
GLMMs: basic + user-specied family/link functions
extract deviance function
standard accessors: xed and random coecients, residuals etc
predict and simulate methods
likelihood proling and parametric bootstrapping
Downstream packages
afex
agridat
AICcmodavg
aod
aods3
arm
BayesFactor
Bayesthresh
BBRecapture
benchmark
blme
boss
BradleyTerry2
car
catdata
clusterPower
DAAG difR
dlnm
doBy
effects
expp
ez
flexmix
gamm4
glmulti
gmodels
GWAF
HLMdiag
HSAUR
HSAUR2
influence.ME
irtrees
kulife
kyotil
languageR
lava
lme4
LMERConvenienceFunctions
lmerTest
longpower
lsmeans
mediation
MEMSS
metafor
Metatron
MethComp
mi
mice
miceadds
mixAK
mixlm
MixMAP
mlmRev
MPDiR
multcomp
multiDimBio
MuMIn
NanoStringNorm
nonrandom
ordinal
pamm
pan
papeR
PBImisc
pbkrtest
pedigreemm
phia
phmm
polytomous
prLogistic
R2admbR2STATS
RcmdrPlugin.NMBU
refund
RLRsim robustlmm
RVAideMemoire
SASmixed
sirt
spacomSPOT
Surrogate
texreg
TripleR
ZeligMultilevel
Outline
Introduction
Interface issues
User guidance
Testing
Future directions
Challenges
Wide range of users/developers
Evolving goals
R is a hacker language . . .
choice of object-orientation systems: (S3/S4/ref class)
fortunes::fortune(121)
Rolf Turner: If you want to simultaneously handcu
yourself, strap yourself into a strait jacket, and tie
yourself in knots, and moreover write code which is
incomprehensible to the human mind, then S4
methods are indeed the way to go.
Goals
Simplicity for end-users (formula interface)
Flexibility for downstream developers (modular chunks)
wrappers (ez, afex)
inference and diagnostics (pbkrtest, lmerTest)
extended models (pedigreemm, blme)
Modularity for core development/maintenance
Stability
Layers
i linear algebra: RcppEigen/CHOLMOD
ii PWRSS/PIRLS computations
iii nonlinear optimization
iv API/formula interface, higher-level functions
(proling, bootstrap, etc.)
Modular structure
(g)lFormula formula plus data → model elements
(model frame, X, ReTrms ={Zt, Lambdat, Lind . . . })
mk(Gl|L)merDevfun model elements → deviance function
(layers i and ii)
optimize(Gl|L)mer deviance function + starting conditions →
estimates of θ and β
(layer iii)
mkMerMod optimization results → merMod object
getME general-purpose accessor function
Modularity in action
lmod - lFormula(Reaction ~ Days + (Days | Subject),
sleepstudy)
names(lmod)
## [1] fr X reTrms REML
## [5] formula
devfun - do.call(mkLmerDevfun, lmod)
(opt - optimizeLmer(devfun))
## parameter estimates: 0.967 0.0152 0.231
## objective: 1744
## number of function evaluations: 98
result - mkMerMod(environment(devfun), opt, lmod$reTrms,
fr = lmod$fr)
Fit with pseudo-xed eects
lmod2 - lFormula(Reaction ~ Days + (1 | Subject) +
(0 + Days | Subject), sleepstudy)
devfun2 - do.call(mkLmerDevfun, lmod2)
tmpf - function(th) devfun2(c(20, th))
minqa::bobyqa(par = 1, fn = tmpf, lower = 0)
## parameter estimates: 0.248
## objective: 1824
## number of function evaluations: 22
Is it working?
most downstream packages successfully ported to v 1.0
most users weaned from @ accessors (?)
development seems easier
will we be able to make large internal changes?
Outline
Introduction
Interface issues
User guidance
Testing
Future directions
Design issues
Prevent/warn of silly usage
Unidentiable models
(e.g. rank-decient, single level per random eect)
Ill-advised models
(e.g. small number of random eect levels)
Prevent/warn of bad ts
Recent changes
v. ???: move from nlminb to other default optimizers (no
more false convergence warnings)
v. ???: introduce pre-t checking
v. 1.0-1: loosen pre-t checks
v. 1.0-5: introduce convergence checks
soon: loosen/restructure convergence checks
(use relative rather than absolute gradients)
Open questions: gradient, Hessian calculations?
Problems
Computational overhead (e.g. rank-checking)
Unusual use cases
Detecting and identifying tting problems
Model use issues
Inference for mixed models
is tough
(e.g. the great
degrees-of-freedom debate)
Ethics: should you provide
questionable, imperfect, or
poorly understood
methods?
(e.g. Wald intervals;
standard errors on
predictions)
. . . or should you let your
users ounder?
Roz Chast
Outline
Introduction
Interface issues
User guidance
Testing
Future directions
Testing
Computational core is all oating-point
Small dierences between platforms, compilers, etc.. . . .
. . . but there are many unstable cases
have to go beyond unit tests
test examples are large, slow, and sometimes condential
Outline
Introduction
Interface issues
User guidance
Testing
Future directions
Model extensions
non-linear tting (present but underdeveloped)
negative binomial, zero-inated models:
EM/iterative algorithms or add to level III
exible variance structures: flexLambda branch
structure in residuals (R-side)
Open questions
restore post hoc MCMC sampling?
other (faster) methods for inference and
limitations of formula interface
how important is GHQ?
The really big picture
Switch to Julia, or ??
Commodities
Computational linear algebra
Nonlinear optimizers
Language-switching: interface friction
Advantages of established framework

Google lme4