Google lme4


Published on

Published in: Science, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Google lme4

  1. 1. lme4: interface, testing, and community issues Ben Bolker, McMaster University Departments of Mathematics & Statistics and Biology 15 April 2014
  2. 2. Outline Introduction Interface issues User guidance Testing Future directions
  3. 3. lme4 R package for mixed models linear, generalized, nonlinear speed and generality alternatives (also see R: MCMCglmm, glmmADMB, hglm, others other: AD Model Builder, Stata (GLAMM, xtmixed, xtmelogit), AS-REML, MLWiN, HLM, SAS PROC GLIMMIX/MIXED/NLMIXED, NIMBLE (http: // Bayesian frameworks: INLA, BUGS (JAGS: glm module), Stan
  4. 4. Features formula interface scalar and vector random eects GLMMs: basic + user-specied family/link functions extract deviance function standard accessors: xed and random coecients, residuals etc predict and simulate methods likelihood proling and parametric bootstrapping
  5. 5. Downstream packages afex agridat AICcmodavg aod aods3 arm BayesFactor Bayesthresh BBRecapture benchmark blme boss BradleyTerry2 car catdata clusterPower DAAG difR dlnm doBy effects expp ez flexmix gamm4 glmulti gmodels GWAF HLMdiag HSAUR HSAUR2 influence.ME irtrees kulife kyotil languageR lava lme4 LMERConvenienceFunctions lmerTest longpower lsmeans mediation MEMSS metafor Metatron MethComp mi mice miceadds mixAK mixlm MixMAP mlmRev MPDiR multcomp multiDimBio MuMIn NanoStringNorm nonrandom ordinal pamm pan papeR PBImisc pbkrtest pedigreemm phia phmm polytomous prLogistic R2admbR2STATS RcmdrPlugin.NMBU refund RLRsim robustlmm RVAideMemoire SASmixed sirt spacomSPOT Surrogate texreg TripleR ZeligMultilevel
  6. 6. Outline Introduction Interface issues User guidance Testing Future directions
  7. 7. Challenges Wide range of users/developers Evolving goals R is a hacker language . . . choice of object-orientation systems: (S3/S4/ref class) fortunes::fortune(121) Rolf Turner: If you want to simultaneously handcu yourself, strap yourself into a strait jacket, and tie yourself in knots, and moreover write code which is incomprehensible to the human mind, then S4 methods are indeed the way to go.
  8. 8. Goals Simplicity for end-users (formula interface) Flexibility for downstream developers (modular chunks) wrappers (ez, afex) inference and diagnostics (pbkrtest, lmerTest) extended models (pedigreemm, blme) Modularity for core development/maintenance Stability
  9. 9. Layers i linear algebra: RcppEigen/CHOLMOD ii PWRSS/PIRLS computations iii nonlinear optimization iv API/formula interface, higher-level functions (proling, bootstrap, etc.)
  10. 10. Modular structure (g)lFormula formula plus data → model elements (model frame, X, ReTrms ={Zt, Lambdat, Lind . . . }) mk(Gl|L)merDevfun model elements → deviance function (layers i and ii) optimize(Gl|L)mer deviance function + starting conditions → estimates of θ and β (layer iii) mkMerMod optimization results → merMod object getME general-purpose accessor function
  11. 11. Modularity in action lmod - lFormula(Reaction ~ Days + (Days | Subject), sleepstudy) names(lmod) ## [1] fr X reTrms REML ## [5] formula devfun -, lmod) (opt - optimizeLmer(devfun)) ## parameter estimates: 0.967 0.0152 0.231 ## objective: 1744 ## number of function evaluations: 98 result - mkMerMod(environment(devfun), opt, lmod$reTrms, fr = lmod$fr)
  12. 12. Fit with pseudo-xed eects lmod2 - lFormula(Reaction ~ Days + (1 | Subject) + (0 + Days | Subject), sleepstudy) devfun2 -, lmod2) tmpf - function(th) devfun2(c(20, th)) minqa::bobyqa(par = 1, fn = tmpf, lower = 0) ## parameter estimates: 0.248 ## objective: 1824 ## number of function evaluations: 22
  13. 13. Is it working? most downstream packages successfully ported to v 1.0 most users weaned from @ accessors (?) development seems easier will we be able to make large internal changes?
  14. 14. Outline Introduction Interface issues User guidance Testing Future directions
  15. 15. Design issues Prevent/warn of silly usage Unidentiable models (e.g. rank-decient, single level per random eect) Ill-advised models (e.g. small number of random eect levels) Prevent/warn of bad ts
  16. 16. Recent changes v. ???: move from nlminb to other default optimizers (no more false convergence warnings) v. ???: introduce pre-t checking v. 1.0-1: loosen pre-t checks v. 1.0-5: introduce convergence checks soon: loosen/restructure convergence checks (use relative rather than absolute gradients) Open questions: gradient, Hessian calculations?
  17. 17. Problems Computational overhead (e.g. rank-checking) Unusual use cases Detecting and identifying tting problems
  18. 18. Model use issues Inference for mixed models is tough (e.g. the great degrees-of-freedom debate) Ethics: should you provide questionable, imperfect, or poorly understood methods? (e.g. Wald intervals; standard errors on predictions) . . . or should you let your users ounder? Roz Chast
  19. 19. Outline Introduction Interface issues User guidance Testing Future directions
  20. 20. Testing Computational core is all oating-point Small dierences between platforms, compilers, etc.. . . . . . . but there are many unstable cases have to go beyond unit tests test examples are large, slow, and sometimes condential
  21. 21. Outline Introduction Interface issues User guidance Testing Future directions
  22. 22. Model extensions non-linear tting (present but underdeveloped) negative binomial, zero-inated models: EM/iterative algorithms or add to level III exible variance structures: flexLambda branch structure in residuals (R-side)
  23. 23. Open questions restore post hoc MCMC sampling? other (faster) methods for inference and limitations of formula interface how important is GHQ?
  24. 24. The really big picture Switch to Julia, or ?? Commodities Computational linear algebra Nonlinear optimizers Language-switching: interface friction Advantages of established framework