SlideShare a Scribd company logo
1 of 51
Download to read offline
Precursors           GLMMs                Results                   Conclusions                   References




             Open-source tools for estimation and inference
                using generalized linear mixed models

                                      Ben Bolker

                                   McMaster University
                    Departments of Mathematics & Statistics and Biology


                                      7 April 2011




Ben Bolker                           McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors            GLMMs            Results                   Conclusions                   References




Outline
       1 Precursors
             Examples
             Definitions
       2 GLMMs
             Estimation
             Inference: tests
             Inference: confidence intervals
       3 Results
             Glycera
             Arabidopsis
       4 Conclusions

Ben Bolker                        McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors            GLMMs            Results                   Conclusions                   References



Examples


Outline
       1 Precursors
             Examples
             Definitions
       2 GLMMs
             Estimation
             Inference: tests
             Inference: confidence intervals
       3 Results
             Glycera
             Arabidopsis
       4 Conclusions

Ben Bolker                        McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                       GLMMs                  Results                   Conclusions                   References



Examples


Coral protection by symbionts

                                     Number of predation events
                           10

                            8                                                2
        Number of blocks




                                            2
                                                             2
                            6    2
                                                                             1
                                            1
                            4
                                                                             0
                            2               0                0
                                 1
                            0
                                none      shrimp          crabs            both

                                                Symbionts


Ben Bolker                                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                                       GLMMs                                            Results                                            Conclusions     References



Examples


Environmental stress: Glycera cell survival
                                                 0    0.03   0.1   0.32                            0    0.03   0.1   0.32


                             Anoxia                   Anoxia                   Anoxia                   Anoxia                   Anoxia
                            Osm=12.8                 Osm=22.4                 Osm=32                   Osm=41.6                 Osm=51.2                       1.0



                                                                                                                                                       133.3




                                                                                                                                                       66.6    0.8




                                                                                                                                                       33.3



                                                                                                                                                               0.6
                                                                                                                                                       0
       Copper




                            Normoxia                 Normoxia                 Normoxia                 Normoxia                 Normoxia
                            Osm=12.8                 Osm=22.4                 Osm=32                   Osm=41.6                 Osm=51.2
                                                                                                                                                               0.4

                133.3




                 66.6
                                                                                                                                                               0.2



                 33.3




                   0                                                                                                                                           0.0




                        0    0.03   0.1   0.32                            0   0.03   0.1   0.32                             0    0.03   0.1   0.32


                                                                                H2S


Ben Bolker                                                                             McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                              GLMMs                  Results                      Conclusions                References



Examples


Arabidopsis response to fertilization & clipping
                                 panel: nutrient, color: genotype

                                       nutrient : 1                      nutrient : 8
                                                                                        q
                                                                                        q
                                                                    q                   q
                                                                                        q
                                                                    q                   q
                                                                    q                   q
                           5       q
                                   q
                                   q
                                                      q             q
                                                                    q
                                                                    q
                                                                    q
                                                                    q
                                                                    q
                                                                                        q
                                                                                        q
                                                                                        q
                                                                                        q
                                                                                        q
                                   q                                q
                                                                    q                   q
                                                                                        q
                                   q                                q
                                                                    q                   q
                                                                                        q
                                                                    q
                                                                    q                   q
                                   q                  q
                                                      q             q
                                                                    q                   q
                                   q                                q
                                                                    q                   q
                                                                                        q
                                   q
                                   q                  q             q                   q
                                                                                        q
                                                      q             q
                                                                    q                   q
                                                                                        q
                                                                    q
        Log(1+fruit set)




                                   q                  q             q                   q
                                                                                        q
                           4       q
                                   q
                                                      q
                                                      q
                                                      q
                                                      q
                                                                    q
                                                                    q
                                                                    q
                                                                    q
                                                                    q
                                                                    q
                                                                                        q
                                                                                        q
                                                                                        q
                                                                                        q
                                                                                        q
                                                                                        q
                                                                                        q
                                   q                  q             q
                                                                    q                   q
                                                                                        q
                                                      q             q
                                                                    q                   q
                                   q                  q             q
                                                                    q                   q
                                   q
                                   q                  q                                 q
                                                                                        q
                                   q                  q             q
                                                                    q                   q
                                                                                        q
                                   q
                                   q                  q             q                   q
                                   q
                                   q                  q             q
                                                                    q                   q
                                   q
                                   q                  q
                                                      q             q
                           3       q
                                   q
                                   q
                                   q
                                   q                  q
                                                      q
                                                                    q
                                                                    q
                                                                    q
                                                                    q
                                                                                        q

                                   q
                                   q                                                    q
                                                                                        q
                                   q
                                   q                  q
                                                      q             q                   q
                                   q                  q                                 q
                                                                                        q
                                                      q
                                                      q
                                                      q             q
                                   q                  q             q
                                                                    q                   q
                                                                                        q
                                   q
                                   q                  q
                                                      q                                 q
                                   q                  q                                 q
                                                                                        q
                                   q                  q             q                   q
                                   q                  q             q
                                                                    q                   q
                                                                                        q
                           2       q
                                   q
                                   q
                                   q
                                   q
                                   q
                                                      q
                                                      q
                                                      q
                                                      q
                                                      q
                                                                    q
                                                                    q
                                                                                        q
                                                                                        q
                                                                                        q
                                                                                        q
                                                                    q                   q
                                   q                  q             q
                                   q                  q
                                   q
                                   q                  q
                                                      q             q                   q
                                   q                  q             q
                                   q
                                   q
                                   q                  q
                                                      q             q
                                                                    q
                           1                          q             q

                                   q
                                   q                  q             q                   q
                                   q                  q




                           0       q
                                   q
                                   q
                                                      q
                                                      q
                                                      q
                                                                    q
                                                                    q
                                                                    q
                                                                                        q
                                                                                        q
                                                                                        q



                               unclipped        clipped         unclipped         clipped



Ben Bolker                                                McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors          GLMMs             Results                   Conclusions                   References



Examples


Glossary: data


       Fixed effects Predictors where interest is in specific levels
       Random effects (RE) predictors where interest is in distribution
                  rather than levels (blocks) (Gelman, 2005)
        Crossed RE multiple REs where levels of one occur in more than
                   one level of another (ex.: block × year: cf. nested)
                   http://lme4.r-forge.r-project.org/book/,
                   Pinheiro and Bates (2000)




Ben Bolker                       McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors             GLMMs              Results                 Conclusions                   References



Examples


Data challenges


           Estimation                        Computation              Inference
           Small # RE levels (<5–6)          Large n                  Small N (< 40)
           Overdispersion                    Multiple REs             Small n
           Crossed REs                       Crossed REs
           Spatial/temporal
             correlation
           Unusual distributions
             (Gamma, neg. binom . . . )




Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors            GLMMs            Results                   Conclusions                   References



Definitions


Outline
       1 Precursors
             Examples
             Definitions
       2 GLMMs
             Estimation
             Inference: tests
             Inference: confidence intervals
       3 Results
             Glycera
             Arabidopsis
       4 Conclusions

Ben Bolker                        McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors             GLMMs               Results                   Conclusions                   References



Definitions


Generalized linear models


             Distributions from exponential family
             (Poisson, binomial, Gaussian, Gamma,
             neg. binomial (known k) . . . )
             Means = linear functions of predictors
             on scale of link function (identity, log, logit, . . . )

                                     Y ∼ D(g −1 (Xβ), φ)
             φ often set to 1 (Poisson, binomial) except for
             quasilikelihood approaches


Ben Bolker                            McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors          GLMMs              Results                   Conclusions                   References



Definitions


Generalized linear mixed models



       Add random effects:
                            Y ∼ D(g −1 (Xβ + Zu), φ)
                            u ∼ MVN(0, Σ)

       Synonyms: multilevel, hierarchical models




Ben Bolker                        McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors             GLMMs                Results                   Conclusions                   References



Definitions


Marginal likelihood


       Likelihood (Prob(data|parameters)) — requires integrating over
       possible values of REs to get marginal likelihood e.g.:
             likelihood of i th obs. in block j is L(xij |θi , σw )
                                                                2

                                                                   2
             likelihood of a particular block mean θj is L(θj |0, σb )
             marginal likelihood is                   2            2
                                         L(xij |θj , σw )L(θj |0, σb ) dθj
       Balance (dispersion of RE around 0) with (dispersion of data
       conditional on RE)




Ben Bolker                             McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors             GLMMs                Results                   Conclusions                   References



Definitions


Marginal likelihood


       Likelihood (Prob(data|parameters)) — requires integrating over
       possible values of REs to get marginal likelihood e.g.:
             likelihood of i th obs. in block j is L(xij |θi , σw )
                                                                2

                                                                   2
             likelihood of a particular block mean θj is L(θj |0, σb )
             marginal likelihood is                   2            2
                                         L(xij |θj , σw )L(θj |0, σb ) dθj
       Balance (dispersion of RE around 0) with (dispersion of data
       conditional on RE)




Ben Bolker                             McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                                 GLMMs                       Results                   Conclusions                   References



Definitions


Shrinkage

                                              Arabidopsis block estimates
                                                                                            5
                                                                                     11 2 5
                                                                             7 9 4 9        q
                                3                                     6 10 5         q q q
                                                                  4 2        q q q q
                                                                6     q q q
                                                        3
                                                          9 9 4   q q
                                                            q q q
        Mean(log) fruit set




                                                      4 q q
                                                   10
                                                 8    q
                                                   q
                                        2        q
                                0       q
                                          3 10
                                          q q
                                            q




                              −3

                              −15       q q


                                    0              5         10           15          20         25

                                                              Genotype


Ben Bolker                                                        McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors           GLMMs              Results                   Conclusions                   References



Definitions


RE examples


             Coral symbionts: simple experimental blocks, RE affects
             intercept (overall probability of predation in block)
             Glycera: applied to cells from 10 individuals, RE again affects
             intercept (cell survival prob.)
             Arabidopsis: region (3 levels, treated as fixed) / population /
             genotype: affects intercept (overall fruit set) as well as
             treatment effects (nutrients, herbivory, interaction)




Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors            GLMMs            Results                   Conclusions                   References



Estimation


Outline
       1 Precursors
             Examples
             Definitions
       2 GLMMs
             Estimation
             Inference: tests
             Inference: confidence intervals
       3 Results
             Glycera
             Arabidopsis
       4 Conclusions

Ben Bolker                        McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors            GLMMs             Results                   Conclusions                   References



Estimation


Penalized quasi-likelihood (PQL)


             alternate steps of estimating GLM using known RE variances
             to calculate weights; estimate LMMs given GLM fit (Breslow,
             2004)
             flexible (allows spatial/temporal correlations, crossed REs)
             biased for small unit samples (e.g. counts < 5, binary or
             low-survival data)
             widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
             90% of small-unit-sample cases



Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors            GLMMs             Results                   Conclusions                   References



Estimation


Penalized quasi-likelihood (PQL)


             alternate steps of estimating GLM using known RE variances
             to calculate weights; estimate LMMs given GLM fit (Breslow,
             2004)
             flexible (allows spatial/temporal correlations, crossed REs)
             biased for small unit samples (e.g. counts < 5, binary or
             low-survival data)
             widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
             90% of small-unit-sample cases



Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors            GLMMs             Results                   Conclusions                   References



Estimation


Penalized quasi-likelihood (PQL)


             alternate steps of estimating GLM using known RE variances
             to calculate weights; estimate LMMs given GLM fit (Breslow,
             2004)
             flexible (allows spatial/temporal correlations, crossed REs)
             biased for small unit samples (e.g. counts < 5, binary or
             low-survival data)
             widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
             90% of small-unit-sample cases



Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors            GLMMs             Results                   Conclusions                   References



Estimation


Penalized quasi-likelihood (PQL)


             alternate steps of estimating GLM using known RE variances
             to calculate weights; estimate LMMs given GLM fit (Breslow,
             2004)
             flexible (allows spatial/temporal correlations, crossed REs)
             biased for small unit samples (e.g. counts < 5, binary or
             low-survival data)
             widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈
             90% of small-unit-sample cases



Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors           GLMMs              Results                   Conclusions                   References



Estimation


Laplace approximation


             approximate marginal likelihood
             for given β, θ (RE parameters), find conditional modes by
             penalized, iterated reweighted least squares; then use
             second-order Taylor expansion around the conditional modes
             more accurate than PQL
             reasonably fast and flexible
             lme4:glmer, glmmML, glmmADMB, R2ADMB (AD Model Builder)




Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors           GLMMs             Results                   Conclusions                   References



Estimation


Gauss-Hermite quadrature (AGQ)



             as above, but compute additional terms in the integral
             (typically 8, but often up to 20)
             most accurate
             slowest, hence not flexible (2–3 RE at most, maybe only 1)
             lme4:glmer, glmmML, repeated




Ben Bolker                        McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors           GLMMs              Results                   Conclusions                   References



Estimation


Bayesian approaches

             Bayesians have to do nasty integrals anyway (to normalize the
             posterior probability density)
             various flavours of stochastic Bayesian computation (Gibbs
             sampling, MCMC, etc.)
             generally slower but more flexible
             solves many problems of assessing confidence intervals
             must specify priors, assess convergence
             specialized: glmmAK, MCMCglmm (Hadfield, 2010), INLA
             general: glmmBUGS, R2WinBUGS, BRugs
             (WinBUGS/OpenBUGS), R2jags, rjags (JAGS)

Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors           GLMMs             Results                   Conclusions                   References



Estimation


Overdispersion (slight tangent)



       Variance greater than expected from statistical model
             Quasi-likelihood approaches: MASS:glmmPQL
             Extended distributions (e.g. negative binomial): glmmADMB
             Observation-level random effects (e.g. lognormal-Poisson):
             lme4




Ben Bolker                        McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                     GLMMs                         Results                                          Conclusions   References



Estimation


Comparison of coral symbiont results

                                                Regression estimates
                                 −6            −4            −2                0                          2


                                                                                               q
                                                                                   q
                                                                                       q
                                                                                           q
                                                                                           q
                                                                                           q
             Added symbiont                                                                q




                                                                       q
                                                                               q
                                                                           q
                                                                           q
                                                                           q
                                                                           q
             Crab vs. Shrimp                                               q




                                       q
                                                             q                                     q   GLM (fixed)
                                                    q
                                           q
                                                                                                   q   GLM (pooled)
                                           q                                                       q   PQL
                                           q                                                       q   Laplace
                   Symbiont                q
                                                                                                   q   AGQ




Ben Bolker                                              McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                 GLMMs             Results                   Conclusions                   References



Inference: tests


Outline
        1 Precursors
                   Examples
                   Definitions
        2 GLMMs
                   Estimation
                   Inference: tests
                   Inference: confidence intervals
        3 Results
                   Glycera
                   Arabidopsis
        4 Conclusions

Ben Bolker                              McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                 GLMMs              Results                   Conclusions                   References



Inference: tests


Wald tests [non-quadratic likelihood surfaces]


                   For OLS/linear models, likelihood surface is quadratic; only
                   asymptotically true for GLM(M)s
                   Wald tests (e.g. typical results of summary) assume
                   quadratic, based on curvature (information matrix)
                   always approximate, sometimes awful (Hauck-Donner effect)
                   do model comparison (F , score or likelihood ratio tests [LRT])
                   instead
       But . . .


Ben Bolker                               McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                  GLMMs             Results                   Conclusions                   References



Inference: tests


Conditional F tests [Uncertainty in scale parameters]

                   Model comparison: in general
                   −2 log L = D =  deviancei /φ
                   Classical linear models:                  ˆ
                                                deviance and φ are both χ2
                   distributed so D ∼ F (ν1 , ν2 )
                   Denominator degrees of freedom (df) (ν2 ) for complex
                   (unbalanced, crossed, R-side effects) models?
                   Approximations: Satterthwaite, Kenward-Roger (Kenward
                   and Roger, 1997; Schaalje et al., 2002)
                   Is D really ∼ F in these situations?
       Scale parameters usually not estimated in GLMMs (Gamma,
       quasi-likelihood cases only).
       But . . .
Ben Bolker                               McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                  GLMMs             Results                   Conclusions                   References



Inference: tests


Likelihood ratio tests [non-normality of likelihood]



                   What about cases where φ is specified (e.g. ≡ 1)?
                   in GLM(M) case, numerator is only asymptotically χ2 anyway
                   Bartlett corrections (Cordeiro et al., 1994; Cordeiro and
                   Ferrari, 1998), higher-order asymptotics: cond [neither
                   extended to GLMMs!]




Ben Bolker                               McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                  GLMMs              Results                   Conclusions                   References



Inference: tests


Tests of random effects [boundary problems]


                   LRT depends on null hypothesis being within the parameter’s
                   feasible range (Goldman and Whelan, 2000; Molenberghs and
                   Verbeke, 2007)
                   violated e.g. by H0 : σ 2 = 0
                   In simple cases null distribution is a mixture of χ2
                   (e.g. 0.5χ2 + 0.5χ2 (emdbook:dchibarsq)
                             0        1
                   ignoring this leads to conservative tests (e.g. true p-value =
                   1
                   2 · nominal p-value)
                   simulation-based testing: RLRsim


Ben Bolker                                McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                 GLMMs             Results                   Conclusions                   References



Inference: tests


Information-theoretic approaches


                   Above issues apply, but less well understood (Greven, 2008;
                   Greven and Kneib, 2010)
                   AIC is asymptotic
               “corrected” AIC (AICc ) (HURVICH and TSAI, 1989) derived
               for linear models, widely used but not tested elsewhere
               (Richards, 2005)
                   For comparing models with different REs,
                   or for AICc , what is p?
                   AICcmodavg, MuMIn


Ben Bolker                              McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                  GLMMs               Results                   Conclusions                   References



Inference: tests


Parametric bootstrapping

                   fit null model to data
                   simulate “data” from null model
                   fit null and working model, compute likelihood difference
                   repeat to estimate null distribution
       > pboot <- function(m0, m1) {
            s <- simulate(m0)
            L0 <- logLik(refit(m0, s))
            L1 <- logLik(refit(m1, s))
            2 * (L1 - L0)
        }
       > replicate(1000, pboot(fm2, fm1))

Ben Bolker                                 McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                 GLMMs            Results                   Conclusions                   References



Inference: tests


Finite-sample problems


       How far are we from “asymptopia”?
                   How much data
                   (number of samples, number of RE levels)?
                   How many parameters
                   (number of fixed-effect parameters, number of RE levels,
                   number of RE parameters)?
       Hope (#data) − (#parameters)                   1 but if not?




Ben Bolker                             McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                 GLMMs              Results                   Conclusions                   References



Inference: tests


Levels of focus



                   how many parameters does a RE take?
                   Somewhere between q and r (e.g., 1 and the number of levels
                   for a variance) . . . shrinkage
                   Conditional vs. marginal AIC
                   Similar issues with Deviance Information Criterion
                   (Spiegelhalter et al., 2002)




Ben Bolker                               McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                       GLMMs        Results                   Conclusions                   References



Inference: confidence intervals


Outline
       1 Precursors
                Examples
                Definitions
       2 GLMMs
                Estimation
                Inference: tests
                Inference: confidence intervals
       3 Results
                Glycera
                Arabidopsis
       4 Conclusions

Ben Bolker                               McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                       GLMMs        Results                   Conclusions                   References



Inference: confidence intervals


Wald tests



               a sometimes-crude approximation
               computationally easy, especially for many-parameter models
               use Wald Z (assume “residual df” large)? Or t, guessing at
               the residual df?
               Available from most packages




Ben Bolker                               McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                       GLMMs        Results                   Conclusions                   References



Inference: confidence intervals


Profile confidence intervals




               Tedious to program
               Computationally challenging
               Inherits finite-size sample problems from LRT
               lme4a (in development/soon!)




Ben Bolker                               McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                       GLMMs        Results                   Conclusions                   References



Inference: confidence intervals


Bayesian posterior intervals



               Marginal quantile or highest posterior density intervals
               Computationally “free” with results of stochastic Bayesian
               computation
               Easily extended to confidence intervals on predictions, etc..
               Post hoc Markov chain Monte Carlo sampling available for
               some packages (glmmADMB, R2ADMB, eventually lme4a)




Ben Bolker                               McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                       GLMMs        Results                   Conclusions                   References



Inference: confidence intervals


Summary



               Large data
                       computation can be limiting
                       asymptotics better
               Small data
                       RE variances may be poorly estimated/ set to zero
                       (informative priors can help)
                       inference tricky




Ben Bolker                               McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors               GLMMs          Results                   Conclusions                   References



Glycera


Outline
          1 Precursors
              Examples
              Definitions
          2 GLMMs
              Estimation
              Inference: tests
              Inference: confidence intervals
          3 Results
              Glycera
              Arabidopsis
          4 Conclusions

Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                   GLMMs                           Results                              Conclusions   References



Glycera




                                                                              qq   qq
             Osm:Cu:H2S:Anoxia                                                     q
                                                                   q
                                                                   q
                 Cu:H2S:Anoxia                                                              q q
                                                                                            q
                                                                       qq
                                                                       q
               Osm:H2S:Anoxia                                           q
                                                                        q
                                                                       q
                                                                       qq
                                                                        q
                 Osm:Cu:Anoxia                                           q

                                             q     q         qq
                   Osm:Cu:H2S                 q
                                                                       qqq
                                                                        qq
                    H2S:Anoxia
                                                                    q
                                                                   qq q
                     Cu:Anoxia                                      q
                                                                       q
                                                                       q
                   Osm:Anoxia                                          qq
                                                                       q
                                  q                     q    q
                       Cu:H2S     q
                                  q
                                                                          q
                                                                          q
                     Osm:H2S                                           qq
                                                                       q
                                                                        q q
                                                                        q q
                       Osm:Cu                                           q
                                                                                        q   MCMCglmm
                                                                       qqq
                                                                         q
                        Anoxia                                          q               q   glmer(OD:2)
                                                            q qq
                          H2S                                      q
                                                                   q                    q   glmer(OD)
                                                             qq q
                           Cu                                   q
                                                                q                       q   glmmML
                                                                q
                          Osm
                                                                 qq
                                                                 qq
                                                                                        q   glmer


                                 −60   −40        −20                   0          20       40       60

                                                    Effect on survival




Ben Bolker                                        McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                    GLMMs                                 Results                     Conclusions      References



Glycera




       Osm : Cu : H2S : Oxygen                                                     q

              Osm : Cu : Oxygen                             q

             Osm : H2S : Oxygen                             q

              Cu : H2S : Oxygen                     q                                       3−way
                 Osm : Cu : H2S                 q

                     Osm : Cu                                           q

                  H2S : Oxygen                                          q

                     Osm : H2S                                      q
                                                                                            2−way
                   Cu : Oxygen                                  q

                  Osm : Oxygen                              q

                       Cu : H2S             q

                       Oxygen                                   q

                          Osm                           q
                                                                                       main effects
                            Cu              q

                           H2S          q




                                  −20   −10                     0             10          20          30
                                                        Effect on survival




Ben Bolker                                          McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                                  GLMMs                                    Results                             Conclusions    References



Glycera


Parametric bootstrap results
                                                                                     0.02      0.04        0.06   0.08

                                                  H2S                                             Anoxia


                                                                                                                               0.08


                                                                                                                               0.06


                                                                                                                               0.04
          Inferred p value




                                                                                                                               0.02


                                                  Osm                                                 Cu


                             0.08


                             0.06


                             0.04


                             0.02



                                    0.02   0.04         0.06   0.08

                                                                      True p value


Ben Bolker                                                                 McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors            GLMMs             Results                   Conclusions                   References



Arabidopsis


Outline
       1 Precursors
              Examples
              Definitions
       2 GLMMs
              Estimation
              Inference: tests
              Inference: confidence intervals
       3 Results
              Glycera
              Arabidopsis
       4 Conclusions

Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                GLMMs                     Results                   Conclusions                   References



Arabidopsis


Arabidopsis: AIC comparison of REs

                          nointeract                 q

                           int(popu)                      q

                int(gen) X int(popu)       q

               int(gen) X nut(popu)                       q

              int(gen) X clip(popu)                           q

               nut(gen) X int(popu)    q

             nut(gen) X nut(popu)                                 q

             nut(gen) X clip(popu)                            q

              clip(gen) X int(popu)                                      q

             clip(gen) X nut(popu)                                               q

             clip(gen) X clip(popu)                                                  q


                                       0              2                  4   6
                                                                  ∆AIC


Ben Bolker                                     McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                            GLMMs                            Results                   Conclusions                References



Arabidopsis


Arabidopsis: fits with and without nutrient(genotype)

                                                          Regression estimates
                                        −1.0       −0.5          0.0         0.5   1.0          1.5

                                                                             q
              nutrient8:amdclipped                                           q




                                                           q
                  statusTransplant                         q




                                                      q
                  statusPetri.Plate                   q




                                          q
                             rack2        q




                                               q
                       amdclipped              q




                                                                                            q
                         nutrient8                                                         q




Ben Bolker                                                     McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors           GLMMs             Results                   Conclusions                   References




Primary tools



             lme4: multiple/crossed REs, (profiling): fast
             MCMCglmm: Bayesian, very flexible
             glmmADMB: negative binomial, zero-inflated etc.
             Most flexible: R2ADMB/AD Model Builder,
             R2WinBUGS/WinBUGS/R2jags/JAGS, INLA




Ben Bolker                        McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors           GLMMs              Results                   Conclusions                   References




Loose ends


             Overdispersion and zero-inflation: MCMCglmm, glmmADMB
             Spatial and temporal correlation (R-side effects):
             MASS:glmmPQL (sort of), GLMMarp, INLA;
             WinBUGS, AD Model Builder
             Additive models: amer, gamm4, mgcv
             Penalized methods (Jiang, 2008) (?)
             Hierarchical GLMs: hglm, HGLMMM
             Marginal models: geepack, gee



Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors           GLMMs              Results                   Conclusions                   References




To be done



             Many holes in knowledge (but what can be done?)
             Faster algorithms, more parallel computation
             Lots of implementation and clean-up
             Benefits & costs of staying within the GLMM framework
             Benefits & costs of diversity
       More info: glmm.wikidot.com




Ben Bolker                         McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors           GLMMs            Results                   Conclusions                   References




Acknowledgements



             Data: Josh Banta and Massimo Pigliucci (Arabidopsis);
             Adrian Stier and Sea McKeon (coral symbionts); Courtney
             Kagan, Jocelynn Ortega, David Julian (Glycera);
             Co-authors: Mollie Brooks, Connie Clark, Shane Geange, John
             Poulsen, Hank Stevens, Jada White




Ben Bolker                       McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs
Precursors                    GLMMs                        Results                     Conclusions                     References




References
       Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattle symposium in
           biostatistics: Analysis of correlated data, pages 1–22. Springer. ISBN 0387208623.
       Cordeiro, G.M. and Ferrari, S.L.P., 1998. Journal of Statistical Planning and Inference, 71(1-2):261–269. ISSN
           0378-3758. doi:10.1016/S0378-3758(98)00005-6.
       Cordeiro, G.M., Paula, G.A., and Botter, D.A., 1994. International Statistical Review / Revue Internationale de
           Statistique, 62(2):257–274. ISSN 03067734. doi:10.2307/1403512.
       Gelman, A., 2005. Annals of Statistics, 33(1):1–53. doi:doi:10.1214/009053604000001048.
       Goldman, N. and Whelan, S., 2000. Molecular Biology and Evolution, 17(6):975–978.
       Greven, S., 2008. Non-Standard Problems in Inference for Additive and Linear Mixed Models. Cuvillier Verlag,
           G¨ttingen, Germany. ISBN 3867274916.
             o
       Greven, S. and Kneib, T., 2010. Biometrika, 97(4):773–789.
       Hadfield, J.D., 2010. Journal of Statistical Software, 33(2):1–22. ISSN 1548-7660.
       HURVICH, C.M. and TSAI, C., 1989. Biometrika, 76(2):297 –307. doi:10.1093/biomet/76.2.297.
       Jiang, J., 2008. The Annals of Statistics, 36(4):1669–1692. ISSN 0090-5364. doi:10.1214/07-AOS517.
       Kenward, M.G. and Roger, J.H., 1997. Biometrics, 53(3):983–997.
       Molenberghs, G. and Verbeke, G., 2007. The American Statistician, 61(1):22–27.
           doi:10.1198/000313007X171322.
       Pinheiro, J.C. and Bates, D.M., 2000. Mixed-effects models in S and S-PLUS. Springer, New York. ISBN
           0-387-98957-9.
       Richards, S.A., 2005. Ecology, 86(10):2805–2814. doi:10.1890/05-0074.
       Schaalje, G., McBride, J., and Fellingham, G., 2002. Journal of Agricultural, Biological & Environmental Statistics,
           7(14):512–524.
       Spiegelhalter, D.J., Best, N., et al., 2002. Journal of the Royal Statistical Society B, 64:583–640.
Ben Bolker                                           McMaster University Departments of Mathematics & Statistics and Biology
Open-source GLMMs

More Related Content

More from Ben Bolker

ESS of minimal mutation rate in an evo-epidemiological model
ESS of minimal mutation rate in an evo-epidemiological modelESS of minimal mutation rate in an evo-epidemiological model
ESS of minimal mutation rate in an evo-epidemiological modelBen Bolker
 
math bio for 1st year math students
math bio for 1st year math studentsmath bio for 1st year math students
math bio for 1st year math studentsBen Bolker
 
MBRS detectability talk
MBRS detectability talkMBRS detectability talk
MBRS detectability talkBen Bolker
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talkBen Bolker
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talkBen Bolker
 
Bolker esa2014
Bolker esa2014Bolker esa2014
Bolker esa2014Ben Bolker
 
virulence evolution (IGERT symposium)
virulence evolution (IGERT symposium)virulence evolution (IGERT symposium)
virulence evolution (IGERT symposium)Ben Bolker
 
Davis eco-evo virulence
Davis eco-evo virulenceDavis eco-evo virulence
Davis eco-evo virulenceBen Bolker
 
intro to knitr with RStudio
intro to knitr with RStudiointro to knitr with RStudio
intro to knitr with RStudioBen Bolker
 
Stats sem 2013
Stats sem 2013Stats sem 2013
Stats sem 2013Ben Bolker
 
computational science & engineering seminar, 16 oct 2013
computational science & engineering seminar, 16 oct 2013computational science & engineering seminar, 16 oct 2013
computational science & engineering seminar, 16 oct 2013Ben Bolker
 
Disease-induced extinction
Disease-induced extinctionDisease-induced extinction
Disease-induced extinctionBen Bolker
 
MBI intro to spatial models
MBI intro to spatial modelsMBI intro to spatial models
MBI intro to spatial modelsBen Bolker
 
Harvard Forest GLMM talk
Harvard Forest GLMM talkHarvard Forest GLMM talk
Harvard Forest GLMM talkBen Bolker
 

More from Ben Bolker (20)

ESS of minimal mutation rate in an evo-epidemiological model
ESS of minimal mutation rate in an evo-epidemiological modelESS of minimal mutation rate in an evo-epidemiological model
ESS of minimal mutation rate in an evo-epidemiological model
 
math bio for 1st year math students
math bio for 1st year math studentsmath bio for 1st year math students
math bio for 1st year math students
 
MBRS detectability talk
MBRS detectability talkMBRS detectability talk
MBRS detectability talk
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talk
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talk
 
Bolker esa2014
Bolker esa2014Bolker esa2014
Bolker esa2014
 
Montpellier
MontpellierMontpellier
Montpellier
 
virulence evolution (IGERT symposium)
virulence evolution (IGERT symposium)virulence evolution (IGERT symposium)
virulence evolution (IGERT symposium)
 
Igert glmm
Igert glmmIgert glmm
Igert glmm
 
Davis eco-evo virulence
Davis eco-evo virulenceDavis eco-evo virulence
Davis eco-evo virulence
 
Google lme4
Google lme4Google lme4
Google lme4
 
intro to knitr with RStudio
intro to knitr with RStudiointro to knitr with RStudio
intro to knitr with RStudio
 
Stats sem 2013
Stats sem 2013Stats sem 2013
Stats sem 2013
 
computational science & engineering seminar, 16 oct 2013
computational science & engineering seminar, 16 oct 2013computational science & engineering seminar, 16 oct 2013
computational science & engineering seminar, 16 oct 2013
 
Threads 2013
Threads 2013Threads 2013
Threads 2013
 
Threads 2013
Threads 2013Threads 2013
Threads 2013
 
Disease-induced extinction
Disease-induced extinctionDisease-induced extinction
Disease-induced extinction
 
Zif bolker_w2
Zif bolker_w2Zif bolker_w2
Zif bolker_w2
 
MBI intro to spatial models
MBI intro to spatial modelsMBI intro to spatial models
MBI intro to spatial models
 
Harvard Forest GLMM talk
Harvard Forest GLMM talkHarvard Forest GLMM talk
Harvard Forest GLMM talk
 

Recently uploaded

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

open-source GLMM tools

  • 1. Precursors GLMMs Results Conclusions References Open-source tools for estimation and inference using generalized linear mixed models Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology 7 April 2011 Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 2. Precursors GLMMs Results Conclusions References Outline 1 Precursors Examples Definitions 2 GLMMs Estimation Inference: tests Inference: confidence intervals 3 Results Glycera Arabidopsis 4 Conclusions Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 3. Precursors GLMMs Results Conclusions References Examples Outline 1 Precursors Examples Definitions 2 GLMMs Estimation Inference: tests Inference: confidence intervals 3 Results Glycera Arabidopsis 4 Conclusions Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 4. Precursors GLMMs Results Conclusions References Examples Coral protection by symbionts Number of predation events 10 8 2 Number of blocks 2 2 6 2 1 1 4 0 2 0 0 1 0 none shrimp crabs both Symbionts Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 5. Precursors GLMMs Results Conclusions References Examples Environmental stress: Glycera cell survival 0 0.03 0.1 0.32 0 0.03 0.1 0.32 Anoxia Anoxia Anoxia Anoxia Anoxia Osm=12.8 Osm=22.4 Osm=32 Osm=41.6 Osm=51.2 1.0 133.3 66.6 0.8 33.3 0.6 0 Copper Normoxia Normoxia Normoxia Normoxia Normoxia Osm=12.8 Osm=22.4 Osm=32 Osm=41.6 Osm=51.2 0.4 133.3 66.6 0.2 33.3 0 0.0 0 0.03 0.1 0.32 0 0.03 0.1 0.32 0 0.03 0.1 0.32 H2S Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 6. Precursors GLMMs Results Conclusions References Examples Arabidopsis response to fertilization & clipping panel: nutrient, color: genotype nutrient : 1 nutrient : 8 q q q q q q q q q 5 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Log(1+fruit set) q q q q q 4 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 3 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 1 q q q q q q q q q 0 q q q q q q q q q q q q unclipped clipped unclipped clipped Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 7. Precursors GLMMs Results Conclusions References Examples Glossary: data Fixed effects Predictors where interest is in specific levels Random effects (RE) predictors where interest is in distribution rather than levels (blocks) (Gelman, 2005) Crossed RE multiple REs where levels of one occur in more than one level of another (ex.: block × year: cf. nested) http://lme4.r-forge.r-project.org/book/, Pinheiro and Bates (2000) Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 8. Precursors GLMMs Results Conclusions References Examples Data challenges Estimation Computation Inference Small # RE levels (<5–6) Large n Small N (< 40) Overdispersion Multiple REs Small n Crossed REs Crossed REs Spatial/temporal correlation Unusual distributions (Gamma, neg. binom . . . ) Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 9. Precursors GLMMs Results Conclusions References Definitions Outline 1 Precursors Examples Definitions 2 GLMMs Estimation Inference: tests Inference: confidence intervals 3 Results Glycera Arabidopsis 4 Conclusions Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 10. Precursors GLMMs Results Conclusions References Definitions Generalized linear models Distributions from exponential family (Poisson, binomial, Gaussian, Gamma, neg. binomial (known k) . . . ) Means = linear functions of predictors on scale of link function (identity, log, logit, . . . ) Y ∼ D(g −1 (Xβ), φ) φ often set to 1 (Poisson, binomial) except for quasilikelihood approaches Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 11. Precursors GLMMs Results Conclusions References Definitions Generalized linear mixed models Add random effects: Y ∼ D(g −1 (Xβ + Zu), φ) u ∼ MVN(0, Σ) Synonyms: multilevel, hierarchical models Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 12. Precursors GLMMs Results Conclusions References Definitions Marginal likelihood Likelihood (Prob(data|parameters)) — requires integrating over possible values of REs to get marginal likelihood e.g.: likelihood of i th obs. in block j is L(xij |θi , σw ) 2 2 likelihood of a particular block mean θj is L(θj |0, σb ) marginal likelihood is 2 2 L(xij |θj , σw )L(θj |0, σb ) dθj Balance (dispersion of RE around 0) with (dispersion of data conditional on RE) Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 13. Precursors GLMMs Results Conclusions References Definitions Marginal likelihood Likelihood (Prob(data|parameters)) — requires integrating over possible values of REs to get marginal likelihood e.g.: likelihood of i th obs. in block j is L(xij |θi , σw ) 2 2 likelihood of a particular block mean θj is L(θj |0, σb ) marginal likelihood is 2 2 L(xij |θj , σw )L(θj |0, σb ) dθj Balance (dispersion of RE around 0) with (dispersion of data conditional on RE) Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 14. Precursors GLMMs Results Conclusions References Definitions Shrinkage Arabidopsis block estimates 5 11 2 5 7 9 4 9 q 3 6 10 5 q q q 4 2 q q q q 6 q q q 3 9 9 4 q q q q q Mean(log) fruit set 4 q q 10 8 q q 2 q 0 q 3 10 q q q −3 −15 q q 0 5 10 15 20 25 Genotype Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 15. Precursors GLMMs Results Conclusions References Definitions RE examples Coral symbionts: simple experimental blocks, RE affects intercept (overall probability of predation in block) Glycera: applied to cells from 10 individuals, RE again affects intercept (cell survival prob.) Arabidopsis: region (3 levels, treated as fixed) / population / genotype: affects intercept (overall fruit set) as well as treatment effects (nutrients, herbivory, interaction) Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 16. Precursors GLMMs Results Conclusions References Estimation Outline 1 Precursors Examples Definitions 2 GLMMs Estimation Inference: tests Inference: confidence intervals 3 Results Glycera Arabidopsis 4 Conclusions Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 17. Precursors GLMMs Results Conclusions References Estimation Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM fit (Breslow, 2004) flexible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts < 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈ 90% of small-unit-sample cases Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 18. Precursors GLMMs Results Conclusions References Estimation Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM fit (Breslow, 2004) flexible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts < 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈ 90% of small-unit-sample cases Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 19. Precursors GLMMs Results Conclusions References Estimation Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM fit (Breslow, 2004) flexible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts < 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈ 90% of small-unit-sample cases Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 20. Precursors GLMMs Results Conclusions References Estimation Penalized quasi-likelihood (PQL) alternate steps of estimating GLM using known RE variances to calculate weights; estimate LMMs given GLM fit (Breslow, 2004) flexible (allows spatial/temporal correlations, crossed REs) biased for small unit samples (e.g. counts < 5, binary or low-survival data) widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈ 90% of small-unit-sample cases Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 21. Precursors GLMMs Results Conclusions References Estimation Laplace approximation approximate marginal likelihood for given β, θ (RE parameters), find conditional modes by penalized, iterated reweighted least squares; then use second-order Taylor expansion around the conditional modes more accurate than PQL reasonably fast and flexible lme4:glmer, glmmML, glmmADMB, R2ADMB (AD Model Builder) Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 22. Precursors GLMMs Results Conclusions References Estimation Gauss-Hermite quadrature (AGQ) as above, but compute additional terms in the integral (typically 8, but often up to 20) most accurate slowest, hence not flexible (2–3 RE at most, maybe only 1) lme4:glmer, glmmML, repeated Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 23. Precursors GLMMs Results Conclusions References Estimation Bayesian approaches Bayesians have to do nasty integrals anyway (to normalize the posterior probability density) various flavours of stochastic Bayesian computation (Gibbs sampling, MCMC, etc.) generally slower but more flexible solves many problems of assessing confidence intervals must specify priors, assess convergence specialized: glmmAK, MCMCglmm (Hadfield, 2010), INLA general: glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS), R2jags, rjags (JAGS) Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 24. Precursors GLMMs Results Conclusions References Estimation Overdispersion (slight tangent) Variance greater than expected from statistical model Quasi-likelihood approaches: MASS:glmmPQL Extended distributions (e.g. negative binomial): glmmADMB Observation-level random effects (e.g. lognormal-Poisson): lme4 Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 25. Precursors GLMMs Results Conclusions References Estimation Comparison of coral symbiont results Regression estimates −6 −4 −2 0 2 q q q q q q Added symbiont q q q q q q q Crab vs. Shrimp q q q q GLM (fixed) q q q GLM (pooled) q q PQL q q Laplace Symbiont q q AGQ Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 26. Precursors GLMMs Results Conclusions References Inference: tests Outline 1 Precursors Examples Definitions 2 GLMMs Estimation Inference: tests Inference: confidence intervals 3 Results Glycera Arabidopsis 4 Conclusions Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 27. Precursors GLMMs Results Conclusions References Inference: tests Wald tests [non-quadratic likelihood surfaces] For OLS/linear models, likelihood surface is quadratic; only asymptotically true for GLM(M)s Wald tests (e.g. typical results of summary) assume quadratic, based on curvature (information matrix) always approximate, sometimes awful (Hauck-Donner effect) do model comparison (F , score or likelihood ratio tests [LRT]) instead But . . . Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 28. Precursors GLMMs Results Conclusions References Inference: tests Conditional F tests [Uncertainty in scale parameters] Model comparison: in general −2 log L = D = deviancei /φ Classical linear models: ˆ deviance and φ are both χ2 distributed so D ∼ F (ν1 , ν2 ) Denominator degrees of freedom (df) (ν2 ) for complex (unbalanced, crossed, R-side effects) models? Approximations: Satterthwaite, Kenward-Roger (Kenward and Roger, 1997; Schaalje et al., 2002) Is D really ∼ F in these situations? Scale parameters usually not estimated in GLMMs (Gamma, quasi-likelihood cases only). But . . . Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 29. Precursors GLMMs Results Conclusions References Inference: tests Likelihood ratio tests [non-normality of likelihood] What about cases where φ is specified (e.g. ≡ 1)? in GLM(M) case, numerator is only asymptotically χ2 anyway Bartlett corrections (Cordeiro et al., 1994; Cordeiro and Ferrari, 1998), higher-order asymptotics: cond [neither extended to GLMMs!] Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 30. Precursors GLMMs Results Conclusions References Inference: tests Tests of random effects [boundary problems] LRT depends on null hypothesis being within the parameter’s feasible range (Goldman and Whelan, 2000; Molenberghs and Verbeke, 2007) violated e.g. by H0 : σ 2 = 0 In simple cases null distribution is a mixture of χ2 (e.g. 0.5χ2 + 0.5χ2 (emdbook:dchibarsq) 0 1 ignoring this leads to conservative tests (e.g. true p-value = 1 2 · nominal p-value) simulation-based testing: RLRsim Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 31. Precursors GLMMs Results Conclusions References Inference: tests Information-theoretic approaches Above issues apply, but less well understood (Greven, 2008; Greven and Kneib, 2010) AIC is asymptotic “corrected” AIC (AICc ) (HURVICH and TSAI, 1989) derived for linear models, widely used but not tested elsewhere (Richards, 2005) For comparing models with different REs, or for AICc , what is p? AICcmodavg, MuMIn Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 32. Precursors GLMMs Results Conclusions References Inference: tests Parametric bootstrapping fit null model to data simulate “data” from null model fit null and working model, compute likelihood difference repeat to estimate null distribution > pboot <- function(m0, m1) { s <- simulate(m0) L0 <- logLik(refit(m0, s)) L1 <- logLik(refit(m1, s)) 2 * (L1 - L0) } > replicate(1000, pboot(fm2, fm1)) Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 33. Precursors GLMMs Results Conclusions References Inference: tests Finite-sample problems How far are we from “asymptopia”? How much data (number of samples, number of RE levels)? How many parameters (number of fixed-effect parameters, number of RE levels, number of RE parameters)? Hope (#data) − (#parameters) 1 but if not? Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 34. Precursors GLMMs Results Conclusions References Inference: tests Levels of focus how many parameters does a RE take? Somewhere between q and r (e.g., 1 and the number of levels for a variance) . . . shrinkage Conditional vs. marginal AIC Similar issues with Deviance Information Criterion (Spiegelhalter et al., 2002) Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 35. Precursors GLMMs Results Conclusions References Inference: confidence intervals Outline 1 Precursors Examples Definitions 2 GLMMs Estimation Inference: tests Inference: confidence intervals 3 Results Glycera Arabidopsis 4 Conclusions Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 36. Precursors GLMMs Results Conclusions References Inference: confidence intervals Wald tests a sometimes-crude approximation computationally easy, especially for many-parameter models use Wald Z (assume “residual df” large)? Or t, guessing at the residual df? Available from most packages Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 37. Precursors GLMMs Results Conclusions References Inference: confidence intervals Profile confidence intervals Tedious to program Computationally challenging Inherits finite-size sample problems from LRT lme4a (in development/soon!) Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 38. Precursors GLMMs Results Conclusions References Inference: confidence intervals Bayesian posterior intervals Marginal quantile or highest posterior density intervals Computationally “free” with results of stochastic Bayesian computation Easily extended to confidence intervals on predictions, etc.. Post hoc Markov chain Monte Carlo sampling available for some packages (glmmADMB, R2ADMB, eventually lme4a) Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 39. Precursors GLMMs Results Conclusions References Inference: confidence intervals Summary Large data computation can be limiting asymptotics better Small data RE variances may be poorly estimated/ set to zero (informative priors can help) inference tricky Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 40. Precursors GLMMs Results Conclusions References Glycera Outline 1 Precursors Examples Definitions 2 GLMMs Estimation Inference: tests Inference: confidence intervals 3 Results Glycera Arabidopsis 4 Conclusions Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 41. Precursors GLMMs Results Conclusions References Glycera qq qq Osm:Cu:H2S:Anoxia q q q Cu:H2S:Anoxia q q q qq q Osm:H2S:Anoxia q q q qq q Osm:Cu:Anoxia q q q qq Osm:Cu:H2S q qqq qq H2S:Anoxia q qq q Cu:Anoxia q q q Osm:Anoxia qq q q q q Cu:H2S q q q q Osm:H2S qq q q q q q Osm:Cu q q MCMCglmm qqq q Anoxia q q glmer(OD:2) q qq H2S q q q glmer(OD) qq q Cu q q q glmmML q Osm qq qq q glmer −60 −40 −20 0 20 40 60 Effect on survival Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 42. Precursors GLMMs Results Conclusions References Glycera Osm : Cu : H2S : Oxygen q Osm : Cu : Oxygen q Osm : H2S : Oxygen q Cu : H2S : Oxygen q 3−way Osm : Cu : H2S q Osm : Cu q H2S : Oxygen q Osm : H2S q 2−way Cu : Oxygen q Osm : Oxygen q Cu : H2S q Oxygen q Osm q main effects Cu q H2S q −20 −10 0 10 20 30 Effect on survival Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 43. Precursors GLMMs Results Conclusions References Glycera Parametric bootstrap results 0.02 0.04 0.06 0.08 H2S Anoxia 0.08 0.06 0.04 Inferred p value 0.02 Osm Cu 0.08 0.06 0.04 0.02 0.02 0.04 0.06 0.08 True p value Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 44. Precursors GLMMs Results Conclusions References Arabidopsis Outline 1 Precursors Examples Definitions 2 GLMMs Estimation Inference: tests Inference: confidence intervals 3 Results Glycera Arabidopsis 4 Conclusions Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 45. Precursors GLMMs Results Conclusions References Arabidopsis Arabidopsis: AIC comparison of REs nointeract q int(popu) q int(gen) X int(popu) q int(gen) X nut(popu) q int(gen) X clip(popu) q nut(gen) X int(popu) q nut(gen) X nut(popu) q nut(gen) X clip(popu) q clip(gen) X int(popu) q clip(gen) X nut(popu) q clip(gen) X clip(popu) q 0 2 4 6 ∆AIC Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 46. Precursors GLMMs Results Conclusions References Arabidopsis Arabidopsis: fits with and without nutrient(genotype) Regression estimates −1.0 −0.5 0.0 0.5 1.0 1.5 q nutrient8:amdclipped q q statusTransplant q q statusPetri.Plate q q rack2 q q amdclipped q q nutrient8 q Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 47. Precursors GLMMs Results Conclusions References Primary tools lme4: multiple/crossed REs, (profiling): fast MCMCglmm: Bayesian, very flexible glmmADMB: negative binomial, zero-inflated etc. Most flexible: R2ADMB/AD Model Builder, R2WinBUGS/WinBUGS/R2jags/JAGS, INLA Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 48. Precursors GLMMs Results Conclusions References Loose ends Overdispersion and zero-inflation: MCMCglmm, glmmADMB Spatial and temporal correlation (R-side effects): MASS:glmmPQL (sort of), GLMMarp, INLA; WinBUGS, AD Model Builder Additive models: amer, gamm4, mgcv Penalized methods (Jiang, 2008) (?) Hierarchical GLMs: hglm, HGLMMM Marginal models: geepack, gee Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 49. Precursors GLMMs Results Conclusions References To be done Many holes in knowledge (but what can be done?) Faster algorithms, more parallel computation Lots of implementation and clean-up Benefits & costs of staying within the GLMM framework Benefits & costs of diversity More info: glmm.wikidot.com Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 50. Precursors GLMMs Results Conclusions References Acknowledgements Data: Josh Banta and Massimo Pigliucci (Arabidopsis); Adrian Stier and Sea McKeon (coral symbionts); Courtney Kagan, Jocelynn Ortega, David Julian (Glycera); Co-authors: Mollie Brooks, Connie Clark, Shane Geange, John Poulsen, Hank Stevens, Jada White Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs
  • 51. Precursors GLMMs Results Conclusions References References Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattle symposium in biostatistics: Analysis of correlated data, pages 1–22. Springer. ISBN 0387208623. Cordeiro, G.M. and Ferrari, S.L.P., 1998. Journal of Statistical Planning and Inference, 71(1-2):261–269. ISSN 0378-3758. doi:10.1016/S0378-3758(98)00005-6. Cordeiro, G.M., Paula, G.A., and Botter, D.A., 1994. International Statistical Review / Revue Internationale de Statistique, 62(2):257–274. ISSN 03067734. doi:10.2307/1403512. Gelman, A., 2005. Annals of Statistics, 33(1):1–53. doi:doi:10.1214/009053604000001048. Goldman, N. and Whelan, S., 2000. Molecular Biology and Evolution, 17(6):975–978. Greven, S., 2008. Non-Standard Problems in Inference for Additive and Linear Mixed Models. Cuvillier Verlag, G¨ttingen, Germany. ISBN 3867274916. o Greven, S. and Kneib, T., 2010. Biometrika, 97(4):773–789. Hadfield, J.D., 2010. Journal of Statistical Software, 33(2):1–22. ISSN 1548-7660. HURVICH, C.M. and TSAI, C., 1989. Biometrika, 76(2):297 –307. doi:10.1093/biomet/76.2.297. Jiang, J., 2008. The Annals of Statistics, 36(4):1669–1692. ISSN 0090-5364. doi:10.1214/07-AOS517. Kenward, M.G. and Roger, J.H., 1997. Biometrics, 53(3):983–997. Molenberghs, G. and Verbeke, G., 2007. The American Statistician, 61(1):22–27. doi:10.1198/000313007X171322. Pinheiro, J.C. and Bates, D.M., 2000. Mixed-effects models in S and S-PLUS. Springer, New York. ISBN 0-387-98957-9. Richards, S.A., 2005. Ecology, 86(10):2805–2814. doi:10.1890/05-0074. Schaalje, G., McBride, J., and Fellingham, G., 2002. Journal of Agricultural, Biological & Environmental Statistics, 7(14):512–524. Spiegelhalter, D.J., Best, N., et al., 2002. Journal of the Royal Statistical Society B, 64:583–640. Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology Open-source GLMMs