- 1. Multiplicative Interaction Models in R Heather Turner and David Firth Department of Statistics The University of Warwick, UK 8th August 2005
- 2. Multiplicative Interaction Models – Interaction terms in generalized linear models can be over-complex ª use multiplicative interactions instead – e.g. Goodmans’ row-column association model log µrc = αr + βc + γr δc – e.g. UNIDIFF model log µijk = αik + βjk + γk δij – e.g. GAMMI model K link(µrc ) = αr + βc + σk γkr δkc k=1
- 3. Fitting Multiplicative Interaction Models – Multiplicative interactions are non-linear ª difﬁcult to ﬁt using standard software – Llama (Log Linear and Multiplicative Analysis) written by Firth (1998) ª log link only ª categorical variables only ª standard multiplicative interactions ª XLisp package
- 4. The gnm R Package – Provides framework for estimating generalised nonlinear models ª in-built mechanism for multiplicative terms ª works with “plug-in” functions to ﬁt other types of nonlinear terms – Designed to be glm-like ª common arguments, returned objects, methods, etc – Estimates model parameters using maximum likelihood – Uses over-parameterised representations of models
- 5. Working with Over-Parameterised Models – gnm does not impose any identiﬁability constraints on the nonlinear parameters ª the same model can be represented by an inﬁnite number of parameterizations, e.g. log µrc = αr + βc + γr δc = αr + βc + (2γr )(0.5δc ) ′ ′ = αr + βc + γr δc ª gnm will return one of these parameterisations, at random – General rule for applying constraints not required – Fitting algorithm must be able to handle singular matrices
- 6. Parameter Estimation – Wish to estimate the predictor η = η(β) which is nonlinear, so we have a local design matrix ∂η X(β) = ∂β where rank(X) < p, the no. of parameters, due to over-parameterisation – Use maximum likelihood estimation: want to solve the likehood score equations U (β) = ∇l(β) = 0
- 7. Fitting Algorithm – Use a two stage procedure: 1. one-parameter-at-a-time Newton method to update nonlinear parameters 2. full Newton-Raphson to update all parameters but with the Moore-Penrose pseudoinverse (X T W X)− since X is not of full rank – Starting values are obtained in two ways for the linear parameters use estimates from a glm ﬁt for the nonlinear parameters generate randomly ª parameterization determined by starting values for nonlinear parameters
- 8. Model Speciﬁcation – Models in R are usually speciﬁed by a symbolic formula, such as y ∼a+b+a:b – gnm introduces two functions to specify nonlinear terms ª Mult for standard multiplicative interactions, e.g. counts ∼ row + column + Mult(-1 + row, -1 + column) ª Nonlin for other terms that require a “plug-in” function, e.g. counts ∼ row + column + Nonlin(MultHomog(row, column))
- 9. Example: Occupational Status Data – Study of occupational status taken from Goodman (1979) – Cross-classiﬁed by occupational status of father (origin) and son (destination) > occupationalStatus destination origin 1 2 3 4 5 6 7 8 1 50 19 26 8 7 11 6 2 2 16 40 34 18 11 20 8 3 3 12 35 65 66 35 88 23 21 4 11 20 58 110 40 183 64 32 5 2 8 12 23 25 46 28 12 6 12 28 102 162 90 554 230 177 7 0 6 19 40 21 158 143 71 8 0 3 14 32 15 126 91 106
- 10. Homogeneous Row-Column Association Model Call: gnm(formula = Freq ˜ origin + destination + Diag(origin, destination) + Nonlin(MultHomog(origin, destination)), family = poisson, data = occupationalStatus) Coefficients: (Intercept) origin2 0.01031 0.52684 ... origin8 destination2 1.29563 0.94586 ... destination8 Diag(origin, destination)1 1.87101 1.52667 ... Diag(origin, destination)8 MultHomog(origin, destination).1 0.38848 -1.54112 ... MultHomog(origin, destination).8 1.04786
- 11. Homogeneous Row-column Association Model (Contd.) Deviance: 32.56098 Pearson chi-squared: 31.20718 Residual df: 34 – Compare to model with heterogeneous multiplicative effects > RC <- gnm(Freq ˜ origin + destination + Diag(origin, destination) + + Mult(origin, destination), family = poisson, + data = occupationalStatus) > RChomog$dev - RC$dev [1] 3.411823 > RChomog$df.residual - RC$df.residual [1] 6
- 12. Estimate Identiﬁable Contrasts – Unconstrained parameters of homogeneous multiplicative factor not identiﬁable – Contrast effects with lowest level > round(getContrasts(RChomog, rev(coefs.of.interest))[[1]], 5) estimate se MultHomog(origin, destination).8 2.58898 0.18869 MultHomog(origin, destination).7 2.34541 0.17263 MultHomog(origin, destination).6 1.92927 0.15737 MultHomog(origin, destination).5 1.41751 0.17195 MultHomog(origin, destination).4 1.40034 0.16029 MultHomog(origin, destination).3 0.81646 0.16665 MultHomog(origin, destination).2 0.21829 0.23465 MultHomog(origin, destination).1 0.00000 0.00000
- 13. Concluding Remarks – gnm has many more useful features ª for multiplicative models: functional factors, Exp, multiplicity ª other: symmetric interactions, diagonal reference models, eliminate, ... – More details can be found on http://www.warwick.ac.uk/go/dfirth/software/gnm – Available on CRAN http://cran.r-project.org