Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multiplicative Interaction Models in R

7,353 views

Published on

  • Be the first to comment

  • Be the first to like this

Multiplicative Interaction Models in R

  1. 1. Multiplicative Interaction Models in R Heather Turner and David Firth Department of Statistics The University of Warwick, UK 8th August 2005
  2. 2. Multiplicative Interaction Models– Interaction terms in generalized linear models can be over-complex ª use multiplicative interactions instead– e.g. Goodmans’ row-column association model log µrc = αr + βc + γr δc– e.g. UNIDIFF model log µijk = αik + βjk + γk δij– e.g. GAMMI model K link(µrc ) = αr + βc + σk γkr δkc k=1
  3. 3. Fitting Multiplicative Interaction Models– Multiplicative interactions are non-linear ª difficult to fit using standard software– Llama (Log Linear and Multiplicative Analysis) written by Firth (1998) ª log link only ª categorical variables only ª standard multiplicative interactions ª XLisp package
  4. 4. The gnm R Package– Provides framework for estimating generalised nonlinear models ª in-built mechanism for multiplicative terms ª works with “plug-in” functions to fit other types of nonlinear terms– Designed to be glm-like ª common arguments, returned objects, methods, etc– Estimates model parameters using maximum likelihood– Uses over-parameterised representations of models
  5. 5. Working with Over-Parameterised Models– gnm does not impose any identifiability constraints on the nonlinear parameters ª the same model can be represented by an infinite number of parameterizations, e.g. log µrc = αr + βc + γr δc = αr + βc + (2γr )(0.5δc ) ′ ′ = αr + βc + γr δc ª gnm will return one of these parameterisations, at random– General rule for applying constraints not required– Fitting algorithm must be able to handle singular matrices
  6. 6. Parameter Estimation– Wish to estimate the predictor η = η(β) which is nonlinear, so we have a local design matrix ∂η X(β) = ∂β where rank(X) < p, the no. of parameters, due to over-parameterisation– Use maximum likelihood estimation: want to solve the likehood score equations U (β) = ∇l(β) = 0
  7. 7. Fitting Algorithm– Use a two stage procedure: 1. one-parameter-at-a-time Newton method to update nonlinear parameters 2. full Newton-Raphson to update all parameters but with the Moore-Penrose pseudoinverse (X T W X)− since X is not of full rank– Starting values are obtained in two ways for the linear parameters use estimates from a glm fit for the nonlinear parameters generate randomly ª parameterization determined by starting values for nonlinear parameters
  8. 8. Model Specification– Models in R are usually specified by a symbolic formula, such as y ∼a+b+a:b– gnm introduces two functions to specify nonlinear terms ª Mult for standard multiplicative interactions, e.g. counts ∼ row + column + Mult(-1 + row, -1 + column) ª Nonlin for other terms that require a “plug-in” function, e.g. counts ∼ row + column + Nonlin(MultHomog(row, column))
  9. 9. Example: Occupational Status Data– Study of occupational status taken from Goodman (1979)– Cross-classified by occupational status of father (origin) and son (destination) > occupationalStatus destination origin 1 2 3 4 5 6 7 8 1 50 19 26 8 7 11 6 2 2 16 40 34 18 11 20 8 3 3 12 35 65 66 35 88 23 21 4 11 20 58 110 40 183 64 32 5 2 8 12 23 25 46 28 12 6 12 28 102 162 90 554 230 177 7 0 6 19 40 21 158 143 71 8 0 3 14 32 15 126 91 106
  10. 10. Homogeneous Row-Column Association ModelCall:gnm(formula = Freq ˜ origin + destination + Diag(origin, destination) + Nonlin(MultHomog(origin, destination)), family = poisson, data = occupationalStatus)Coefficients: (Intercept) origin2 0.01031 0.52684 ... origin8 destination2 1.29563 0.94586 ... destination8 Diag(origin, destination)1 1.87101 1.52667 ... Diag(origin, destination)8 MultHomog(origin, destination).1 0.38848 -1.54112 ...MultHomog(origin, destination).8 1.04786
  11. 11. Homogeneous Row-column Association Model (Contd.) Deviance: 32.56098 Pearson chi-squared: 31.20718 Residual df: 34– Compare to model with heterogeneous multiplicative effects > RC <- gnm(Freq ˜ origin + destination + Diag(origin, destination) + + Mult(origin, destination), family = poisson, + data = occupationalStatus) > RChomog$dev - RC$dev [1] 3.411823 > RChomog$df.residual - RC$df.residual [1] 6
  12. 12. Estimate Identifiable Contrasts– Unconstrained parameters of homogeneous multiplicative factor not identifiable– Contrast effects with lowest level > round(getContrasts(RChomog, rev(coefs.of.interest))[[1]], 5) estimate se MultHomog(origin, destination).8 2.58898 0.18869 MultHomog(origin, destination).7 2.34541 0.17263 MultHomog(origin, destination).6 1.92927 0.15737 MultHomog(origin, destination).5 1.41751 0.17195 MultHomog(origin, destination).4 1.40034 0.16029 MultHomog(origin, destination).3 0.81646 0.16665 MultHomog(origin, destination).2 0.21829 0.23465 MultHomog(origin, destination).1 0.00000 0.00000
  13. 13. Concluding Remarks– gnm has many more useful features ª for multiplicative models: functional factors, Exp, multiplicity ª other: symmetric interactions, diagonal reference models, eliminate, ...– More details can be found on http://www.warwick.ac.uk/go/dfirth/software/gnm– Available on CRAN http://cran.r-project.org

×