1. Multiplicative Interaction Models in R
Heather Turner and David Firth
Department of Statistics
The University of Warwick, UK
8th August 2005
2. Multiplicative Interaction Models
– Interaction terms in generalized linear models can be over-complex
ª use multiplicative interactions instead
– e.g. Goodmans’ row-column association model
log µrc = αr + βc + γr δc
– e.g. UNIDIFF model
log µijk = αik + βjk + γk δij
– e.g. GAMMI model
K
link(µrc ) = αr + βc + σk γkr δkc
k=1
3. Fitting Multiplicative Interaction Models
– Multiplicative interactions are non-linear
ª difficult to fit using standard software
– Llama (Log Linear and Multiplicative Analysis) written by Firth (1998)
ª log link only
ª categorical variables only
ª standard multiplicative interactions
ª XLisp package
4. The gnm R Package
– Provides framework for estimating generalised nonlinear models
ª in-built mechanism for multiplicative terms
ª works with “plug-in” functions to fit other types of nonlinear terms
– Designed to be glm-like
ª common arguments, returned objects, methods, etc
– Estimates model parameters using maximum likelihood
– Uses over-parameterised representations of models
5. Working with Over-Parameterised Models
– gnm does not impose any identifiability constraints on the nonlinear parameters
ª the same model can be represented by an infinite number of
parameterizations, e.g.
log µrc = αr + βc + γr δc
= αr + βc + (2γr )(0.5δc )
′ ′
= αr + βc + γr δc
ª gnm will return one of these parameterisations, at random
– General rule for applying constraints not required
– Fitting algorithm must be able to handle singular matrices
6. Parameter Estimation
– Wish to estimate the predictor
η = η(β)
which is nonlinear, so we have a local design matrix
∂η
X(β) =
∂β
where rank(X) < p, the no. of parameters, due to over-parameterisation
– Use maximum likelihood estimation: want to solve the likehood score equations
U (β) = ∇l(β) = 0
7. Fitting Algorithm
– Use a two stage procedure:
1. one-parameter-at-a-time Newton method to update nonlinear parameters
2. full Newton-Raphson to update all parameters but with the Moore-Penrose
pseudoinverse (X T W X)− since X is not of full rank
– Starting values are obtained in two ways
for the linear parameters use estimates from a glm fit
for the nonlinear parameters generate randomly
ª parameterization determined by starting values for nonlinear parameters
8. Model Specification
– Models in R are usually specified by a symbolic formula, such as
y ∼a+b+a:b
– gnm introduces two functions to specify nonlinear terms
ª Mult for standard multiplicative interactions, e.g.
counts ∼ row + column +
Mult(-1 + row, -1 + column)
ª Nonlin for other terms that require a “plug-in” function, e.g.
counts ∼ row + column +
Nonlin(MultHomog(row, column))
9. Example: Occupational Status Data
– Study of occupational status taken from Goodman (1979)
– Cross-classified by occupational status of father (origin) and son (destination)
> occupationalStatus
destination
origin 1 2 3 4 5 6 7 8
1 50 19 26 8 7 11 6 2
2 16 40 34 18 11 20 8 3
3 12 35 65 66 35 88 23 21
4 11 20 58 110 40 183 64 32
5 2 8 12 23 25 46 28 12
6 12 28 102 162 90 554 230 177
7 0 6 19 40 21 158 143 71
8 0 3 14 32 15 126 91 106
13. Concluding Remarks
– gnm has many more useful features
ª for multiplicative models: functional factors, Exp, multiplicity
ª other: symmetric interactions, diagonal reference models, eliminate, ...
– More details can be found on
http://www.warwick.ac.uk/go/dfirth/software/gnm
– Available on CRAN http://cran.r-project.org