SlideShare a Scribd company logo
From L to N: Nonlinear Predictors in
                          Generalized Models

                                          Heather Turner

                                Independent Statistical/R Consultant


                                           owing much to
                                 David Firth, University of Warwick




Heather Turner (Independent Consultant)       From L to N             GLM 40 Years On 2012   1 / 32
From L to N

  In a GLM we have

                                   g(µ) = β0 + β1 x1 + ... + βp xp

  and
                                          Var(Y ) = φV (µ)
  A generalized nonlinear model (GNM) is the same as a GLM
  except that we have
                          g(µ) = η(x; β)
  where η(x; β) is nonlinear in the parameters β.



Heather Turner (Independent Consultant)        From L to N           GLM 40 Years On 2012   2 / 32
Motivation

  GNMs may be thought of as...
  ... an extension of Nonlinear Least Squares

          using a nonlinear function of a continuous variable to model a
          non-Gaussian response

  ... an extension of GLMs

          using nonlinear functions of parameters to produce a more
          parsimonious model and interpretable model.




Heather Turner (Independent Consultant)     From L to N   GLM 40 Years On 2012   3 / 32
Example: Mental Health Status

  The following contingency table cross-classifies a sample of 1660
  residents of Manhattan by child’s mental impairment and parents’
  socioeconomic status (Agresti, 2002)

  ##     MHS
  ## SES well mild moderate impaired
  ##   A     64  94      58       46
  ##   B     57  94      54       40
  ##   C     57 105      65       60
  ##   D     72 141      77       94
  ##   E     36  97      54       78
  ##   F     21  71      54       71




Heather Turner (Independent Consultant)   From L to N   GLM 40 Years On 2012   4 / 32
Independence

  A simple analysis of these data might be to test for independence of
  MHS and SES using a chi-squared test.
  This is equivalent to testing the goodness-of-fit of the independence
  model
                             log(µrc ) = αr + βc

  Such a test compares the independence model to the saturated model

                                          log(µrc ) = αr + βc + γrc

  which may be over-complex.



Heather Turner (Independent Consultant)            From L to N        GLM 40 Years On 2012   5 / 32
Row-column Association
  One intermediate model is the Row-Column association model:

                                      log(µrc ) = αr + βc + φr ψc

  (Goodman, 1979), an example of a multiplicative interaction model.
  For the Mental Health data:
  ##    Analysis of Deviance Table
  ##
  ##    Model 1:       Freq ~ SES + MHS
  ##    Model 2:       Freq ~ SES + MHS + Mult ( SES , MHS )
  ##    Model 3:       Freq ~ SES + MHS + SES : MHS
  ##      Resid .      Df Resid . Dev Df Deviance Pr ( > Chi )
  ##    1              15        47.4
  ##    2               8         3.6 7      43.8 2.3 e -07
  ##    3               0         0.0 8       3.6        0.89




Heather Turner (Independent Consultant)         From L to N         GLM 40 Years On 2012   6 / 32
Parameterisation

  The independence model was defined earlier in an over-parameterised
  form:

                                   log(µrc ) = αr + βc
                                             = (αr + 1) + (βc − 1)
                                                ∗     ∗
                                             = αr + βc

  Identifiability constraints may be imposed
       to fix a one-to-one mapping between parameter values and
       distributions
       to enable interpretation of parameters



Heather Turner (Independent Consultant)        From L to N           GLM 40 Years On 2012   7 / 32
Standard Implementation

  The standard approach of all major statistical software packages is to
  apply the identifiability constraints in the construction of the model

                                          g(µ) = Xβ

  so that rank(X) is equal to the number of parameters p.
  Then the inverse in the score equations of the IWLS algorithm
                                                         −1
                           β (r+1) = X T W (r) X              X T W (r) z (r)

  exists.


Heather Turner (Independent Consultant)    From L to N                   GLM 40 Years On 2012   8 / 32
Alternative Implementation

  An alternative is to keep models in their over-parameterised form, so
  that rank(X) < p, and use the generalised inverse in the IWLS
  updates:
                                                        −
                            β (r+1) = X T W (r) X           X T W (r) z (r)

  This approach is more useful for GNMs, since in this case it is much
  harder to define standard rules for specifying identifiability
  constraints.
  Rather, identifiability constraints can be applied post-fitting for
  inference and interpretation.



Heather Turner (Independent Consultant)   From L to N                  GLM 40 Years On 2012   9 / 32
Estimation of GNMs
  GNMs present further technical difficulties vs. GLMs

          automatic generation of starting values is hard
          the likelihood may have multiple optima

  The default approach used in the gnm package for R is as follows:

          generate starting values randomly for nonlinear parameters and
          using a GLM fit for linear parameters
          use one-parameter-at-a-time Newton method to update
          nonlinear parameters
          use the generalized IWLS to update all parameters

  Consequently, the parameterisation returned is random.

Heather Turner (Independent Consultant)    From L to N    GLM 40 Years On 2012   10 / 32
Parameterisation of RC Model
  The RC model is invariant to changes in scale or location of the
  interaction parameters:
                          log(µrc ) = αr + βc + φr ψc
                                    = αr + βc + (2φr )(0.5ψc )
                                    = αr + (βc − ψc ) + (φr + 1)(ψc )
  One way to constrain these parameters is as follows
                                                                    wr φr
                                                  φr −          r
                                                                     wr
                                  φ∗
                                   r      =                         r

                                                                                wr φr
                                              r   wr φr −                   r
                                                                                 wr
                                                                                r


  where wr is the row probability, say, so that
                               wr φ∗ = 0
                                   r                                        wr (φ∗ )2 = 1
                                                                                 r
                           r                                            r

Heather Turner (Independent Consultant)           From L to N                           GLM 40 Years On 2012   11 / 32
Row and Column Scores
  The row and columns scores for the RC model are

  ##                               Estimate Std . Error
  ##    Mult (. ,     MHS ) . SESA     1.11        0.30
  ##    Mult (. ,     MHS ) . SESB     1.12        0.31
  ##    Mult (. ,     MHS ) . SESC     0.37        0.32
  ##    Mult (. ,     MHS ) . SESD    -0.03        0.27
  ##    Mult (. ,     MHS ) . SESE    -1.01        0.31
  ##    Mult (. ,     MHS ) . SESF    -1.82        0.28
  ##                                      Estimate Std . Error
  ##    Mult ( SES ,     .) . MHSwell         1.68        0.19
  ##    Mult ( SES ,     .) . MHSmild         0.14        0.20
  ##    Mult ( SES ,     .) . MHSmoderate    -0.14        0.28
  ##    Mult ( SES ,     .) . MHSimpaired    -1.41        0.17


  As one might expect, the scores are ordered for both factors,
  suggesting the model for the dependence structure might be
  simplified further.

Heather Turner (Independent Consultant)    From L to N           GLM 40 Years On 2012   12 / 32
Biplot Model


  Biplots are graphical displays of data arrays which represent the
  objects that index all dimensions of the array on the same plot.
  So for a two-way table, a biplot represents both the rows and
  columns at the same time.
  The biplot is constructed from a rank-2 representation of the data.
  Here we consider the generalized bilinear model

                                          g(µij ) = α1i β1j + α2i β2j




Heather Turner (Independent Consultant)            From L to N          GLM 40 Years On 2012   13 / 32
Example: Leaf Blotch Data


  The proportion of leaf area affected by leaf blotch was recorded for
  10 varieties of barley grown at nine sites (Gabriel, 1998).
  Thus the response is a continuous variable in [0, 1].
  Wedderburn (1974) suggested to model these data using a logit link
  and a variance proportional to the square of that of the binomial, i.e.
  V (µ) = µ2 (1 − µ)2 – a quasi-likelihood model.




Heather Turner (Independent Consultant)   From L to N   GLM 40 Years On 2012   14 / 32
Geometrical Intepretation
  Given the bilinear model

                                     logit(µij ) = α1i β1j + α2i β2j

  the effect of site i can be represented by the point

                                                   (α1i , α2i )

  in the space spanned by the linearly independent basis vectors

                                          a1 = (α11 , α12 , . . . α19 )T
                                          a2 = (α21 , α22 , . . . α29 )T




Heather Turner (Independent Consultant)             From L to N            GLM 40 Years On 2012   15 / 32
Visualising Sites and Varieties
  Thus we can represent the sites and varieties separately as follows
                                Site Effects                                                   Variety Effects
                 4




                                                                                     4
                 2




                                                                                     2
   Component 2




                                                                       Component 2
                                                                                                            1 2
                                                                                                            4
                                                                                                               3
                                                                                                            5
                                                                                                            7 6
                                                                                                            89
                 0




                                                                                     0
                                                                                                            X

                                    CE
                 −2




                                                                                     −2
                                         F
                           B        D        G
                                                 H
                                                         I
                      A
                 −4




                                                                                     −4

                      −4       −2            0       2       4                            −4   −2       0         2    4

                                    Component 1                                                     Component 1




Heather Turner (Independent Consultant)                          From L to N                         GLM 40 Years On 2012   16 / 32
Obtaining Orthogonal Bases


  Given the SVD of the matrix of predictors

                                              η = U DV T

  matrices of orthogonal basis vectors on the same scale are given by
                                          1                        1
                            A = UD2                           B = D2V T

  The model stays the same, but the parametrization changes.




Heather Turner (Independent Consultant)         From L to N            GLM 40 Years On 2012   17 / 32
Biplot
                                 Biplot for barley data                                                        Biplot for barley data

                           sites: A−I                                                                    sites: A−I
                 4




                                                                                               4
                           varieties: 1−9, X                                                             varieties: 1−9, X                         v−axis


                                                    I                                                                               I
                 2




                                                                                               2
                                               9X       H                                                                    9X         H
   Component 2




                                                                                 Component 2
                                    6          8                                                                  6          8
                                                            G                                                                               G
                                               7             F D                                                             7               F D
                                                             E                                                                               E
                 0




                                                                                               0
                                           5                  C                                                          5                    C
                             3
                             2         4                           B   A                                   3
                                                                                                           2         4                              B        A
                                   1                                                                             1
                 −2




                                                                                               −2
                                                                                                                                                        h−axis
                 −4




                                                                                               −4
                      −4          −2                0          2       4                            −4          −2                  0          2            4

                                               Component 1                                                                   Component 1




Heather Turner (Independent Consultant)                                    From L to N                                           GLM 40 Years On 2012            18 / 32
Model Refinement
  The biplot suggests that the sites could be represented by points
  along a line, with co-ordinates

                                                (γi , δ0 ).

  and the varieties by points on two lines perpendicular to the site line:

                                     (ν0 + ν1 I(i ∈ {2, 3, 6}), ωj )

  This corresponds to the following simplification of the bilinear model:

                                α1i β1j + α2i β2j
                               ≈γi (ν0 + ν1 I(i ∈ {2, 3, 6})) + δ0 ωj

  or equivalently

                                  γi (ν0 + ν1 I(i ∈ {2, 3, 6})) + ωj ,
Heather Turner (Independent Consultant)         From L to N            GLM 40 Years On 2012   19 / 32
Double Additive Model

  Gabriel (1998) described the model derived from the biplot as the
  double additive model.
  An analysis of deviance confirms that this model is adequate for the
  leaf blotch data
  ## Analysis of Deviance Table
  ##
  ## Model 1: y ~ 0 + Mult ( site , variety , inst = 1) + Mult ( site ,
     ## variety , inst = 2)
  ## Model 2: y ~ variety + Mult ( site , variety . binary ) - 1
  ##    Resid . Df Resid . Dev Df Deviance Pr ( > Chi )
  ## 1          56          41
  ## 2          71          51 -15    -9.94        0.8




Heather Turner (Independent Consultant)   From L to N     GLM 40 Years On 2012   20 / 32
Stereotype Model

  The stereotype model (Anderson, 1984) is suitable for ordered
  categorical data. It is a special case of the multinomial logistic model:

                                                    exp(β0c + β T xi )
                                                                c
                             pr(yi = c|xi ) =
                                                     r exp(β0r + β T xi )
                                                                   r

  in which only the scale of the relationship with the covariates changes
  between categories:

                                                  exp(β0c + γc β T xi )
                           pr(yi = c|xi ) =                        T
                                                   r exp(β0r + γr β xi )




Heather Turner (Independent Consultant)        From L to N            GLM 40 Years On 2012   21 / 32
Poisson Trick
  The stereotype model can be fitted as a GNM by re-expressing the
  categorical data as category counts Yi = (Yi1 , . . . , Yik ).
  Assuming a Poisson distribution for Yic , the joint distribution of Yi is
  Multinomial(Ni , pi1 , . . . , pik ) conditional on the total count Ni .
  The expected counts are then µic = Ni pic and the parameters of the
  sterotype model can be estimated through fitting

                                 log µic = log(Ni ) + log(pic )
                                          = αi + β0c + γc        βr xir
                                                             r

  where the “nuisance” parameters αi ensure that the multinomial
  denominators are reproduced exactly, as required.

Heather Turner (Independent Consultant)        From L to N            GLM 40 Years On 2012   22 / 32
Augmented Least Squares
  A disadvantage of using the Poisson trick is that the number of
  nuisance parameters can be large, making computation slow.
  The algorithm can be adapted using augmented least squares.
  For an ordinary least squares model,
                                                                  −1
                        T                 −1       yT y yT X                 A11 A12
             (y|X) (y|X)                       =                       =
                                                   XT y XT X                 A21 A22

  where A11 , A12 and A22 are functions of y T y, X T y and X T X.
  Then it can be shown that

                                   ˆ                       A21
                                   β = (X T X)−1 X T y = −
                                                           A11
  requiring only the first row (column) of the inverse to be found.
Heather Turner (Independent Consultant)             From L to N            GLM 40 Years On 2012   23 / 32
Application to Nuisance Parameters I
  The same approach can be applied to the IWLS algorithm, letting
                                                1
                                          ˜
                                          X = W 2 (z|X)

  Now let
                                           ˜
                                           X = (U |V )
  where V is the part of the design matrix corresponding to the
  nuisance factor.
  U is an nk × p matrix where n is the number of nuisance parameters
  and k is the number of categories and p is the number of model
  parameters, typically with n >> p.
  V is an nc × n matrix of dummy variables identifying each individual.

Heather Turner (Independent Consultant)      From L to N   GLM 40 Years On 2012   24 / 32
Application to Nuisance Parameters II

  Then
                                                            −
                     ˜T ˜                 UTU UTV                   B 11 B 12
                    (X X)− =                                    =
                                          V TU V TV                 B 21 B 22

  Again, only the first row (column) of this generalised inverse is
                       ˆ
  required to estimate β, so we are only interested in B 11 and B 12 .

                          B 11 = (U T U − U T V (V T V )−1 V T U )−
                          B 12 = −(V T V )−1 V T U B 11




Heather Turner (Independent Consultant)       From L to N              GLM 40 Years On 2012   25 / 32
Elimination of the Nuisance Factor

  U T U is p × p, therefore not expensive to compute.
  V T V and V T U can be computed without constructing the large
  nk × n matrix V , due to the stucture of V
       V T V is diagonal and the non-zero elements can be computed
       directly
       V T U is equivalent to aggregating the rows of U by levels of the
       nuisance factor

  Thus we only need to construct the U matrix, saving memory and
  reducing the computational burden



Heather Turner (Independent Consultant)   From L to N   GLM 40 Years On 2012   26 / 32
Example: Back Pain Data

  For 101 patients, 3 prognostic variables were recorded at baseline,
  then after 3 weeks the level of back pain was recorded (Anderson,
  1984)
  These data were converted to counts, for example for the first record:


  ##           x1 x2 x3                 pain count id
  ##    1       1 1 1                  worse     0 1
  ##    1.1     1 1 1                   same     1 1
  ##    1.2     1 1 1   slight . improvement     0 1
  ##    1.3     1 1 1 moderate . improvement     0 1
  ##    1.4     1 1 1   marked . improvement     0 1
  ##    1.5     1 1 1      complete . relief     0 1




Heather Turner (Independent Consultant)   From L to N   GLM 40 Years On 2012   27 / 32
Back Pain Model
  In this example, the expanded data is not that long (606 records) and
  the total number of parameters is only 115 (9 nonlinear), so the
  model does not take long to fit (< 1s!).
  However, eliminating the linear parameters reduces the computation
  time by almost two-thirds, showing the potential of this technique.
  Compare the stereotype model to the multinomial logistic model:
  ##    Analysis of Deviance Table
  ##
  ##    Model 1: count ~ pain + Mult ( pain , x1 + x2 + x3 ) - 1
  ##    Model 2: count ~ pain + pain : x1 + pain : x2 + pain : x3 - 1
  ##      Resid . Df Resid . Dev Df Deviance Pr ( > Chi )
  ##    1        493         303
  ##    2        485         299 8      4.08        0.85




Heather Turner (Independent Consultant)        From L to N   GLM 40 Years On 2012   28 / 32
Identifiability Constraints

  In order to make the category-specific multipliers identifiable, we
  must constrain both the location and scale.
  A simple way to do this is to set the first multiplier to zero and fix
  the coefficient of the first covariate to one.
  ##                                      estimate      SE quasiSE quasiVar
  ##    worse                                0.000   0.000 1.7797 3.16745
  ##    same                                -3.710   1.826 0.4281 0.18330
  ##    slight . improvement                -3.510   1.792 0.4025 0.16198
  ##    moderate . improvement              -2.633   1.669 0.5519 0.30454
  ##    marked . improvement                -4.612   1.895 0.3133 0.09817
  ##    complete . relief                   -5.372   2.000 0.4920 0.24202


  Quasi standard errors (Firth and de Menezes, 2004) are invariant to
  reference class


Heather Turner (Independent Consultant)         From L to N         GLM 40 Years On 2012   29 / 32
Comparison Intervals
                                          Intervals based on quasi standard errors
               4
               2




                           q
               0
    estimate
               −2




                                                                     q

                                            q           q
               −4




                                                                                   q
                                                                                                q
               −6




                        worse             same        slight      moderate      marked      complete
                                                   improvement   improvment   improvement     relief
                                                             pain




Heather Turner (Independent Consultant)               From L to N                  GLM 40 Years On 2012   30 / 32
Summary



  Moving from GLMs to GNMs present some technical difficulties, but
  provides a framework that covers several useful models.
  Further examples can be found in the help files and manual
  accompanying the gnm package which is available on CRAN.




Heather Turner (Independent Consultant)    From L to N   GLM 40 Years On 2012   31 / 32
References
  Agresti, A. (2002). Categorical Data Analysis (2nd ed.). New York: Wiley.
  Anderson, J. A. (1984). Regression and Ordered Categorical Variables. J.
    R. Statist. Soc. B 46 (1), 1–30.
  Firth, D. and R. X. de Menezes (2004). Quasi-variances. Biometrika 91,
     65–80.
  Gabriel, K. R. (1998). Generalised bilinear regression. Biometrika 85,
    689–700.
  Goodman, L. A. (1979). Simple models for the analysis of association in
    cross-classifications having ordered categories. J. Amer. Statist.
    Assoc. 74, 537–552.
  Wedderburn, R. W. M. (1974). Quasi-likelihood Functions, Generalized
   Linear Models, and the Gauss-Newton Method. Biometrika 61,
   439–447.


Heather Turner (Independent Consultant)     From L to N   GLM 40 Years On 2012   32 / 32

More Related Content

What's hot

Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
Fabian Pedregosa
 
tensor-decomposition
tensor-decompositiontensor-decomposition
tensor-decomposition
Kenta Oono
 
Tensor Decomposition and its Applications
Tensor Decomposition and its ApplicationsTensor Decomposition and its Applications
Tensor Decomposition and its Applications
Keisuke OTAKI
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
Fabian Pedregosa
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
Gabriel Peyré
 
Digital Signal Processing[ECEG-3171]-Ch1_L02
Digital Signal Processing[ECEG-3171]-Ch1_L02Digital Signal Processing[ECEG-3171]-Ch1_L02
Digital Signal Processing[ECEG-3171]-Ch1_L02
Rediet Moges
 
Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1
TOMMYLINK1
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
Arvind Devaraj
 
The convenience yield implied by quadratic volatility smiles presentation [...
The convenience yield implied by quadratic volatility smiles   presentation [...The convenience yield implied by quadratic volatility smiles   presentation [...
The convenience yield implied by quadratic volatility smiles presentation [...
yigalbt
 
Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...
Tatsuya Yokota
 
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2
Ali Baig
 
Decimation in time and frequency
Decimation in time and frequencyDecimation in time and frequency
Decimation in time and frequency
SARITHA REDDY
 
Chapter 9 computation of the dft
Chapter 9 computation of the dftChapter 9 computation of the dft
Chapter 9 computation of the dft
mikeproud
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transform
op205
 
Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFT
taha25
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Fft presentation
Fft presentationFft presentation
Fft presentation
ilker Şin
 
Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)
Pei-Che Chang
 
Discrete Fourier Transform
Discrete Fourier TransformDiscrete Fourier Transform
Discrete Fourier Transform
Abhishek Choksi
 
On the stability and accuracy of finite difference method for options pricing
On the stability and accuracy of finite difference method for options pricingOn the stability and accuracy of finite difference method for options pricing
On the stability and accuracy of finite difference method for options pricing
Alexander Decker
 

What's hot (20)

Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
 
tensor-decomposition
tensor-decompositiontensor-decomposition
tensor-decomposition
 
Tensor Decomposition and its Applications
Tensor Decomposition and its ApplicationsTensor Decomposition and its Applications
Tensor Decomposition and its Applications
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
 
Digital Signal Processing[ECEG-3171]-Ch1_L02
Digital Signal Processing[ECEG-3171]-Ch1_L02Digital Signal Processing[ECEG-3171]-Ch1_L02
Digital Signal Processing[ECEG-3171]-Ch1_L02
 
Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
 
The convenience yield implied by quadratic volatility smiles presentation [...
The convenience yield implied by quadratic volatility smiles   presentation [...The convenience yield implied by quadratic volatility smiles   presentation [...
The convenience yield implied by quadratic volatility smiles presentation [...
 
Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...
 
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2
 
Decimation in time and frequency
Decimation in time and frequencyDecimation in time and frequency
Decimation in time and frequency
 
Chapter 9 computation of the dft
Chapter 9 computation of the dftChapter 9 computation of the dft
Chapter 9 computation of the dft
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transform
 
Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFT
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Fft presentation
Fft presentationFft presentation
Fft presentation
 
Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)
 
Discrete Fourier Transform
Discrete Fourier TransformDiscrete Fourier Transform
Discrete Fourier Transform
 
On the stability and accuracy of finite difference method for options pricing
On the stability and accuracy of finite difference method for options pricingOn the stability and accuracy of finite difference method for options pricing
On the stability and accuracy of finite difference method for options pricing
 

Viewers also liked

Modelling the Diluting Effect of Social Mobility on Health Inequality
Modelling the Diluting Effect of Social Mobility on Health InequalityModelling the Diluting Effect of Social Mobility on Health Inequality
Modelling the Diluting Effect of Social Mobility on Health Inequality
htstatistics
 
Detecting Drug Effects in the Brain
Detecting Drug Effects in the BrainDetecting Drug Effects in the Brain
Detecting Drug Effects in the Brain
htstatistics
 
Custom Functions for Specifying Nonlinear Terms to gnm
Custom Functions for Specifying Nonlinear Terms to gnmCustom Functions for Specifying Nonlinear Terms to gnm
Custom Functions for Specifying Nonlinear Terms to gnm
htstatistics
 
BradleyTerry2: Flexible Models for Paired Comparisons
BradleyTerry2: Flexible Models for Paired ComparisonsBradleyTerry2: Flexible Models for Paired Comparisons
BradleyTerry2: Flexible Models for Paired Comparisons
htstatistics
 
Generalized Bradley-Terry Modelling of Football Results
Generalized Bradley-Terry Modelling of Football ResultsGeneralized Bradley-Terry Modelling of Football Results
Generalized Bradley-Terry Modelling of Football Results
htstatistics
 
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Models for Entry into MarriageNonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Models for Entry into Marriage
htstatistics
 
Clustering Microarray Data
Clustering Microarray DataClustering Microarray Data
Clustering Microarray Data
htstatistics
 
Sample slides from "Programming with R" course
Sample slides from "Programming with R" courseSample slides from "Programming with R" course
Sample slides from "Programming with R" course
htstatistics
 
Sample slides from "Getting Started with R" course
Sample slides from "Getting Started with R" courseSample slides from "Getting Started with R" course
Sample slides from "Getting Started with R" course
htstatistics
 
Sumapaz
SumapazSumapaz
American Eagle Pitch
American Eagle PitchAmerican Eagle Pitch
American Eagle Pitch
Daniela 'Risa' Santisteban
 
21 spain
21 spain21 spain
21 spain
heightses
 
الخطة والمشاريع
الخطة والمشاريعالخطة والمشاريع
الخطة والمشاريع
ali taki
 
Transformation class 03.25.2012
Transformation class 03.25.2012Transformation class 03.25.2012
Transformation class 03.25.2012
Ken White
 
Voice Thread
Voice ThreadVoice Thread
Voice Thread
adarr12
 
Три хита по управлению проектами от компании "Гибкие технологии"
Три хита по управлению проектами от компании "Гибкие технологии"Три хита по управлению проектами от компании "Гибкие технологии"
Три хита по управлению проектами от компании "Гибкие технологии"
Евгений Пикулев
 

Viewers also liked (16)

Modelling the Diluting Effect of Social Mobility on Health Inequality
Modelling the Diluting Effect of Social Mobility on Health InequalityModelling the Diluting Effect of Social Mobility on Health Inequality
Modelling the Diluting Effect of Social Mobility on Health Inequality
 
Detecting Drug Effects in the Brain
Detecting Drug Effects in the BrainDetecting Drug Effects in the Brain
Detecting Drug Effects in the Brain
 
Custom Functions for Specifying Nonlinear Terms to gnm
Custom Functions for Specifying Nonlinear Terms to gnmCustom Functions for Specifying Nonlinear Terms to gnm
Custom Functions for Specifying Nonlinear Terms to gnm
 
BradleyTerry2: Flexible Models for Paired Comparisons
BradleyTerry2: Flexible Models for Paired ComparisonsBradleyTerry2: Flexible Models for Paired Comparisons
BradleyTerry2: Flexible Models for Paired Comparisons
 
Generalized Bradley-Terry Modelling of Football Results
Generalized Bradley-Terry Modelling of Football ResultsGeneralized Bradley-Terry Modelling of Football Results
Generalized Bradley-Terry Modelling of Football Results
 
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Models for Entry into MarriageNonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Models for Entry into Marriage
 
Clustering Microarray Data
Clustering Microarray DataClustering Microarray Data
Clustering Microarray Data
 
Sample slides from "Programming with R" course
Sample slides from "Programming with R" courseSample slides from "Programming with R" course
Sample slides from "Programming with R" course
 
Sample slides from "Getting Started with R" course
Sample slides from "Getting Started with R" courseSample slides from "Getting Started with R" course
Sample slides from "Getting Started with R" course
 
Sumapaz
SumapazSumapaz
Sumapaz
 
American Eagle Pitch
American Eagle PitchAmerican Eagle Pitch
American Eagle Pitch
 
21 spain
21 spain21 spain
21 spain
 
الخطة والمشاريع
الخطة والمشاريعالخطة والمشاريع
الخطة والمشاريع
 
Transformation class 03.25.2012
Transformation class 03.25.2012Transformation class 03.25.2012
Transformation class 03.25.2012
 
Voice Thread
Voice ThreadVoice Thread
Voice Thread
 
Три хита по управлению проектами от компании "Гибкие технологии"
Три хита по управлению проектами от компании "Гибкие технологии"Три хита по управлению проектами от компании "Гибкие технологии"
Три хита по управлению проектами от компании "Гибкие технологии"
 

Similar to From L to N: Nonlinear Predictors in Generalized Models

ICCF_2022_talk.pdf
ICCF_2022_talk.pdfICCF_2022_talk.pdf
ICCF_2022_talk.pdf
Chiheb Ben Hammouda
 
Threshold network models
Threshold network modelsThreshold network models
Threshold network models
Naoki Masuda
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
Chiheb Ben Hammouda
 
Spectral sum rules for conformal field theories
Spectral sum rules for conformal field theoriesSpectral sum rules for conformal field theories
Spectral sum rules for conformal field theories
Subham Dutta Chowdhury
 
Unit 4_3 Correlation Regression.pptx
Unit 4_3 Correlation Regression.pptxUnit 4_3 Correlation Regression.pptx
Unit 4_3 Correlation Regression.pptx
AppasamiG
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
The Statistical and Applied Mathematical Sciences Institute
 
from_data_to_differential_equations.ppt
from_data_to_differential_equations.pptfrom_data_to_differential_equations.ppt
from_data_to_differential_equations.ppt
ashutoshvb1
 
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
Chiheb Ben Hammouda
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimation
jachno
 
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
Jialin LIU
 
poster
posterposter
Pa, moderation, mediation (final)
Pa, moderation, mediation (final)Pa, moderation, mediation (final)
Pa, moderation, mediation (final)
ahmed-nor
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
Chiheb Ben Hammouda
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
Chiheb Ben Hammouda
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
Alexander Decker
 
correlation-analysis.pptx
correlation-analysis.pptxcorrelation-analysis.pptx
correlation-analysis.pptx
SoujanyaLk1
 
Euler lagrange equations of motion mit-holonomic constraints_lecture7
Euler lagrange equations of motion  mit-holonomic  constraints_lecture7Euler lagrange equations of motion  mit-holonomic  constraints_lecture7
Euler lagrange equations of motion mit-holonomic constraints_lecture7
JOHN OBIDI
 
Algorithm review
Algorithm reviewAlgorithm review
Algorithm review
chidabdu
 
Numerical smoothing and hierarchical approximations for efficient option pric...
Numerical smoothing and hierarchical approximations for efficient option pric...Numerical smoothing and hierarchical approximations for efficient option pric...
Numerical smoothing and hierarchical approximations for efficient option pric...
Chiheb Ben Hammouda
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
Steve Nouri
 

Similar to From L to N: Nonlinear Predictors in Generalized Models (20)

ICCF_2022_talk.pdf
ICCF_2022_talk.pdfICCF_2022_talk.pdf
ICCF_2022_talk.pdf
 
Threshold network models
Threshold network modelsThreshold network models
Threshold network models
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
 
Spectral sum rules for conformal field theories
Spectral sum rules for conformal field theoriesSpectral sum rules for conformal field theories
Spectral sum rules for conformal field theories
 
Unit 4_3 Correlation Regression.pptx
Unit 4_3 Correlation Regression.pptxUnit 4_3 Correlation Regression.pptx
Unit 4_3 Correlation Regression.pptx
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
from_data_to_differential_equations.ppt
from_data_to_differential_equations.pptfrom_data_to_differential_equations.ppt
from_data_to_differential_equations.ppt
 
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimation
 
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
 
poster
posterposter
poster
 
Pa, moderation, mediation (final)
Pa, moderation, mediation (final)Pa, moderation, mediation (final)
Pa, moderation, mediation (final)
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
correlation-analysis.pptx
correlation-analysis.pptxcorrelation-analysis.pptx
correlation-analysis.pptx
 
Euler lagrange equations of motion mit-holonomic constraints_lecture7
Euler lagrange equations of motion  mit-holonomic  constraints_lecture7Euler lagrange equations of motion  mit-holonomic  constraints_lecture7
Euler lagrange equations of motion mit-holonomic constraints_lecture7
 
Algorithm review
Algorithm reviewAlgorithm review
Algorithm review
 
Numerical smoothing and hierarchical approximations for efficient option pric...
Numerical smoothing and hierarchical approximations for efficient option pric...Numerical smoothing and hierarchical approximations for efficient option pric...
Numerical smoothing and hierarchical approximations for efficient option pric...
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
 

From L to N: Nonlinear Predictors in Generalized Models

  • 1. From L to N: Nonlinear Predictors in Generalized Models Heather Turner Independent Statistical/R Consultant owing much to David Firth, University of Warwick Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 1 / 32
  • 2. From L to N In a GLM we have g(µ) = β0 + β1 x1 + ... + βp xp and Var(Y ) = φV (µ) A generalized nonlinear model (GNM) is the same as a GLM except that we have g(µ) = η(x; β) where η(x; β) is nonlinear in the parameters β. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 2 / 32
  • 3. Motivation GNMs may be thought of as... ... an extension of Nonlinear Least Squares using a nonlinear function of a continuous variable to model a non-Gaussian response ... an extension of GLMs using nonlinear functions of parameters to produce a more parsimonious model and interpretable model. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 3 / 32
  • 4. Example: Mental Health Status The following contingency table cross-classifies a sample of 1660 residents of Manhattan by child’s mental impairment and parents’ socioeconomic status (Agresti, 2002) ## MHS ## SES well mild moderate impaired ## A 64 94 58 46 ## B 57 94 54 40 ## C 57 105 65 60 ## D 72 141 77 94 ## E 36 97 54 78 ## F 21 71 54 71 Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 4 / 32
  • 5. Independence A simple analysis of these data might be to test for independence of MHS and SES using a chi-squared test. This is equivalent to testing the goodness-of-fit of the independence model log(µrc ) = αr + βc Such a test compares the independence model to the saturated model log(µrc ) = αr + βc + γrc which may be over-complex. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 5 / 32
  • 6. Row-column Association One intermediate model is the Row-Column association model: log(µrc ) = αr + βc + φr ψc (Goodman, 1979), an example of a multiplicative interaction model. For the Mental Health data: ## Analysis of Deviance Table ## ## Model 1: Freq ~ SES + MHS ## Model 2: Freq ~ SES + MHS + Mult ( SES , MHS ) ## Model 3: Freq ~ SES + MHS + SES : MHS ## Resid . Df Resid . Dev Df Deviance Pr ( > Chi ) ## 1 15 47.4 ## 2 8 3.6 7 43.8 2.3 e -07 ## 3 0 0.0 8 3.6 0.89 Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 6 / 32
  • 7. Parameterisation The independence model was defined earlier in an over-parameterised form: log(µrc ) = αr + βc = (αr + 1) + (βc − 1) ∗ ∗ = αr + βc Identifiability constraints may be imposed to fix a one-to-one mapping between parameter values and distributions to enable interpretation of parameters Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 7 / 32
  • 8. Standard Implementation The standard approach of all major statistical software packages is to apply the identifiability constraints in the construction of the model g(µ) = Xβ so that rank(X) is equal to the number of parameters p. Then the inverse in the score equations of the IWLS algorithm −1 β (r+1) = X T W (r) X X T W (r) z (r) exists. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 8 / 32
  • 9. Alternative Implementation An alternative is to keep models in their over-parameterised form, so that rank(X) < p, and use the generalised inverse in the IWLS updates: − β (r+1) = X T W (r) X X T W (r) z (r) This approach is more useful for GNMs, since in this case it is much harder to define standard rules for specifying identifiability constraints. Rather, identifiability constraints can be applied post-fitting for inference and interpretation. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 9 / 32
  • 10. Estimation of GNMs GNMs present further technical difficulties vs. GLMs automatic generation of starting values is hard the likelihood may have multiple optima The default approach used in the gnm package for R is as follows: generate starting values randomly for nonlinear parameters and using a GLM fit for linear parameters use one-parameter-at-a-time Newton method to update nonlinear parameters use the generalized IWLS to update all parameters Consequently, the parameterisation returned is random. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 10 / 32
  • 11. Parameterisation of RC Model The RC model is invariant to changes in scale or location of the interaction parameters: log(µrc ) = αr + βc + φr ψc = αr + βc + (2φr )(0.5ψc ) = αr + (βc − ψc ) + (φr + 1)(ψc ) One way to constrain these parameters is as follows wr φr φr − r wr φ∗ r = r wr φr r wr φr − r wr r where wr is the row probability, say, so that wr φ∗ = 0 r wr (φ∗ )2 = 1 r r r Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 11 / 32
  • 12. Row and Column Scores The row and columns scores for the RC model are ## Estimate Std . Error ## Mult (. , MHS ) . SESA 1.11 0.30 ## Mult (. , MHS ) . SESB 1.12 0.31 ## Mult (. , MHS ) . SESC 0.37 0.32 ## Mult (. , MHS ) . SESD -0.03 0.27 ## Mult (. , MHS ) . SESE -1.01 0.31 ## Mult (. , MHS ) . SESF -1.82 0.28 ## Estimate Std . Error ## Mult ( SES , .) . MHSwell 1.68 0.19 ## Mult ( SES , .) . MHSmild 0.14 0.20 ## Mult ( SES , .) . MHSmoderate -0.14 0.28 ## Mult ( SES , .) . MHSimpaired -1.41 0.17 As one might expect, the scores are ordered for both factors, suggesting the model for the dependence structure might be simplified further. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 12 / 32
  • 13. Biplot Model Biplots are graphical displays of data arrays which represent the objects that index all dimensions of the array on the same plot. So for a two-way table, a biplot represents both the rows and columns at the same time. The biplot is constructed from a rank-2 representation of the data. Here we consider the generalized bilinear model g(µij ) = α1i β1j + α2i β2j Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 13 / 32
  • 14. Example: Leaf Blotch Data The proportion of leaf area affected by leaf blotch was recorded for 10 varieties of barley grown at nine sites (Gabriel, 1998). Thus the response is a continuous variable in [0, 1]. Wedderburn (1974) suggested to model these data using a logit link and a variance proportional to the square of that of the binomial, i.e. V (µ) = µ2 (1 − µ)2 – a quasi-likelihood model. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 14 / 32
  • 15. Geometrical Intepretation Given the bilinear model logit(µij ) = α1i β1j + α2i β2j the effect of site i can be represented by the point (α1i , α2i ) in the space spanned by the linearly independent basis vectors a1 = (α11 , α12 , . . . α19 )T a2 = (α21 , α22 , . . . α29 )T Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 15 / 32
  • 16. Visualising Sites and Varieties Thus we can represent the sites and varieties separately as follows Site Effects Variety Effects 4 4 2 2 Component 2 Component 2 1 2 4 3 5 7 6 89 0 0 X CE −2 −2 F B D G H I A −4 −4 −4 −2 0 2 4 −4 −2 0 2 4 Component 1 Component 1 Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 16 / 32
  • 17. Obtaining Orthogonal Bases Given the SVD of the matrix of predictors η = U DV T matrices of orthogonal basis vectors on the same scale are given by 1 1 A = UD2 B = D2V T The model stays the same, but the parametrization changes. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 17 / 32
  • 18. Biplot Biplot for barley data Biplot for barley data sites: A−I sites: A−I 4 4 varieties: 1−9, X varieties: 1−9, X v−axis I I 2 2 9X H 9X H Component 2 Component 2 6 8 6 8 G G 7 F D 7 F D E E 0 0 5 C 5 C 3 2 4 B A 3 2 4 B A 1 1 −2 −2 h−axis −4 −4 −4 −2 0 2 4 −4 −2 0 2 4 Component 1 Component 1 Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 18 / 32
  • 19. Model Refinement The biplot suggests that the sites could be represented by points along a line, with co-ordinates (γi , δ0 ). and the varieties by points on two lines perpendicular to the site line: (ν0 + ν1 I(i ∈ {2, 3, 6}), ωj ) This corresponds to the following simplification of the bilinear model: α1i β1j + α2i β2j ≈γi (ν0 + ν1 I(i ∈ {2, 3, 6})) + δ0 ωj or equivalently γi (ν0 + ν1 I(i ∈ {2, 3, 6})) + ωj , Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 19 / 32
  • 20. Double Additive Model Gabriel (1998) described the model derived from the biplot as the double additive model. An analysis of deviance confirms that this model is adequate for the leaf blotch data ## Analysis of Deviance Table ## ## Model 1: y ~ 0 + Mult ( site , variety , inst = 1) + Mult ( site , ## variety , inst = 2) ## Model 2: y ~ variety + Mult ( site , variety . binary ) - 1 ## Resid . Df Resid . Dev Df Deviance Pr ( > Chi ) ## 1 56 41 ## 2 71 51 -15 -9.94 0.8 Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 20 / 32
  • 21. Stereotype Model The stereotype model (Anderson, 1984) is suitable for ordered categorical data. It is a special case of the multinomial logistic model: exp(β0c + β T xi ) c pr(yi = c|xi ) = r exp(β0r + β T xi ) r in which only the scale of the relationship with the covariates changes between categories: exp(β0c + γc β T xi ) pr(yi = c|xi ) = T r exp(β0r + γr β xi ) Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 21 / 32
  • 22. Poisson Trick The stereotype model can be fitted as a GNM by re-expressing the categorical data as category counts Yi = (Yi1 , . . . , Yik ). Assuming a Poisson distribution for Yic , the joint distribution of Yi is Multinomial(Ni , pi1 , . . . , pik ) conditional on the total count Ni . The expected counts are then µic = Ni pic and the parameters of the sterotype model can be estimated through fitting log µic = log(Ni ) + log(pic ) = αi + β0c + γc βr xir r where the “nuisance” parameters αi ensure that the multinomial denominators are reproduced exactly, as required. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 22 / 32
  • 23. Augmented Least Squares A disadvantage of using the Poisson trick is that the number of nuisance parameters can be large, making computation slow. The algorithm can be adapted using augmented least squares. For an ordinary least squares model, −1 T −1 yT y yT X A11 A12 (y|X) (y|X) = = XT y XT X A21 A22 where A11 , A12 and A22 are functions of y T y, X T y and X T X. Then it can be shown that ˆ A21 β = (X T X)−1 X T y = − A11 requiring only the first row (column) of the inverse to be found. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 23 / 32
  • 24. Application to Nuisance Parameters I The same approach can be applied to the IWLS algorithm, letting 1 ˜ X = W 2 (z|X) Now let ˜ X = (U |V ) where V is the part of the design matrix corresponding to the nuisance factor. U is an nk × p matrix where n is the number of nuisance parameters and k is the number of categories and p is the number of model parameters, typically with n >> p. V is an nc × n matrix of dummy variables identifying each individual. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 24 / 32
  • 25. Application to Nuisance Parameters II Then − ˜T ˜ UTU UTV B 11 B 12 (X X)− = = V TU V TV B 21 B 22 Again, only the first row (column) of this generalised inverse is ˆ required to estimate β, so we are only interested in B 11 and B 12 . B 11 = (U T U − U T V (V T V )−1 V T U )− B 12 = −(V T V )−1 V T U B 11 Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 25 / 32
  • 26. Elimination of the Nuisance Factor U T U is p × p, therefore not expensive to compute. V T V and V T U can be computed without constructing the large nk × n matrix V , due to the stucture of V V T V is diagonal and the non-zero elements can be computed directly V T U is equivalent to aggregating the rows of U by levels of the nuisance factor Thus we only need to construct the U matrix, saving memory and reducing the computational burden Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 26 / 32
  • 27. Example: Back Pain Data For 101 patients, 3 prognostic variables were recorded at baseline, then after 3 weeks the level of back pain was recorded (Anderson, 1984) These data were converted to counts, for example for the first record: ## x1 x2 x3 pain count id ## 1 1 1 1 worse 0 1 ## 1.1 1 1 1 same 1 1 ## 1.2 1 1 1 slight . improvement 0 1 ## 1.3 1 1 1 moderate . improvement 0 1 ## 1.4 1 1 1 marked . improvement 0 1 ## 1.5 1 1 1 complete . relief 0 1 Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 27 / 32
  • 28. Back Pain Model In this example, the expanded data is not that long (606 records) and the total number of parameters is only 115 (9 nonlinear), so the model does not take long to fit (< 1s!). However, eliminating the linear parameters reduces the computation time by almost two-thirds, showing the potential of this technique. Compare the stereotype model to the multinomial logistic model: ## Analysis of Deviance Table ## ## Model 1: count ~ pain + Mult ( pain , x1 + x2 + x3 ) - 1 ## Model 2: count ~ pain + pain : x1 + pain : x2 + pain : x3 - 1 ## Resid . Df Resid . Dev Df Deviance Pr ( > Chi ) ## 1 493 303 ## 2 485 299 8 4.08 0.85 Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 28 / 32
  • 29. Identifiability Constraints In order to make the category-specific multipliers identifiable, we must constrain both the location and scale. A simple way to do this is to set the first multiplier to zero and fix the coefficient of the first covariate to one. ## estimate SE quasiSE quasiVar ## worse 0.000 0.000 1.7797 3.16745 ## same -3.710 1.826 0.4281 0.18330 ## slight . improvement -3.510 1.792 0.4025 0.16198 ## moderate . improvement -2.633 1.669 0.5519 0.30454 ## marked . improvement -4.612 1.895 0.3133 0.09817 ## complete . relief -5.372 2.000 0.4920 0.24202 Quasi standard errors (Firth and de Menezes, 2004) are invariant to reference class Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 29 / 32
  • 30. Comparison Intervals Intervals based on quasi standard errors 4 2 q 0 estimate −2 q q q −4 q q −6 worse same slight moderate marked complete improvement improvment improvement relief pain Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 30 / 32
  • 31. Summary Moving from GLMs to GNMs present some technical difficulties, but provides a framework that covers several useful models. Further examples can be found in the help files and manual accompanying the gnm package which is available on CRAN. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 31 / 32
  • 32. References Agresti, A. (2002). Categorical Data Analysis (2nd ed.). New York: Wiley. Anderson, J. A. (1984). Regression and Ordered Categorical Variables. J. R. Statist. Soc. B 46 (1), 1–30. Firth, D. and R. X. de Menezes (2004). Quasi-variances. Biometrika 91, 65–80. Gabriel, K. R. (1998). Generalised bilinear regression. Biometrika 85, 689–700. Goodman, L. A. (1979). Simple models for the analysis of association in cross-classifications having ordered categories. J. Amer. Statist. Assoc. 74, 537–552. Wedderburn, R. W. M. (1974). Quasi-likelihood Functions, Generalized Linear Models, and the Gauss-Newton Method. Biometrika 61, 439–447. Heather Turner (Independent Consultant) From L to N GLM 40 Years On 2012 32 / 32