Custom Functions for Specifying Nonlinear Terms to gnm
Custom Functions for Specifying Nonlinear
Terms to gnm
Heather Turner, David Firth and Andy Batchelor
Department of Statistics
University of Warwick, UK
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 1 / 17
Motivating Application: Marriage Data
We wish to investigate the propensity to marry for women living
in Ireland based on the Living in Ireland Surveys (1994-2001)
For women born between 1950 and 1975 we have
year of (first) marriage
year and month of birth
social class
highest level of education attained
year highest level of education was attained
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 2 / 17
Proposed Model
We wish to model the hazard of marriage occuring at time t
h(t) = P (T = t|T ≥ t)
using the discrete-time model
logit(h(t)) = c + βl log(ageit − αl ) + βr log(αr − ageit ) + xit β
=η
The parameters can be estimated by logistic regression of a
marriage indicator on η — a generalized nonlinear model
need custom "nonlin" function to specify “log-excess” terms
in the formula argument to gnm
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 3 / 17
Variables and Predictors
A "nonlin" function creates a list of arguments for the internal
function nonlinTerms
First define the variables and predictors in the term
βl log(ageit − αl )
Start to build "nonlin" function as follows
LogExcess <- function(age){
list(predictors = list(beta = ~1, alpha = ~1),
variables = list(substitute(age)))
}
class(LogExcess) <- "nonlin"
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 4 / 17
Term-specific Issues
Want to use same function for both log-excess terms, so add
argument
side = "left"
To avoid taking logs of negative values, we let
αl = age[min] − 10−5 − exp(αl )
∗
αr = age[max] + 10−5 + exp(αr )
∗
∗ ∗
and estimate αl and αr instead. In LogExcess define
constraint <- ifelse(side == "left",
min(age) - 1e-5, max(age) + 1e-5)
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 5 / 17
Term Definition
The term argument of nonlinTerms takes labels for the
predictors and variables and returns a deparsed expression of the
term:
term = function(predLabels, varLabels) {
paste(predLabels[1], " * log(",
" -"[side == "right"], varLabels[1], " + ",
" -"[side == "left"], constraint,
" + exp(", predLabels[2], "))")
}
So e.g.
> term(c("beta", "alpha*"), "age")
[1] "beta * log( age + - 14.99999 + exp( alpha* ))"
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 6 / 17
Parameter Labels
Default parameter labels are taken from the predictor names,
here beta and alpha
To make parameter labels unique, save call to LogExcess:
call <- sys.call()
and specify call argument to nonlinTerms
call = as.expression(call)
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 7 / 17
Complete Function
LogExcess <- function(age, side = "left"){
call <- sys.call()
constraint <- ifelse(side == "left",
min(age) - 1e-5, max(age) + 1e-5)
list(predictors = list(beta = ~1, alpha = ~1),
variables = list(substitute(age)),
term = function(predLabels, varLabels) {
paste(predLabels[1], " * log(",
" -"[side == "right"], varLabels[1], " + ",
" -"[side == "left"], constraint,
" + exp(", predLabels[2], "))")
},
call = as.expression(call))
}
class(LogExcess) <- "nonlin"
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 8 / 17
Summary of Baseline Model
Call:
gnm(formula = marriages/lives ~ LogExcess(age, side = "left") +
LogExcess(age, side = "right"), family = binomial, data = fulldata,
weights = lives, start = c(-20, 3, 0, 3, 0))
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8098 -0.4441 -0.3224 -0.1528 4.0483
Estimate Std. Error z value Pr(>|z|)
(Intercept) -118.5395 NA NA NA
LogExcess(age, side = "left")beta 3.6928 NA NA NA
LogExcess(age, side = "left")alpha -0.1432 NA NA NA
LogExcess(age, side = "right")beta 24.8623 NA NA NA
LogExcess(age, side = "right")alpha 4.0247 NA NA NA
Std. Error is NA where coefficient has been constrained or is unidentified
Residual deviance: 12553 on 31004 degrees of freedom
AIC: 12748
Number of iterations: 76
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 9 / 17
Example ‘Recoil’ Plot
0.12
Probability of Marriage
0.08
0.04
0.00
10 20 30 40 50
Age
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 10 / 17
Example ‘Recoil’ Plot
0.12
Probability of Marriage
0.08
0.04
0.00
10 20 30 40 50
Age
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 10 / 17
Example ‘Recoil’ Plot
0.12
qq
Probability of Marriage
q q
q q
q
q
0.08
q
q q
q
q q
q
0.04
q q
q
q
q q
q
q qq
qq
0.00
q q
10 20 30 40 50
Age
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 10 / 17
Re-parameterization
The problem with aliasing can be overcome by re-parameterizing
the baseline model:
ν − αl
γ − δ (ν − αl ) log
ageit − αl
αr − ν
+ δ (αr − ν) log
αr − ageit
A new nonlin function, Bell, is needed to specify this term
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 11 / 17
Interpretation of Parameters
The parameters of the new parameterization have a more useful
interpretation than before:
expit(γ)
Probability of Marriage
αL ν αR
Age (years)
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 12 / 17
Recoil Plots for Reparameterised Model
0.15
qqq
q
q
qqqqqq
0.10
q
q q
pred (x) q q q
pred (x)
q q
q
q q
q q
q q
q q
0.05
q q
q q q
q γ q
q
q ν q
q
q −2.09 → −1.95 q
q
q
25.39 → 28 q
q qq q q
0.00
qqqq q qq
qqqq qqqqqq q
Probability of Marriage
x qqqq x
qqqqq q q
q q
pred (x)
pred (x)
q q q q
q q q
q q q
q q q
q q q q
q q
q
δ q q αL q
q
q q q
q 0.34 → 0.15 q
q
q
−0.14 → 0.035 q
qq
qq q qqqq
q qqqqqq
qqq qq
0.15
10 20 30 40 50
qqqq x x
0.10
rep(0, 41)
q q
pred (x)
q q
q
q q
q Original Model
0.05
q q
q Perturbed Model
q αR q
q q Re−fitted Model
q
q 4.02 → 20 qq
qq
q
0.00
qqq
qqq qqqqqq
10 20 30 40 50
Age (years)
x 10:50
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 13 / 17
Summary of Re-parameterized Model
Call:
gnm(formula = marriages/lives ~ Bell(age), family = binomial,
data = fulldata, weights = lives,
start = c(NA, 26, 0.2, NA, NA))
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8098 -0.4441 -0.3224 -0.1528 4.0483
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.09446 0.03313 -63.225 < 2e-16
Bell(age)peak 25.39354 0.30405 83.517 < 2e-16
Bell(age)fallOff 0.32917 0.09065 3.631 0.000282
Bell(age)leftAdj -0.14321 0.89350 -0.160 0.872663
Bell(age)rightAdj 4.02470 1.73753 2.316 0.020540
(Dispersion parameter for binomial family taken to be 1)
Residual deviance: 12553 on 31004 degrees of freedom
AIC: 12748
Number of iterations:
H. Turner, D. Firth and A. Batchelor () 14 Custom “nonlin” Functions UseR August 2008 14 / 17
Further Analysis
We can write a simple function to compute the endpoints and
their standard errors
> BellEndpoints(bell.mod)
[,1] [,2]
left 14.17508 0.7742838
right 100.92183 97.2381849
Adding theoretically important covariates produces even higher
estimate of right endpoint
use simpler model with infinite right endpoint
Residual analysis suggests location of hazard depends on
education level
extend model to allow ν to depend on covariates
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 15 / 17
Final Model
For women born in 1950
0.20
1.0
Education level (years)
Proportion never married
Probability of marriage
0.8
0.15
8
11
14
0.6
0.10
15
16
0.4
17
0.05
0.2
0.00
0.0
15 20 25 30 35 40 45 10 20 30 40 50
Age (years) Age (years)
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 16 / 17
References/Acknowledgements
More information about gnm can be found on
www.warwick.ac.uk/go/gnm
A working paper on the marriage application is available at
www.warwick.ac.uk/go/crism/research/2007
The marriage data are from The Economic and Social Research
Institute Living in Ireland Survey Microdata File ( c Economic
and Social Research Institute).
We gratefully acknowledge Carmel Hannan for introducing us to
this application and providing background on the data.
H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 17 / 17