### Custom Functions for Specifying Nonlinear Terms to gnm

1. Custom Functions for Specifying Nonlinear Terms to gnm Heather Turner, David Firth and Andy Batchelor Department of Statistics University of Warwick, UK H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 1 / 17
2. Motivating Application: Marriage Data We wish to investigate the propensity to marry for women living in Ireland based on the Living in Ireland Surveys (1994-2001) For women born between 1950 and 1975 we have year of (ﬁrst) marriage year and month of birth social class highest level of education attained year highest level of education was attained H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 2 / 17
3. Proposed Model We wish to model the hazard of marriage occuring at time t h(t) = P (T = t|T ≥ t) using the discrete-time model logit(h(t)) = c + βl log(ageit − αl ) + βr log(αr − ageit ) + xit β =η The parameters can be estimated by logistic regression of a marriage indicator on η — a generalized nonlinear model need custom "nonlin" function to specify “log-excess” terms in the formula argument to gnm H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 3 / 17
4. Variables and Predictors A "nonlin" function creates a list of arguments for the internal function nonlinTerms First deﬁne the variables and predictors in the term βl log(ageit − αl ) Start to build "nonlin" function as follows LogExcess <- function(age){ list(predictors = list(beta = ~1, alpha = ~1), variables = list(substitute(age))) } class(LogExcess) <- "nonlin" H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 4 / 17
5. Term-speciﬁc Issues Want to use same function for both log-excess terms, so add argument side = "left" To avoid taking logs of negative values, we let αl = age[min] − 10−5 − exp(αl ) ∗ αr = age[max] + 10−5 + exp(αr ) ∗ ∗ ∗ and estimate αl and αr instead. In LogExcess deﬁne constraint <- ifelse(side == "left", min(age) - 1e-5, max(age) + 1e-5) H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 5 / 17
6. Term Deﬁnition The term argument of nonlinTerms takes labels for the predictors and variables and returns a deparsed expression of the term: term = function(predLabels, varLabels) { paste(predLabels[1], " * log(", " -"[side == "right"], varLabels[1], " + ", " -"[side == "left"], constraint, " + exp(", predLabels[2], "))") } So e.g. > term(c("beta", "alpha*"), "age") [1] "beta * log( age + - 14.99999 + exp( alpha* ))" H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 6 / 17
7. Parameter Labels Default parameter labels are taken from the predictor names, here beta and alpha To make parameter labels unique, save call to LogExcess: call <- sys.call() and specify call argument to nonlinTerms call = as.expression(call) H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 7 / 17
8. Complete Function LogExcess <- function(age, side = "left"){ call <- sys.call() constraint <- ifelse(side == "left", min(age) - 1e-5, max(age) + 1e-5) list(predictors = list(beta = ~1, alpha = ~1), variables = list(substitute(age)), term = function(predLabels, varLabels) { paste(predLabels[1], " * log(", " -"[side == "right"], varLabels[1], " + ", " -"[side == "left"], constraint, " + exp(", predLabels[2], "))") }, call = as.expression(call)) } class(LogExcess) <- "nonlin" H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 8 / 17
9. Summary of Baseline Model Call: gnm(formula = marriages/lives ~ LogExcess(age, side = "left") + LogExcess(age, side = "right"), family = binomial, data = fulldata, weights = lives, start = c(-20, 3, 0, 3, 0)) Deviance Residuals: Min 1Q Median 3Q Max -0.8098 -0.4441 -0.3224 -0.1528 4.0483 Estimate Std. Error z value Pr(>|z|) (Intercept) -118.5395 NA NA NA LogExcess(age, side = "left")beta 3.6928 NA NA NA LogExcess(age, side = "left")alpha -0.1432 NA NA NA LogExcess(age, side = "right")beta 24.8623 NA NA NA LogExcess(age, side = "right")alpha 4.0247 NA NA NA Std. Error is NA where coefficient has been constrained or is unidentified Residual deviance: 12553 on 31004 degrees of freedom AIC: 12748 Number of iterations: 76 H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 9 / 17
10. Example ‘Recoil’ Plot 0.12 Probability of Marriage 0.08 0.04 0.00 10 20 30 40 50 Age H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 10 / 17
11. Example ‘Recoil’ Plot 0.12 Probability of Marriage 0.08 0.04 0.00 10 20 30 40 50 Age H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 10 / 17
12. Example ‘Recoil’ Plot 0.12 qq Probability of Marriage q q q q q q 0.08 q q q q q q q 0.04 q q q q q q q q qq qq 0.00 q q 10 20 30 40 50 Age H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 10 / 17
13. Re-parameterization The problem with aliasing can be overcome by re-parameterizing the baseline model: ν − αl γ − δ (ν − αl ) log ageit − αl αr − ν + δ (αr − ν) log αr − ageit A new nonlin function, Bell, is needed to specify this term H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 11 / 17
14. Interpretation of Parameters The parameters of the new parameterization have a more useful interpretation than before: expit(γ) Probability of Marriage αL ν αR Age (years) H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 12 / 17
15. Recoil Plots for Reparameterised Model 0.15 qqq q q qqqqqq 0.10 q q q pred (x) q q q pred (x) q q q q q q q q q q q 0.05 q q q q q q γ q q q ν q q q −2.09 → −1.95 q q q 25.39 → 28 q q qq q q 0.00 qqqq q qq qqqq qqqqqq q Probability of Marriage x qqqq x qqqqq q q q q pred (x) pred (x) q q q q q q q q q q q q q q q q q q q q δ q q αL q q q q q q 0.34 → 0.15 q q q −0.14 → 0.035 q qq qq q qqqq q qqqqqq qqq qq 0.15 10 20 30 40 50 qqqq x x 0.10 rep(0, 41) q q pred (x) q q q q q q Original Model 0.05 q q q Perturbed Model q αR q q q Re−fitted Model q q 4.02 → 20 qq qq q 0.00 qqq qqq qqqqqq 10 20 30 40 50 Age (years) x 10:50 H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 13 / 17
16. Summary of Re-parameterized Model Call: gnm(formula = marriages/lives ~ Bell(age), family = binomial, data = fulldata, weights = lives, start = c(NA, 26, 0.2, NA, NA)) Deviance Residuals: Min 1Q Median 3Q Max -0.8098 -0.4441 -0.3224 -0.1528 4.0483 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.09446 0.03313 -63.225 < 2e-16 Bell(age)peak 25.39354 0.30405 83.517 < 2e-16 Bell(age)fallOff 0.32917 0.09065 3.631 0.000282 Bell(age)leftAdj -0.14321 0.89350 -0.160 0.872663 Bell(age)rightAdj 4.02470 1.73753 2.316 0.020540 (Dispersion parameter for binomial family taken to be 1) Residual deviance: 12553 on 31004 degrees of freedom AIC: 12748 Number of iterations: H. Turner, D. Firth and A. Batchelor () 14 Custom “nonlin” Functions UseR August 2008 14 / 17
17. Further Analysis We can write a simple function to compute the endpoints and their standard errors > BellEndpoints(bell.mod) [,1] [,2] left 14.17508 0.7742838 right 100.92183 97.2381849 Adding theoretically important covariates produces even higher estimate of right endpoint use simpler model with inﬁnite right endpoint Residual analysis suggests location of hazard depends on education level extend model to allow ν to depend on covariates H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 15 / 17
18. Final Model For women born in 1950 0.20 1.0 Education level (years) Proportion never married Probability of marriage 0.8 0.15 8 11 14 0.6 0.10 15 16 0.4 17 0.05 0.2 0.00 0.0 15 20 25 30 35 40 45 10 20 30 40 50 Age (years) Age (years) H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 16 / 17
19. References/Acknowledgements More information about gnm can be found on www.warwick.ac.uk/go/gnm A working paper on the marriage application is available at www.warwick.ac.uk/go/crism/research/2007 The marriage data are from The Economic and Social Research Institute Living in Ireland Survey Microdata File ( c Economic and Social Research Institute). We gratefully acknowledge Carmel Hannan for introducing us to this application and providing background on the data. H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 17 / 17