### Nonlinear Discrete-time Hazard Models for Entry into Marriage

1. Nonlinear Discrete-time Hazard Models for Entry into Marriage Heather Turner, Andy Batchelor, David Firth Department of Statistics University of Warwick, UK 8th March 2010 Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
2. Motivating Application: The LII Survey The Living in Ireland Surveys were conducted 1994-2001 For ﬁve 5-year cohorts of women born between 1950 and 1975 we have the following data year of (ﬁrst) marriage year and month of birth social class highest level of education attained year highest level of education was attained Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
3. When do women get married? We can use methods from survival analysis to model the timing of marriage Consider time starting from the legal age of marriage, then the survival time, T is the time until a person marries The time of marriage is recorded to the nearest year, so we will use a discrete-time analysis Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
4. Discrete-time Hazard Models For discrete-time the hazard of marriage occuring at time t is deﬁned as h(t) = P (T = t|T ≥ t) We are interested in the shape of the hazard over the life course and how the hazard is aﬀected by covariates Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
5. Cox Proportional Odds Model A popular choice is the proportional odds model proposed by Cox (JRSSB, 1972): h(t|xit ) h0 (t) = exp xit β 1 − h(t|xit ) 1 − h0 t where h0 (t) is the baseline hazard Taking logs we obtain logit(h(t|xit )) = logit(h0 (t)) + xit β = lt + xit β semi-parametric - makes no assumption about the shape of the hazard function Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
6. Episode-splitting A simple way to estimate the proportional odds model is to generate an event history for each observation Pseudo observations are created at each time point from time 0 up to marriage or censoring - this is known as episode-splitting The parameters in the proportional odds model can then be estimated by ﬁtting a logistic regression model to a binary indicator of marriage at each time point (married = 1, unmarried = 0) Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
7. Cox Proportional Odds Model Probability of Marriage 0.08 0.04 0.00 15 19 23 27 31 35 39 43 Age (years) Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
8. Sidenote: interval-censored data A similar model can be obtained by assuming that the data are interval-censored observations of a continuous-time proportional hazards model The coeﬃcients in the model cloglog(h(t|xit )) = lt + xit β are then the coeﬃcients of the proportional hazards model This relationship breaks down however if αt is replaced by a parametric function Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
9. Blossfeld and Huinink Model Blossfeld and Huinink (Am. J. Sociol., 1991) propose the following parametric baseline logit(h0 (t|ageit )) = l(ageit ) = c + βl log(ageit − 15) + βr log(45 − ageit ) describes the nature of the time dependence ﬁxes the support of the hazard to be 15 to 45 years Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
10. BH Model qq q q q q Probability of Marriage q q q 0.08 q q q q q q q 0.04 q q q q q q q q 0.00 qq q qqq 10 20 30 40 50 Age (years) Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
11. Eﬀect of Endpoints 0.12 Hazard support Probability of Marriage 15−45 years 12−75 years 0.08 0.04 0.00 10 20 30 40 50 Age (years) Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
12. Nonlinear Discrete-time Hazard Model An obvious extension of the BH model is to treat the endpoints as parameters l(ageit ) = c + βl log(ageit − αl ) + βr log(αr − ageit ) nonlinear - need to extend available software near-aliasing between parameters - need to reparameterise Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
13. Developing the Nonlinear Model First analyse using the BH model as a reference Then analyse using the extended model and illustrate near-aliasing Finally analyse using a re-parameterised nonlinear discrete model compare to BH model reﬁne model for the LII data Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
14. BH Models The BH models can be ﬁtted using the glm function in R. Following the model building strategy of Blossfeld & Huinink (1991), we select a cohort factor a time-varying indicator of educational status (in/out) For the 1970-1974 cohort the conditional odds of marriage are 24% of those for the 1950-1954 cohort For women in education the conditional odds of marriage are 11% of those for women not in education Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
15. Selected BH Model 0.15 (1949,1954] Probability of Marriage (1954,1959] (1959,1964] 0.10 (1964,1969] (1969,1974] 0.05 0.00 15 20 25 30 35 40 45 Age (years) Deviance = 12073 Residual d.f. = 31001 Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
16. Nonlinear Discrete-time Hazard Models The nonlinear discrete-time hazard model is an example of a generalised nonlinear model, which can be ﬁtted using the gnm package in R (Turner and Firth, R News, 2007) parameters estimated by a modiﬁed IWLS algorithm certain nonlinear terms inbuilt e.g. Mult, Exp our terms cannot be expressed in terms of these functions, so need to write custom "nonlin" function Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
17. Custom "nonlin" Function LogExcess <- function(age, side = "left"){ call <- sys.call() constraint <- ifelse(side == "left", min(age) - 1e-5, max(age) + 1e-5) list(predictors = list(beta = ∼1, alpha = ∼1), variables = list(substitute(age)), term = function(predLabels, varLabels) { paste(predLabels[1], " * log(", " -"[side == "right"], varLabels[1], " + ", " -"[side == "left"], constraint, " + exp(", predLabels[2], "))") }, call = as.expression(call)) } class(LogExcess) <- "nonlin" Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
18. Summary of Baseline Model Call: gnm(formula = marriages/lives ~ LogExcess(age, side = "left") + LogExcess(age, side = "right"), family = binomial, data = fulldata, weights = lives, start = c(-20, 3, 0, 3, 0)) Deviance Residuals: Min 1Q Median 3Q Max -0.8098 -0.4441 -0.3224 -0.1528 4.0483 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -118.5395 201.6387 -0.588 0.55661 LogExcess(age, side = "left")beta 3.6928 1.1913 3.100 0.00194 LogExcess(age, side = "left")alpha -0.1432 0.8935 -0.160 0.87267 LogExcess(age, side = "right")beta 24.8623 38.5743 0.645 0.51923 LogExcess(age, side = "right")alpha 4.0247 1.7376 2.316 0.02054 Std. Error is NA where coefficient has been constrained or is unidentified Residual deviance: 12553 on 31004 degrees of freedom AIC: 12748 Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-timeiterations: 76 Number of Hazard Models for Entry into Marriage
19. Parameter Correlations c βl αl βr αr c 1.00000 βl -0.92563 1.00000 αl -0.80861 0.95844 1.00000 βr -0.99999 0.92688 0.80989 1.00000 αr -0.99833 0.90319 0.77910 0.99808 1.00000 Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
20. Example ’Recoil’ Plot 0.12 Probability of Marriage 0.08 0.04 0.00 10 20 30 40 50 Age Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
21. Example ’Recoil’ Plot 0.12 Probability of Marriage 0.08 0.04 0.00 10 20 30 40 50 Age Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
22. Example ’Recoil’ Plot 0.12 qq Probability of Marriage q q q q q q 0.08 q q q q q q q 0.04 q q q q q q q q qq qq 0.00 q q 10 20 30 40 50 Age Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
23. Is Near-aliasing a Problem? Extended model can still be used as baseline hazard logit(h(t|xit )) = l(ageit ) + xit β Near-aliasing will make models harder to ﬁt - particularly with several covariates Not all parameters are interpretable Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
24. Re-parameterizing the Nonlinear Model The nonlinear hazard model can be re-parameterized as follows: ν − αl l(ageit ) = γ − δ (ν − αl ) log ageit − αl αr − ν + δ (αr − ν) log αr − ageit Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
25. Interpretation of Parameters The parameters of the new parameterisation have a more useful interpretation than before: expit(γ) Probability of Marriage αL ν αR Age (years) Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
26. New Parameter Correlations γ ν δ αl αr γ 1.00000 ν 0.12956 1.00000 δ 0.21943 -0.69849 1.00000 αl 0.27236 -0.42848 0.91425 1.00000 αr 0.03231 -0.75428 0.93696 0.77910 1.00000 Table: Correlations between the estimated parameters of the reparameterized baseline model deﬁned in Equation ?? Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
27. Recoil Plots for Reparameterised Model 0.12 peak height (γ) peak location (ν) −2.09 → −1.95 25.39 → 28 predictCurve (x) predictCurve (x) 0.08 0.04 0.00 fall off (δ) left endpoint (αL) Probability of Marriage x 0.34 → 0.15 x 14.17 → 15.04 predictCurve (x) predictCurve (x) 0.12 right endpoint (αR) 10 20 30 40 50 x 100.66 → 47.68 x predictCurve (x) 0.08 rep(0, 41) Original Model 0.04 Perturbed Model q Re−fitted Model 0.00 10 20 30 40 50 xAge 10:50 Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
28. Analysis with the Reparameterised Model We can now repeat the previous analysis using the nonlinear baseline hazard instead of the BH hazard function The model selection is qualitatively unchanged The residual deviance is reduced by about 20 at the expense of 2 d.f. There is a lot of uncertainty about the right end-point - in the ﬁnal model it is estimated as 400 years with a large standard error. Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
29. Inﬁnite Right End-point It seems more appropriate to deﬁne the baseline hazard in which the right end-point tends to inﬁnity: ν − αl l(ageit ) = γ−δ (ν − αl ) log − ageit − ν ageit − αl Re-ﬁtting the ﬁnal model with this baseline increases the deviance by a negligible amount Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
30. 0.15 Comparing Models 0.15 (1949,1954] (1949,1954] Probability of Marriage Probability of Marriage (1954,1959] (1954,1959] (1959,1964] (1959,1964] 0.10 0.10 (1964,1969] (1964,1969] (1969,1974] (1969,1974] 0.05 0.05 0.00 0.00 15 20 25 30 35 40 45 15 20 25 30 35 40 45 Age (years) Age (years) Deviance = 12073 Residual d.f. = 31001 Deviance = 12051 Residual d.f. = 31000 Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
31. Reﬁning the Model The model building strategy so far has been similar to Blossfeld and Huinink (1991) for comparison Careful consideration of the ﬁt of the model suggests that improvements can be made Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
32. Final Model with New Baseline 0.15 (1949,1954] Probability of Marriage (1954,1959] (1959,1964] 0.10 (1964,1969] (1969,1974] 0.05 0.00 15 20 25 30 35 40 45 Age (years) Deviance = 12051 Residual d.f. = 31000 Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
33. Cohort Eﬀect We can investigate the cohort eﬀect further by replacing the cohort factor by a year-of-birth factor and plotting the resultant eﬀects q q q q q −0.5 0.0 q q q q q q q q q Year−of−birth Effect q q q q q q q −1.5 q q −2.5 q 1955 1960 1965 1970 Year of Birth Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
34. Year-of-birth Eﬀect The plot suggests a more appropriate model θ exp(λ(yrbi − 1950)) Replacing the year-of-birth factor with this nonlinear term reduces the deviance by 19 whilst gaining 2 d.f. Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
35. Checking the Fit The new year-of-birth terms takes account of the eﬀect of this factor on the magnitude of the hazard To check for other eﬀects on the hazard, we can group the data by year of age and cohort then plot the corresponding observed and ﬁtted proportions Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
36. Fit over Cohorts 0.20 (1949, 1954] q (1955, 1959) (1959, 1964] (5211) (6283) q (6560) q 0.15 q q q q q q q q q q q grpObs[i, ] grpObs[i, ] grpObs[i, ] q q q q qq q q q q 0.10 q q q q q q q q q q q q q q q q qq q 0.05 q q q q q q q q Proportion married q qq q q q q q q q q qq qqq q qq q 0.00 q qq q qq q qq qq q q qq 15 20 25 30 35 40 45 (1965, 1969] (1969, 1974] (6289) as.numeric(colnames(grp)) (6666) as.numeric(colnames(grp)) as.numeric(colnames(grp)) grpObs[i, ] grpObs[i, ] grpObs[i, ] q q q q q q q q q q q q qqq q q q q q q q qq q q qqqq qq qq qq q 15 20 25 30 35 40 45 as.numeric(colnames(grp)) (years) Age as.numeric(colnames(grp)) as.numeric(colnames(grp)) Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
37. Fit over Education Levels 0.20 No attainment/primary Lower secondary Upper secondary q (2366) (7900) (11507) 0.15 q q q q q q q q grpObs[i, ] grpObs[i, ] grpObs[i, ] q q q q 0.10 q q q q q q q q qq q q q q q q q q q q q q q q q 0.05 q q q q q q q q Proportion married q q q qq qq q q qqq q q 0.00 q q q qq qq qqqqq qqq q qqqq qqq q q qq qqq 15 20 25 30 35 40 45 College q University q (4829) as.numeric(colnames(grp)) (4407) as.numeric(colnames(grp)) as.numeric(colnames(grp)) q q q q grpObs[i, ] grpObs[i, ] grpObs[i, ] q Observed q q q q q qqq q Model 13 q q (common peak) q q q q q q q q q q Model 14 qq q q (separate peaks) q q q q q q qqqq q qqqqq qq qqqqq q q q q qq 15 20 25 30 35 40 45 as.numeric(colnames(grp)) (years) Age as.numeric(colnames(grp)) as.numeric(colnames(grp)) Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
38. Linear Dependence of Peak Location Quantifying the education level by a dynamic measure of years in education ed, we incorporate a linear dependence of peak location on ed: ν0 + ν1 edi − αl l(xit ) = γ − δ (ν0 + ν1 edi − αl ) log ageit − αl +δ {ageit + ν0 + ν1 edi } This results in a non-proportional hazards model Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
39. Years Post-Education Checking the ﬁt against years post-education: 0.15 q q q lower rate of increase in Proportion married q q 0.10 q q q q ﬁrst 3 years q post-education 0.05 q qq q q q q q q q q q q q qq q sharp change at 7 years 0.00 q qqqqqqq qqq q q qqqqq post-education −10 0 10 20 30 Years post education outlying points Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
40. Early Career Eﬀect The lower rate of increase during the ﬁrst 3 years post-education may be explained by an early career eﬀect This can be incorporated in the model by including an appropriate indicator variable, signiﬁcantly reducing the deviance The deviance does not signiﬁcantly increase when the left endpoint is constrained to 15 years Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
41. Eﬀect of Education Peak location varies from 20.78 years (primary education) to 26.89 years (university graduates) 0.20 Education level Primary Probability of marriage 0.15 Lower sec. Upper sec. PLC 0.10 IT University 0.05 0.00 10 20 30 40 50 Age (years) Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
42. Eﬀect of Year-of-birth Peak hazard varies from 0.17 (b. 1950) through 0.15 (b. 1960) to 0.07 (b. 1970) 0.20 Year of Birth 1950 Probability of marriage 0.15 1960 1970 0.10 0.05 0.00 10 20 30 40 50 Age (years) Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
43. Summary Estimating the support of the hazard function improves ﬁt Near-aliasing can occur in nonlinear models, but can be overcome by re-parameterisation Our proposed model has more interpretable parameters, particularly location and magnitude of the maximum hazard can investigate eﬀect of covariates on these features The parametric form does impose some restrictions on the shape of the hazard curve Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
44. References A comprehensive manual is distributed with the package at http://www.cran.r-project.org/package=gnm A working paper on the marriage application is available at www.warwick.ac.uk/go/crism/research/2007 Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage
45. Acknowledgements The data are from The Economic and Social Research Institute Living in Ireland Survey Microdata File (©Economic and Social Research Institute). We gratefully acknowledge Carmel Hannan for introducing us to this application and providing background on the data. Heather Turner, Andy Batchelor, David Firth University of Warwick Nonlinear Discrete-time Hazard Models for Entry into Marriage