Nonlinear Discrete-time Hazard Models for Entry into Marriage

2,699 views
2,734 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,699
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Nonlinear Discrete-time Hazard Models for Entry into Marriage

  1. 1. Nonlinear Discrete-time Hazard Models for Entry into Marriage Heather Turner, Andy Batchelor, David Firth Department of Statistics University of Warwick, UK 8th March 2010Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  2. 2. Motivating Application: The LII Survey The Living in Ireland Surveys were conducted 1994-2001 For five 5-year cohorts of women born between 1950 and 1975 we have the following data year of (first) marriage year and month of birth social class highest level of education attained year highest level of education was attainedHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  3. 3. When do women get married? We can use methods from survival analysis to model the timing of marriage Consider time starting from the legal age of marriage, then the survival time, T is the time until a person marries The time of marriage is recorded to the nearest year, so we will use a discrete-time analysisHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  4. 4. Discrete-time Hazard Models For discrete-time the hazard of marriage occuring at time t is defined as h(t) = P (T = t|T ≥ t) We are interested in the shape of the hazard over the life course and how the hazard is affected by covariatesHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  5. 5. Cox Proportional Odds Model A popular choice is the proportional odds model proposed by Cox (JRSSB, 1972): h(t|xit ) h0 (t) = exp xit β 1 − h(t|xit ) 1 − h0 t where h0 (t) is the baseline hazard Taking logs we obtain logit(h(t|xit )) = logit(h0 (t)) + xit β = lt + xit β semi-parametric - makes no assumption about the shape of the hazard functionHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  6. 6. Episode-splitting A simple way to estimate the proportional odds model is to generate an event history for each observation Pseudo observations are created at each time point from time 0 up to marriage or censoring - this is known as episode-splitting The parameters in the proportional odds model can then be estimated by fitting a logistic regression model to a binary indicator of marriage at each time point (married = 1, unmarried = 0)Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  7. 7. Cox Proportional Odds Model Probability of Marriage 0.08 0.04 0.00 15 19 23 27 31 35 39 43 Age (years)Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  8. 8. Sidenote: interval-censored data A similar model can be obtained by assuming that the data are interval-censored observations of a continuous-time proportional hazards model The coefficients in the model cloglog(h(t|xit )) = lt + xit β are then the coefficients of the proportional hazards model This relationship breaks down however if αt is replaced by a parametric functionHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  9. 9. Blossfeld and Huinink Model Blossfeld and Huinink (Am. J. Sociol., 1991) propose the following parametric baseline logit(h0 (t|ageit )) = l(ageit ) = c + βl log(ageit − 15) + βr log(45 − ageit ) describes the nature of the time dependence fixes the support of the hazard to be 15 to 45 yearsHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  10. 10. BH Model qq q q q q Probability of Marriage q q q 0.08 q q q q q q q 0.04 q q q q q q q q 0.00 qq q qqq 10 20 30 40 50 Age (years)Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  11. 11. Effect of Endpoints 0.12 Hazard support Probability of Marriage 15−45 years 12−75 years 0.08 0.04 0.00 10 20 30 40 50 Age (years)Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  12. 12. Nonlinear Discrete-time Hazard Model An obvious extension of the BH model is to treat the endpoints as parameters l(ageit ) = c + βl log(ageit − αl ) + βr log(αr − ageit ) nonlinear - need to extend available software near-aliasing between parameters - need to reparameteriseHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  13. 13. Developing the Nonlinear Model First analyse using the BH model as a reference Then analyse using the extended model and illustrate near-aliasing Finally analyse using a re-parameterised nonlinear discrete model compare to BH model refine model for the LII dataHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  14. 14. BH Models The BH models can be fitted using the glm function in R. Following the model building strategy of Blossfeld & Huinink (1991), we select a cohort factor a time-varying indicator of educational status (in/out) For the 1970-1974 cohort the conditional odds of marriage are 24% of those for the 1950-1954 cohort For women in education the conditional odds of marriage are 11% of those for women not in educationHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  15. 15. Selected BH Model 0.15 (1949,1954] Probability of Marriage (1954,1959] (1959,1964] 0.10 (1964,1969] (1969,1974] 0.05 0.00 15 20 25 30 35 40 45 Age (years) Deviance = 12073 Residual d.f. = 31001Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  16. 16. Nonlinear Discrete-time Hazard Models The nonlinear discrete-time hazard model is an example of a generalised nonlinear model, which can be fitted using the gnm package in R (Turner and Firth, R News, 2007) parameters estimated by a modified IWLS algorithm certain nonlinear terms inbuilt e.g. Mult, Exp our terms cannot be expressed in terms of these functions, so need to write custom "nonlin" functionHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  17. 17. Custom "nonlin" Function LogExcess <- function(age, side = "left"){ call <- sys.call() constraint <- ifelse(side == "left", min(age) - 1e-5, max(age) + 1e-5) list(predictors = list(beta = ∼1, alpha = ∼1), variables = list(substitute(age)), term = function(predLabels, varLabels) { paste(predLabels[1], " * log(", " -"[side == "right"], varLabels[1], " + ", " -"[side == "left"], constraint, " + exp(", predLabels[2], "))") }, call = as.expression(call)) } class(LogExcess) <- "nonlin"Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  18. 18. Summary of Baseline Model Call: gnm(formula = marriages/lives ~ LogExcess(age, side = "left") + LogExcess(age, side = "right"), family = binomial, data = fulldata, weights = lives, start = c(-20, 3, 0, 3, 0)) Deviance Residuals: Min 1Q Median 3Q Max -0.8098 -0.4441 -0.3224 -0.1528 4.0483 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -118.5395 201.6387 -0.588 0.55661 LogExcess(age, side = "left")beta 3.6928 1.1913 3.100 0.00194 LogExcess(age, side = "left")alpha -0.1432 0.8935 -0.160 0.87267 LogExcess(age, side = "right")beta 24.8623 38.5743 0.645 0.51923 LogExcess(age, side = "right")alpha 4.0247 1.7376 2.316 0.02054 Std. Error is NA where coefficient has been constrained or is unidentified Residual deviance: 12553 on 31004 degrees of freedom AIC: 12748Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-timeiterations: 76 Number of Hazard Models for Entry into Marriage
  19. 19. Parameter Correlations c βl αl βr αr c 1.00000 βl -0.92563 1.00000 αl -0.80861 0.95844 1.00000 βr -0.99999 0.92688 0.80989 1.00000 αr -0.99833 0.90319 0.77910 0.99808 1.00000Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  20. 20. Example ’Recoil’ Plot 0.12 Probability of Marriage 0.08 0.04 0.00 10 20 30 40 50 AgeHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  21. 21. Example ’Recoil’ Plot 0.12 Probability of Marriage 0.08 0.04 0.00 10 20 30 40 50 AgeHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  22. 22. Example ’Recoil’ Plot 0.12 qq Probability of Marriage q q q q q q 0.08 q q q q q q q 0.04 q q q q q q q q qq qq 0.00 q q 10 20 30 40 50 AgeHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  23. 23. Is Near-aliasing a Problem? Extended model can still be used as baseline hazard logit(h(t|xit )) = l(ageit ) + xit β Near-aliasing will make models harder to fit - particularly with several covariates Not all parameters are interpretableHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  24. 24. Re-parameterizing the Nonlinear Model The nonlinear hazard model can be re-parameterized as follows: ν − αl l(ageit ) = γ − δ (ν − αl ) log ageit − αl αr − ν + δ (αr − ν) log αr − ageitHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  25. 25. Interpretation of Parameters The parameters of the new parameterisation have a more useful interpretation than before: expit(γ) Probability of Marriage αL ν αR Age (years)Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  26. 26. New Parameter Correlations γ ν δ αl αr γ 1.00000 ν 0.12956 1.00000 δ 0.21943 -0.69849 1.00000 αl 0.27236 -0.42848 0.91425 1.00000 αr 0.03231 -0.75428 0.93696 0.77910 1.00000 Table: Correlations between the estimated parameters of the reparameterized baseline model defined in Equation ??Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  27. 27. Recoil Plots for Reparameterised Model 0.12 peak height (γ) peak location (ν) −2.09 → −1.95 25.39 → 28 predictCurve (x) predictCurve (x) 0.08 0.04 0.00 fall off (δ) left endpoint (αL) Probability of Marriage x 0.34 → 0.15 x 14.17 → 15.04 predictCurve (x) predictCurve (x) 0.12 right endpoint (αR) 10 20 30 40 50 x 100.66 → 47.68 x predictCurve (x) 0.08 rep(0, 41) Original Model 0.04 Perturbed Model q Re−fitted Model 0.00 10 20 30 40 50 xAge 10:50Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  28. 28. Analysis with the Reparameterised Model We can now repeat the previous analysis using the nonlinear baseline hazard instead of the BH hazard function The model selection is qualitatively unchanged The residual deviance is reduced by about 20 at the expense of 2 d.f. There is a lot of uncertainty about the right end-point - in the final model it is estimated as 400 years with a large standard error.Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  29. 29. Infinite Right End-point It seems more appropriate to define the baseline hazard in which the right end-point tends to infinity: ν − αl l(ageit ) = γ−δ (ν − αl ) log − ageit − ν ageit − αl Re-fitting the final model with this baseline increases the deviance by a negligible amountHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  30. 30. 0.15 Comparing Models 0.15 (1949,1954] (1949,1954] Probability of Marriage Probability of Marriage (1954,1959] (1954,1959] (1959,1964] (1959,1964] 0.10 0.10 (1964,1969] (1964,1969] (1969,1974] (1969,1974] 0.05 0.05 0.00 0.00 15 20 25 30 35 40 45 15 20 25 30 35 40 45 Age (years) Age (years) Deviance = 12073 Residual d.f. = 31001 Deviance = 12051 Residual d.f. = 31000Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  31. 31. Refining the Model The model building strategy so far has been similar to Blossfeld and Huinink (1991) for comparison Careful consideration of the fit of the model suggests that improvements can be madeHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  32. 32. Final Model with New Baseline 0.15 (1949,1954] Probability of Marriage (1954,1959] (1959,1964] 0.10 (1964,1969] (1969,1974] 0.05 0.00 15 20 25 30 35 40 45 Age (years) Deviance = 12051 Residual d.f. = 31000Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  33. 33. Cohort Effect We can investigate the cohort effect further by replacing the cohort factor by a year-of-birth factor and plotting the resultant effects q q q q q −0.5 0.0 q q q q q q q q q Year−of−birth Effect q q q q q q q −1.5 q q −2.5 q 1955 1960 1965 1970 Year of BirthHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  34. 34. Year-of-birth Effect The plot suggests a more appropriate model θ exp(λ(yrbi − 1950)) Replacing the year-of-birth factor with this nonlinear term reduces the deviance by 19 whilst gaining 2 d.f.Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  35. 35. Checking the Fit The new year-of-birth terms takes account of the effect of this factor on the magnitude of the hazard To check for other effects on the hazard, we can group the data by year of age and cohort then plot the corresponding observed and fitted proportionsHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  36. 36. Fit over Cohorts 0.20 (1949, 1954] q (1955, 1959) (1959, 1964] (5211) (6283) q (6560) q 0.15 q q q q q q q q q q q grpObs[i, ] grpObs[i, ] grpObs[i, ] q q q q qq q q q q 0.10 q q q q q q q q q q q q q q q q qq q 0.05 q q q q q q q q Proportion married q qq q q q q q q q q qq qqq q qq q 0.00 q qq q qq q qq qq q q qq 15 20 25 30 35 40 45 (1965, 1969] (1969, 1974] (6289) as.numeric(colnames(grp)) (6666) as.numeric(colnames(grp)) as.numeric(colnames(grp)) grpObs[i, ] grpObs[i, ] grpObs[i, ] q q q q q q q q q q q q qqq q q q q q q q qq q q qqqq qq qq qq q 15 20 25 30 35 40 45 as.numeric(colnames(grp)) (years) Age as.numeric(colnames(grp)) as.numeric(colnames(grp))Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  37. 37. Fit over Education Levels 0.20 No attainment/primary Lower secondary Upper secondary q (2366) (7900) (11507) 0.15 q q q q q q q q grpObs[i, ] grpObs[i, ] grpObs[i, ] q q q q 0.10 q q q q q q q q qq q q q q q q q q q q q q q q q 0.05 q q q q q q q q Proportion married q q q qq qq q q qqq q q 0.00 q q q qq qq qqqqq qqq q qqqq qqq q q qq qqq 15 20 25 30 35 40 45 College q University q (4829) as.numeric(colnames(grp)) (4407) as.numeric(colnames(grp)) as.numeric(colnames(grp)) q q q q grpObs[i, ] grpObs[i, ] grpObs[i, ] q Observed q q q q q qqq q Model 13 q q (common peak) q q q q q q q q q q Model 14 qq q q (separate peaks) q q q q q q qqqq q qqqqq qq qqqqq q q q q qq 15 20 25 30 35 40 45 as.numeric(colnames(grp)) (years) Age as.numeric(colnames(grp)) as.numeric(colnames(grp))Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  38. 38. Linear Dependence of Peak Location Quantifying the education level by a dynamic measure of years in education ed, we incorporate a linear dependence of peak location on ed: ν0 + ν1 edi − αl l(xit ) = γ − δ (ν0 + ν1 edi − αl ) log ageit − αl +δ {ageit + ν0 + ν1 edi } This results in a non-proportional hazards modelHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  39. 39. Years Post-Education Checking the fit against years post-education: 0.15 q q q lower rate of increase in Proportion married q q 0.10 q q q q first 3 years q post-education 0.05 q qq q q q q q q q q q q q qq q sharp change at 7 years 0.00 q qqqqqqq qqq q q qqqqq post-education −10 0 10 20 30 Years post education outlying pointsHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  40. 40. Early Career Effect The lower rate of increase during the first 3 years post-education may be explained by an early career effect This can be incorporated in the model by including an appropriate indicator variable, significantly reducing the deviance The deviance does not significantly increase when the left endpoint is constrained to 15 yearsHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  41. 41. Effect of Education Peak location varies from 20.78 years (primary education) to 26.89 years (university graduates) 0.20 Education level Primary Probability of marriage 0.15 Lower sec. Upper sec. PLC 0.10 IT University 0.05 0.00 10 20 30 40 50 Age (years)Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  42. 42. Effect of Year-of-birth Peak hazard varies from 0.17 (b. 1950) through 0.15 (b. 1960) to 0.07 (b. 1970) 0.20 Year of Birth 1950 Probability of marriage 0.15 1960 1970 0.10 0.05 0.00 10 20 30 40 50 Age (years)Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  43. 43. Summary Estimating the support of the hazard function improves fit Near-aliasing can occur in nonlinear models, but can be overcome by re-parameterisation Our proposed model has more interpretable parameters, particularly location and magnitude of the maximum hazard can investigate effect of covariates on these features The parametric form does impose some restrictions on the shape of the hazard curveHeather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  44. 44. References A comprehensive manual is distributed with the package at http://www.cran.r-project.org/package=gnm A working paper on the marriage application is available at www.warwick.ac.uk/go/crism/research/2007Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage
  45. 45. Acknowledgements The data are from The Economic and Social Research Institute Living in Ireland Survey Microdata File (©Economic and Social Research Institute). We gratefully acknowledge Carmel Hannan for introducing us to this application and providing background on the data.Heather Turner, Andy Batchelor, David Firth University of WarwickNonlinear Discrete-time Hazard Models for Entry into Marriage

×