Nonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Models for
Entry into Marriage
Heather Turner, Andy Batchelor, David Firth
Department of Statistics
University of Warwick, UK
8th March 2010
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Motivating Application: The LII Survey
The Living in Ireland Surveys were conducted 1994-2001
For five 5-year cohorts of women born between 1950 and
1975 we have the following data
year of (first) marriage
year and month of birth
social class
highest level of education attained
year highest level of education was attained
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
When do women get married?
We can use methods from survival analysis to model the
timing of marriage
Consider time starting from the legal age of marriage,
then the survival time, T is the time until a person
marries
The time of marriage is recorded to the nearest year, so
we will use a discrete-time analysis
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Discrete-time Hazard Models
For discrete-time the hazard of marriage occuring at time
t is defined as
h(t) = P (T = t|T ≥ t)
We are interested in the shape of the hazard over the life
course and how the hazard is affected by covariates
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Cox Proportional Odds Model
A popular choice is the proportional odds model proposed
by Cox (JRSSB, 1972):
h(t|xit ) h0 (t)
= exp xit β
1 − h(t|xit ) 1 − h0 t
where h0 (t) is the baseline hazard
Taking logs we obtain
logit(h(t|xit )) = logit(h0 (t)) + xit β
= lt + xit β
semi-parametric - makes no assumption about the shape
of the hazard function
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Episode-splitting
A simple way to estimate the proportional odds model is
to generate an event history for each observation
Pseudo observations are created at each time point from
time 0 up to marriage or censoring - this is known as
episode-splitting
The parameters in the proportional odds model can then
be estimated by fitting a logistic regression model to a
binary indicator of marriage at each time point (married
= 1, unmarried = 0)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Cox Proportional Odds Model
Probability of Marriage
0.08
0.04
0.00
15 19 23 27 31 35 39 43
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Sidenote: interval-censored data
A similar model can be obtained by assuming that the
data are interval-censored observations of a
continuous-time proportional hazards model
The coefficients in the model
cloglog(h(t|xit )) = lt + xit β
are then the coefficients of the proportional hazards model
This relationship breaks down however if αt is replaced by
a parametric function
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Blossfeld and Huinink Model
Blossfeld and Huinink (Am. J. Sociol., 1991) propose the
following parametric baseline
logit(h0 (t|ageit )) = l(ageit )
= c + βl log(ageit − 15) + βr log(45 − ageit )
describes the nature of the time dependence
fixes the support of the hazard to be 15 to 45 years
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
BH Model
qq
q q
q
q
Probability of Marriage
q
q
q
0.08
q q
q
q
q
q
q
0.04
q q
q
q q
q
q q
0.00
qq
q qqq
10 20 30 40 50
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Effect of Endpoints
0.12
Hazard support
Probability of Marriage
15−45 years
12−75 years
0.08
0.04
0.00
10 20 30 40 50
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Model
An obvious extension of the BH model is to treat the
endpoints as parameters
l(ageit ) = c + βl log(ageit − αl ) + βr log(αr − ageit )
nonlinear - need to extend available software
near-aliasing between parameters - need to
reparameterise
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Developing the Nonlinear Model
First analyse using the BH model as a reference
Then analyse using the extended model and illustrate
near-aliasing
Finally analyse using a re-parameterised nonlinear discrete
model
compare to BH model
refine model for the LII data
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
BH Models
The BH models can be fitted using the glm function in R.
Following the model building strategy of Blossfeld &
Huinink (1991), we select
a cohort factor
a time-varying indicator of educational status (in/out)
For the 1970-1974 cohort the conditional odds of
marriage are 24% of those for the 1950-1954 cohort
For women in education the conditional odds of marriage
are 11% of those for women not in education
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Selected BH Model
0.15
(1949,1954]
Probability of Marriage
(1954,1959]
(1959,1964]
0.10
(1964,1969]
(1969,1974]
0.05
0.00
15 20 25 30 35 40 45
Age (years)
Deviance = 12073 Residual d.f. = 31001
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Models
The nonlinear discrete-time hazard model is an example of
a generalised nonlinear model, which can be fitted using
the gnm package in R (Turner and Firth, R News, 2007)
parameters estimated by a modified IWLS algorithm
certain nonlinear terms inbuilt e.g. Mult, Exp
our terms cannot be expressed in terms of these
functions, so need to write custom "nonlin" function
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Custom "nonlin" Function
LogExcess <- function(age, side = "left"){
call <- sys.call()
constraint <- ifelse(side == "left",
min(age) - 1e-5, max(age) + 1e-5)
list(predictors = list(beta = ∼1, alpha = ∼1),
variables = list(substitute(age)),
term = function(predLabels, varLabels) {
paste(predLabels[1], " * log(",
" -"[side == "right"], varLabels[1], " + ",
" -"[side == "left"], constraint,
" + exp(", predLabels[2], "))")
},
call = as.expression(call))
}
class(LogExcess) <- "nonlin"
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Summary of Baseline Model
Call:
gnm(formula = marriages/lives ~ LogExcess(age, side = "left") +
LogExcess(age, side = "right"), family = binomial, data = fulldata,
weights = lives, start = c(-20, 3, 0, 3, 0))
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8098 -0.4441 -0.3224 -0.1528 4.0483
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -118.5395 201.6387 -0.588 0.55661
LogExcess(age, side = "left")beta 3.6928 1.1913 3.100 0.00194
LogExcess(age, side = "left")alpha -0.1432 0.8935 -0.160 0.87267
LogExcess(age, side = "right")beta 24.8623 38.5743 0.645 0.51923
LogExcess(age, side = "right")alpha 4.0247 1.7376 2.316 0.02054
Std. Error is NA where coefficient has been constrained or is unidentified
Residual deviance: 12553 on 31004 degrees of freedom
AIC: 12748
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-timeiterations: 76
Number of Hazard Models for Entry into Marriage
Parameter Correlations
c βl αl βr αr
c 1.00000
βl -0.92563 1.00000
αl -0.80861 0.95844 1.00000
βr -0.99999 0.92688 0.80989 1.00000
αr -0.99833 0.90319 0.77910 0.99808 1.00000
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Example ’Recoil’ Plot
0.12
Probability of Marriage
0.08
0.04
0.00
10 20 30 40 50
Age
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Example ’Recoil’ Plot
0.12
Probability of Marriage
0.08
0.04
0.00
10 20 30 40 50
Age
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Example ’Recoil’ Plot
0.12
qq
Probability of Marriage
q q
q q
q
q
0.08
q
q q
q
q q
q
0.04
q q
q
q
q q
q
q qq
qq
0.00
q q
10 20 30 40 50
Age
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Is Near-aliasing a Problem?
Extended model can still be used as baseline hazard
logit(h(t|xit )) = l(ageit ) + xit β
Near-aliasing will make models harder to fit - particularly
with several covariates
Not all parameters are interpretable
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Re-parameterizing the Nonlinear Model
The nonlinear hazard model can be re-parameterized as
follows:
ν − αl
l(ageit ) = γ − δ (ν − αl ) log
ageit − αl
αr − ν
+ δ (αr − ν) log
αr − ageit
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Interpretation of Parameters
The parameters of the new parameterisation have a more
useful interpretation than before:
expit(γ)
Probability of Marriage
αL ν αR
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
New Parameter Correlations
γ ν δ αl αr
γ 1.00000
ν 0.12956 1.00000
δ 0.21943 -0.69849 1.00000
αl 0.27236 -0.42848 0.91425 1.00000
αr 0.03231 -0.75428 0.93696 0.77910 1.00000
Table: Correlations between the estimated parameters of the
reparameterized baseline model defined in Equation ??
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Recoil Plots for Reparameterised Model
0.12
peak height (γ) peak location (ν)
−2.09 → −1.95 25.39 → 28
predictCurve (x)
predictCurve (x)
0.08
0.04
0.00
fall off (δ) left endpoint (αL)
Probability of Marriage
x 0.34 → 0.15 x 14.17 → 15.04
predictCurve (x)
predictCurve (x)
0.12
right endpoint (αR) 10 20 30 40 50
x 100.66 → 47.68 x
predictCurve (x)
0.08
rep(0, 41)
Original Model
0.04
Perturbed Model
q Re−fitted Model
0.00
10 20 30 40 50
xAge 10:50
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Analysis with the Reparameterised Model
We can now repeat the previous analysis using the
nonlinear baseline hazard instead of the BH hazard
function
The model selection is qualitatively unchanged
The residual deviance is reduced by about 20 at the
expense of 2 d.f.
There is a lot of uncertainty about the right end-point -
in the final model it is estimated as 400 years with a
large standard error.
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Infinite Right End-point
It seems more appropriate to define the baseline hazard in
which the right end-point tends to infinity:
ν − αl
l(ageit ) = γ−δ (ν − αl ) log − ageit − ν
ageit − αl
Re-fitting the final model with this baseline increases the
deviance by a negligible amount
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
0.15
Comparing Models
0.15
(1949,1954] (1949,1954]
Probability of Marriage
Probability of Marriage
(1954,1959] (1954,1959]
(1959,1964] (1959,1964]
0.10
0.10
(1964,1969] (1964,1969]
(1969,1974] (1969,1974]
0.05
0.05
0.00
0.00
15 20 25 30 35 40 45 15 20 25 30 35 40 45
Age (years) Age (years)
Deviance = 12073 Residual d.f. = 31001 Deviance = 12051 Residual d.f. = 31000
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Refining the Model
The model building strategy so far has been similar to
Blossfeld and Huinink (1991) for comparison
Careful consideration of the fit of the model suggests that
improvements can be made
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Final Model with New Baseline
0.15
(1949,1954]
Probability of Marriage
(1954,1959]
(1959,1964]
0.10
(1964,1969]
(1969,1974]
0.05
0.00
15 20 25 30 35 40 45
Age (years)
Deviance = 12051 Residual d.f. = 31000
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Cohort Effect
We can investigate the cohort effect further by replacing
the cohort factor by a year-of-birth factor and plotting the
resultant effects
q
q q q q
−0.5 0.0
q q q q q
q q q q
Year−of−birth Effect
q
q q
q
q
q
q
−1.5
q q
−2.5
q
1955 1960 1965 1970
Year of Birth
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Year-of-birth Effect
The plot suggests a more appropriate model
θ exp(λ(yrbi − 1950))
Replacing the year-of-birth factor with this nonlinear term
reduces the deviance by 19 whilst gaining 2 d.f.
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Checking the Fit
The new year-of-birth terms takes account of the effect of
this factor on the magnitude of the hazard
To check for other effects on the hazard, we can group
the data by year of age and cohort then plot the
corresponding observed and fitted proportions
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Linear Dependence of Peak Location
Quantifying the education level by a dynamic measure of
years in education ed, we incorporate a linear dependence
of peak location on ed:
ν0 + ν1 edi − αl
l(xit ) = γ − δ (ν0 + ν1 edi − αl ) log
ageit − αl
+δ {ageit + ν0 + ν1 edi }
This results in a non-proportional hazards model
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Years Post-Education
Checking the fit against years post-education:
0.15
q
q
q
lower rate of increase in
Proportion married
q
q
0.10
q
q
q
q first 3 years
q
post-education
0.05
q qq
q
q
q q
q q
q
q
q q
q
qq
q
sharp change at 7 years
0.00
q
qqqqqqq qqq q q qqqqq
post-education
−10 0 10 20 30
Years post education
outlying points
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Early Career Effect
The lower rate of increase during the first 3 years
post-education may be explained by an early career effect
This can be incorporated in the model by including an
appropriate indicator variable, significantly reducing the
deviance
The deviance does not significantly increase when the left
endpoint is constrained to 15 years
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Effect of Education
Peak location varies from 20.78 years (primary education)
to 26.89 years (university graduates)
0.20
Education level
Primary
Probability of marriage
0.15
Lower sec.
Upper sec.
PLC
0.10
IT
University
0.05
0.00
10 20 30 40 50
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Effect of Year-of-birth
Peak hazard varies from 0.17 (b. 1950) through 0.15 (b.
1960) to 0.07 (b. 1970)
0.20
Year of Birth
1950
Probability of marriage
0.15
1960
1970
0.10
0.05
0.00
10 20 30 40 50
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Summary
Estimating the support of the hazard function improves fit
Near-aliasing can occur in nonlinear models, but can be
overcome by re-parameterisation
Our proposed model has more interpretable parameters,
particularly location and magnitude of the maximum
hazard
can investigate effect of covariates on these features
The parametric form does impose some restrictions on
the shape of the hazard curve
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
References
A comprehensive manual is distributed with the package
at http://www.cran.r-project.org/package=gnm
A working paper on the marriage application is available
at www.warwick.ac.uk/go/crism/research/2007
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage