Thesis Defense

  • 1,414 views
Uploaded on

EMPIRICAL LIKELIHOOD INFERENCE FOR THE ACCELERATED FAILURE TIME MODEL USING KENDALL ESTIMATING EQUATUION

EMPIRICAL LIKELIHOOD INFERENCE FOR THE ACCELERATED FAILURE TIME MODEL USING KENDALL ESTIMATING EQUATUION

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,414
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
16
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Georgia State University
    EMPIRICAL LIKELIHOOD INFERENCE FOR THE ACCELERATED FAILURE TIME MODEL USING KENDALL ESTIMATING EQUATUION
    By Yinghua Lu
    June 29th 2009
  • 2. Contents
    • Introduction
    • 3. Main Procedure
    • 4. Simulation Study
    • 5. Real Application
    • 6. Conclusion
  • Introduction – AFT Model
    Accelerated Failure Time (AFT) Model:
    • Very popular.
    • 7. Similar to the classic linear regression:
    where Y=ln(T).
    Different methods are developed
    • OLS
    • 8. Non-monotone estimating equations
    • 9. Monotone estimating equations with normal approximation.
  • Introduction – Kendall’s Tau
    Let {X1,Y1} and {X2, Y2} be two observations of two variables.
    Kendall’s tau coefficient is defined as:
    where nc is the number of [sign(X1-X2) = sign(Y1-Y2)], nd is the number of [sign(X1-X2) = -sign(Y1-Y2)].
    Sen(1968) proposed
    ε(b)=Y-bX
    U(b) is non-increasing in b.
  • 10. Introduction – Empirical Likelihood
    • A nonparametric method
    • 11. Based on a data-driven likelihood ratio function
    • 12. Without specifying a parametric family of distributions for the data.
    • 13. The shape of confidence regions
    • 14. Joins the reliability of the nonparametric methods and the efficiency of the likelihood methods.
  • Introduction – Empirical Likelihood
    For X1,X2,…,Xn, the likelihood function is defined by
    Let X1,X2,…,Xnbe n independent samples, the empirical cumulative distribution (ECDF) at x is
    The nonparametric likelihood of the CDF can be defined as
  • 15. Introduction – Empirical Likelihood
    Likelihood ratio:
    Owen (2001) proved
  • 16. Introduction – Brief History
    • Traced back to Thomas and Grunkemeier (1975)
    • 17. Summarized and discussed in Owen (1988, 1990, 1991, 2001)
    • 18. Qin and Jing (2001) and Li and Wang (2003): the limiting distribution EL ratio is a weighted chi-square distribution.
    • 19. Zhou (2005) and Zhou and Li (2008): Logrank and Gehan estimators, and Buckley-James estimator.
  • Main Procedure – Preliminaries
    Let T1,…,Tn be a sequence of random variables and Ti > 0. Let Z1,…,Zn be their corresponding covariates sequence.
    Z and β are px1 vectors.
    We observe and
    Define
    We employee the estimating equation as follow:
  • 20. Main Procedure – Preliminaries
    We can rewrite it as a U-statistic with symmetric kernel,
    Similar to Fygenson and Ritov (1994),
    where R and J are defined similarly in Fygenson and Ritov (1994).
  • 21. Main Procedure – Preliminaries
    The asymptotic variance of generalized estimate of β is
    The numerator can be estimated by
    The denominator can be estimated by
    Then we can construct the confidence interval as
  • 22. Main Procedure – Empirical Likelihood
    Let and
    Apply the idea of Sen (1960), we define
    where W’s are independently distributed.
  • 23. Main Procedure – Empirical Likelihood
    Let be a probability vector. Then the empirical likelihood function at the value β is given by
    For this function, reaches its maximum when
    Thus, the empirical likelihood ratio at β is defined by
  • 24. Main Procedure – Empirical Likelihood
    By Lagrange Multiplier method for logarithm transformation of above equation, we write
    Setting the partial derivative of G with respect to p to 0, we have
    then
  • 25. Main Procedure – Empirical Likelihood
    Plug into the previous equation, we obtain
    So, for all the p’s
    We have
  • 26. Main Procedure – Empirical Likelihood
    Theorem 1 Under the above conditions, converges in distribution to , where is a chi-square random variable with p degrees of freedom.
    Confidence region for β is given by
    EL confidence region for the q sub-vector
    Of
    Theorem 2 Under the above conditions, converges in distribution to , where is a chi-square random variable with q degrees of freedom.
    confidence region for is given by
  • 27. Simulation Study – EL vs. NA
    Consider the AFT model:
    Model 1: (skewed error distribution)
    • Z ~ Uniform distribution in [-1, 1].
    • 28. The censoring time C ~ Uniform distribution in [0, c], where c controls the censoring rate.
    • 29. The error term has the standard extreme value distribution, which is skewed to the right.
  • Simulation Study – EL vs. NA
    Model 2: (symmetric error distribution ).
    • Z ~ Uniform distribution in [0.5, 1.5].
    • 30. The censoring time C is defined as 2exp(1)+c.
    • 31. The error term has the standard Normal distribution N(0,1), which is symmetric.
    Setting:
    Repetition: 10000
  • 32. Simulation Study – EL vs. NA
    Results for model 1:
  • 33. Simulation Study – EL vs. NA
    Results for model 1:
  • 34. Simulation Study – EL vs. NA
    Results for model 1:
  • 35. Simulation Study – EL vs. NA
    Results for model 1:
  • 36. Simulation Study – EL vs. NA
    Results for model 2:
  • 37. Simulation Study – EL vs. NA
    Results for model 2:
  • 38. Simulation Study – EL vs. NA
    Results for model 2:
  • 39. Simulation Study – EL vs. NA
    Results for model 2:
  • 40. Simulation Study – EL vs. NA
    Summary:
    • As the sample size increase, the coverage probabilities (CP) for both methods increase.
    • 41. As the censoring rate increase, the coverage probabilities (CP) for both methods decrease.
    • 42. When the sample size is small, the CP for EL is better than NA, for very heavy censoring rate, both are not good enough though.
  • Simulation Study – EL vs. NA
    Summary:
    • Average length for the EL is a little longer than the NA in all cases.
    • 43. A little over-coverage problem with the EL.
    • 44. Under-coverage problem with the NA.
  • Simulation Study – Kendall vs. others
    Consider the following AFT model:
    We observe and
    Model 3:
    • Z ~ Normal distribution as N(1, 0.52).
    • 45. The censoring time C ~ Normal distribution as N(µ, 42), where µ produce samples with censoring rate equal to 10%, 30%, 50%, 75%.
    • 46. The error term has Normal distribution as N(0, 0.52).
    • 47. Sample Size: 50, 100 and 200
    • 48. Repetition: 5000
  • Simulation Study – Kendall
    Results for model 3:
  • 49. Simulation Study – Kendall
    Results for model 3:
    • When the sample size is small (n=50) and the censoring rate is heavy, Kendall’s rank regression estimator is better an all the other estimators.
    • 50. In other cases, Kendall’s rank regression estimator is also comparative.
  • Real Application
    Bone marrow transplants are a standard treatment for acute leukemia.
    Total of 137 patients were treated.
    For simplicity, the model contains only one covariate at a time, which is where Ti is Time to Death.
    The response variable Time to Death takes values from 1 day to 2640 days with mean equal to 839.16 days.
  • 51. Real Application
    We consider the following four variables:
    Disease Group (3 groups)
    Waiting Time to Transplant in Days (from 24 to 2616 days, mean=275 days)
    Recipient and Donor Age (from 7 to 52 and from 2 to 56)
    French-American-British (FAB): classification based on standard morphological criteria.
  • 52. Real Application
  • 53. Real Application
    Results:
    Two methods show similar results.
    Two exceptions may due to asymmetric CI of the EL.
    Average lengths of the EL are a little longer than that of the NA. Same results with the simulation study.
  • 54. Conclusion & Discussion
    • Average length of the CI by the EL are slightly longer than that by NA.
    • 55. The coverage probabilities of the EL are closer to the nominal levels than NA, especially when the sample size is very small and censoring rate is heavy.
    • 56. Kendall’s rank regression estimator is better than the Buckley-James, Logrank and Gehan estimators in terms of coverage probabilities.
  • Conclusion & Discussion
    • The combination of the Kendall estimating equation and the EL CI has strong advantages over the other considered approaches in the case of small sample size and heavy censoring rate.
    • 57. The combination shows a problem of over-coverage.
    • 58. A smoothing kernel is suggested to eliminate such a problem in the future work.
  • Thank you !