Upcoming SlideShare
×

Thesis Defense

1,529 views

Published on

EMPIRICAL LIKELIHOOD INFERENCE FOR THE ACCELERATED FAILURE TIME MODEL USING KENDALL ESTIMATING EQUATUION

Published in: Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,529
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
18
0
Likes
0
Embeds 0
No embeds

No notes for slide

Thesis Defense

1. 1. Georgia State University<br />EMPIRICAL LIKELIHOOD INFERENCE FOR THE ACCELERATED FAILURE TIME MODEL USING KENDALL ESTIMATING EQUATUION<br />By Yinghua Lu<br />June 29th 2009<br />
2. 2. Contents<br /><ul><li> Introduction
3. 3. Main Procedure
4. 4. Simulation Study
5. 5. Real Application
6. 6. Conclusion</li></li></ul><li>Introduction – AFT Model<br />Accelerated Failure Time (AFT) Model: <br /><ul><li> Very popular.
7. 7. Similar to the classic linear regression:</li></ul>where Y=ln(T).<br />Different methods are developed<br /><ul><li> OLS
8. 8. Non-monotone estimating equations
9. 9. Monotone estimating equations with normal approximation.</li></li></ul><li>Introduction – Kendall’s Tau <br />Let {X1,Y1} and {X2, Y2} be two observations of two variables. <br />Kendall’s tau coefficient is defined as:<br />where nc is the number of [sign(X1-X2) = sign(Y1-Y2)], nd is the number of [sign(X1-X2) = -sign(Y1-Y2)].<br />Sen(1968) proposed<br />ε(b)=Y-bX<br />U(b) is non-increasing in b.<br />
10. 10. Introduction – Empirical Likelihood<br /><ul><li>A nonparametric method
11. 11. Based on a data-driven likelihood ratio function
12. 12. Without specifying a parametric family of distributions for the data.
13. 13. The shape of confidence regions
14. 14. Joins the reliability of the nonparametric methods and the efficiency of the likelihood methods.</li></li></ul><li>Introduction – Empirical Likelihood<br />For X1,X2,…,Xn, the likelihood function is defined by<br />Let X1,X2,…,Xnbe n independent samples, the empirical cumulative distribution (ECDF) at x is<br />The nonparametric likelihood of the CDF can be defined as <br />
15. 15. Introduction – Empirical Likelihood<br />Likelihood ratio:<br />Owen (2001) proved <br />
16. 16. Introduction – Brief History<br /><ul><li>Traced back to Thomas and Grunkemeier (1975)
17. 17. Summarized and discussed in Owen (1988, 1990, 1991, 2001)
18. 18. Qin and Jing (2001) and Li and Wang (2003): the limiting distribution EL ratio is a weighted chi-square distribution.
19. 19. Zhou (2005) and Zhou and Li (2008): Logrank and Gehan estimators, and Buckley-James estimator.</li></li></ul><li>Main Procedure – Preliminaries<br />Let T1,…,Tn be a sequence of random variables and Ti &gt; 0. Let Z1,…,Zn be their corresponding covariates sequence.<br />Z and β are px1 vectors. <br />We observe and<br />Define <br />We employee the estimating equation as follow:<br />
20. 20. Main Procedure – Preliminaries<br />We can rewrite it as a U-statistic with symmetric kernel,<br />Similar to Fygenson and Ritov (1994), <br />where R and J are defined similarly in Fygenson and Ritov (1994). <br />
21. 21. Main Procedure – Preliminaries<br />The asymptotic variance of generalized estimate of β is <br />The numerator can be estimated by<br />The denominator can be estimated by<br />Then we can construct the confidence interval as<br />
22. 22. Main Procedure – Empirical Likelihood<br />Let and<br />Apply the idea of Sen (1960), we define<br />where W’s are independently distributed.<br />
23. 23. Main Procedure – Empirical Likelihood<br />Let be a probability vector. Then the empirical likelihood function at the value β is given by<br />For this function, reaches its maximum when <br />Thus, the empirical likelihood ratio at β is defined by<br />
24. 24. Main Procedure – Empirical Likelihood<br />By Lagrange Multiplier method for logarithm transformation of above equation, we write <br />Setting the partial derivative of G with respect to p to 0, we have<br />then <br />
25. 25. Main Procedure – Empirical Likelihood<br />Plug into the previous equation, we obtain<br />So, for all the p’s<br />We have<br />
26. 26. Main Procedure – Empirical Likelihood<br />Theorem 1 Under the above conditions, converges in distribution to , where is a chi-square random variable with p degrees of freedom.<br />Confidence region for β is given by<br />EL confidence region for the q sub-vector <br />Of<br />Theorem 2 Under the above conditions, converges in distribution to , where is a chi-square random variable with q degrees of freedom.<br />confidence region for is given by<br />
27. 27. Simulation Study – EL vs. NA<br />Consider the AFT model:<br />Model 1: (skewed error distribution)<br /><ul><li>Z ~ Uniform distribution in [-1, 1].
28. 28. The censoring time C ~ Uniform distribution in [0, c], where c controls the censoring rate.
29. 29. The error term has the standard extreme value distribution, which is skewed to the right.</li></li></ul><li>Simulation Study – EL vs. NA<br />Model 2: (symmetric error distribution ).<br /><ul><li>Z ~ Uniform distribution in [0.5, 1.5].
30. 30. The censoring time C is defined as 2exp(1)+c.
31. 31. The error term has the standard Normal distribution N(0,1), which is symmetric.</li></ul>Setting:<br />Repetition: 10000<br />
32. 32. Simulation Study – EL vs. NA<br />Results for model 1:<br />
33. 33. Simulation Study – EL vs. NA<br />Results for model 1:<br />
34. 34. Simulation Study – EL vs. NA<br />Results for model 1:<br />
35. 35. Simulation Study – EL vs. NA<br />Results for model 1:<br />
36. 36. Simulation Study – EL vs. NA<br />Results for model 2:<br />
37. 37. Simulation Study – EL vs. NA<br />Results for model 2:<br />
38. 38. Simulation Study – EL vs. NA<br />Results for model 2:<br />
39. 39. Simulation Study – EL vs. NA<br />Results for model 2:<br />
40. 40. Simulation Study – EL vs. NA<br />Summary:<br /><ul><li>As the sample size increase, the coverage probabilities (CP) for both methods increase.
41. 41. As the censoring rate increase, the coverage probabilities (CP) for both methods decrease.
42. 42. When the sample size is small, the CP for EL is better than NA, for very heavy censoring rate, both are not good enough though.</li></li></ul><li>Simulation Study – EL vs. NA<br />Summary:<br /><ul><li>Average length for the EL is a little longer than the NA in all cases.
43. 43. A little over-coverage problem with the EL.
44. 44. Under-coverage problem with the NA.</li></li></ul><li>Simulation Study – Kendall vs. others<br />Consider the following AFT model:<br />We observe and<br />Model 3:<br /><ul><li>Z ~ Normal distribution as N(1, 0.52).
45. 45. The censoring time C ~ Normal distribution as N(µ, 42), where µ produce samples with censoring rate equal to 10%, 30%, 50%, 75%.
46. 46. The error term has Normal distribution as N(0, 0.52).
47. 47. Sample Size: 50, 100 and 200
48. 48. Repetition: 5000</li></li></ul><li>Simulation Study – Kendall<br />Results for model 3:<br />
49. 49. Simulation Study – Kendall<br />Results for model 3:<br /><ul><li>When the sample size is small (n=50) and the censoring rate is heavy, Kendall’s rank regression estimator is better an all the other estimators.
50. 50. In other cases, Kendall’s rank regression estimator is also comparative.</li></li></ul><li>Real Application<br />Bone marrow transplants are a standard treatment for acute leukemia.<br />Total of 137 patients were treated.<br />For simplicity, the model contains only one covariate at a time, which is where Ti is Time to Death. <br />The response variable Time to Death takes values from 1 day to 2640 days with mean equal to 839.16 days. <br />
51. 51. Real Application<br />We consider the following four variables:<br />Disease Group (3 groups)<br />Waiting Time to Transplant in Days (from 24 to 2616 days, mean=275 days)<br />Recipient and Donor Age (from 7 to 52 and from 2 to 56)<br />French-American-British (FAB): classification based on standard morphological criteria.<br />
52. 52. Real Application<br />
53. 53. Real Application<br />Results:<br />Two methods show similar results.<br />Two exceptions may due to asymmetric CI of the EL.<br />Average lengths of the EL are a little longer than that of the NA. Same results with the simulation study.<br />
54. 54. Conclusion & Discussion<br /><ul><li>Average length of the CI by the EL are slightly longer than that by NA.
55. 55. The coverage probabilities of the EL are closer to the nominal levels than NA, especially when the sample size is very small and censoring rate is heavy.
56. 56. Kendall’s rank regression estimator is better than the Buckley-James, Logrank and Gehan estimators in terms of coverage probabilities. </li></li></ul><li>Conclusion & Discussion<br /><ul><li>The combination of the Kendall estimating equation and the EL CI has strong advantages over the other considered approaches in the case of small sample size and heavy censoring rate.
57. 57. The combination shows a problem of over-coverage.
58. 58. A smoothing kernel is suggested to eliminate such a problem in the future work.</li></li></ul><li>Thank you !<br />