1. Survival Analysis on the REVEAL-HBV Dataset
Assessing the risk of liver cancer in Hepatitis B patients
Lin Han1 James Stinecipher2
1Stony Brook University
Stony Brook, NY
2California State University, Fresno
Fresno, CA
July 23, 2012
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 1 / 13
2. Introduction Hepatitis B and REVEAL-HBV
Hepatitis B: Facts
• Chronic hepatitis B is a life-long liver disease caused by infection with
the hepatitis B virus (HBV).
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 2 / 13
3. Introduction Hepatitis B and REVEAL-HBV
Hepatitis B: Facts
• Chronic hepatitis B is a life-long liver disease caused by infection with
the hepatitis B virus (HBV).
• Approximately 400 million people have chronic infection with HBV,
which can lead to liver cirrhosis, liver failure, and liver cancer.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 2 / 13
4. Introduction Hepatitis B and REVEAL-HBV
Hepatitis B: Facts
• Chronic hepatitis B is a life-long liver disease caused by infection with
the hepatitis B virus (HBV).
• Approximately 400 million people have chronic infection with HBV,
which can lead to liver cirrhosis, liver failure, and liver cancer.
• 75% of these individuals are of Asian or Pacific Islander descent.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 2 / 13
5. Introduction Hepatitis B and REVEAL-HBV
Hepatitis B: Facts
• Chronic hepatitis B is a life-long liver disease caused by infection with
the hepatitis B virus (HBV).
• Approximately 400 million people have chronic infection with HBV,
which can lead to liver cirrhosis, liver failure, and liver cancer.
• 75% of these individuals are of Asian or Pacific Islander descent.
• REVEAL-HBV Study: Risk Evaluation and Viral Load Elevation and
Associated Liver Disease/Cancer-Hepatitis B Virus Study.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 2 / 13
6. Introduction Hepatitis B and REVEAL-HBV
Hepatitis B: Facts
• Chronic hepatitis B is a life-long liver disease caused by infection with
the hepatitis B virus (HBV).
• Approximately 400 million people have chronic infection with HBV,
which can lead to liver cirrhosis, liver failure, and liver cancer.
• 75% of these individuals are of Asian or Pacific Islander descent.
• REVEAL-HBV Study: Risk Evaluation and Viral Load Elevation and
Associated Liver Disease/Cancer-Hepatitis B Virus Study.
• Assessed effect of hepatitis B virus DNA (HBV-DNA) on hepatocellular
carcinoma (HCC) risk in Taiwanese hepatitis B patients.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 2 / 13
7. Introduction Hepatitis B and REVEAL-HBV
Hepatitis B: Facts
• Chronic hepatitis B is a life-long liver disease caused by infection with
the hepatitis B virus (HBV).
• Approximately 400 million people have chronic infection with HBV,
which can lead to liver cirrhosis, liver failure, and liver cancer.
• 75% of these individuals are of Asian or Pacific Islander descent.
• REVEAL-HBV Study: Risk Evaluation and Viral Load Elevation and
Associated Liver Disease/Cancer-Hepatitis B Virus Study.
• Assessed effect of hepatitis B virus DNA (HBV-DNA) on hepatocellular
carcinoma (HCC) risk in Taiwanese hepatitis B patients.
• We use the same subset of 3,656 individuals as in the original study.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 2 / 13
8. Introduction Hepatitis B and REVEAL-HBV
The REVEAL-HBV Dataset
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 3 / 13
9. Analyzing Data Initial Analyses
Starting Points - Nonparametric Analyses
Define the survival time as the difference between the date of acceptance and
liver cancer diagnosis, death or date last seen.
Participants who died or did not develop HCC were considered “censored.”
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 4 / 13
10. Analyzing Data Initial Analyses
Starting Points - Nonparametric Analyses
Define the survival time as the difference between the date of acceptance and
liver cancer diagnosis, death or date last seen.
Participants who died or did not develop HCC were considered “censored.”
Definition (Kaplan-Meier / Product-Limit Estimator)
ˆS(t) =
j:tj≤t
1 −
dj
rj
where ˆS(t) is the estimated survival rate at time t,
tj is the jth distinct diagnosis time,
dj is the number of people diagnosed with HCC at tj and
rj is the number of individuals at risk for HCC at tj.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 4 / 13
11. Analyzing Data Initial Analyses
Starting Points - Nonparametric Analyses
• For example, ALT (alanine aminotransferase) is an enzyme found in the
liver. We can look at the survival curves for individuals with low (< 45
units/liter) and high (≥ 45 units/liter) ALT levels.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 5 / 13
12. Analyzing Data Initial Analyses
Starting Points - Nonparametric Analyses
• For example, ALT (alanine aminotransferase) is an enzyme found in the
liver. We can look at the survival curves for individuals with low (< 45
units/liter) and high (≥ 45 units/liter) ALT levels.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 5 / 13
13. Analyzing Data Initial Analyses
Starting Points - Nonparametric Analyses
• For example, ALT (alanine aminotransferase) is an enzyme found in the
liver. We can look at the survival curves for individuals with low (< 45
units/liter) and high (≥ 45 units/liter) ALT levels.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 5 / 13
14. Analyzing Data Initial Analyses
Starting Points - Nonparametric Analyses
Test each variable to see which have significant differences in survival time.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 6 / 13
15. Analyzing Data Initial Analyses
Starting Points - Nonparametric Analyses
Test each variable to see which have significant differences in survival time.
Test of Equality over Strata
Variable Log-Rank Chi-Square DF Pr>Chi-Square
Gender 45.7463 1 <.0001
Age 87.0001 3 <.0001
Cigarette Smoking Habit 18.7793 1 <.0001
Alcohol Drinking Habit 27.3548 1 <.0001
Family History of HCC 15.9015 1 <.0001
ALT (U/L) at Baseline 72.9174 1 <.0001
HBV DNA Level at Baseline 254.7498 4 <.0001
HBeAg Serostatus at Baseline 157.0193 1 <.0001
Liver Cirrhosis at Study Entry 446.0566 1 <.0001
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 6 / 13
16. Analyzing Data Proportional Hazards Analyses
Applying the Cox Model to REVEAL
The Cox Proportional Hazards (PH) Model provides the following benefits:
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 7 / 13
17. Analyzing Data Proportional Hazards Analyses
Applying the Cox Model to REVEAL
The Cox Proportional Hazards (PH) Model provides the following benefits:
• We can quantify the effect of each covariate on the survival time.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 7 / 13
18. Analyzing Data Proportional Hazards Analyses
Applying the Cox Model to REVEAL
The Cox Proportional Hazards (PH) Model provides the following benefits:
• We can quantify the effect of each covariate on the survival time.
• We get a unified equation that includes all covariates of interest.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 7 / 13
19. Analyzing Data Proportional Hazards Analyses
Applying the Cox Model to REVEAL
The Cox Proportional Hazards (PH) Model provides the following benefits:
• We can quantify the effect of each covariate on the survival time.
• We get a unified equation that includes all covariates of interest.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 7 / 13
20. Analyzing Data Proportional Hazards Analyses
Applying the Cox Model to REVEAL
The Cox Proportional Hazards (PH) Model provides the following benefits:
• We can quantify the effect of each covariate on the survival time.
• We get a unified equation that includes all covariates of interest.
Definition (Cox Proportional Hazards Model)
h(t|Z) = h0(t)eβZ
where h(t|Z) is the hazard at time t,
Z is a vector of covariate values,
β is a vector of parameters, and
h0(t) is the baseline hazard function (unspecified).
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 7 / 13
21. Analyzing Data Proportional Hazards Analyses
Time-Dependence - How and Why?
• Problem: Not everyone who was diagnosed with liver cirrhosis (scarring)
had the condition on entry, so the risk associated with LC is not constant.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 8 / 13
22. Analyzing Data Proportional Hazards Analyses
Time-Dependence - How and Why?
• Problem: Not everyone who was diagnosed with liver cirrhosis (scarring)
had the condition on entry, so the risk associated with LC is not constant.
• Instead, we will make liver cirrhosis into a time-dependent variable.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 8 / 13
23. Analyzing Data Proportional Hazards Analyses
Time-Dependence - How and Why?
• Problem: Not everyone who was diagnosed with liver cirrhosis (scarring)
had the condition on entry, so the risk associated with LC is not constant.
• Instead, we will make liver cirrhosis into a time-dependent variable.
• This allows us to model how risk changes if and when LC develops.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 8 / 13
24. Analyzing Data Proportional Hazards Analyses
Time-Dependence - How and Why?
• Problem: Not everyone who was diagnosed with liver cirrhosis (scarring)
had the condition on entry, so the risk associated with LC is not constant.
• Instead, we will make liver cirrhosis into a time-dependent variable.
• This allows us to model how risk changes if and when LC develops.
• We create a secondary data set from the original, splitting those
individuals who developed LC into two intervals (before and after).
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 8 / 13
25. Analyzing Data Proportional Hazards Analyses
Time-Dependence - How and Why?
• Problem: Not everyone who was diagnosed with liver cirrhosis (scarring)
had the condition on entry, so the risk associated with LC is not constant.
• Instead, we will make liver cirrhosis into a time-dependent variable.
• This allows us to model how risk changes if and when LC develops.
• We create a secondary data set from the original, splitting those
individuals who developed LC into two intervals (before and after).
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 8 / 13
26. Analyzing Data Proportional Hazards Analyses
Time-Dependence - How and Why?
• Problem: Not everyone who was diagnosed with liver cirrhosis (scarring)
had the condition on entry, so the risk associated with LC is not constant.
• Instead, we will make liver cirrhosis into a time-dependent variable.
• This allows us to model how risk changes if and when LC develops.
• We create a secondary data set from the original, splitting those
individuals who developed LC into two intervals (before and after).
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 8 / 13
27. Analyzing Data Proportional Hazards Analyses
Cox PH Model for REVEAL-HBV Dataset
Changing LC to a time-dependent variable, we get the following:
Analysis of Maximum Likelihood Estimates
Parameter β Estimate Standard Error Chi-Square Pr > ChiSq Hazard Ratio
Gender 0.36948 0.20153 3.3612 0.0668 1.447
Age 2 1.37091 0.25106 29.8166 <.0001 3.939
3 1.74920 0.24114 52.6172 <.0001 5.750
4 2.32481 0.27678 70.5490 <.0001 10.225
Smoking 0.15868 0.15476 1.0513 0.3052 1.172
Drinking 0.55692 0.17451 10.1841 0.0014 1.745
HBeAg 0.48706 0.20639 5.5690 0.0183 1.628
ALT 0.04020 0.19157 0.0440 0.8338 1.041
HBV DNA 2 −0.01275 0.31610 0.0016 0.9678 0.987
3 0.39902 0.30965 1.6606 0.1975 1.490
4 1.12699 0.29801 14.3011 0.0002 3.086
5 1.31971 0.31930 17.0825 <.0001 3.742
Liver Cirrhosis 2.73020 0.15805 298.3991 <.0001 15.336
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 9 / 13
28. Analyzing Data Proportional Hazards Analyses
Checking Model Fit
One method we use to check the goodness-of-fit for the model is by finding
the Schoenfeld residuals for each covariate:
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 10 / 13
29. Analyzing Data Proportional Hazards Analyses
Checking Model Fit
One method we use to check the goodness-of-fit for the model is by finding
the Schoenfeld residuals for each covariate:
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 10 / 13
30. Analyzing Data Proportional Hazards Analyses
Checking Model Fit
We also assess our model using the following model fit criteria
(Lower scores correspond to a better fit):
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 11 / 13
31. Analyzing Data Proportional Hazards Analyses
Checking Model Fit
We also assess our model using the following model fit criteria
(Lower scores correspond to a better fit):
Original Model Fit Statistics
Criterion W/o Covariates With Covariates
-2 LOG L 4417.608 3898.132
AIC 4417.608 3924.132
SBC 4417.608 3971.007
Time-Dependent Model Fit Statistics
Criterion W/o Covariates With Covariates
-2 LOG L 3519.857 2814.697
AIC 3519.857 2840.697
SBC 3519.857 2884.696
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 11 / 13
32. Conclusion
Summary
• Using the Cox PH model allows us to quantify the risk for HCC
associated with various conditions.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 12 / 13
33. Conclusion
Summary
• Using the Cox PH model allows us to quantify the risk for HCC
associated with various conditions.
• We determined the following covariates to be significant:
• Gender
• Age
• Drinking
• Presence of the hepatitis B “e” antigen
• Family history of HCC
• Serum HBV DNA level
• Diagnosis with liver cirrhosis.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 12 / 13
34. Conclusion
Summary
• Using the Cox PH model allows us to quantify the risk for HCC
associated with various conditions.
• We determined the following covariates to be significant:
• Gender
• Age
• Drinking
• Presence of the hepatitis B “e” antigen
• Family history of HCC
• Serum HBV DNA level
• Diagnosis with liver cirrhosis.
• Considering LC as a time-dependent covariate yields a more efficient
model than time-independent models.
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 12 / 13
35. Conclusion Acknowledgements
Acknowledgements
We would like to thank:
• California State University, Fresno and the National Science Foundation
for their financial support (NSF Grant #DMS-1156273)
• The California State University, Fresno Mathematics REU program, and
• Our mentor, Dr. Ke Wu, for his support during the completion of this
project.
Thank You!
Han / Stinecipher (SUNYSB / CSUF) Survival Analysis July 23, 2012 13 / 13