Working with medical doctors, we implemented novel data mining techniques to predict the Sustained Virological Response (SVR) to hepatitis C treatment. In order to make the models more interpretable, we used Probability Estimation Trees (PETs).
1. Simone Romano’s Research Activity
Formerly:
Research Assistant at Department of
Information Engineering (DEI)
Padova, Italy
Now:
PhD student at Department of
Computing and Information Systems (CIS)
Melbourne, Australia
Simone Romano (University of Melbourne) Simone Romano’s Research Activity March 21st 2012 1 / 8
2. 1 Problem Statement
2 Proposed Solutions
Probability Estimation Trees
Cost-sensitive classification
3 Results
4 Conclusions
Simone Romano (University of Melbourne) Simone Romano’s Research Activity March 21st 2012 2 / 8
3. Problem Statement
Problem: Interferon (IFN) and Ribavirin (RBV) therapy for Hepatitis C
is successful only in the 60% of cases. Moreover, this
combined treatment has many side effects.
Data: 606 Padova patients + 592 external patients already treated
with IFN and RBV with known outcome.
Objective: Predict the Sustained Virological Response (SVR) as earlier
as possible for future subjects.
Simone Romano (University of Melbourne) Simone Romano’s Research Activity March 21st 2012 3 / 8
4. Proposed Solutions
Proposed solutions:
Probability Estimation Trees (PETs);
Cost-Sensitive Classification;
PETs with future doses.
Simone Romano (University of Melbourne) Simone Romano’s Research Activity March 21st 2012 4 / 8
5. Proposed Solutions Probability Estimation Trees
Gender
Age
Male Female
40%
30% 50%
60% 30%
[years]
100
49 51
40 11
Simone Romano (University of Melbourne) Simone Romano’s Research Activity March 21st 2012 5 / 8
6. Proposed Solutions Cost-sensitive classification
Which is worse?
Exclude from therapy a patient that can get better?
Treat a patient with no result?
Simone Romano (University of Melbourne) Simone Romano’s Research Activity March 21st 2012 6 / 8
7. Results
PET example, with future doses:
HCV-RNA 1st
month
379
63%
91
5%
≤ 3.97 > 3.97
[log IU/mL]
subgroup 1
38
0%
53
9%
54
0%
37
14%
RBV dose percentage
[%]
IFN dose percentage
[%]
≤ 90 > 90
≤ 99 > 99
p = 0.051 p = 0.005
New stopping criteria:
Criterion Recall Precision
Standard criterion
EVR (3rd
month) 35.3 100.0
New Criteria
HCV-RNA 1st
m. > 4.90 40.3 100.0
HCV-RNA 1st
m. > 3.97 and
(IFN ≤ 99% or RBV ≤ 90%)
48.2 100.0
...
Simone Romano (University of Melbourne) Simone Romano’s Research Activity March 21st 2012 7 / 8