Invasive Aspergillosis (IA) is a serious fungal infection and a major cause of mortality in patients undergoing
allogeneic stem cell transplantation or chemotherapy for acute leukaemia. Large amounts of data are collected during the treatment of high-risk haematology patients and we
propose leveraging such data to produce more accurate predictions of IA diagnosis. We describe here the
application of machine learning techniques to predict probability of IA, which can be used to enhance the
interpretation of biomarker results.
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
1. Introduction Results Description Conclusions
HISA Big Data 2014 – April 3rd 2014 ( #BD14 )
Enhancing Diagnostics for Invasive Aspergillosis using
Machine Learning
Simone Romano
simone.romano@unimelb.edu.au
@ialuronico
James Bailey1
Lawrence Cavedon1,2,3
Orla Morrissey4,5
Monica slavin6,7
Karin Verspoor1,2
1The University of Melbourne, Dept. of Computing and Information Systems
2NICTA (National ICT Aust.) VRL
3School of Computer Science and IT, RMIT University
4Alfred Health 5Monash University
6Peter MacCallum Cancer Centre 7Melbourne Health
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
2. Introduction Results Description Conclusions
Introduction
Invasive Aspergillosis
Challenging Big Data Task
Results
Diagnostic Model
Description
Machine Learning for Diagnosis
Diagnosis of Invasive Aspergillosis
Conclusions
Summary
Future Work
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
3. Introduction Results Description Conclusions
Invasive Aspergillosis
Invasive Aspergillosis (IA)
Serious fungal infection and major cause of
mortality in patients undergoing allogeneic
stem cell transplantation or chemotherapy
for acute leukaemia.
Figure : Pulmonary IA.
http://en.wikipedia.org/wiki/Aspergillosis
Facts
34–43% mortality rate;
culture methods low sensitivity, only 40–50% IA cases identified;
IA patient results in +7 days of hospital stay and +$30,957.
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
4. Introduction Results Description Conclusions
Invasive Aspergillosis
Diagnosis and Treatment
Cases are classified with ProvenIA/ProbableIA/PossibleIA.
Current criteria for diagnosing IA are:
1. microbiology, risk factors, and CT scan findings;
2. Improved biomarkers such as Aspergillus PCR and Galactomannan
(GM) tested twice a week.
positive biopsy OR (positive CT scan AND single positive PCR/GM)
⇒ ProvenIA
≥ 2 consecutive positive PCR/GM in 2 week time frame
⇒ ProbableIA
Problem
One single positive biomarker might be a False Positive
⇒ Unnecessary harmful treatment.
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
5. Introduction Results Description Conclusions
Challenging Big Data Task
Big Data task
In a randomised controlled trial comparing the two different strategies for
diagnosis IA, large amount of data was collected from 240 patients
between Sept. 2005 and Nov. 2009 at six Australian Centres.
Objective: Leverage such data to produce more accurate prediction of
IA with Machine Learning techniques.
Are we really dealing with Big Data?
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
6. Introduction Results Description Conclusions
Challenging Big Data Task
Big Data task
In a randomised controlled trial comparing the two different strategies for
diagnosis IA, large amount of data was collected from 240 patients
between Sept. 2005 and Nov. 2009 at six Australian Centres.
Objective: Leverage such data to produce more accurate prediction of
IA with Machine Learning techniques.
Are we really dealing with Big Data?
All patients tracked for 26 weeks providing rich longitudinal data on
daily and weekly tests for each patient.
240 × 26 × 7 = 45,680 records.
Bed-side interpretation is a challenging task!
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
7. Introduction Results Description Conclusions
Diagnostic Model
Introduction
Invasive Aspergillosis
Challenging Big Data Task
Results
Diagnostic Model
Description
Machine Learning for Diagnosis
Diagnosis of Invasive Aspergillosis
Conclusions
Summary
Future Work
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
8. Introduction Results Description Conclusions
Diagnostic Model
Model
Our training set is a collection of 358 single positive biomarker tests that
precede the earliest label of IA.
Transplant/Chemotherapy
begins
1st 2nd 3rd 4th 5th months
positive biomarkers infection
Just 29 of the positive biomarkers were associated with a Proven IA or
Probable IA label within a week (329 false positives)
Built a model to output a probability of infection within a week
value;
Validated by a patient-level cross-validation framework.
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
9. Introduction Results Description Conclusions
Diagnostic Model
1 − TNR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
AUC = 0.63
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
10. Introduction Results Description Conclusions
Diagnostic Model
1 − TNR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
AUC = 0.63 AUC not too good
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
11. Introduction Results Description Conclusions
Diagnostic Model
1 − TNR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
AUC = 0.63
But good in
classifying negatives!
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
12. Introduction Results Description Conclusions
Diagnostic Model
Result
Setting a low threshold on the model output probability to achieve high
NPV (100%) we were able to identify 95 (26.5%) tests that do not
lead to an IA infection (TNR = 28.9%) within a week.
⇒ Doctors can avoid to start treatment in 26.5% cases!
avoid over-treatment;
reduce drug-toxicity;
reduce antifungal drug costs
(E.g. Amphotericin B $8,260 per patient per week).
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
13. Introduction Results Description Conclusions
Machine Learning for Diagnosis
Introduction
Invasive Aspergillosis
Challenging Big Data Task
Results
Diagnostic Model
Description
Machine Learning for Diagnosis
Diagnosis of Invasive Aspergillosis
Conclusions
Summary
Future Work
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
14. Introduction Results Description Conclusions
Machine Learning for Diagnosis
Classification Models
Logistic regression;
Decision trees;
Random forest
Training set
Voting
resampling
random tree
resampling
random tree
resampling
random tree
resampling
random tree
resampling
random tree
Random forest because:
It has the capability to work with heterogeneous features
(categorical/continuous);
It can work with many features.
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
15. Introduction Results Description Conclusions
Diagnosis of Invasive Aspergillosis
Features to use
Known at baseline: Gender, age, BMI, smoking attitude status,etc.
Daily tested: neutrophil count, body temperature, amount of
administered steroids, haemoglobin, platelets, white cell count, urea,
creatinine, ALT, AST, GGT, bilirubin, LDH, etc.
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
16. Introduction Results Description Conclusions
Diagnosis of Invasive Aspergillosis
Features to use
Known at baseline: Gender, age, BMI, smoking attitude status,etc.
Daily tested: neutrophil count, body temperature, amount of
administered steroids, haemoglobin, platelets, white cell count, urea,
creatinine, ALT, AST, GGT, bilirubin, LDH, etc.
Very heterogeneous features!!!
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
17. Introduction Results Description Conclusions
Diagnosis of Invasive Aspergillosis
Heterogeneous Features
Features constant along the treatment: Age, Gender, etc.
Features that varied over time: neutrophil count, temperature,
corticosteroid doses, etc.
When we have a positive biomarker test we can use the recent past
information to predict IA. We consider recent past the values in the 3
week window prior a single positive test result.
May Jun Jul
36.537.538.5
date
temperature
window
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
18. Introduction Results Description Conclusions
Diagnosis of Invasive Aspergillosis
Features that varied over time
Duration Features we count the number of days the value each
parameter lay within a particular range. For example, we divide the
measured temperature measurements into the intervals [36,37],
(37,38], (38,39], (39, 40], and and greater than 40(>40) Celsius
degrees and counted the number of days temperature occurred in
each interval;
Trajectories We select two days in the 3 week window preceding a
positive test test and compute the mean value, the standard
deviation, and the relative difference between those values. We
do it for all possible intervals in the window.
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
19. Introduction Results Description Conclusions
Summary
Introduction
Invasive Aspergillosis
Challenging Big Data Task
Results
Diagnostic Model
Description
Machine Learning for Diagnosis
Diagnosis of Invasive Aspergillosis
Conclusions
Summary
Future Work
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
20. Introduction Results Description Conclusions
Summary
Summary
Target: Enhance Diagnostics for biomarkers for Invasive
Aspergillosis
Method: Random forest for heterogeneous features creating
duration features, and trajectories features;
Validation: patient-level cross-validation;
Results: Setting a low threshold on the output probability, NPV =
100%, TNR = 28.9%. Safe avoidance of antifungal
therapy for 26.5% cases. Savings around $8K per patient
per week.
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
21. Introduction Results Description Conclusions
Future Work
Future Work
make the model more accurate in predicting when a positive test is
associated with an immediate infection to trigger the antifungal
treatment earlier in time;
search for alternative diagnosis when the outcomes are equally
probable according to the model;
make the model output more interpretable to clinical practitioners,
e.g. by identifying the trajectories in the data which generate a low
or high probability of IA.
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
22. Introduction Results Description Conclusions
Future Work
Thank you.
Questions?
Simone Romano The University of Melbourne
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning