Prognosis of Invasive Micropapillary Carcinoma of the Breast Analyzed by Usin...
poster
1. Focused proteomic profiling for rapid detection of late-onset neonatal sepsis
Daniel Cannon1
, Robin Ohls1
, Carol Hartenberger1
, Hannah Peceny1
, Kari Ballard2
, Akhil Maheshwari3
, Mohan Venkatesh4
, Subramani Mani1
1
University of New Mexico, Albuquerque, NM; 2
Myriad RBM, Austin, TX; 3
University of Illinois, Chicago, IL; 4
Baylor University, Waco, TX
Background j
Sepsis is a frequent and serious problem in neona-
tal intensive care units (NICUs), particularly among
premature and very low birth weight (VLBW) infants.
As many as 20% of VLBW infants experience one
or more episodes of late-onset sepsis (LOS), and
LOS more than doubles the risk of death before dis-
charge.1
Despite the frequency with which life-threatening in-
fections are encountered in the NICU, a rapid, sensi-
tive, and specific diagnostic test remains elusive.
Objective
Using recent advances in machine learning and
proteomic assay technology, we sought to derive
and validate a computer algorithm for reliably de-
tecting late-onset sepsis in preterm neonates using
proteomic measurements.
Study Design
Infants were eligible for the study if they had a ges-
tational age of ≤ 32 completed weeks, a birth weight
of ≤ 1, 500 grams, and a postnatal age of ≥ 5 days.
Infants were enrolled for 42 days, or until discharge,
transfer to another institution, or death.
One drop (approximately 10-20 µL) of blood was pe-
riodically collected from each infant.
All 45 culture positive cases were included for analy-
sis, and 90 healthy controls were also included.
Figure 1: Following enrollment, blood samples were periodically
collected from each patient. Patient charts were reviewed to deter-
mine class labels, and samples from infants with culture positive
sepsis or as healthy controls were included for analysis.
Subject Characteristics j
Controls Cases P
Subjects 88 45 -
Female (%) 46 (50) 25 (60) 0.85
Birth weight [g]
Mean (σ) 1011 (279) 790 (214) < 0.001
IQR 813 – 1204 650 – 875
Median 999 740
Gestation [wks.]
Mean (σ) 28 (2) 26 (2) < 0.001
IQR 26 – 30 24 – 27
Median 28 25
Analysis j
Myriad RBM analyzed the blood samples using a fo-
cused proteomic assay of 90 potential biomarkers
believed to play a role in inflammation and infection.
For each case, we defined t0 as the day on which
the physician first suspected sepsis. For controls, we
selected t0 to approximate the t0 distribution of the
cases.
Using ten independent trials of stratified ten-fold
cross-validation, we assessed the performance of
predictive models constructed by applying three ma-
chine learning methods (see below) to the post-
processed proteomic data.
We used a nested five-fold cross-validation to iden-
tify a threshold at which sensitivity was ≥ 0.80 and
accuracy was maximal. If no such threshold could
be identified, then the threshold was chosen to max-
imize accuracy with no constraint on sensitivity.
Machine Learning Methods j
One-R
A na¨ıve univariate approach which outputs a single
variable and a cut-off which most accurately parti-
tions the training data into cases and controls.
Classification and Regression Trees (CART)
The CART2
approach uses recursive partitioning to
generate a decision tree (see Figure 2), which can
be applied to determine the likelihood that a test in-
stance is a case or control.
Random Forest
In the Random Forest3
approach, 1,000 CART trees
are generated using random subsets of the training
data. Test instances are then classified by aggregat-
ing the votes of each tree.
Results j
Figure 2: A streamgraph depicting the cross-validated perfor-
mance of Random Forest and the critical variables for prediction
at each day relative to the day on which sepsis was suspected (t0).
The width of the entire curve represents the AUC of the model for
the given t, while the width of each individual band depicts the
relative importance of the corresponding variable.
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
One-R CART Random Forest
Machine Learning Method
AUC Sensitivity Specificity
0.66
0.64
0.82
0.49
0.77
0.81
0.83
0.35
0.72
Figure 3: A comparison of the average cross-validated perfor-
mance of models generated by One-R, CART, and Random For-
est using data available on t0 (the day on which sepsis was sus-
pected). Error bars represent one standard deviation of the mean.
As is evident in Figure 4, Random Forest substan-
tially outperformed other methods for t0 with an av-
erage AUC of 0.82 (σ = 0.01), compared to an
AUC of 0.64 (σ = 0.04) obtained by CART and 0.66
(σ = 0.03) obtained by One-R.
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5
Days Before or After Suspicion of Sepsis (t)
Sensitivity Accuracy Specificity
Figure 4: The average cross-validated accuracy, sensitivity, and
specificity of Random Forest models by day (relative to the day
sepsis was suspected). Error bars represent one standard devia-
tion of the mean.
When using only data available on the day before
sepsis was suspected (t−1), Random Forest aver-
aged a sensitivity of 0.80 (σ = 0.02), a specificity
of 0.62 (σ = 0.03), an accuracy of 0.68 (σ = 0.01),
and an AUC of 0.78 (σ = 0.01).
Discussion and Conclusions j
While univariate diagnostic algorithms are easy to
comprehend, more complex algorithms which com-
bine multiple biomarkers are more accurate and
likely more robust.
A proteomics-based tool could improve the ability of
clinicians to diagnose sepsis in the NICU, and may
enable detection of sepsis up to 24 hours earlier.
Of the proteins analyzed, only C-Reactive Protein
(CRP) is currently used clinically to diagnose neona-
tal sepsis, yet our results indicate that other biomark-
ers may be far more predictive, especially in the early
stages of infection.
References j
[1] B. J. Stoll, N. Hansen, A. A. Fanaroff, et al., “Late-onset
sepsis in very low birth weight neonates: the experience of
the NICHD neonatal research network”, Pediatrics, vol. 110,
no. 2, Aug. 2002.
[2] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen,
Classification and regression trees. CRC press, 1984.
[3] L. Breiman, “Random forests”, Machine Learning, vol. 45,
no. 1, pp. 5–32, 2001, ISSN: 0885-6125. DOI:
10.1023/A:1010933404324.
This research was supported by the Clinical
Translation Science Award UL1TR000154 and
the Small Business Innovation Research Award
R44 GM082038-01.
dccannon <at> salud.unm.edu Translational Informatics Division · Department of Internal Medicine · University of New Mexico http://medicine.unm.edu/informatics