Anjali_Ganguly_Siemens_2014

Towards a Novel Model for the Prognosis of Acute Myeloid Leukemia Patients: An Evaluation
of Cancer Modeling Techniques
Executive Summary
In recent years, as the incidence of cancer continues to climb, much research attention has
been devoted to alternative methods of cancer management, among them computational
modeling. While computational modeling offers many promises to cancer researchers, it faces
several challenges as well, including the lack of a robust, clinically applicable cancer model.
This study aims to develop an accurate, comprehensive predictive model for patients of Acute
Myeloid Leukemia, one of the most devastating cancers. This paper presents the initial
development of this model with the evaluation of the predictive potential of two widely used
computational models, the Artificial Neural Network (ANN) and Logistic Regression (LR). Both
the ANN and LR were developed in Python using protein expression data from 191 AML
patients so that, given a patient’s profile at the time of diagnosis, the model could predict
whether the patient would go into remission or exhibit resistance to current therapies for AML.
The predictive potential of these models was measured using five criteria: sensitivity, specificity,
positive predictive value, negative predictive value, and overall accuracy. The ANN exhibited
excellent recognition of patients likely to go into remission, which would be greatly valuable in
the clinic. While the LR did not perform as well, it did yield important statistical information that
set it up as a likely supplement as part of a larger, robust model. Our results indicate that an
integration of the ANN and LR will likely yield the most effective predictive tool, as the two
together compensate for the flaws they exhibit individually. This research is the first step in the
creation of a novel tool that doctors can use to predict a patient’s prognosis at the time of
diagnosis, contributing to a more individualized, effective approach to cancer therapy.

Towards a Novel Model for the Prognosis of Acute Myeloid Leukemia Patients: An Evaluation
of Cancer Modeling Techniques
Abstract
This paper presents the initial stages of the development of a predictive model for Acute Myeloid
Leukemia (AML) patients. This study assessed the respective advantages and disadvantages of
two widely used computational models, the Artificial Neural Network (ANN) and Logistic
Regression (LR), in order to firstly, evaluate their potential to predict, given a patient’s profile at
diagnosis, whether the patient will enter remission or exhibit primary resistance, and secondly, to
offer a broader understanding of the current necessities of cancer modeling. The ANN and LR
were both developed in Python using proteomic data (231 proteins) obtained from 191 AML
patients. Predictive potential of models was measured using sensitivity, specificity, positive
predictive value, negative predictive value, and overall accuracy. The ANN exhibited an
excellent specificity and positive predictive value, indicating its potential clinical applications;
however, it did exhibit a tendency to overfit to data, confirming one hypothesis of this research.
While the LR was a relatively poor performer, it did yield important statistical information that
set it up as a likely supplement as part of a larger, robust model. Thus, our results point us to
integration of the two modeling techniques to create the novel predictive tool for physicians we
envision.

1
1. Introduction
Cancer persists today as one of the leading causes of mortality worldwide, threatening to
claim almost 600,000 lives in the United States alone this year (ACS et al. 2014). As the
incidence of cancer continues to climb, researchers are prompted to go beyond traditional
techniques of treatment to explore alternative methods of cancer management. Among the
various methods gaining popularity, mathematical and computational modeling has become an
especially promising new avenue for cancer research.
Medical society has mastered the art of data collection with regards to cancer: today,
gathering and storing data about a patient’s clinical and genetic correlates is no longer the
challenge. Instead, extracting valuable information from this wealth of “big data” proves to be
the next large obstacle for cancer researchers to surmount (Quaranta et al. 2005). Here is how
computational modeling, or a “top-down” approach to medicine, offers great potential; by
providing researchers with the tools able to perform large-scale analysis, computational
modeling, with enormous predictive capacity, may yield the tools necessary to improve the
management of cancer starting from diagnosis. As the focus on cancer prediction models has
grown, interest in ensuring their proper development and use is of the utmost importance
(Durrett et al. 2013).
However, cancer modeling faces challenges that currently prevent it from becoming
widely adapted in clinics—as the type and number of cancer models continues to rise, the
relative lack of assessment on the specific strengths and weakness of individual cancer models
has resulted in a plethora of models that are functional but severely limited in scope, rendering
these models useless in the clinic (Freedman et al. 2005). There is a need for a better

2
fundamental understanding of cancer modeling techniques in order to identify the best strategies
to apply when modeling a specific type of cancer.
This study focuses on Acute Myeloid Leukemia (AML), a cancer originating in bone
marrow and the bone’s soft tissue. A heterogeneous clonal disorder in which hematopoietic
precursors are restricted in early stages, AML has garnered much research attention in recent
years; however, understanding the pathophysiology of AML has not greatly improved the five-
year survival rate, which barely scrapes 24% (Stone et al. 2004). Several reasons account for the
low rate of survival, including the rapid progression of the disease (as denoted by the term
“acute”), but more important in limiting instances of recovery is the ineffective use of current
AML therapies. Current therapies, which initially concentrate on chemotherapy coupled with
allogenic hematopoietic stem cell transplantation, often induce relapse in AML patients. Thus,
chemotherapy doses must be consistently augmented depending on the specific patient’s
reaction, and this process eventually destroys bone marrow and results in the high mortality rate
(Stone et al. 2004). The resistance to chemotherapy noted has in turn contributed to a hypothesis
that the stem cells causing AML are particularly resistant to chemotherapy, eliminating one
heavily used avenue in cancer treatment and highlighting the need for a tool capable of
identifying the best treatment strategy for a given patient.

3
Thus, the ultimate goal of our research is to address this need for the betterment of AML
treatment by developing the best prediction model that, given a patient’s profile, can predict
whether he will be primary resistant (exhibiting a resistance to current therapies should they be
applied) or whether he will reach complete remission (becoming cancer-free for a given amount
of time). Developing this model as a predictive tool able to be used at the time of diagnosis
would enable doctors to use the prediction of the model—either remission or resistant—to decide
what treatment strategy would be best for the patient under consideration, and thus physicians
would be able to deliver an optimal combination of therapies to the patient, in essence creating a
more effective, individualized approach to cancer therapy.
This paper presents the primary stages of this enterprise. Given the current challenges in
the computational modeling of cancer, we sought to specifically evaluate the predictive potential
of highly used modeling techniques and assess their value in order to facilitate the development
of the comprehensive, optimal diagnostic tool we envision.
Figure 1: This figure depicts the dismally low survival rate for AML patients (both male and female) in the U.S.,
highlighting the need for a better treatment strategy for AML. Data collected between 1988 and 2007. Figure
obtained from http://www.medinfographics.com/ and data corroborated from the American Cancer Society
(http://www.cancer.org).

4
After much theoretical evaluation, we chose to work with two currently used modeling
techniques—the Artificial Neural Network (ANN) and Logistic Regression (LR) to assess their
predictive potential. Research reveals that the ANN, a type of machine learning algorithm, is
excellent for dealing with a nonlinear, dichotomous outcome, the type of outcome expected for
our models, and it learns easily, rendering highly accurate output. However, it is expected to
exhibit a tendency to overfit data, given the extensive emphasis it places on “learning” the
training dataset completely (Burke et al. 1997). On the other hand, the LR, a standard model used
in probabilistic statistics, can take in several features and is easy to set up, although it has been
widely recognized by the modeling community to exhibit a tendency to underfit data, given its
relative simplicity (Smith et al. 1982). Thus, we hypothesized that, by assessing the tendency of
the ANN to overfit and the LR to underfit simultaneously, we would gain a better understanding
of how to potentially use the two modeling techniques to complement each other and produce a
more accurate, comprehensive model.
2. Methods
2.1 Overview of Methods
The process of developing and evaluating the code for the cancer models selected can be
divided into two main steps: data and models. The first large body of work this paper presents
regards organizing, visualizing, and sorting the data to prepare a dataset ready for input into the
modeling code. The second body of work, the process of coding the models, used the dataset
prepared by Step 1 for developing and evaluating model performance.
2.2 Organization and Visualization of Data
Data were collected by Dr. Steven M. Kornblau over the course of eight years from 191
patients at the M. D. Anderson Cancer Center in Texas. Each patient’s data consisted of clinical

5
correlates, including patient demographics, mutation status, and cytogenetic category, and
proteomic measurements from 231 antibodies indicating the level of total or phosphorylated
proteins on a scale of -5 to 5 for each given protein. For the purposes of this study, only protein
data was considered in developing prediction models; it was hypothesized that, as proteins play
an important role in the regulation of apoptosis, protein expression could significantly affect
cancer prognosis. A schematic representation of the format of the original protein data is
depicted in the following table:
ACTB AIFM AKTI1_2_3.pS473 …FN1 …ZNF346
1 0.185844713 -0.21783783 -0.097350442 -0.97929751 -0.15678498
2 0.24250154 0.770726524 0.277496831 -0.34302295 1.260718032
3 0.555895971 0.331599622 -1.614156048 1.626415093 0.147560248
…191 1.220082392 0.233040465 1.548540573 -0.80614041 -0.30311939
Additional visualization techniques were used to better understand protein expression
level for each patient, especially through heatmaps. The heatmap is interactive and can be
segmented as different groups of proteins are selected, allowing patients to be categorized by
their relative expression levels. While not of critical relevance to the development of the
prediction models, the heat maps were an important preliminary tool that allowed us to better
understand the format of the data provided more clearly and clarify the development of a data-
sorting algorithm (discussed in section 2.3).
Table 1: This table is a schematic representation of the type of data in the protein dataset. For each of the 231
named proteins, each of the 191 patients had a value between -5 and 5; this value, although initially inconsistent
between patients, corresponds to the level expression of the protein under consideration. Data obtained from M. D.
Anderson Cancer Center.

6
2.3 Sorting of Data
After understanding the data, we wanted to eliminate all present outliers to essentially
have a “clean” dataset. As we were working with human data, no established standard for
comparison for proteomic data existed. In order to account for this lack of a control, we started
by confirming the dataset was well-modeled by a normal distribution using the Shapiro Wilk
Normality Test. This test utilizes the null hypothesis principle to check if the sample is from a
normally distributed population with the following standard equation:
(Eq. 1)
After determining that our dataset was indeed a normally distributed population, we began the
elimination of outliers. A script to remove outliers beyond a 3.5 σ threshold was written in
Python. The following standard formula was utilized to calculate standard deviation values:
Figure 2: This figure depicts one of the primary forms of data visualization used in this study: the protein
heatmaps. Each ring in the heatmap represents a selected protein; in this heatmap, red corresponds to a higher
relative expression level, and blue corresponds to a lower expression level. Heatmap visualization technique
obtained from Sage Synapse: Dream 9 Challenge.

7
(Eq. 2)
Using this equation, which only includes values within a limited threshold, it was confirmed that
none of our data points were far from the median values. The script was run in iterations to
remove all detectable outliers to ensure that the data describe the majority of patients, critical to
producing an accurate model. All data points not in the 3.5 σ range for each individual protein
were eliminated. Eliminating those patients from the training data gave us a cleaner and more
statistically sound dataset with which to develop the models. Although clinical data was not used
extensively in this stage of the study, we subdivided the patients by their individual clinical
covariates and noticed that patients with certain clinical characteristics and protein expression
levels tend to exhibit similar trends when comparing remission rates, an interesting finding for
future considerations.
2.4 Model Development
Following the preparation of a cleaned dataset, this dataset was used as the training set to
develop prediction models, both with an Artificial Neural Network and Logistic Regression.
Both models were coded in Python to output a binary value: either 1 to indicate a patient going
into remission, or 0 to indicate primary resistance. Furthermore, both were developed with a
split-sample technique, using 75% of the training dataset for development and 25% for testing
and redesign. The Artificial Neural Network and Logistic Regression were each evaluated
separately to determine the possibility of integration.
2.4.1 Artificial Neural Networks (ANN)
To reiterate, an Artificial Neural Network, or ANN, is a type of machine learning that
seeks to mimic the human brain; specifically it models neurons and their connections. Each

8
neuron is referred to as a node, and these various nodes connect to form a network that mimics
the biological nervous system. Much like a human child, an ANN learns by practice and thus
needs to be trained on a given dataset once created. During the training process, the ANN learns
to recognize patters and complexities inherently present in the data before computing an output.
As shown in this oversimplified diagram, an ANN consists of input nodes, or neurons,
which make up the input layer. These inputs are fed to one or more hidden layers, which use the
ANN’s inbuilt algorithms to process the inputs and link them to an output layer, where the
answer is output:
The ANN used in this study was developed with 231 input notes, 128 hidden nodes, and 1 output
node. The tanh squashing function was chosen as the activation function of the hidden layer, as it
limits complexities in the learning algorithm. The ANN was trained using supervised learning
via the PyBrain library’s backpropagation algorithm, which calculates the gradient of a loss
function with respect to all weights already present in the network, goes backward through the
network, and updates all parameters so as to minimize error. An arbitrary threshold was set—the
Figure 3: This figure is an oversimplified representation of the architecture of a neural network. Input (protein
data) is fed in through the input nodes, after which the network uses the inbuilt algorithms in its hidden layer to
form connections between the input data points, strengthen connections through backpropagation, and output a
binary outcome.

9
ANN was trained on the dataset until its error was less than 1e^-15. In addition, in order to find
the optimal weights for a given network, networks were created and evaluated 1000 times—an
arbitrary number set through experimental observation—and the network that yielded the highest
overall accuracy was saved and used for further evaluation.
2.4.2 Logistic Regression (LR)
The LR, a standard probabilistic model used in statistics, uses the logistic function
(Eq. 3)
to take in the input dataset; it then assigns coefficients, or weights, to the data points, in order to
output a probability value between 0 and 1. For the purposes of this study, 0 would mean the
patient is primary resistant, and 1 would indicate his going into remission. To reiterate, a LR
does have several advantages: it is much more robust than a linear regression, does not tend to
overfit data, and can take in many input features. However, as previous discussed, theoretically it
may be hampered by its assumption of linearity and consequent tendency to underfit. We
hypothesized that this drawback were indeed revealed in the course of this research as well, it
would likely need to be accounted for in order to produce a robust model.
The LR in this study was developed using modules from the scikit-learn library in
Python. An L1-norm (Least Absolute Deviations) was chosen; although an L1-norm does not
always yield a stable solution, it is robust and has a built-in feature selection method, factors we
considered important when working with protein data of a diverse nature. The regularization
parameter C was experimentally set to 1.0. Following evaluation of the model’s output, the
capacity of the LR to output probability values, a confidence interval, and decision scores for
each of the patient’s in the test set was explored.

10
2.5 Model Evaluation
Both the ANN and LR models were evaluated following the calculation of True Positive
(TP), True Negative (TN), False Positive (FP), and False Negative (FN) rates using a module we
developed to output a confusion matrix:
Positive (actual) Negative (actual)
Positive (predicted) True Positive False Positive
Negative (predicted) False Negative True Negative
Calculating the TP, TN, FP, and FN allowed us to extract a wealth of analysis; we specifically
focused on the following five criteria, as research reveals them to be the most clinically relevant:
1. Sensitivity: the likelihood of the test to correctly identify a patient who will not go into
remission
2. Specificity: the likelihood of the test to correctly evaluate a patient who will go into
remission
3. Positive Predictive Value (PPV): the probability that a patient who tests positive will
actually go into remission
4. Negative Predictive Value (NPV): the probability that a patient who tests negative will
not go into remission and exhibit resistance
5. Accuracy: the overall accuracy of the model, as computed by the sum of the number of
correctly classified positives and number of correctly classified negatives divided by the
total number of data points in the dataset.
Following the evaluation of the models with the abovementioned criteria, additional avenues for
research were explored, including the ability of the LR to produce information such as a
confidence interval.
Table 2: This table depicts the structure of the confusion matrix used to evaluate results.

11
3. Results
This study presents the beginning stages of a long-term effort to establish a thorough
basis for researchers in the pursuit of developing a comprehensive model for AML patients.
However, it is hoped that this study and similar ones in the future simultaneously contribute to a
broader understanding of cancer modeling techniques as a whole. During this project, we
evaluated two widely used modeling techniques—the ANN and LR—in view of this long-term
goal; thus, our results are presented in line with their potential implications.
3.1 ANN Results
The following table summarizes the results obtained from the ANN:
Sensitivity 0.82
Specificity 0.93
PPV 0.96
NPV 0.72
Accuracy 0.86
Significant features in these results include the high specificity and PPV values, which indicate
to us that the ANN developed succeeds in correctly identifying patients who will indeed go into
remission. Thus, the ANN specifically exhibits the capacity to recognize protein profiles of
patients almost certain to go into remission, a predictive capacity that would be of high clinical
value. However, the ANN exhibits difficulty in recognizing profiles of patients likely to be
resistant—an avenue that would need to be improved upon to create a fully comprehensive
model for clinical use.
To evaluate the nature of the ANN to overfit to data, we additionally tested it with its
own training dataset, for which the results are summarized below:
Table 3: ANN results with the test dataset with respect to the 5 criteria

12
Sensitivity 1.0
Specificity 1.0
PPV 1.0
NPV 1.0
Accuracy 1.0
These results confirm the ANN’s tendency to overfit, as revealed by the perfect scores output
when tested on the training data, indicating that, while the ANN is excellent at learning the
trends and patterns inherent in a dataset, it would likely need either a larger dataset or integration
with another, more simplified model to increase its accuracy with unknown data. Nevertheless,
this result is significant, as it confirms both the ANN’s major successes and its drawback and
thus highlights the path needed to use an ANN most effectively.
3.2 LR Results
The following table summarizes the results obtained from the LR:
Sensitivity 0.71
Specificity 0.43
PPV 0.71
NPV 0.43
Accuracy 0.62
As displayed by the LR’s relatively poor performance, especially in determining
specificity and NPV, it became evident to us that the LR, which likely does underfit as
hypothesized, is unreliable when used as the sole predictive model. Thus, we explored other
means of exploiting the LR’s capacities to use it as a supplementary, or confirmatory, model in
the future. We developed modules to successfully extract probability scores and a confidence
interval for each patient’s profile, a development that is significant, as it allows us to individually
identify the model’s confidence regarding a given patient. This information, when integrated into
Table 4: ANN results with the training dataset with respect to the 5 criteria
Table 5: LR results with the test dataset with respect to the 5 criteria

13
another model, would establish in essence a system of “checks and balances” where a potentially
incorrect output may be reevaluated and corrected with the additional input of probability and
confidence scores.
4. Application
We believe that this study, which is unique in its exploration of different avenues with the
goal of developing the optimal model, has several implications for cancer research, both
immediately and in the long-term. The current results of both models point us towards an
integration of the ANN and LR, a little-investigated approach to cancer modeling that may yield
the most promising results: the ANN, an otherwise powerful predictive tool, can compensate for
its one drawback, its nature to overfit, by incorporating models such as the LR, which are
unreliable when used alone but provide essential information as input as part of a greater,
comprehensive model. Perhaps more importantly, in the long term, the models we are continuing
to research and develop are the fundamental steps in creating a novel tool for physicians. This
tool will allow doctors to view an AML patient’s profile at the time of diagnosis and
immediately determine the likelihood of the patient to respond to current therapies. Recognizing
either remission or resistance, the doctor can use the model’s prediction to identify the best
possible treatment path for the patient, creating a much-needed accurate, individualized approach
to cancer therapy.
5. Conclusions and Future Work
Our study provides the first set of code modules and the first evaluation that our mentors’
research group will be using for years to come as part of a larger effort to improve cancer
modeling. This research, which points us to the necessity of integration of more than one
modeling technique, opens up a new path for other researchers to create a more effective model.

14
We continue to explore additional avenues to improve the process of cancer modeling,
many of which will continue under other researchers. Future steps include devising an algorithm
to quantify clinical correlates so they can corroborate with proteomic data once incorporated into
a computational model. Furthermore, we look to construct a dynamical system for our data—a
model that is different from the traditional computational approach to modeling by closely
emulating human cognition, which happens continuously and linearly rather than cyclically. The
development of these improved models as well as of a website for doctors to use these models
will continue through 2015 into future years.

Works Cited
ACS Home. (2014). Viewed 8 August 2014. Web. http://www.cancer.org/
Burke, Harry B. and Philip H. Goodman, David B. Rosen, Donald E. Henson, John N.
Weinstein, Frank E. Harrell, Jr., Jeffrey R. Marks, David P. Winchester, and David G.
Bostwick. 1997. “Artificial Neural Networks Improve the Accuracy of Cancer Survival
Prediction.” American Cancer Society 79(4): 857-862.
Durrett, Rick. 2013. “Cancer Modeling: A Personal Perspective.” Notices of the AMS 60(3): 304-
309.
Freedman, Andrew N., and Daniela Seminara, Mitchell H. Gail, Patricia Hartge, Graham A.
Colditz, Rachel Ballard-Barbash, and Ruth M. Pfeiffer. “Cancer Risk Prediction Models:
A Workshop on Development, Evaluation, and Application.” Journal of the National
Cancer Institute 97(10): 715-723.
Quaranta, Vito and Alissa M. Weaver, Peter T. Cummings, and Alexandra R. A. Anderson.
2005. “Mathematical Modeling of Cancer: The Future of Prognosis and Treatment.”
Clinica Chimica Acta 357(2): 173-179.
Smith, Terry L., and Edmund A. Gehan, Michael J. Keating, and Emil J. Freireich. 1982.
“Prediction of Remission in Adult Acute Leukemia.” Cancer 50: 466-472.
Stone, Richard M., and Margaret R. O’Donnell, and Mikkael A. Sekeres. 2004. “Acute Myeloid
Leukemia.” Hematology 1: 98-117.
Works Consulted
Delen, Dursun, and Glenn Walker, and Amit Kadam. 2004. “Predicting Breast Cancer
Survivability: A Comparison of Three Data Mining Methods.” Artificial Intelligence in
Medicine 34: 113-127.
Herst, Patries M., and Rebecca A. Howman, Paul J. Neeson, Michael V. Berridge, and David S.
Ritchie. 2010. “The Level of Glycolytic Metabolism in Acute Myeloid Leukemia Blasts
at Diagnosis is Prognostic for Clinical Outcome.” Journal of Leukocyte Biology 89: 51-
55.
Li, Lihua, and Hong Tang, Zuobao Wu, Jianli Gong, Michael Gruidl, Jun Zou, Melvyn
Tockman, and Robert A. Clark. 2004. “Data Mining Techniques for Cancer Detection
Using Serum Proteomic Profiling.” Artificial Intelligence in Medicine 32: 71-83.

Anjali_Ganguly_Siemens_2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Anjali_Ganguly_Siemens_2014

Similar to Anjali_Ganguly_Siemens_2014 (20)

Anjali_Ganguly_Siemens_2014