Novel Methodology for Predicting Synergistic Cancer Drug Pairs Slides

Gene Expression as a Key
Component of Predicting
Synergistic Drug Combinations
Megan Yin

Background
Limitations of Targeted
Therapies
Cancer is a multigenic disease
Targeted therapies are
ineffective because of
acquired drug resistance
Despite increased government
investment, there is a
decline in drug discovery
Next Steps with Combination
Therapy
Combination therapy rests on
the assumption that effective
drug combinations are known
Want drugs that work together
synergistically
Hard to explore this in vitro due
to the large combinatorial
space
Currently no good
methodologies available to
predict drug synergy
2
Combination Therapy
Can get past limitations of drug
resistance by targeting
multiple pathways at the
same time to kill tumor cells
Uses currently available drugs
only
Decrease drug dosages since
drugs can be more potent in
combination

Literature Review
Median Effect
Median-effect
based models
depend on the
linearity of the
median-effect
plot which is not
entirely fulfilled in
the context of
cancer
combinations.
Loewe Additivity
Loewe additivity
assumes that a
dose-response
curve is known
but dose-response
curves are not
known due to
noise or
insufficient
sample size.
Bliss Independence
Bliss
Independence is
inaccurate
because it is prone
to false-positive
results and
assumes that
drugs work
independently
butthe way drugs
work in treatment
is not known
3

Purpose
To develop a novel
computational
algorithm to predict
synergistic cancer drug
combinations
4

Source Data: DREAM Challenge Data
◎Source for known synergy scores
○Largest and most comprehensive data set produced to
date measuring drug synergy
○118 drugs
○85 cell lines
○6,903 drug combinations
○586,705 drug pair on cell line combinations
◎Source for control cell line gene expression
○83 cell lines
○17,000 genes.
6

Source Data: LINCS L1000 Data
◎Source for drug gene expression data
○1769 drugs
○41 cell lines
○978 genes were measured
○22,628 genes are inferred from these 978
LINCS data is the largest dataset produced
measuring drug gene expression response on
a variety of cell lines.
7

Algorithm 1: Regularized Bilinear Regression
◎Most used to seeing linear regression, but needed a higher
dimensionality training algorithm as I was predicting synergy scores
of a pair of drugs on a cell line
◎Cell line gene expression data from DREAM Challenge
◎Drug on cell line gene expression data from LINCS L1000 database
◎Very large dimensionality dataset
○20 principal components for each drug & cell line
○400 principal components for each drug pair
◎Tensor product for D & C Matrix
◎Objective: to learn W matrix and predict Y (synergy score)
8

Algorithm 2: Up-Down Gene Analysis Regression
◎Regularized Bilinear Regression did not perform very well, thus I wanted to
improve on it by looking into the biological context of the data
◎For each drug experimented on each of 41 cell lines in L1000, categorized
each effect on gene as up-regulated or down-regulated
◎One 41 cell line X 911 gene matrix for each drug
◎Take mean over each cell line to find each gene’s probability of being up-
or down-regulated
◎111 drug up-down matrices total
◎For drug pairs add the values
◎Cell Line matrix kept the same as it was the control
◎Objective: learn W matrix and predict the same synergy scores
◎Performed Bilinear Regression again
9

Algorithm 3: Neighborhood Predictor
𝑆 𝐶𝐿,𝐷1,𝐷2 = 𝑆 + 𝑆 𝐶𝐿 + 𝑆 𝐷𝑃
𝑆 =
1
𝑁
𝑖,𝑗 ∈ 𝑆 𝑖,𝑗
𝑆𝑖,𝑗
𝑆 𝐶𝐿 =
𝑗 =1
𝑘
𝑑𝑗,𝐶𝐿 𝑆𝑗,𝐷𝑃
𝑗 =1
𝑘
𝑑𝑗,𝐶𝐿 𝑆𝑗
𝑆 𝐷𝑃 =
𝑗 =1
𝑘
𝑑𝑗,𝐷𝑃 𝑆𝑗,𝐷𝑃
𝑗 =1
𝑘
𝑑𝑗,𝐷𝑃 𝑆𝑗
𝑆 𝐶𝐿 =
1
𝑀
𝐶𝐿,𝑗 ∈ 𝑆 𝑡𝑟𝑎𝑖𝑛
𝑆 𝐶𝐿,𝑗
𝑆 𝐶𝐿 =
1
𝑃
𝑖,𝐷𝑃 ∈ 𝑆 𝑡𝑟𝑎𝑖𝑛
𝑆𝑖,𝐷𝑃
10
• Wanted to further improve on the results of
the Up-Down Gene Analysis Regression
• This algorithm is completely different than
last two—there is no regression
• Start by training a baseline predictor 𝑆, a
mean of all synergy scores in the dataset
• 𝑆 𝐶𝐿, cell line neighborhood adjustment
adjusted synergy score to the cell line by
looking at the synergy scores for all the
most cell lines to cell line in question
(similarity derived by correlation of gene
expression)
• 𝑆 𝐷𝑃, drug pair neighborhood adjustment
adjusted synergy score to the drug pair by
looking at the synergy scores for all the
most drug pairs to drug pairs in question
(similarity derived by correlation of gene
expression)

Performance Metrics
◎A test set was used to score algorithm
accuracy
○Leaderboard test set
○600 synergy scores
○167 drug pairs
○85 cell lines.
○400 scores could be used
◎Each algorithm predicted the synergy score
on these 400 drug pair on cell line
combinations
11

Results
12
Performance metrics for all three algorithms. RMSE is root mean squared error standard for many
statistical and learning problems. However, in this context it is not very important as it is not the
exact that are important, but the shape/patterns the real synergy scores take. Primary Metric and
Tiebreak metrics were obtained from the AstraZeneca Drug Combination Prediction DREAM
Challenge. Both measure the correlations of synergy scores (predicted vs. actual) within each cell
line. Correlation measures whole dataset Spearman correlation as a whole. Remarkably, all four
metrics improved as models got more complex and biological in nature.

Results
13
Graph of performance
of regularized bilinear
regression. X-axis is the
predicted synergy score
from the algorithm; Y-
axis is the actual synergy
score from the
leaderboard data from
the challenge. Line
through the middle
represents y = x, the
desired performance of
the algorithm. Lots of
clumping in the middle
shows that the model
was not the most
accurate but does show
promise that gene
expression can be used
to predict synergy
scores.

Results
14
Graph of performance of
up-down gene analysis
regularized bilinear
regression. X-axis is the
predicted synergy score
from the algorithm; Y-axis is
the actual synergy score
from the leaderboard data
from the challenge. Line
through the middle
represents y = x, the desired
performance of the
algorithm. Clumping in the
middle shows relatively
poor performance. Only
very few data points were
used as the scope of the
training data limited the
choice of ground-truth
variables.

Results
15
Graph of performance of
neighborhood predictor
X-axis is the predicted synergy
score from the algorithm; Y-axis
is the actual synergy score from
the leaderboard data from the
challenge. Line through the
middle represents y = x, the
desired performance of the
algorithm. Marked difference
between performance of this
algorithm versus other two
algorithms. Most data points of
all were used as feature data
did not limit choice of ground-
truth variables. The points
follow the curve of the y = x line.

Results
16
Similarity Grouping for Cell Line 22RV1
Based on cell line similarity matrix used in
Neighborhood Predictor. Shows that the
algorithm is not just attempting to guess a
relatively random number but also takes
into consideration the biological context.
The correlation matrix gets the similarity of
each cell line to all other 82 cell lines using
gene expression values to compare.
Similarity metric was able to pick up that cell
lines that are most similar to each other are
in the same disease area. Points are colored
according to disease area and several colors,
particularly red and green are clustering
together.

Discovery
◎Gene expression is a key component to predicting drug
synergy
◎Scientists should dedicate more time to studying gene
expression as it relates to synergy
◎Pharmaceutical companies can work to produce more gene
expression data on cell lines and drugs perturbations
◎Applying more of a biological approach improves
performance accuracy
◎Most other submissions to the DREAM Challenge only scored
in 0.1-0.15 range.
◎Most other submissions relied on a pure mathematical
approach, failing to take into consideration the biological
context.
17

Limitations
◎ Cell line control gene expression data
○ Only included 83 cell lines across varying tissues of origin
○ Next step is to use more diverse datasets with more cell lines
gene expression values from more tissues of origin
◎ Drug gene expression data
○ L1000 data only tested drugs on 41 cell lines
○ Only 6 cell lines overlapped with DREAM
○ Was unable to take drug on cell line gene expression
○ Error inherent in inferring other drug on cell line gene expression
values based on only six cell lines
○ If the same cell lines were used in both DREAM and L1000, could
have had a more direct comparison of gene expression drug and
cell line data to synergy scores
18

Further Research
◎Next step is to interpret these synergy
scores
○What is the cut off for a synergistic combination?
○What gene expression fold changes lead to higher
synergy scores?
◎Interpreting the weight matrix
○Gene expression fold changes in which genes tend to
lead to higher synergy scores
◎Important to test these combinations in
vitro to verify synergy
○Bridge between computational and experimental
19

References
Ali, S., Tonekaboni, M., Ghoraie, L. S., Satya, V., Manem, K., & Haibe-kains, B. (2017). OUP accepted manuscript.
Cerebral Cortex, (July), 1–14. https://doi.org/10.1093/cercor/bhw393
Bansal, M., Yang, J., Karan, C., Menden, M. P., Costello, J. C., Tang, H., … Shen, Y. (2014). A community computational
challenge to predict the activity of pairs of compounds. Nature Biotechnology, 32(12), 1–
12. https://doi.org/10.1038/nbt.3052
Costello, J. C., Heiser, L. M., Georgii, E., Gönen, M., Menden, M. P., Wang, N. J., … Stolovitzky, G. (2014). A community
effort to assess and improve drug sensitivity prediction algorithms. Nature Biotechnology, 32(12), 20–
23. https://doi.org/10.1038/nbt.2877
Dry, J. R., Yang, M., & Saez-Rodriguez, J. (2016). Looking beyond the cancer cell for effective drug combinations.
Genome Medicine, 8(1). https://doi.org/10.1186/s13073-016-0379-8
Foucquier, J., & Guedj, M. (2015). Analysis of drug combinations: current methodological landscape. Pharmacology
Research & Perspectives, 3(3), e00149. https://doi.org/10.1002/prp2.149
Gayvert, K. M., Aly, O., Platt, J., Bosenberg, M. W., Stern, D. F., & Elemento, O. (2017). A Computational Approach for
Identifying Synergistic Drug Combinations. PLOS Computational Biology, 13(1),
e1005308. https://doi.org/10.1371/journal.pcbi.1005308
Huang, H., Zhang, P., Qu, X. A., Sanseau, P., & Yang, L. (2014). Systematic prediction of drug combinations based on
clinical side-effects. Scientific Reports, 4(Figure 2), 7160. https://doi.org/10.1038/srep07160
Yin, N., Ma, W., Pei, J., Ouyang, Q., Tang, C., & Lai, L. (2014). Synergistic and antagonistic drug combinations depend
on network topology. PLoS ONE, 9(4). https://doi.org/10.1371/journal.pone.0093960
20

Acknowledgements
I would like to thank Dr. Christina Leslie, my mentor and principal
investigator of the Leslie Computational Biology Lab at Memorial
Sloan Kettering Cancer Center for allowing me to work in her lab as a
high school student and for providing me with so much advice and
guidance through this whole process. I would also like to Dr. Hatice
Osmanbeyoglu for always being available to talk about ideas or to
answer my questions and for training me in the beginning on
machine learning and cancer in general so I would be prepared to
work on this project. In general, I would like to thank the Leslie Lab
for welcoming me in as a high school student and for always being
available if I had questions. Lastly, I would like to thank my parents
who have supported me through all the ups and downs of this project
and always encouraged me to keep going.
21

Novel Methodology for Predicting Synergistic Cancer Drug Pairs Slides

Recommended

Recommended

More Related Content

Similar to Novel Methodology for Predicting Synergistic Cancer Drug Pairs Slides

Similar to Novel Methodology for Predicting Synergistic Cancer Drug Pairs Slides (20)

Recently uploaded

Recently uploaded (20)

Novel Methodology for Predicting Synergistic Cancer Drug Pairs Slides