International Journal of Pharmaceutical Science Invention (IJPSI)
Integrating Pathway Information with Gene Expression Data to Identify Novel Pathway-specific Cancer Drugs
1. Integrating Pathway Information with Gene Expression Data to Identify Novel Pathway-specific
Cancer Drugs
Charles Pei
Upper Arlington High School
2. Integrating pathway information with gene expression data to identify novel pathway-
specific cancer drugs
Abstract:
Connectivity Map (CMap), an extensive database of drug-treatment gene expression data
comprised of over 6000 experiments across 1300 compounds, has proven to be a valuable asset
in drug repositioning. It has been used to recognize drugs with common mechanisms of action
(MOAs), discover new MOAs and identify new treatments. The goal of this project is to
integrate publicly available pathway information with gene expression data from CMap in order
to discover novel pathway-specific cancer drugs. We identified several major cancer related
pathways, the p53 signaling, PI3K/AKT signaling, PTEN signaling and Wnt/β-catenin signaling
pathways. We applied a modified CMap algorithm to carry out pathway specific queries across
the two databases and identified drugs that specifically perturb pathways of interest. We
successfully created a novel method for calculating a connectivity score from non-directional
pathway information and ranked gene expression data. Applying the method, we identified many
drugs significantly affecting the PTEN, PI3K/AKT and p53 pathways, though none were
identified for the Wnt pathway. Some of these results were validated through Venn analysis with
Ingenuity Pathway Analysis (IPA) information on pathways while others were hypothesized to
be novel drug indications.
3. INTRODUCTION
The average cost to bring a single drug to market has surpassed $5 billion, a statistic
driven by the fact that over 95% of experimental medicines fail due to toxicity or lack of
efficacy1. As a result of this unsustainably high price, an unconventional method known as drug
repositioning, in which established drug compounds are applied to new therapeutic indications,
has gained prominence due to its lower development costs and shorter paths to approval when
compared to traditional drug development.
Our lab has successfully applied CMap to repurpose drugs in various disease areas.
However, the current CMap method assesses the effects on whole systems rather than individual
pathways. Since some drugs may have off-pathway effects, we are interested in finding pathway-
specific drugs. We hypothesized that analyzing diseases on a pathway, rather than genome-wide,
level could yield novel drug indications. We focus on four cancer-related pathways, the p53
signaling, PI3K/AKT signaling, PTEN signaling and Wnt/β-catenin signaling pathways and
identify drugs affecting each one.
Firstly, the p53 pathway is composed of a network of genes and their products that are
targeted to respond to stress signals that impact upon cellular mechanisms that monitor DNA
replication, chromosome segregation and cell division2. In response to a stress signal, the p53
protein is activated and leads to either cell cycle arrest or cellular apoptosis. Thus, mutations of
genes in the pathway leading to the absence of the functional p53 protein can lead to cancers.
Second and thirdly, the tumor suppressor PTEN is a negative regulator of the PI3K signaling
pathway, a main regulator of cell growth, metabolism and survival3. The loss and mutation of
PTEN in various cancers lead to hyperactive PI3K signaling. Finally, deregulation of the Wnt/β-
catenin pathway is known to play a major role in human tumorigenesis4. By investigating the
effects of the thousands of drugs on CMap on these four pathways, we were able to validate
previously known cancer treatments and predict new cancer indications.
MATERIALS AND METHODS
Drug treatment gene expression data was obtained from Connectivity Map (CMap), a
database of over 6000 experiments across 1300 compounds, using MySQL. The raw expression
data was ranked and read into R. Pathway information and known drug indications for each
pathway were retrieved from Ingenuity Pathway Analysis (IPA), a tool used to model, analyze,
and understand the complex biological and chemical systems. This information was read into R
as well.
Enrichment analysis of the non-directional pathway information was conducted by using
a rank-based pattern-matching strategy based on the Kolmogorov-Smirnov statistic for
4. nonparametric data in R. The aforementioned algorithm calculated a pathway enrichment score
for each pathway-drug comparison, which was stored for later use.
A permutation method (10000 random permutations) was used to calculate p-values for
each pathway enrichment score from the prior step. These p-values were then corrected for the
multiple hypotheses problem by False Discovery Rate (FDR) adjustment to find significant drug
indications for each pathway. The overlaps between known drugs and predicted drugs were
analyzed using Venn analysis and the hypergeometric test with all CMap drugs as the
background.
RESULTS
Inhibitors and activators were defined as (FDR<0.2). Thus, we predicted p53 inhibitors
(n=1272), p53 activators (n=663), PI3K inhibitors (n=241), PI3K activators (n=95), PTEN
activators (n= 238) and PTEN inhibitors (n=101). Interestingly, no Wnt activators or inhibitors
were predicted.
Analysis of the overlaps between the known pathway drugs showed that three drugs,
tretinoin, doxorubicin and daunorubicin, are known to affect the PI3K, PTEN and p53 pathways.
While tretinoin is an acne drug repurposed to treat acute promyelocytic leukemia, doxorubicin
and daunorubicin are predictably used to treat a wide range of cancers. All three were found to
be statistically significant in affecting all three pathways except for daunorubicin in the PI3K
pathway. Evidently, the method seems to work for less specific drug indications.
When applied to more specific drugs, the method seems to work as well. Venn Analysis
of the predicted vs. known drugs shows significant overlap for each pathway except for Wnt,
with p53 having a hypergeometric p-value=0, PI3k with p-value=1.60e-08 and PTEN with p-
Predicted
Known
995
0
8
p53
Figure 1. Predicted vs. known drug indications. 8 p53 drugs, 5 PTEN drugs and 13
PI3K predicted drugs were verified by our analysis. The Wnt pathway had no
statistically significant drugs.
Predicted
Known
281
33
13
PI3K
Predicted
Known
291
9
5
PTEN
Predicted
Known
0
7
0
Wnt
5. value=2.68e-05 when tested with all CMap drugs as the background. Clearly, with such low p-
values, the method is predicting drugs at a much higher rate than random.
rank FDR score name dose (M) cell
line1 0.109 0.379 chlorprothixene 1.14E-05 MCF7
2 0.109 0.377 mephenytoin 1.84E-05 HL60
3 0.109 0.368 metixene 1.16E-05 PC3
4 0.109 0.367 noscapine 9.60E-06 MCF7
5 0.109 0.367 acenocoumarol 1.14E-05 MCF7
6 0.109 0.365 clemastine 8.60E-06 MCF7
7 0.109 0.364 chlorpromazine 1.12E-05 HL60
8 0.109 0.362 R-atenolol 1.50E-05 MCF7
9 0.109 0.358 conessine 1.12E-05 MCF7
10 0.109 0.357 dihydroergocristine 5.60E-06 MCF7
Table 1. Top ten activators for PI3K. Experiments ranked by FDR, then pathway enrichment
score.
rank FDR score name dose (M) cell line
1 0.122 -0.385 sirolimus 1.00E-07 ssMCF
72 0.122 -0.358 sirolimus 1.00E-07 MCF7
3 0.122 -0.353 cytisine 2.10E-05 MCF7
4 0.122 -0.350 natamycin 6.00E-06 MCF7
5 0.122 -0.349 sulfamethoxypyrida
zine
1.42E-05 MCF7
6 0.124 -0.347 trioxysalen 1.76E-05 MCF7
7 0.124 -0.344 hexamethonium
bromide
1.00E-05 MCF7
8 0.124 -0.334 ciprofloxacin 1.08E-05 MCF7
9 0.124 -0.332 metamizole sodium 1.20E-05 MCF7
10 0.124 -0.329 monorden 1.00E-07 MCF7
Table 2. Top ten inhibitors for PI3K. Experiments ranked by FDR, then pathway enrichment
score.
The predicted drugs were also analyzed by ranking by first FDR and then pathway
enrichment score. The top ten inhibitors and activators for each pathway were taken by rank and
analyzed. Tables 1 and 2 show the top ten activators and inhibitors, respectively, for PI3K. The
bolded rows contain drugs that are also known drug indications. In Table 1, clemastine was
found to be both known and predicted in the top ten activators. Clemastine is currently
prescribed as an antihistamine for allergy medication, but has also been found to induce
apoptosis in cutaneous T-cell lymphoma cell lines5. In Table 2, two different experiments
involving the drug sirolimus were found to be the top and second-highest ranked inhibitors.
6. Sirolimus is known to have immunosuppressant and tumor-suppressant properties, and its effects
on both the PTEN and PI3K pathways have already been recorded6.
The other two pathways with significant activators and inhibitors, PTEN and p53, did not
have any known drugs in their top ten activator and inhibitor groupings. This may either be due
to the predicted experiments above them having a stronger effect or an error in the method in
accurately ranking the experiments quantitatively.
CONCLUSIONS
We have developed a pathway-based method of finding new drugs. Based on the
significance of the overlap between known drug indications and predicted indications in three of
the four pathways analyzed, we can conclude that some of the previously unknown predicted
drugs may potentially be novel cancer drug indications.
Future work on this project would include creating a better enrichment score algorithm
for more pathway specificity and superior accuracy, analyzing the biology of the drugs’ effects
on the pathways and integrating the method with more extensive databases such as the pathway
databases KEGG and Reactome, and the gene-expression database LINCS.
ACKNOWLEDGEMENTS
The authors would like to thank the other members of the Butte lab, the Stanford
Institutes of Medicine Summer Program (SIMR) and Tianyi Wang for facilitating this research.
REFERENCES
1Herper, Matthew. (2013). “The cost of creating a new drug now $5 billion, pushing big
pharma to change.” Forbes Magazine.
2http://www.nature.com/onc/journal/v24/n17/full/1208615a.html
3http://www.nature.com/onc/journal/v27/n41/full/onc2008247a.html
4http://www.boneandcancer.org/MOLab%20Publications%20in%20PDF%20files/Luu%2
0et%20al_Targeting%20Wnt%20bCat%20Review_CCDT_4-7-05.pdf
5http://www.ncbi.nlm.nih.gov/pubmed/23362870
6 http://www.ncbi.nlm.nih.gov/pubmed/16039868