Exploiting drug targets for Immuno-Oncology drug discovery
MURI Summer
1. Identifying and Repurposing Novel Drug Candidates
for Treating Leukemia Using Drug, Protein and
Disease Interaction Networks
Rashell Garretson1, Rut Thakkar2 , Zack East2 , Bin Peng3
Dr. Jake Chen4 and Dr. Walter Jessen5
1Department of Biology, Purdue School of Science, IUPUI; 2Neuroscience Program, Purdue School of Science, IUPUI;
3Department of Computer and Information Science, Purdue School of Science, IUPUI, 4Indiana University Center for Systems
Biology and Personalized Medicine, IUPUI; 5Informatics, Covance, Greenfield, IN
Introduction
Taking a drug from discovery to market takes an average of twelve years.
To minimize the time and costs of new drug development, data mining
can be utilized to identify currently available drugs and other associated
data, and prioritize candidates that can be repurposed to treat other
diseases. This study focuses on three subtypes of leukemia:
myelomonocytic leukemia, acute megakaryoblastic leukemia and B-cell
prolymphocytic leukemia as they lack sufficient treatment along with
having a poor prognosis. The data mining process is initiated by gathering
information regarding FDA approved drugs and drugs in clinical trials to
treat these subtypes of leukemia. A complex network is then generated
through the curation of information on drug, protein, and disease
interactions. A host of other diseases are then analyzed through disease to
disease interactions to compile a list of diseases that are closely related to
our leukemia subtypes of interest. Drugs used for these closely related
diseases are then contrasted with drugs used for leukemia based on their
protein targets, interactions and structure to identify drugs that would
most likely be effective in treating our leukemia subtypes. Repurposing
drugs based on structure, protein interactions, and target similarity can be
beneficial in saving immense time and resources by utilizing drugs that
are already available on the market in a novel way with the ultimate goal
of saving lives.
Methods
Defining Subtypes of Interest
• Subtypes were chosen by reviewing articles about the prognosis, 5-
year survival rate, and currently available treatments. Subtypes with a
poor prognosis, a low survival rate, and few effective treatments were
prioritize.
• Myelomonocytic leukemia and acute megakaryoblastic leukemia are
subtypes of acute myeloid leukemia (AML) while B-cell
prolymphocytic leukemia is a subtype of chronic lymphoblastic
leukemia (CLL). We used AML and CLL subcategories in our drug,
disease, and protein interactions to gather more general information.
The more specific category information will be added later to find
drugs to target our subtypes of interest.
Disease to Drug
Drugs are separated into categories using two criteria:
• A D category drug is a drug being used for the specific disease of
interest while an X category drug is a drug currently being used for a
related disease.
• A level 1 drug is one that is curretly FDA approved. A level 2 drug is
a drug that is currently in clinical trial. A level 3 drug is a drug that
has been terminated, withdrawn or suspended in clinical trial, or in
this study any drug in a clinical trial that has not be updated since
2010.
• D1 and X1: Using cancer.gov and the Leukemia and Lymphoma
society website, information about drugs which are currently on the
market to treat the chosen subtypes or related diseases was collected.
• D2, D3, X2, and X3: Using clinicaltrials.gov each subtype and related
disease was inputed and all information on clinical trials was
downloaded and sorted. Trials that were listed as terminated,
withdrawn, or suspended or that had not been updated in the last 5
years were labeled as a category 3. The rest were considered a
category 2. All the drugs from each trial were separated, filtered, and
listed.
Disease to Protein
• Preliminary mutated genes associated with AML and CLL were found
through scrutinizing articles on Pubmed as well as OMIM.
• Effector genes are discovered using the GEO database, which lists all
the up and down regulated gene expressions in a disease.
Drug to Protein
• The D1 drug information collected from the disease to drug curation
was evaluated using DrugBank and STITCH that gave information
about protein interactions and targets for each drug.
Protein to Protein
• Using the Disease to Protein interactions, the key proteins connected
with the subtypes of interest were evaluated using STRING and
HAPPI databases. These interactions were used to create networks
using cytoscape.
Diseases to Disease
• CMBI and Diseaseconnect databases were used to acquire a list of all
the disease associated with AML and CLL.
• The list was then analyzed to obtain the top disease that are similar to
both the leukemia subtypes.
Conclusion & Future Studies
Current Status of Research
References
• The UniProt Consortium. UniProt: a hub for
protein information. Nucleic Acids Res. 43:
D204-D212 (2015). http://www.uniprot.org
• Jensen LJ, Kuhn M, Stark M, Chaffron S,
Creevey C, Muller J, Doerks T, Julien P, Roth
A, Simonovic M, Bork P, von Mering C.
STRING 8--a global view on proteins and their
functional interactions in 630 organisms.
Nucleic Acids Res. 2009 Jan;37(Database
issue):D412-6. doi: 10.1093/nar/gkn760. Epub
2008 Oct 21. http://string-db.org
• Kuhn M, Szklarczyk D, Pletscher-Frankild S,
Blicher TH, von Mering C, Jensen LJ, Bork P.
STITCH 4: integration of protein-chemical
interactions with user data. Nucleic Acids Res.
2014 Jan;42(Database issue):D401-7. doi:
10.1093/nar/gkt1207. Epub 2013 Nov 28.
http://stitch.embl.de
• Chen JY, Mamidipalli S, Huan T. HAPPI: an
online database of comprehensive human
annotated and predicted protein interactions.
BMC Genomics. 2009 Jul 7;10 Suppl 1:S16.
doi: 10.1186/1471-2164-10-S1-S16.
http://discovery.informatics.iupui.edu/HAPPI/
• Nucleic Acids Res. 2014 Jul;42(Web Server
issue):W137-46. doi: 10.1093/nar/gku412.
Epub 2014 Jun 3.
• Liu CC, Tseng YT, Li W, Wu CY, Mayzus I,
Rzhetsky A, Sun F, Waterman M, Chen JJ,
Chaudhary PM, Loscalzo J, Crandall E, Zhou
XJ. DiseaseConnect: a comprehensive web
server for mechanism-based disease-disease
connections. http://disease-connect.org
• DrugBank 4.0: shedding new light on drug
metabolism. Law V, Knox C, Djoumbou Y,
Jewison T, Guo AC, Liu Y, Maciejewski A,
Arndt D, Wilson M, Neveu V, Tang A, Gabriel
G, Ly C, Adamjee S, Dame ZT, Han B, Zhou
Y, Wishart DS. Nucleic Acids Res. 2014 Jan
1;42(1):D1091-7. http://www.drugbank.ca
• Bolton E, Wang Y, Thiessen PA, Bryant SH.
PubChem: Integrated Platform of Small
Molecules and Biological Activities. Chapter
12 IN Wheeler RA and Spellmeyer DC, eds.
Annual Reports in Computational Chemistry,
Volume 4. Oxford, UK: Elsevier, 2008, pp.
217-241. doi:10.1016/S1574-1400(08)00012-1.
https://pubchem.ncbi.nlm.nih.gov
• Shannon P, Markiel A, Ozier O, Baliga NS,
Wang JT, Ramage D, Amin N, Schwikowski
B, Ideker T. Cytoscape: a software
environment for integrated models of
biomolecular interaction networks. Genome
Research 2003 Nov; 13(11):2498-504.
http://cytoscape.org
• Edgar R, Domrachev M, Lash AE. Gene
Expression Omnibus: NCBI gene expression
and hybridization array data repository.
Nucleic Acids Res. 2002 Jan 1;30(1):207-10.
http://www.ncbi.nlm.nih.gov/geo/
0
150
300
450
600
D1 D2 D3 X1 X2 X3
NumberofDrugs
Category of Drugs
Number of Drugs Per Category
AML and CLL Protein to Protein
Interaction Network
Top Proteins Targeted by D1 Drugs
AML Drug Targets Number of Drugs CLL Drug Targets Number of Drugs
P42574 (CASP3) 8 P42574 (CASP3) 6
P08684 (CYP3A4) 8 P55211 (CASP9) 5
P33527 (ABCC1) 7 Q14790 (CASP8) 4
P08183 (ABCB1) 6 P09874 (PARP1) 4
Q14790 (CASP8) 6 P33527 (ABCC1) 4
P04637 (TP53) 5 P33527 (ABCB1) 4
Q9UNQ0 (ABCG2) 5 P08684 (CYP3A4) 4
Q92887 (ABCC2) 4 P55210 (CASP7) 3
P55211 (CASP9) 4 Q9UNQ0 (ABCG2) 3
Q16678 (CYP1B1) 4 P20815 (CYP3A5) 3
Table 2: Top proteins identified as targets in DrugBank and STITCH
from drug to protein interaction. The top ten proteins listed were the
proteins that were targeted by the highest number of D1 for each
subtype.
Table 3: Top related diseases from CMBI. Diseases that were found in
the related disease list for CLL and AML as well as having a CMBI score
greater than 0.3 were chosen as the top related diseases. These diseases
were used to identify X category drugs as candidates for repurposing.
Our team has developed a website that allows us to import data into a
database which we can use to analyze and visualize our data collected
from the data mining process. The data will be stored in the postgreSQL
database and use the elasticsearch framework to do the fuzzy searching
which would allow us to narrow down our searches to find key
information. The website also can provide the relationship between the
drug, disease and protein interactions which we can use to create models
and networks. The next steps are to improve the website’s functionality
so that we can import all our data collected regarding drugs, diseases, and
proteins. We will then use the website to gather information about the
interactions between the data we imported to help us create a model
which we will use to identify which of the drug candidates we found are
the most suited for repurposing.
Top Associated Diseases
Disease
AML/CLL
CMBI Score
Disease
AML/CLL
CMBI Score
Acute Lymphoblastic
Leukemia
0.3682,0.3979
Chronic Myeloid
Leukemia 0.3999,0.4131
Mixed lineage leukemia 0.2942/0.3031
T-cell acute
lymphocytic leukemia 0.4212/0.3074
Non-bruton
agammaglobulinemia 0.335/0.4215
Hemophagocytic
lymphohistiocytosis 0.3328/0.3775
Mycosis Fungoides 0.3083/0.3194 B-cell lymphoma 0.3129/0.2957
Non-hodgkin lymphoma 0.3092/0.3293 Hodgkin lymphoma 0.3092/0.3424
Burkitt's lymphoma 0.3092/0.3622 Werner Syndrome 0.2948/0.303
Figures 1 and 2: Proteins from disease to protein interaction were
combined with additional interacting proteins using STRING and HAPPI
databases. Cytoscape was used to create a network of connections between
proteins associated with each subtype.
Figure 3: Number of drugs found using clinicaltrial.gov, cancer.gov,
Leukemia and Lymphoma society, and articles found on pubmed for
AML and CLL as well as the top associated diseases. D1= FDA approved
drugs for AML or CLL. D2= drugs currently in clinical trial for AML or
CLL. D3= drugs with clinical trial information that has not been updated
in the last five years or clinical trials that have been suspended,
terminated, or withdrawn. X category drugs follow the same rules as D
category drugs, but they are being used for the top associated diseases.
Table 1: Top proteins identified as targets in DrugBank and STITCH
from drug to protein interaction. The top ten proteins listed were the
proteins that were targeted by the highest number of D1 for each
subtype.
Popularity of D Category Drugs by Pubmed Search
Name of drug
Number of Pubmed
articles found
Name of drug
Number of Pubmed
articles found
Cytoxan 6967 Etoposide 3421
Methotrexate 6668 Mercaptopurine 2860
Imatinib 6442
Tetradecanoylphorbol
acetate
2759
Aminopterin 5268 Asparaginase 2701
Antracycline 4175 Cytosar 2508