This document presents a method for using patterns in clinical trial data to recommend new conditions for drug retesting. The authors analyzed drug retesting patterns across trials on ClinicalTrials.gov and found drugs were often retested in conditions whose trials had similar eligibility criteria. They developed an approach leveraging shared eligibility criteria between conditions to recommend potential new retesting targets. As a proof of concept, they were able to validate one recommendation for ranolazine in myocardial infarction based on a published study. However, more sophisticated models are still needed to fully evaluate this method for drug repurposing.
Recommending New Target Conditions for Drug Retesting Using Temporal Patterns in Clinical Trials
1. Recommending New Target Conditions for
Drug Retesting Using Temporal Patterns in
Clinical Trials: A Proof of Concept
Zhe He, Chunhua Weng
Department of Biomedical Informatics, Columbia University
2. Disclosure
• Both authors disclose that they have no financial
relationships with commercial interests.
2
3. Learning Objective
• After attending this session, the learners will be able to:
• Analyze the temporal pattern of drug retesting in
retrospective clinical trials
• Leverage the metadata in clinical trial summaries to
narrow the search for new target conditions
3
4. Background
• De novo drug discovery
• Drug repurposing: discovery of novel indication of existing drugs
4
Hopkins AL. Network pharmacology: the next paradigm in drug discovery. Nat
Chem Biol. Nov 2008;4(11):682-690.
• Successful drug repurposing cases were mostly identified by serendipity
• Computational methods have been proposed
Duloxetine
Depression Stress urinary incontinence
5. Approach
• ClinicalTrials.gov & its use for drug repurposing (Zhang et al. 2014)
Zhang P, Wang F, Hu J. Towards Drug Repositioning: A Unified Computational Framework for
Integrating Multiple Aspects of Drug Similarity and Disease Similarity. AMIA Annu Symp Proc. 2014;In
press.
• Drug retesting patterns in drug intervention trials
• Hypothesis
• Drug retesting often occurred in conditions whose trials
employed similar eligibility criteria
• We explore the feasibility of using the data from CT.gov to narrow
the search for drug repurposing targets
5
6. Configuration for Drug Retesting
6
1. What drugs were often retested on different conditions?
2. How similar are the eligibility criteria of trials on A and B?
3. Can we leverage drug retesting patterns to
recommend new target conditions for existing drugs?
7. Data Preparation (1)
7
Trial summaries
from CT.gov
Extracting metadata
of trials
Indexing trials by
conditions
Extracting common
eligibility features
Miotto R, Weng C. Unsupervised Mining of Frequent Tags for Clinical Eligibility Text
Indexing. Journal of Biomedical Informatics. 2013;46(6): 1145-51
Common Eligibility
Feature:
e.g., Type 2
diabetes trials:
Metformin;
Contraceptive
method;
…..
Extracting n-grams
from free-text EC
Partially match a
UMLS concept?
Normalizing to a
UMLS CUI
Yes
Retained CUIs
appearing 3% of trials
8. Data Preparation (2)
• 59,716 drug intervention trials between 2003 and 2013
• Included drugs used in >= 5 trials on the same condition in a year
• Formulated each retesting case as a quintuple:
• Drug: Duloxetine
• Initial condition: Depression first tested in 1995
• Retested condition: Stress urinary incontinence first tested in 2004
• Excluded “placebo” from the dataset
• # of drugs: 550
• # of conditions: 451
• # of drug-condition pairs: 4,351
8
11. Top 20 Most Retested Drugs
11Hirsch HA, et al. Metformin selectively targets cancer stem cells, and acts together with chemotherapy
to block tumor growth and prolong remission. Cancer Res. 2009. 69(19): 7507–7511.
12. Most Frequent Initial and Retested Conditions
Top five
frequent
initial
conditions
# of
condition
pairs
# of retested
drugs
Top five frequent
retested
conditions
# of
condition
pairs
# of retested
drugs
Respiratory
tract diseases
173 35 Skin diseases 140 14
Carcinoma 167 46 Digestive system
diseases
133 30
Vascular
disease
167 30 Gastrointestinal
diseases
133 30
Immunoprolifer
ative disorders
164 39 Urologic diseases 124 10
Lymphoprolifera
tive disorders
164 39 Neoplasm
metastasis
117 19
12
13. Analysis of Condition Relatedness
• Hypothesis:
• Drug retesting often occurred between conditions whose
trials used similar eligibility criteria
• Similarity: # of shared Common Eligibility Features (CEFs)
• Aggregated the retested drugs investigating the same pair
of conditions
• Analyzed the distribution of # condition pairs over # of
retested drugs
13
14. Shared CEFs of Conditions involving
Drug Retesting
14
Avg # of CEFs shared by any two conditions: 52
Avg # of shared CEFs of condition pairs involving drug retesting is 139
15. Recommending Drug Retesting Candidate
Drug X will be recommended for Condition B if:
15
Drug X
Condition A Condition B
Drug Y
# Shared CEFs > threshold
Tested
Recommended
17. Validated Recommendation
17
Ranolazine
ischemia
myocardial
infarction
Ticagrelor
# Shared CEFs (112) >
100
confirmed by Hale et al.
Hale SL, Kloner RA. Ranolazine treatment for myocardial infarction? Effects on the
development of necrosis, left ventricular function and arrhythmias in experimental
models. Cardiovasc Drugs Ther. Oct 2014;28(5):469-475.
Threshold: 100
Tested
Recommended
18. Limitations
• Do not work for new conditions and drugs
• Concept-level common eligibility features
• “myocardial infarction within the last five years”
• Data quality issues in ClinicalTrials.gov
18
19. Future Work
19
• Drug retesting path linking multiple conditions over time
• Tuning the parameters, e.g., empirical threshold values
• Enriching the drug repurposing prediction method with
SNOMED CT, DrugBank, OpenFDA
• Will formally evaluate the method with precision, recall,
and f-measure.
20. Summary
• Drug retesting often occurred between conditions whose
trials used similar eligibility criteria for participant selection
• Leverage the design patterns in drug intervention trials to
recommend potential new conditions for drug retesting.
• Provide very preliminary proof of concept
• More sophisticated models should be developed to further
test this idea.
20
http://thedaily.case.edu/news/?p=33147
De novo drug discovery is expensive and time consuming. It is estimated that it takes up to 17 years and over $800 millions to develop a new drug. Failures during development often cost a fortune for research sponsors. To accelerate drug discovery while reducing costs, methods have been sought for efficient discovery of novel indications for existing drugs. This process, known as drug repurposing, drug repositioning, or drug re-profiling, promises to accelerate drug discovery due to known safety issues and reduced risk of failure.
Some drugs have been successfully repurposed. Duloxetine was initially designed to treat depression but later successfully repurposed by Eli Lilly to treat stress urinary incontinence for women.
Another drug: Sildenafil, marketed as Viagra, was initially developed for hypertension and later repurposed by Pfizer for erectile dysfunction.
However, such discoveries have been primarily driven by insights or serendipitous observations.
It is not until recently that computational methods have been proposed to predict new indications for existing drugs using networks analysis of genetic, proteomic, and metabolic data.
Key examples of recently discovered additional benefits include Viagra and aspirin. Historically, aspirin has been used for headaches and muscle pain, but its use has now extended to prevention of cardiovascular disease and colon cancer.
Previously, the evidence in ClinicalTrials.gov has been used to validate repurposing targets predicted by a similarity-based computational framework.
ClinicalTrials.gov contains over 180,000 trial summaries
We hypothesize that drugs were often retested among conditions whose trials employed similar eligibility criteria
We explore the feasibility of using these data to identify temporal patterns in drug retesting to narrow the search for drug repurposing targets
In this work, we analyzed the drug retesting patterns in drug intervention trials from 2003 to 2013 with a focus on drugs that were used in every pair of different conditions over time.
The pipeline of constructing the COMPACT database can be briefly described as four steps: 1. indexing trials by conditions, 2. extracting metadata of trials, 3. extracting and analyzing categorical/dichotomous features, and 4. extracting and analyzing numeric expressions. The results for each step were stored in a relationship database table of COMPACT.
Synonyms of the same condition such as Heart Attack or Myocardial Infarction can be consolidated
Among 59,716 drug intervention trials conducted between 2003 and 2013 that used one or more drug interventions, 40,167 drugs were used for 1,487 conditions.
We included all the drugs used in at least five trials for the same medical condition in one year. Retained mostly generic drug names
Out of all 202,950 (451x450) plausible condition pairs, only 12,774 (6.3%) pairs included two different conditions, each testing the same drug in at least five trials in two different years between 2003 and 2013.
Figure 1 visualizes the drug retesting networks for two example conditions, i.e., asthma and hypertension. For example, asthma was the retested condition for four different drugs (i.e., GW685698X, Ciclesonide, Oma`lizumab, and Bu`desonide) that were previously tested for seven other conditions. Hypertension was the retested condition for three drugs (i.e., Ta`dalafil, Sil`denafil, and A`miodipine) that were previous tested for five other conditions (i.e., mental disorders, vascular diseases, prostatic diseases, psychotic disorders, and erectile dysfunction). A node indicates a condition, while an arrow represents a drug. The arrow ends and arrowheads are initial and retested conditions, respectively.
For the 10-year time window, we constructed 10 x 10 matrix
Row i and column j being each year during the time window
di,j represents the number of distinct drugs that were first studied for one condition in year i and later for a different condition in year j
ci,j represents the number of distinct pairs of conditions in which a drug was tested for one condition in year i and later for a different condition in year j.
Give an example
The numbers of drugs are consistently smaller than the numbers of conditions pairs, indicating that a drug may have been used for more than one condition pairs. More retesting cases occurred between 2003 and 2004 than other pairs of years. Looking at one row at a time, we can see that as the time window widens, the counts of retested drugs and condition pairs decrease.
http://chemocare.com/chemotherapy/drug-info/#.VRLaTZPF9UM
Non-chemotherapy drugs: Metformin (it was shown to have anti-tumor effects) http://meetinglibrary.asco.org/content/130976-144
Figure 2 displays the count of different conditions that a drug was retested on each year for the top 20 drugs that were retested on most conditions between 2004 and 2013. Each color block represents the number of different conditions that the drug was retested. The most retested drug (i.e., Beva`cizumab) resides at the bottom of the figure.
Most retested drugs were used in chemotherapeutic activities. One reason could be that chemotherapy usually uses multiple drugs to kill or control tumor cells. Meanwhile, chemotherapy drugs are often used to treat different types of neoplasms and cancers.
Table 2 shows the most frequent initial conditions and retested conditions, respectively. The second column gives the number of condition pairs in which the initial condition is specified in the first column. The third column shows the number of drugs that were tested for the initial condition specified in the first column and later retested for a different condition. The fifth column gives the number of condition pairs in which the retested condition is specified in the fourth column. The sixth column shows the number of drugs that were previously tested for some other conditions and later retested for the condition specified in fourth column.
We hypothesized that drugs were often retested among conditions whose trials employed similar eligibility criteria
We aggregated the retested drugs that were investigated with the same pair of conditions and analyzed the distribution of # of condition pairs over counts of retested drugs
On average, each condition has 172 CEFs. The average number of CEFs shared by any two conditions is 52, whereas the average number of CEFs shared by condition pairs involving drug retesting is 139.
64.6% of these condition pairs have 100-200 shared CEFs, while only 2.9% condition pairs have fewer than 50 shared CEFs, indicating that drug retesting often occurred between conditions with a large number of shared CEFs.
The average number of shared CEFs increases with the number of retested drugs, which indicates that conditions with more shared CEFs, implying the research on these two conditions tend to use similar criteria for patient recruitment, are more likely to use the same drug as an intervention on these conditions. For example, 15 drugs (e.g., Bendamustine, Bortezomib, brentuximab vedotin) that were tested for lymphoproliferative disorders were later retested for leukemia. Lymphoproliferative disorders and leukemia share 199 CEFs (e.g., electrocorticogram, alanine transaminase, creatinine clearance).
Figure 4 shows the number of drug predicted and the number of different conditions for threshold values between 20 and 200. Higher thresholds yielded fewer predictions, which may also be more clinically relevant. The number of drugs is consistently greater than the number of different conditions, showing that a drug may be predicted for multiple conditions.
Our analysis has several major limitations. Since the drug indication predictions were made based on retrospective trials, this approach does not work for new conditions and drugs.
Another limitation is that our similarity analysis for conditions was at the concept-level using n-grams; ideally a more sophisticated similarity analysis should be done at the rule level so that we could use more complete meaning such as “myocardial infarction within the last five years” to represent a common eligibility feature.
A third limitation is the data quality issues in ClinicalTrials.gov. Moreover, the “intervention” field for every clinical trial does not specify which drug is primarily tested if multiple drugs are used in a trial. In this work, we removed the control “Placebo” from our analysis but all other drugs listed as intervention for a trial were included in our analysis. Automated techniques are desired to rank the importance of drugs within a trial to produce more precise analysis. The conditions assigned to each trial may not be normalized and hence may introduce condition-indexing errors.
So the question is how to analyze commonalities in target populations? Eligibility criteria specify detailed characteristics and medical conditions for patient selection. Because they are largely unstructured, it is necessary to first build a computable repository of discrete data element. In this stage, we can analyze the frequencies of these features and the value patterns of numeric features. In the next stage, we will enrich the features with contextual and temporal information, for example “HbA1c > 7.0% after insulin”. Then we will identify the relationships between the features. For example, “age >= 65” and “senior” should be aggregated because they have the same meaning. Along the line of this work, we are still facing challenges in natural language processing. In this talk, I will discuss our effort in building such a repository of concept-level eligibility features and numeric expressions.
Additional material:
Aggregating concepts in a semantically and clinically or semantically meaningful way.
For example, for the criterion “kidney disease not caused by diabetes”, two concepts “kidney disease” and “diabetes” is connected by the relationship “caused by” and a negation. (Kidney disease” not caused by “diabetes”)