Chemicals in Context: from SuperTarget and Matador to STITCH
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Chemicals in Context: from SuperTarget and Matador to STITCH



EBI, Hinxton, 04.02.2008

EBI, Hinxton, 04.02.2008



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • the slide show is truncated, plus it has movies, so download the whole file for Keynote to see it in full
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Chemicals in Context: from SuperTarget and Matador to STITCH Presentation Transcript

  • 1. Chemicals in Context: from SuperTarget and Matador to STITCH Michael Kuhn Peer Bork lab, EMBL Heidelberg
  • 2. Drug-Target Databases Published online 16 October 2007 Nucleic Acids Research, 2008, Vol. 36, Database issue D919–D922 doi:10.1093/nar/gkm862 SuperTarget and Matador: resources for exploring drug-target relationships Stefan Gunther1, Michael Kuhn2, Mathias Dunkel1, Monica Campillos2, ¨ Christian Senger1, Evangelia Petsalaki2, Jessica Ahmed1, Eduardo Garcia Urdiales2, Andreas Gewiess3, Lars Juhl Jensen2, Reinhard Schneider2, Roman Skoblo3, Robert B. Russell2, Philip E. Bourne4, Peer Bork2,5 and Robert Preissner1,* 1 ´ Structural Bioinformatics Group, Institute of Molecular Biology and Bioinformatics, Charite—University Medicine Berlin, Arnimallee 22, 14195 Berlin, EMBL—Biocomputing, Meyerhofstraße 1, 69117 Heidelberg, 3Institute for 2 Laboratory Medicine, Windscheidstr, 18, 10627 Berlin, Germany, 4Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla CA 92093, USA and 5Max-Delbruck-Center for MolecularMedicine (MDC), 13092 Berlin-Buch, Germany ¨ Received August 15, 2007; Revised September 26, 2007; Accepted September 27, 2007 ABSTRACT INTRODUCTION The molecular basis of drug action is often not Within the past two decades our knowledge about well understood. This is partly because the very drugs, their mechanisms of action and target proteins abundant and diverse information generated in the has increased rapidly. Nevertheless, knowledge on their past decades on drugs is hidden in millions of molecular effects is far from complete. For some drugs medical articles or textbooks. Therefore, we develo- even the primary targets are still unknown, for example, Diloxanide, Niclosamide and Ambroxol are administered ped a one-stop data warehouse, SuperTarget that successfully although their effect on human metabolism is integrates drug-related information about medical
  • 3. Manual Curation • look for abstracts in PubMed/MEDLINE that mention genes and drugs • create candidate list • annotate candidate list
  • 4. Direct Interactions
  • 5. Indirect Interactions
  • 6. Indirect Interactions
  • 7. Interactions with Proteins
  • 8. Interactions with Protein Families
  • 9. Interactions with Protein Families
  • 10. Chemicals in Context D684–D688 Nucleic Acids Research, 2008, Vol. 36, Database issue Published online 15 December 2007 doi:10.1093/nar/gkm795 STITCH: interaction networks of chemicals and proteins Michael Kuhn1, Christian von Mering2, Monica Campillos1, Lars Juhl Jensen1,* and Peer Bork1,3 1 European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, 2University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland and 3Max-Delbruck-Centre for Molecular Medicine, ¨ ¨ Robert-Rossle-Strasse 10, 13092 Berlin, Germany Received August 14, 2007; Revised September 14, 2007; Accepted September 17, 2007 ABSTRACT basis for the integration of knowledge about chemicals themselves, their biological interactions and their pheno- The knowledge about interactions between typic effects. Thus, many problems in Chemical Biology proteins and small molecules is essential for the are now becoming approachable by the academic research understanding of molecular and cellular functions. community. However, information on such interactions is Valuable information about the biological activity of widely dispersed across numerous databases and chemicals is provided by large-scale experiments. the literature. To facilitate access to this data, Phenotypic effects of chemicals were first made available STITCH (‘search tool for interactions of chemicals’) on a large scale by the US National Cancer Institute (NCI), integrates information about interactions from which conducts anti-cancer drug screens on 60 human metabolic pathways, crystal structures, binding tumour cell lines (NCI60) (4). The patterns of growth experiments and drug–target relationships. Inferred inhibition in the different cell lines by small molecules can information from phenotypic effects, text mining not only be used to judge the efficacy of individual compounds, but also to relate compounds by their and chemical structure similarity is used to predict
  • 11. Content • 373 genomes • 68,000 chemicals
  • 12. Content • 11,800 human genes • 373 genomes • 38,000 chemicals • 68,000 chemicals • 2100 drugs
  • 13. Yao and Rzhetsky within the network, although the drug targets in the GeneWays network tend to have slightly higher betweenness values than average (P-value = 0.1943; Fig. 2C). The increased average between- ness of drug targets is most obvious in the HPRD1 and HPRD 2 networks (P- values = 0.0004 and 0.004, respectively), suggesting that successful drug targets tend to bridge two or more clusters of relatively closely interacting molecules. The clustering coefficients of drug tar- gets are similar to those of the rest of the network nodes in all five data sets (see Table 2; Fig. 2D). We next asked if proteins that are successful drug targets are less polymor- phic (considering only human, intraspe- cies variation) than human genes on av- Figure 1. Distribution of the number of human gene targets per successful drug. The plot is super- erage. To answer this question, we used a imposed on a family classification of drug targets. large set (16,462 genes) of known hu- man single-nucleotide polymorphisms (SNPs) available at dbSNP (Sherry et al. The connectivity of a node within a graph is simply the total 2001). To reduce any effects of SNP sampling bias (some genes number of incoming and outgoing arcs (direct molecular inter- enjoy more attention on the part of the scientific community actions, in our case). As has been previously established, the con- than others), instead of studying the absolute number of re- nectivity distributions for real molecular networks are so-called ported SNPs for each gene, we used the ratio (Cratio) of nonsyn- heavy-tail distributions resembling Zipf’s (Pareto’s or power-law) onymous to synonymous SNPs (with an expected value of 1 for distribution (Fig. 2A; Barabasi and Bonabeau 2003). The success- a perfectly neutral mode of SNP accumulation). The assumption ful drug targets occupy a rather narrow niche within this distri- underlying this analysis is that sampling bias for a gene affects bution: their connectivity is significantly higher than that of an synonymous and nonsynonymous SNPs equally. average node within the network (in GeneWays it is ∼9.1, P- Our analysis indicates (Fig. 2E,F) that Cratio for successful 1 2 value = 0.0064 [Fig. 2A,B,F]; in HPRD and HPRD , it is 10.9 and drug targets is significantly smaller than that for an average hu- 11.5, P-values = 0 and 0.0001, respectively; the same comparison man gene (P-value = 0.0007). This result suggests that successful performed using the smaller Y2H and BIND networks revealed no drug targets tend to be less nonsynonymously polymorphic at significant difference [see Table 2]). However, the average con- the human population level than are human genes on average. nectivity of drug targets is relatively small compared to the maxi- Furthermore, Cratio is significantly negatively correlated with mum connectivity observed in the network (9.1 vs. a maximum gene connectivity (Spearman rank correlation coefficient of 346 in GeneWays). The most highly connected high-revenue 0.4841, P-value = 0.0000), consistent with the observation that drug targets in the GeneWays network (ABL1, androgen receptor more highly conserved proteins tend to have higher connectivi- [AR], BCHE, EGFR, INSR, NR3C1, TNF, and VEGFA; see Fig. 2G) ties (Fraser et al. 2002). Another line of evidence shows that are targeted by drugs intended to provide relief for the most highly expressed genes tend to evolve more slowly than those life-threatening phenotypes, such as cancer and autoimmune whose expression is low (Drummond et al. 2005). Furthermore, disorders. The successful drugs targeting these highly connected some experimental techniques, such as yeast two-hybrid pro- genes and proteins are associated with terrible side effects (think tein–protein interaction screening, may detect interactions of of chemotherapy patients) that are tolerable only in life-or-death highly expressed proteins more readily (Bloom and Adami 2003). situations. Hence, relationships between gene expression level, sequence The betweenness of a network node is defined as the number conservation, and connectivity may involve data biases and of times this node appears in the shortest path between two other should be interpreted with caution. network nodes, summed over all node pairs in the network and We interpret the results of our SNP analysis as follows: a divided by the total number of node pairs (e.g., Noh 2003). The drug designed to target a protein that is polymorphic among clustering coefficient of a network node is the ratio of the actual number of direct connections between the immediate neighbors Table 1. Comparison of different human molecular interaction of the node to the maximum possible number of such direct arcs data sets between its neighbors (e.g., Holme and Kim 2002). The clustering No. of No. of No. of drug coefficient is zero if a node’s neighbors do not interact directly genes/proteins interactions targets covered (e.g., a professor who interacts with many graduate students, but whose students avoid talking to one another). The highest clus- Y2H 2936 5722 49 tering coefficient is attained in a complete graph where every BIND 2886 4964 157 GeneWays 4458 14,124 197 node is connected to every other node. The betweenness values HPRD1 7764 28,149 304 of the drug targets in the GeneWays, BIND, and Y2H networks HPRD2 9462 37,107 318 are not significantly different from those of the rest of genes 2 Genome Research
  • 14. Links to Protein World
  • 15. Links to Protein World
  • 16. Links to Protein World
  • 17. Acknowledgements • SuperTarget: Robert Preissner group • Matador: Rob Russell / Peer Bork groups • STITCH: Lars Juhl Jensen, Christian von Mering and lab • Data sources: PubChem, DrugBank, KEGG, BindingDB, ...
  • 18. Thank you for your attention! • SuperTarget: • Matador: • STITCH: