Chemical Landscape Analysis – the case of tautomers
Upcoming SlideShare
Loading in...5
×
 

Chemical Landscape Analysis – the case of tautomers

on

  • 188 views

Poster at 6th Joint Sheffield Conference on Chemoinformatics ...

Poster at 6th Joint Sheffield Conference on Chemoinformatics
by Nina Jeliazkova, Nikolay T. Kochev, Vedrin Jeliazkov

The Structure-Activity Relationships (SAR) landscape and activity cliffs concepts are a popular analysis and visualisation technique, with origins in medicinal chemistry and receptor-ligand interactions modelling. While intuitive, the definition of an activity cliff as a “pair of structurally similar compounds with large differences in potency” is commonly recognized as ambiguous. We have recently proposed a new and efficient method for identifying activity cliffs and visualization of activity landscapes [1]. The method introduces a probabilistic measure - the likelihood of a compound having large activity difference compared to other compounds, while being highly similar to them. The likelihood is effectively a quantification of a SAS Map with defined thresholds and does not require the storage of the pairwise similarity matrix. The method generates a list of individual compounds, ranked according to their likelihood of being involved in the formation of activity cliffs , and goes beyond characterizing cliffs by structure pairs only. Every compound is associated with zero, one, or more compounds with similarity and activity difference above the defined threshold. The paired structures can be easily retrieved by a standard similarity query. The arrangement as a graph naturally emerges from the set of top ranked compounds, as they are usually interconnected as activity cliffs pairs. The popular matched molecular pairs approach could be considered a special case, but is also improved by being able to identify multiple matching pairs at once. We now extend the landscape analysis and visualisation of datasets, where the chemical structures are represented by more than one tautomer, and study the influence of the tautomerization on the SAR landscape. The tautomer generation relies on the Ambit-Tautomer open source package, developed by the authors [2]. Finally, the method is implemented as part of an existing open source Ambit package [3] and could be accessed via an OpenTox API compliant web service. OpenTox API provides a uniform REST web service application programming interface (API) to chemical structures, experimental data and calculated properties, descriptor calculation, model building, validation and reporting [4]. The AMBIT web services package [3] is being developed by Ideaconsult Ltd. and is one of the several existing independent implementations of the OpenTox API, providing data sharing and remote calculations capabilities. Visualisation of the ranked activity cliffs by bubble charts is presented and interactive visualisation at http://toxmatch.sf.net are available.

[1] http://www.ncbi.nlm.nih.gov/pubmed/23110534
[2] http://dx.doi.org/10.1002/minf.201200133
[3] http://www.jcheminf.com/content/3/1/18
[4] http://www.jcheminf.com/content/2/1/7

Statistics

Views

Total Views
188
Views on SlideShare
188
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Chemical Landscape Analysis – the case of tautomers Chemical Landscape Analysis – the case of tautomers Presentation Transcript

    • The method generates a ranking list of individual compounds, ordered according to their likelihood of being involved in the formation of activity cliffs. The following examples are using the PubChem Thrombin inhibitors assay AID 1215, the Tanimoto similarity is calculated by The CDK library fingerprints. Chemical Landscape Analysis – the case of tautomers References [1] Jeliazkova, N., Jeliazkov V., Chemical Landscape Analysis with the Opentox Framework, Current Topics in Medicinal Chemistry, 2012, 12(18);1987-2001(15). [2] Kochev N., Paskaleva V., Jeliazkova N., Ambit-Tautomer: An Open Source Tool for Tautomer Generation, Molecular Informatics, 2013, 32(5-6):481-504. [3] Jeliazkova N., Jeliazkov V., AMBIT RESTful web services: an implementation of the Open Tox application programming interface, Journal of Chemoinformatics 2011, 3:18 [4] AMBIT project, http://ambit.sourceforge.net Nina Jeliazkova*1, Nikolay T. Kochev2, Vedrin Jeliazkov1 *e-mail : jeliazkova.nina@gmail.com twitter: @10705013; 1Ideaconsult Ltd, 4 Angel Kanchev Str., Sofia 1000, Bulgaria; 2University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry, Bulgaria What: The Structure-Activity Relationships (SAR) landscape and activity cliffs Why: analysis and visualisation technique Origin: medicinal chemistry and receptor-ligand interactions modelling Activity cliff definition: a “pair of structurally similar compounds with large differences in potency” State of the art : SAS Maps, network graphs, quantification by SALI , SARI , number of methods analyzing the pairwise similarity matrix; various extensions Pros: intuitive Cons: ambiguous, not scalable to large datasets Method: We have recently proposed a new and efficient method for identifying activity cliffs and visualization of activity landscapes [1]. The method ranks the activity cliffs by a probabilistic measure - the likelihood of a compound having large activity difference compared to other compounds, while being highly similar to them. Table 1. Conditional probability of events co-occurrence 𝑮 𝟐 = 𝒂 𝐥𝐥𝐥 𝒂 𝒄 + 𝒅 𝒄 𝒂 + 𝒃 + 𝒃 𝐥𝐥𝐥 𝒃 𝒄 + 𝒅 𝒅 𝒂 + 𝒃 Background Activity cliffs ranking The tautomers in the chemical landscape s (high similarity) ! s ( low similarity) t (large activity difference) a ~ P(s| t) b ~ P(!s| t) !t (small activity difference) c ~ P( s| !t) d ~ P (!s | !t) d, scaffold hops III c, smooth IV a, activity cliffs I b, nondescript II G2 Rank ID a b c d IC50, µM G2 1 12371 2 216 0 310 50 (inactive) 32.34 2 12413 1 310 1 216 5.84 0.07 3 12439 1 308 1 218 10.90 0.07 Visualisation: Bubble Chart The circles area is proportional to G2. The activity cliffs are as in the Table 2 ranking Fig 2.The result of a similarity query for the top ranked compound ID = 12731 We extend the landscape analysis and visualisation of datasets, where the chemical structures are represented by more than one tautomer, and study the influence of the tautomerization on the SAR landscape. The Thrombin inhibitors dataset (AID 1215) contains 529 structures. The tautomers enriched dataset (generated by Ambit-Tautomers package [2]) consists of 6145 structures. Table 2. Activity cliffs ranking by G2 (Tanimoto threshold> 0.8 and activity difference > 21.6) Fig 1. SAS Map of Pubchem Thrombin inhibitor assay AID 1215 ( IC50 , μM); Tanimoto similarity on hashed 1024 bit fingerprints (The CDK library) Counts a, b, c, d as in Table 1 . If taking into account only structure pairs between a given compound and all other compounds in the analysed dataset, the G2 characterizes the likelihood of this particular compound to form activity cliffs with the compounds in the dataset. By estimating G2 of all structures in the dataset, a ranking can be established, thus identifying the most eminent activity cliffs. Note that this is a ranking of individual structures, not pairs of structures. This is a significant advantage, especially when processing large datasets, as only the likelihood (or the four counts) need to be stored per compound, instead of the entire pairwise matrix. The column a gives the number of pairs that form activity cliffs with the compound. The paired structures can be easily retrieved by a standard similarity query. The arrangement as a graph naturally emerges from the set of top ranked compounds, as they are usually interconnected as activity cliffs pairs. The network graph The bubble chart is space efficient and can represent a large number of values in a small space. More (interactive) examples at: http://toxmatch.sf.net Combined bubble chart of G2 ranked compounds. Similarity threshold 0.8; each color corresponds to a different activity difference threshold. The gray color at the right indicates the structures with count a = 0, but G2>0, due to the additive smoothing. These are potential activity cliffs at different similarity thresholds. PubChem AID1215, Tautomers enriched The network graph The bubble chart There are 8 activity cliffs pair instead of only one in the original dataset (Fig 2). The bubble chart shows that the G2 ranking is not the same for all the tautomers of the same compound (the size of the circles of the same color differs). The enriched dataset contains 8 tautomers per each of the three structures at Fig. 2. Blue: tautomers of ID = 12731 Red : tautomers of ID = 12413 Green : tautomers of ID = 12439 Fig 3. Activity cliffs ranking of the tautomer enriched AID 1215 dataset The network graph at Fig 3 shows that the activity cliffs never involve more than one tautomer. Therefore, if the correct combination of tautomers is missing in a particular dataset, the activity cliffs might not be identified. DSSTox CPDBAS dataset (carcinogenicity) Multiple activities and thresholds (1519 structures). Fig 4. The tautomers of ID = 12731 (Pubchem SID 861943) 1-[5-(4-bromophenyl)-7-(4-methoxyphenyl)-1,7-dihydro-[1,2,4]triazolo[1, 5-a]pyrimidin-2-yl]pyrrolidine-2,5-dione The activity cliffs ranking method is implemented as part of the open source Ambit package [3, 4] and could be accessed via REST web service (OpenTox API compliant). All the user interface and charts are JavaScript based and accessible through modern web browsers. Finally, each of the original structures is assigned the maximum G2, taken over the set of its tautomers. Then the structures are ordered and the rank is assigned. Fig.5 is an illustration how the activity cliff ranking changes, compared to the ranking derived form the original structures only. 1 101 201 301 401 501 1101201301401501 Rank (tautomersenricheddataset) Rank (original dataset) Fig 5. Activity cliff ranking in the original and tautomer enriched datasets The method goes beyond finding structure pairs only. * Additive (Laplace) smoothing is used to deal with zero counts The likelihood G2 is effectively a quantification of a SAS Map with defined thresholds. It can be calculated for the entire dataset (Fig. 1), for a selected set of compounds, or for an individual compound.