Improving Cluster Representations by "Fuzzifying"
Maximum Common Substructures
Christian Herhaus
Merck KGaA, Computational...
Upcoming SlideShare
Loading in …5

Poster wellcome-trust-2013-herhaus-fuzzy-mcs


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Poster wellcome-trust-2013-herhaus-fuzzy-mcs

  1. 1. Improving Cluster Representations by "Fuzzifying" Maximum Common Substructures Christian Herhaus Merck KGaA, Computational Chemistry, Frankfurter Str. 250, 64293 Darmstadt, Introduction Results Arranging similar structures in clusters is one of the typical tasks of modern Chemoinformatics with high impact in HTS follow-up, generation of structure activity relationships (SAR) and selection of starting points for compound optimization. The approach presented here allows the generation of MCS also for similarity-based clusters with a given inherent structural diversity. It does so by utilizing query features like atoms lists and query bonds to add "fuzziness" in atom and bond information to the MCS. Methods for cluster generation are as diverse as the structures which they are applied to [1], may they be e.g. similarity- or substructure-based. Typically, medicinal chemists tend to orientate themselves in structure subsets like clusters with the help of substructures, so-called "scaffolds", which intuitively characterize the structural relationships between the molecules of the subset. As a result, the generated "fuzzy" MCS, although still being fully database-searchable, is more meaningful for the characterisation of clusters as it can cover larger parts of the full structures than a conventional MCS could do. The approach was implemented in Pipeline Pilot™ for proof of concept but is general enough to be transferred to other technical platforms as well. In the case of substructure-based clustering, well established methods are existing for generation of Maximum Common Substructures (MCS) which are present in all members of the structure population or a defined proportion thereof [2]. But in the case of similaritybased clusters, such MCS may either not be existing for the required dataset proportion or the common substructure may be so small that it is no longer representative and therefore lacking information. Conventional MCS Aim of this approach is to provide descriptive scaffold structures also for clusters whose intrinsic structural diversity is too high to be characterized by conventional Maximum Common Substructure approaches. "Fuzzy" MCS Methods any s/d s/d s/d Content of the Pipeline Pilot™ component "Generate Fuzzy MCS" (slightly simplified) 1. Reduced graphs 2. Generalized MCS any any any any any any any any any any any any any any any any any any any any any any any any any any any any any any 6. Query bonds coded as stereo bonds 7. Final "fuzzy" MCS any s/d s/d 3. Generalized MCS mapped onto structures 4. Structures reduced & mappings normalized 6 1 1 1 17 11 11 11 9 7 1 13 11 16 5. Collected atom/bond information transfered into queries Atom No 1 7 11 Elements O,O,S N,N,O C,C,C Atom No 1 7 11 Element [O,S] [N,O] C Bond No 3 9 14 Types 1,1,1 1,2,1 2,1,4 Bond No 3 9 14 Type single single/double aromatic 7 7 20 s/d 7 Where to get / How to contribute The current version of the component is published on the Accelrys® Pipeline Pilot™ user forum and can be downloaded from Contributors are appreciated for improving the approach and for transfering the concept to other technology platforms like RDKit, CDK or KNIME. Current limitations • Aromaticity may cause unexpected effects (e.g. single/double vs. aromatic bonds) • Symmetric substructures may blur atom/bond information (e.g. carboxyl or nitro groups) • Currently, just one presence of an atom/bond feature leads to inclusion into a query feature. For larger clusters, a percentual threshold may be required • Different ring sizes and chain lengths can so far not be described by query patterns. [1] Downs GM, Barnard JM: Clustering Methods and Their Uses in Computational Chemistry. In Reviews in Computational Chemistry (18). Chichester: Wiley and Sons; 2003, 1-40. [2] Ehrlich HC, Rarey M: Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput Mol Sci 2011, 1:68-79.