Tim Cheeseright, Assessing the Similarities of Compound collections using molecular fields: Does it add value?
Upcoming SlideShare
Loading in...5
×
 

Tim Cheeseright, Assessing the Similarities of Compound collections using molecular fields: Does it add value?

on

  • 474 views

This presentation, originally given at the 2012 ACS National Meeting in San Diego, investigates alternative methods of defining chemical space using 3D Field based methodologies - the advantages and ...

This presentation, originally given at the 2012 ACS National Meeting in San Diego, investigates alternative methods of defining chemical space using 3D Field based methodologies - the advantages and disadvantages of which are described.

Statistics

Views

Total Views
474
Views on SlideShare
370
Embed Views
104

Actions

Likes
0
Downloads
8
Comments
0

2 Embeds 104

http://www.cresset-group.com 103
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Notes:The 2D drawing of a molecule gives limited information about its nature – in real life, molecules take on a 3D geometry whose nature can’t be truly represented by a flat cartoon.Consider the electrostatic potential surrounding a molecule and map that potential out to a surface as shown in the second figure. Field Points are points that are placed at the extrema of the MEP, with the point size governed by the size of the electrostatic contribution.Spatial points are also included at the van der Walls radii extrema.
  • 1) Commercial databases 9 million filtered for Heavy atom count:  >11 < 30 correspond to roughly  Mwt >140  < 500  (4,655,051 cpds)  (2) Further filtered for rotatable bond count < 5  reactive group filters applied (removes nasties like aldehydes, ketones, hydrazones, alkylhalides, isocyanates, nitrosyl etc… see below for full list), charge filters < 3 formal charges neg. or positive.    (1,282,042 mols passed these filters). (3) For this list of compounds we intend to calculate logP, HBA, HBD, PMI and shadow indices and select 20K on shape diversity. I believe this is going to be a reasonable approximation of field similarity since fields are also heavily dependent on 3D conformation. (4) From this data we also intend to pick 100 probe molecules and use these to calculate similarity v the 20K set. This gives a 20K set each with a 100 bit field fingerprint.  This is the equivalent of a completing a 2M virtual screen. (5) This fingerprint can be subjected to a PCA analysis to reduce the data effectively to a 3 dimensional ‘field space’ from which a diverse 12 K set can be chosen. From a practical point of view it will be difficult to expand this process to a bigger data set although if 3d shape sim correlates well with Field sim then the PMI selection may be enough – we simply don’t know until we do the experiment.  (6) We will provide the 12K SD file set for you to purchase with 2000 cpd redundancy for those which are  not available or too expensive etc. (a) filtered on properties and nasty functionality to obtain a 1.2 million compound data set.  (b) On this set we ran a PMI shape descriptor calculation on a single ‘lowest energy’ conformation for each molecule in the set. (c) From this we picked a 20K shape diverse set using the PMI defined shape space.  (d)From the 20K set I picked a diverse 200 cpd set in the same way.(e) We applied to this 200 an all by all 2D similarity matrix ‘200 by 200’ we could then ensure 2D dissimilarity in the choice of a set of a 100 probe molecules. (f) These 100 probe molecules were used as templates to measure Field similarity against each of the 20K cpds and thus produce a 100 bit number for each of the 20K cpds.                (g) From the Field similarity matrix we collapsed the ‘ ~20000 X 100’ matrix to ~20000 X 3 dimensions using PCA to define the 3D fieldspace.                (h) 12k Field diverse compounds were selected from this 3D Fieldspace.
  • Theoretically, field based metrics should be a good way to assess the similarity/diversity of fragment collections?? Diversity of fragment databases?? In Fieldstere
  • Should have probably done a 200 X 200 field similarity at this stage to ensure picking field diverse probes? But 2d disim also ensured we were avoiding picking too similar chemotypes for the probes – probably doesn’t matter. Theoretically, field based metrics should be a good way to assess the similarity/diversity of fragment collections?? Diversity of fragment databases?? In FieldstereNever tried using a smaller number of probes – could increase/decrease discrimination?
  • Picked a cluster set from the space 3D PCA – selected an arbitrary conformer then flexibly aligned (Falign) the rest – plot surface. Bottom 8 Fsim less than 6
  • Picked a cluster set from the space 3D PCA – selected an arbitary conformer then flexibly aligned (Falign) the rest – plot surface. Bottom 8 Fsim less than 6
  • Againselected an arbitary conformer (different one this time) then flexibly aligned (Falign) the rest – plot surface. Bottom 5 Fsim less than 6
  • Picked a second cluster and repeated with another Arbitary template – Fsims all > 6 discarded 4 which were below 6. – Cluster still OKConclude: Evenin this space - clusters of close field similarity are still fairly diverse!!
  • Separation of chemically intuitive groupings – DHP-like esters/lactones………….compact sulphonamides – clusters on periphery are truly Field dissimilar.

Tim Cheeseright, Assessing the Similarities of Compound collections using molecular fields: Does it add value? Tim Cheeseright, Assessing the Similarities of Compound collections using molecular fields: Does it add value? Presentation Transcript

  • Tim Cheeseright, Mark Mackey, Rob Scoffin, Martin SlaterAssessing the similarity of compound collectionsusing molecular fields: Does it add value? 1
  • Conclusions> It works brilliantly> All synthetic steps gave yields of 100%> All enrichments were perfect> All new molecules were sub nM> All QSARs were totally predictive, q2 = 1.0> We expect the call from Sweden any day now 2
  • Conclusions> Work in progress> 3D similarity can add value to compound selection> Full matrix of similarities possibly unnecessary> Using probes looks like a possible solution> Not a panacea 3
  • Agenda & Background> Fields & similarity> Generating screening compounds using Fields> Selecting a 10K “diverse” library for screening from commercial compounds > Initial thoughts > Problems > More Initial thoughts > A solution but not a complete one> Conclusions 4
  • Field PointsCondensed representation of electrostatic, hydrophobicand shape properties (“protein‟s view”) > Molecular Field Extrema (“Field Points”) 2D 3D Molecular Field Points Electrostatic = Positive Potential (MEP) = Negative = Shape = Hydrophobic 5
  • Improved MM Electrostatics> Field patterns from XED force field reproduce experimental results Experimental Using XEDs Not using XEDs Interaction of Acetone and Any-OH from small molecule XED adds ‘p-orbitals’ to crystal structures get better representation of atoms 6
  • Non-Classical Comparisons 7
  • Molecular Alignment 0.82 0.66 0.98 Cheeseright et al, J. Chem Inf. Mod., 2006, 665 8
  • Using Fields> Bioisosteric groups> Virtual Screening> Pharmacophore hypothesis> Qualitative SAR interpretation> 3D QSAR> Library Design 9
  • Field based library design success 10
  • Libraries from Fields> Small, custom synthesised libraries (~100s - 1000s compds)> Low scaffold diversity> Highly targeted> Lots of manual design 11
  • An Opportunity & a Challenge> Provide a small diverse screening library 10K for a small biotech company > Diversity in potential biological targets to be hit > Minimum redundancy in the set > Maximum chance of success in finding a lead within available budget and screening resources 12
  • Initial thoughts> Customised design not an option - commercial compounds only> Using Fields to successfully select compounds for screening performed many times > Virtual screening > Always in a specific biological context> What about using Fields to choose a „diverse‟ set> Possible problem with numbers > 10,000 cmpd library small > 9,000,000 commercially available molecules v. large for 3D diversity 13
  • Initial thoughts> Compare 3D and 2D similarities for compound collections - are we wasting our time?> Take a small compound collection> Full NxN calculation> 3D method = Fields & Shape> 2D method = atom pairs> Compare and Contrast 14
  • Conformations> 3D Method requires conformations - which one(s) to use?> What is the similarity of 2 compounds in 3D ? > Context is important! > Highest across all conformations? > Average ? > Lowest ?> For 3D, similarity calculation is Nconfs x Nconfs 15
  • Compound Collection> BIONET Rule of Three (Ro3) Fragment Library: “7,907 Ro3-compliant fragments”> Conformation hunt on every fragment  Maximum of 5 conformations (!)> Full N x N similarity matrix, 3D & 2D (60 Million data points)> ~30 compounds failed conformation hunting 17
  • Problems> 400Mb of data> Tedious to use and examinePilot study just using the first 500 compounds > Some chemical families in this area > Still a large dataset to deal with (250,000 data points)> 2D similarities and fragments > Small changes cause disproportionately high changes > Atom pairs particularly bad > Switch to KNIME fingerprints  All 2D values lower than „normal‟ 18
  • Comparing 2D and 3D metrics Agreement 19
  • Example - Similar Scores 2D sim = 0.9 101 104 3D field sim = 0.87 22
  • Example - Higher 3D Sim 2D sim = 0.1 (other methods=0.3) 3D field sim = 0.82 23
  • Example - Higher 3D Sim 2D sim = 0.2 141 454 3D sim = 0.7 24
  • Example - Higher 3D Sim 2D sim = 0.3 (other methods 0.55) 437 440 3D field sim = 0.8 25
  • So…> Pilot study suggests some added value> Full matrix painful even if we could calculate it> What about a reduced matrix? > Use „Probe‟ compounds to tease out molecules that are different in Field space How many probes? Across how many molecules> We were running out of time… 26
  • Compound selection by Field Diversity> Proposed workflow for generation of a field diverse library: 9M Pick 200 commercial Calc. 200 X 200 sub-set compounds 2D similarity Pick 100 Calc. Shape matrix Diverse Diversity by Field Property PMI probes Filters 1.2M Pick 20K sub-set Calc. 20K X 100 Field similarity matrix Pick 12K 3D PCA on Field Field matrix Diverse set 27
  • Field Diverse library: Outcome12K „Field Diverse‟ library mapped by 3D PCA on the100 x 20,000 „Field Similarity Fingerprint‟ Ammoniums Piperidines Distinct separation of charged species within this space ….so what!! Benzoic and aliphatic acids 30
  • Field Diverse library: Outcome12K „Field Diverse‟ library mapped by 3D PCA Distinct separation of by molecules by size within this space ….so what!! Decreasing Size 31
  • Deeper - Moderate „Field Similarity‟ Alignment to „template1‟ 32
  • Deeper - Moderate „Field Similarity‟Random selection of mols Alignment to „template1‟ 33
  • Deeper - Moderate „Field Similarity‟ Alignment to „template‟ 35
  • Is the chemical space sensible? Small sulphonamides Large esters Two example clusters 36
  • Conclusions> Work in progress> Full similarity matrix shows potential of 3D sim to add value> Full matrix difficult to handle and possibly unnecessary> Using probes looks like a possible solution> Not a panacea - still need to play the numbers game 37
  • Acknowledgements> Cresset > Martin Slater > Rob Scoffin > Mark Mackey > James Melville> Mission Therapeutics > Keith Menear 38