Molecular design: How to and how not to?
Peter W Kenny
Some things that are hurting Pharma
• Having to exploit targets that are poorly-linked to
human disease
• Inability to pre...
Molecular Design
• Control of behavior of compounds by manipulation of
molecular properties
• Hypothesis-driven or predict...
Achtung!
Spitfire!
Prediction-driven design: Ju 87 Stuka
Stuka on wikipedia
“Why can’t we pray for something good, like a tighter
bombing pattern, for example? Couldn’t we pray for a
tighter bombing...
Do1 Do2
Ac1
Kenny (2009) JCIM 49:1234-1244 DOI
Illustrating hypothesis-driven design
DNA Base Isosteres: Acceptor & Donor ...
Watson-Crick Donor & Acceptor Electrostatic Potentials for
Adenine Isosteres
Vmin(Ac1)
Va (Do1)
Kenny (2009) JCIM 49:1234-...
Eu prefiro minha comida cozida e meus dados brutos…
Correlation
• Strong correlation implies good predictivity
• Multivariate data analysis (e.g. PCA) usually involves
transf...
Quantifying strengths of relationships between
continuous variables
• Correlation measures
– Pearson product-moment correl...
Difference in mean values of Y for X = A and X = B
Scale by standard
deviation
Scale by standard
error
Cohen’s d
(independ...
Preparation of synthetic data sets
Kenny & Montanari (2013) JCAMD 27:1-13 DOI
Add Gaussian noise
(SD=10) to Y
Correlation inflation by hiding variation
See Hopkins, Mason & Overington (2006) Curr Opin Struct Biol 16:127-136 DOI
Lees...
r
N 1202
R 0.247 ( 95% CI: 0.193 | 0.299)
 0.215 ( P < 0.0001)
 0.148 ( P < 0.0001)
N 8
R 0.972 ( 95% CI: 0.846 | 0.995)...
Masking variation with standard error
See Gleeson (2008) JMC 51:817-834 DOI
Partition by value of X into 4 bins with equal...
N Bins Degrees of Freedom F P
40 4 3 0.2596 0.8540
400 4 3 12.855 < 0.0001
4000 4 3 115.35 < 0.0001
4000 2 1 270.91 < 0.00...
Know your data
• Assays are typically run in replicate making it possible
to estimate assay variance
• Every assay has a f...
Depicting variation with
percentile plots
This graphical representation of data makes it easy
to visualize variation and c...
Binning continuous data restricts your options for analysis and
places burden of proof on you to show that your conclusion...
Some stuff to think about
• Model continuous data as continuous data
– RMSE is most relevant to prediction but you still n...
Choosing octanol was the first mistake...
Polarity
N
ClogP ≤ 5 Acc ≤ 10; Don ≤5
An alternative view of the Rule of 5
Does octanol/water ‘see’ hydrogen bond donors?
--0.06 -0.23 -0.24
--1.01 -0.66
Sangster lab database of octanol/water part...
Octanol/Water Alkane/Water
Octanol/water is not the only partitioning system
logPoct = 2.1
logPalk = 1.9
DlogP = 0.2
logPoct = 1.5
logPalk = -0.8
DlogP = 2.3
logPoct = 2.5
logPalk = -1.8
DlogP = 4.3
...
DlogP = 0.5
PSA/ Å2 = 48
Polar Surface Area is not predictive of
hydrogen bond strength
DlogP = 4.3
PSA/ Å2 = 22
Toulmin e...
DlogP
(corrected)
Vmin/(Hartree/electron)
DlogP
(corrected)
Vmin/(Hartree/electron)
N or ether O
Carbonyl O
Prediction of ...
Basis for ClogPalk model
logPalk
MSA/Å2
Kenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOIKenny, Montanari & ...
𝐶𝑙𝑜𝑔𝑃𝑎𝑙𝑘 = 𝑙𝑜𝑔𝑃0 + 𝑠 × 𝑀𝑆𝐴 −
𝑖
∆𝑙𝑜𝑔𝑃𝐹𝐺,𝑖 −
𝑗
∆𝑙𝑜𝑔𝑃𝐼𝑛𝑡,𝑗
ClogPalk from perturbation of saturated hydrocarbon
logPalk predic...
Performance of ClogPalk model
Hydrocortisone
Cortisone
(logPalk  ClogPalk)/2
logPalkClogPalk
Atropine
Propanolol
Papavar...
Another way to look at SAR?
(Descriptor-based) QSAR/QSPR:
Some questions
• How valid is methodology (especially for validation)
when distribution of c...
Measures of Diversity & Coverage
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
2-Dimensional representation of chemical space is used here...
Neighborhoods and library design
Examples of relationships between structures
Tanimoto coefficient (foyfi) for structures is 0.90
Ester is methylated acid ...
Leatherface molecular editor
From chain saw to Matched Molecular Pairs
c-[A;!R]
bnd 1 2
c-Br
cul 2
hyd 1 1
[nX2]1c([OH])cc...
Effect of bioisosteric replacement on
plasma protein binding
?
Date of Analysis N DlogFu SE SD %increase
2003 7 -0.64 0.09...
-0.316
-0.315
-0.296
-0.295
Bioisosterism: Carboxylate & tetrazole
-0.262
-0.261
-0.268
-0.268
Kenny (2009) JCIM 49:1234-1...
Amide N DlogS SE SD %Increase
Acyclic (aliphatic amine) 109 0.59 0.07 0.71 76
Cyclic 9 0.18 0.15 0.47 44
Benzanilides 9 1....
Relationships between structures
Discover new
bioisosteres &
scaffolds
Prediction of activity &
properties
Recognise
extre...
MUDO Molecule Editor
• SMIRKS-based re-write of Leatherface using OEChem
• Can process 3D structures (e.g. form covalent b...
More stuff to think about
• There is life beyond octanol/water (and atom-
centered charges) if we choose to look for it
• ...
-0.054
-0.086
-0.091
-0.072
-0.104 -0.093
Hydrogen bonding of esters
Toulmin et al (2008) J Med Chem 51:3720-3730 DOI
Glycogen Phosphorylase inhibitors:
Series comparison
DpIC50
DlogFu
DlogS
0.38 (0.06)
-0.30 (0.06)
-0.29 (0.13)
DpIC50
Dlog...
Upcoming SlideShare
Loading in …5
×

Molecular design: How to and how not to?

667 views

Published on

I visited Astex and the main focus of the talk was correlation inflation.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
667
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Molecular design: How to and how not to?

  1. 1. Molecular design: How to and how not to? Peter W Kenny
  2. 2. Some things that are hurting Pharma • Having to exploit targets that are poorly-linked to human disease • Inability to predict idiosyncratic toxicity • Inability to measure free (unbound) physiological concentrations of drug for remote targets (e.g. intracellular or on far side of blood brain barrier) Dans la merde : http://fbdd-lit.blogspot.com/2011/09/dans-la-merde.html
  3. 3. Molecular Design • Control of behavior of compounds by manipulation of molecular properties • Hypothesis-driven or prediction-driven • Sampling of chemical space – Does fragment-based screening allow better control of sampling resolution?
  4. 4. Achtung! Spitfire! Prediction-driven design: Ju 87 Stuka Stuka on wikipedia
  5. 5. “Why can’t we pray for something good, like a tighter bombing pattern, for example? Couldn’t we pray for a tighter bombing pattern?” , Heller, Catch 22, 1961 Hypothesis-driven design: B52 Stratofortress B52 on wikipedia
  6. 6. Do1 Do2 Ac1 Kenny (2009) JCIM 49:1234-1244 DOI Illustrating hypothesis-driven design DNA Base Isosteres: Acceptor & Donor Definitions
  7. 7. Watson-Crick Donor & Acceptor Electrostatic Potentials for Adenine Isosteres Vmin(Ac1) Va (Do1) Kenny (2009) JCIM 49:1234-1244 DOI
  8. 8. Eu prefiro minha comida cozida e meus dados brutos…
  9. 9. Correlation • Strong correlation implies good predictivity • Multivariate data analysis (e.g. PCA) usually involves transformation to orthogonal basis of lower dimensionality • Applying cutoffs (e.g. MW restriction) to data can distort correlations
  10. 10. Quantifying strengths of relationships between continuous variables • Correlation measures – Pearson product-moment correlation coefficient (R) – Spearman's rank correlation coefficient () – Kendall rank correlation coefficient (τ) • Quality of fit measures – Coefficient of determination (R2) is the fraction of the variance in Y that is explained by model – Root mean square error (RMSE)
  11. 11. Difference in mean values of Y for X = A and X = B Scale by standard deviation Scale by standard error Cohen’s d (independent of sample size) Student’s t (depends on sample size) Size of effect for categorical X R2 can be seen as analogous to Cohen’s d
  12. 12. Preparation of synthetic data sets Kenny & Montanari (2013) JCAMD 27:1-13 DOI Add Gaussian noise (SD=10) to Y
  13. 13. Correlation inflation by hiding variation See Hopkins, Mason & Overington (2006) Curr Opin Struct Biol 16:127-136 DOI Leeson & Springthorpe (2007) NRDD 6:881-890 DOI Data is naturally binned (X is an integer) and mean value of Y is calculated for each value of X. In some studies, averaged data is only presented graphically and it is left to the reader to judge the strength of the correlation. R = 0.34 R = 0.30 R = 0.31 R = 0.67 R = 0.93 R = 0.996
  14. 14. r N 1202 R 0.247 ( 95% CI: 0.193 | 0.299)  0.215 ( P < 0.0001)  0.148 ( P < 0.0001) N 8 R 0.972 ( 95% CI: 0.846 | 0.995)  0.970 ( P < 0.0001)  0.909 ( P = 0.0018) Correlation Inflation in Flatland See Lovering, Bikker & Humblet (2009) JMC 52:6752-6756 DOI
  15. 15. Masking variation with standard error See Gleeson (2008) JMC 51:817-834 DOI Partition by value of X into 4 bins with equal numbers of data points and display 95% confidence interval for mean (green) and mean ± SD (blue) for each bin. R = 0.12 R = 0.29 R = 0.28
  16. 16. N Bins Degrees of Freedom F P 40 4 3 0.2596 0.8540 400 4 3 12.855 < 0.0001 4000 4 3 115.35 < 0.0001 4000 2 1 270.91 < 0.0001 4000 8 7 50.075 < 0.0001 “In each plot provided, the width of the errors bars and the difference in the mean values of the different categories are indicative of the strength of the relationship between the parameters.” Gleeson (2008) JMC 51:817-834 DOI The error of standard error ANOVA for binned data sets
  17. 17. Know your data • Assays are typically run in replicate making it possible to estimate assay variance • Every assay has a finite dynamic range and it may not always be obvious what this is for a particular assay • Dynamic range may have been sacrificed for thoughput but this, by itself, does not make the assay bad • We need to be able analyse in-range and out-of- range data within single unified framework – See Lind (2010) QSAR analysis involving assay results which are only known to be greater than, or less than some cut-off limit. Mol Inf 29:845-852 DOI
  18. 18. Depicting variation with percentile plots This graphical representation of data makes it easy to visualize variation and can be used with mixed in-range and out-of-range data. See Colclough et al (2008) BMCL 16:6611-6616 DOI
  19. 19. Binning continuous data restricts your options for analysis and places burden of proof on you to show that your conclusions are independent of the binning scheme. Think before you bin! Averaging the binned data was your idea so don’t try blaming me this time!
  20. 20. Some stuff to think about • Model continuous data as continuous data – RMSE is most relevant to prediction but you still need R2 – Fitted parameters may provide insight (e.g. solubility is more sensitive than potency to lipophilicity) • When selecting training data think in terms of Design of Experiments (e.g. evenly spaced values of X) • Try to achieve normally distributed Y (e.g. use pIC50 rather than IC50) • Never make statements about the strength of a relationship when you’ve hidden variation in the data (unless you want a starring role in Correlation Inflation 2) • To be meaningful a measure of the spread of a distribution must be independent of sample size • Reviewers/editors, mercilessly purge manuscripts of statements like, “A negative correlation was observed between X and Y” or “A and B are correlated/linked”
  21. 21. Choosing octanol was the first mistake...
  22. 22. Polarity N ClogP ≤ 5 Acc ≤ 10; Don ≤5 An alternative view of the Rule of 5
  23. 23. Does octanol/water ‘see’ hydrogen bond donors? --0.06 -0.23 -0.24 --1.01 -0.66 Sangster lab database of octanol/water partition coefficients: http://logkow.cisti.nrc.ca/logkow/index.jsp --1.05
  24. 24. Octanol/Water Alkane/Water Octanol/water is not the only partitioning system
  25. 25. logPoct = 2.1 logPalk = 1.9 DlogP = 0.2 logPoct = 1.5 logPalk = -0.8 DlogP = 2.3 logPoct = 2.5 logPalk = -1.8 DlogP = 4.3 Differences in octanol/water and alkane/water logP values reflect hydrogen bonding between solute and octanol Toulmin et al (2008) J Med Chem 51:3720-3730 DOI
  26. 26. DlogP = 0.5 PSA/ Å2 = 48 Polar Surface Area is not predictive of hydrogen bond strength DlogP = 4.3 PSA/ Å2 = 22 Toulmin et al (2008) J Med Chem 51:3720-3730 DOI
  27. 27. DlogP (corrected) Vmin/(Hartree/electron) DlogP (corrected) Vmin/(Hartree/electron) N or ether O Carbonyl O Prediction of contribution of acceptors to DlogP DlogP = DlogP0 x exp(-kVmin) Toulmin et al (2008) J Med Chem 51:3720-3730 DOI
  28. 28. Basis for ClogPalk model logPalk MSA/Å2 Kenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOIKenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOI
  29. 29. 𝐶𝑙𝑜𝑔𝑃𝑎𝑙𝑘 = 𝑙𝑜𝑔𝑃0 + 𝑠 × 𝑀𝑆𝐴 − 𝑖 ∆𝑙𝑜𝑔𝑃𝐹𝐺,𝑖 − 𝑗 ∆𝑙𝑜𝑔𝑃𝐼𝑛𝑡,𝑗 ClogPalk from perturbation of saturated hydrocarbon logPalk predicted for saturated hydrocarbon Perturbation by functional groups Perturbation by interactions between functional groups Kenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOI
  30. 30. Performance of ClogPalk model Hydrocortisone Cortisone (logPalk  ClogPalk)/2 logPalkClogPalk Atropine Propanolol Papavarlne Kenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOI
  31. 31. Another way to look at SAR?
  32. 32. (Descriptor-based) QSAR/QSPR: Some questions • How valid is methodology (especially for validation) when distribution of compounds in training/test space is highly non-uniform? • Are models predicting activity or locating neighbours? • To what extent are ‘global’ models just ensembles of local models? • How well do the methods handle ‘activity cliffs’? • How should we account for sizes of descriptor pools when comparing model performance?
  33. 33. Measures of Diversity & Coverage • • • • • • • • • • • • • • • 2-Dimensional representation of chemical space is used here to illustrate concepts of diversity and coverage. Stars indicate compounds selected to sample this region of chemical space. In this representation, similar compounds are close together
  34. 34. Neighborhoods and library design
  35. 35. Examples of relationships between structures Tanimoto coefficient (foyfi) for structures is 0.90 Ester is methylated acid Amides are ‘reversed’
  36. 36. Leatherface molecular editor From chain saw to Matched Molecular Pairs c-[A;!R] bnd 1 2 c-Br cul 2 hyd 1 1 [nX2]1c([OH])cccc1 hyd 1 1 hyd 3 -1 bnd 2 3 2 Kenny & Sadowski Structure modification in chemical databases, Methods and Principles in Medicinal Chemistry (Chemoinformatics in Drug Discovery 2005, 23, 271-285 DOI
  37. 37. Effect of bioisosteric replacement on plasma protein binding ? Date of Analysis N DlogFu SE SD %increase 2003 7 -0.64 0.09 0.23 0 2008 12 -0.60 0.06 0.20 0 Mining PPB database for carboxylate/tetrazole pairs suggested that bioisosteric replacement would lead to decrease in Fu so tetrazoles were not synthesised. Birch et al (2009) BMCL 19:850-853 DOI
  38. 38. -0.316 -0.315 -0.296 -0.295 Bioisosterism: Carboxylate & tetrazole -0.262 -0.261 -0.268 -0.268 Kenny (2009) JCIM 49:1234-1244 DOI
  39. 39. Amide N DlogS SE SD %Increase Acyclic (aliphatic amine) 109 0.59 0.07 0.71 76 Cyclic 9 0.18 0.15 0.47 44 Benzanilides 9 1.49 0.25 0.76 100 Effect of amide N-methylation on aqueous solubility is dependent on substructural context Birch et al (2009) BMCL 19:850-853 DOI
  40. 40. Relationships between structures Discover new bioisosteres & scaffolds Prediction of activity & properties Recognise extreme data Direct prediction (e.g. look up substituent effects) Indirect prediction (e.g. apply correction to existing model) Bad measurement or interesting effect?
  41. 41. MUDO Molecule Editor • SMIRKS-based re-write of Leatherface using OEChem • Can process 3D structures (e.g. form covalent bond between protein and ligand) • Identification of matched molecular pairs is much easier than with Leatherface • Published with source code in supplemental information Kenny, Montanari, Propopczyk, Sala, Rodrigues Sartori (2013) Automated molecule editing in molecular design. JCAMD 27 DOI
  42. 42. More stuff to think about • There is life beyond octanol/water (and atom- centered charges) if we choose to look for it • Even molecules can have meaningful relationships
  43. 43. -0.054 -0.086 -0.091 -0.072 -0.104 -0.093 Hydrogen bonding of esters Toulmin et al (2008) J Med Chem 51:3720-3730 DOI
  44. 44. Glycogen Phosphorylase inhibitors: Series comparison DpIC50 DlogFu DlogS 0.38 (0.06) -0.30 (0.06) -0.29 (0.13) DpIC50 DlogFu DlogS 0.21 (0.06) 0.13 (0.04) 0.20 (0.09) DpIC50 DlogFu DlogS 0.29 (0.07) -0.42 (0.08) -0.62 (0.13) Standard errors in mean values in parenthesis; see Birch et al (2009) BMCL 19:850-853 DOI

×