Molecular design:  How to and how not to?
Upcoming SlideShare
Loading in...5
×
 

Molecular design: How to and how not to?

on

  • 450 views

I visited Astex and the main focus of the talk was correlation inflation.

I visited Astex and the main focus of the talk was correlation inflation.

Statistics

Views

Total Views
450
Views on SlideShare
450
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Molecular design:  How to and how not to? Molecular design: How to and how not to? Presentation Transcript

  • Molecular design: How to and how not to? Peter W Kenny
  • Some things that are hurting Pharma • Having to exploit targets that are poorly-linked to human disease • Inability to predict idiosyncratic toxicity • Inability to measure free (unbound) physiological concentrations of drug for remote targets (e.g. intracellular or on far side of blood brain barrier) Dans la merde : http://fbdd-lit.blogspot.com/2011/09/dans-la-merde.html
  • Molecular Design • Control of behavior of compounds by manipulation of molecular properties • Hypothesis-driven or prediction-driven • Sampling of chemical space – Does fragment-based screening allow better control of sampling resolution?
  • Achtung! Spitfire! Prediction-driven design: Ju 87 Stuka Stuka on wikipedia
  • “Why can’t we pray for something good, like a tighter bombing pattern, for example? Couldn’t we pray for a tighter bombing pattern?” , Heller, Catch 22, 1961 Hypothesis-driven design: B52 Stratofortress B52 on wikipedia
  • Do1 Do2 Ac1 Kenny (2009) JCIM 49:1234-1244 DOI Illustrating hypothesis-driven design DNA Base Isosteres: Acceptor & Donor Definitions
  • Watson-Crick Donor & Acceptor Electrostatic Potentials for Adenine Isosteres Vmin(Ac1) Va (Do1) Kenny (2009) JCIM 49:1234-1244 DOI
  • Eu prefiro minha comida cozida e meus dados brutos…
  • Correlation • Strong correlation implies good predictivity • Multivariate data analysis (e.g. PCA) usually involves transformation to orthogonal basis of lower dimensionality • Applying cutoffs (e.g. MW restriction) to data can distort correlations
  • Quantifying strengths of relationships between continuous variables • Correlation measures – Pearson product-moment correlation coefficient (R) – Spearman's rank correlation coefficient () – Kendall rank correlation coefficient (τ) • Quality of fit measures – Coefficient of determination (R2) is the fraction of the variance in Y that is explained by model – Root mean square error (RMSE)
  • Difference in mean values of Y for X = A and X = B Scale by standard deviation Scale by standard error Cohen’s d (independent of sample size) Student’s t (depends on sample size) Size of effect for categorical X R2 can be seen as analogous to Cohen’s d
  • Preparation of synthetic data sets Kenny & Montanari (2013) JCAMD 27:1-13 DOI Add Gaussian noise (SD=10) to Y
  • Correlation inflation by hiding variation See Hopkins, Mason & Overington (2006) Curr Opin Struct Biol 16:127-136 DOI Leeson & Springthorpe (2007) NRDD 6:881-890 DOI Data is naturally binned (X is an integer) and mean value of Y is calculated for each value of X. In some studies, averaged data is only presented graphically and it is left to the reader to judge the strength of the correlation. R = 0.34 R = 0.30 R = 0.31 R = 0.67 R = 0.93 R = 0.996
  • r N 1202 R 0.247 ( 95% CI: 0.193 | 0.299)  0.215 ( P < 0.0001)  0.148 ( P < 0.0001) N 8 R 0.972 ( 95% CI: 0.846 | 0.995)  0.970 ( P < 0.0001)  0.909 ( P = 0.0018) Correlation Inflation in Flatland See Lovering, Bikker & Humblet (2009) JMC 52:6752-6756 DOI
  • Masking variation with standard error See Gleeson (2008) JMC 51:817-834 DOI Partition by value of X into 4 bins with equal numbers of data points and display 95% confidence interval for mean (green) and mean ± SD (blue) for each bin. R = 0.12 R = 0.29 R = 0.28
  • N Bins Degrees of Freedom F P 40 4 3 0.2596 0.8540 400 4 3 12.855 < 0.0001 4000 4 3 115.35 < 0.0001 4000 2 1 270.91 < 0.0001 4000 8 7 50.075 < 0.0001 “In each plot provided, the width of the errors bars and the difference in the mean values of the different categories are indicative of the strength of the relationship between the parameters.” Gleeson (2008) JMC 51:817-834 DOI The error of standard error ANOVA for binned data sets
  • Know your data • Assays are typically run in replicate making it possible to estimate assay variance • Every assay has a finite dynamic range and it may not always be obvious what this is for a particular assay • Dynamic range may have been sacrificed for thoughput but this, by itself, does not make the assay bad • We need to be able analyse in-range and out-of- range data within single unified framework – See Lind (2010) QSAR analysis involving assay results which are only known to be greater than, or less than some cut-off limit. Mol Inf 29:845-852 DOI
  • Depicting variation with percentile plots This graphical representation of data makes it easy to visualize variation and can be used with mixed in-range and out-of-range data. See Colclough et al (2008) BMCL 16:6611-6616 DOI
  • Binning continuous data restricts your options for analysis and places burden of proof on you to show that your conclusions are independent of the binning scheme. Think before you bin! Averaging the binned data was your idea so don’t try blaming me this time!
  • Some stuff to think about • Model continuous data as continuous data – RMSE is most relevant to prediction but you still need R2 – Fitted parameters may provide insight (e.g. solubility is more sensitive than potency to lipophilicity) • When selecting training data think in terms of Design of Experiments (e.g. evenly spaced values of X) • Try to achieve normally distributed Y (e.g. use pIC50 rather than IC50) • Never make statements about the strength of a relationship when you’ve hidden variation in the data (unless you want a starring role in Correlation Inflation 2) • To be meaningful a measure of the spread of a distribution must be independent of sample size • Reviewers/editors, mercilessly purge manuscripts of statements like, “A negative correlation was observed between X and Y” or “A and B are correlated/linked”
  • Choosing octanol was the first mistake...
  • Polarity N ClogP ≤ 5 Acc ≤ 10; Don ≤5 An alternative view of the Rule of 5
  • Does octanol/water ‘see’ hydrogen bond donors? --0.06 -0.23 -0.24 --1.01 -0.66 Sangster lab database of octanol/water partition coefficients: http://logkow.cisti.nrc.ca/logkow/index.jsp --1.05
  • Octanol/Water Alkane/Water Octanol/water is not the only partitioning system
  • logPoct = 2.1 logPalk = 1.9 DlogP = 0.2 logPoct = 1.5 logPalk = -0.8 DlogP = 2.3 logPoct = 2.5 logPalk = -1.8 DlogP = 4.3 Differences in octanol/water and alkane/water logP values reflect hydrogen bonding between solute and octanol Toulmin et al (2008) J Med Chem 51:3720-3730 DOI
  • DlogP = 0.5 PSA/ Å2 = 48 Polar Surface Area is not predictive of hydrogen bond strength DlogP = 4.3 PSA/ Å2 = 22 Toulmin et al (2008) J Med Chem 51:3720-3730 DOI
  • DlogP (corrected) Vmin/(Hartree/electron) DlogP (corrected) Vmin/(Hartree/electron) N or ether O Carbonyl O Prediction of contribution of acceptors to DlogP DlogP = DlogP0 x exp(-kVmin) Toulmin et al (2008) J Med Chem 51:3720-3730 DOI
  • Basis for ClogPalk model logPalk MSA/Å2 Kenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOIKenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOI
  • 𝐶𝑙𝑜𝑔𝑃𝑎𝑙𝑘 = 𝑙𝑜𝑔𝑃0 + 𝑠 × 𝑀𝑆𝐴 − 𝑖 ∆𝑙𝑜𝑔𝑃𝐹𝐺,𝑖 − 𝑗 ∆𝑙𝑜𝑔𝑃𝐼𝑛𝑡,𝑗 ClogPalk from perturbation of saturated hydrocarbon logPalk predicted for saturated hydrocarbon Perturbation by functional groups Perturbation by interactions between functional groups Kenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOI
  • Performance of ClogPalk model Hydrocortisone Cortisone (logPalk  ClogPalk)/2 logPalkClogPalk Atropine Propanolol Papavarlne Kenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOI
  • Another way to look at SAR?
  • (Descriptor-based) QSAR/QSPR: Some questions • How valid is methodology (especially for validation) when distribution of compounds in training/test space is highly non-uniform? • Are models predicting activity or locating neighbours? • To what extent are ‘global’ models just ensembles of local models? • How well do the methods handle ‘activity cliffs’? • How should we account for sizes of descriptor pools when comparing model performance?
  • Measures of Diversity & Coverage • • • • • • • • • • • • • • • 2-Dimensional representation of chemical space is used here to illustrate concepts of diversity and coverage. Stars indicate compounds selected to sample this region of chemical space. In this representation, similar compounds are close together
  • Neighborhoods and library design
  • Examples of relationships between structures Tanimoto coefficient (foyfi) for structures is 0.90 Ester is methylated acid Amides are ‘reversed’
  • Leatherface molecular editor From chain saw to Matched Molecular Pairs c-[A;!R] bnd 1 2 c-Br cul 2 hyd 1 1 [nX2]1c([OH])cccc1 hyd 1 1 hyd 3 -1 bnd 2 3 2 Kenny & Sadowski Structure modification in chemical databases, Methods and Principles in Medicinal Chemistry (Chemoinformatics in Drug Discovery 2005, 23, 271-285 DOI
  • Effect of bioisosteric replacement on plasma protein binding ? Date of Analysis N DlogFu SE SD %increase 2003 7 -0.64 0.09 0.23 0 2008 12 -0.60 0.06 0.20 0 Mining PPB database for carboxylate/tetrazole pairs suggested that bioisosteric replacement would lead to decrease in Fu so tetrazoles were not synthesised. Birch et al (2009) BMCL 19:850-853 DOI
  • -0.316 -0.315 -0.296 -0.295 Bioisosterism: Carboxylate & tetrazole -0.262 -0.261 -0.268 -0.268 Kenny (2009) JCIM 49:1234-1244 DOI
  • Amide N DlogS SE SD %Increase Acyclic (aliphatic amine) 109 0.59 0.07 0.71 76 Cyclic 9 0.18 0.15 0.47 44 Benzanilides 9 1.49 0.25 0.76 100 Effect of amide N-methylation on aqueous solubility is dependent on substructural context Birch et al (2009) BMCL 19:850-853 DOI
  • Relationships between structures Discover new bioisosteres & scaffolds Prediction of activity & properties Recognise extreme data Direct prediction (e.g. look up substituent effects) Indirect prediction (e.g. apply correction to existing model) Bad measurement or interesting effect?
  • MUDO Molecule Editor • SMIRKS-based re-write of Leatherface using OEChem • Can process 3D structures (e.g. form covalent bond between protein and ligand) • Identification of matched molecular pairs is much easier than with Leatherface • Published with source code in supplemental information Kenny, Montanari, Propopczyk, Sala, Rodrigues Sartori (2013) Automated molecule editing in molecular design. JCAMD 27 DOI
  • More stuff to think about • There is life beyond octanol/water (and atom- centered charges) if we choose to look for it • Even molecules can have meaningful relationships
  • -0.054 -0.086 -0.091 -0.072 -0.104 -0.093 Hydrogen bonding of esters Toulmin et al (2008) J Med Chem 51:3720-3730 DOI
  • Glycogen Phosphorylase inhibitors: Series comparison DpIC50 DlogFu DlogS 0.38 (0.06) -0.30 (0.06) -0.29 (0.13) DpIC50 DlogFu DlogS 0.21 (0.06) 0.13 (0.04) 0.20 (0.09) DpIC50 DlogFu DlogS 0.29 (0.07) -0.42 (0.08) -0.62 (0.13) Standard errors in mean values in parenthesis; see Birch et al (2009) BMCL 19:850-853 DOI