Automatic Compound Design by Matched Molecular Pairs


Published on

Matched Molecular Pairs (MMPs) are pairs of molecules that differ by a single structural transformation, which can be due to a chemical reaction but more often involves swapping one chemical group for another in a way that is not feasible in a single synthetic step. It is implicitly understood that in MMPs the static (common) part of the pair is significantly larger than the variable parts. MMPs are popular among medicinal chemists because the concept is closely related to how chemists think about a series of molecules: typically a series is defined as a static core with variable substitutions that each contribute to the overall properties of the molecule like potency, solubility, selectivity, etc. MMPs have been used to mine large sets of biological screening results to answer questions like “how much potency is gained by added a chloro atom in the para postion”. This analysis can be done at multiple levels, for instance all occurrences of the transformation, occurrences against a particular target or occurrences against a target family. For each transformation the average change in potency is recorded which can be used to make quantitative predictions. Suppose a pair of molecules were only the potency of one is known, but the other molecule is related to the first by a single transformation. The predicted potency of the second molecule is the potency of the first plus the average potency change associated with the transformation. Herein lies the power of MMPs compared to classic QSAR regression methods: not only can the potency of novel molecules be predicted but the transformations can be applied as idea generator to come up with reasonable ideas of what the novel molecule(s) should be. In the presentation it is shown how the above can be done using the new MMP algorithm in Pipeline Pilot 8.5 using publicly available datasets from ChEMBL.
(Accelrys European Science Symposium, Brussels, June 2012)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Cecin’est pas unepaire de moléculairesappariés.
  • Note: duplicates! Same transformation (but with different common core). Same transformation can have different delta(pIC50).
  • For the eagle-eyed: note that the transformation on the left is not derived from the pair on the right
  • X-axis: test set reactions sorted by Bayesian score i.e. likelihood that they increase potency at least 10fold. Y-axis: Retrieval rate (pct)e of true reactions that increase potency at least 10fold (i.e. true positives)Random model: screen random 50%, find 50% of true positives. This is the line of decencyPerfect model: Hypothetical model that first identifies all true positives, then the rest.In between random and perfect are the real models. This is typical graph for a good Bayesian model.
  • X-axis: difference between predicted pIC50 bin and real pIC50 bin. The smaller the better
  • Could filter out low confidence transformations from the 24k set. Could also remove ones that add too much Mw, etcMany of the design ideas are duplicates or invalid structures
  • Need to investigate this…Also note larger deviations at low activity range.
  • Automatic Compound Design by Matched Molecular Pairs

    1. 1. Automatic Compound Design byMatched Molecular PairsWillem van HoornSenior Solutions ConsultantProfessional Services
    2. 2. Contents• Matched Molecular Pairs (MMPs)• Implementation in PP• Reaction Fingerprints• Using MMPs as automatic learning machine
    3. 3. Ceci n’est pas une MMP Similarity = 0.55 / 0.98 (ECFP_4 / MDL public keys)   Sildenafil Vardenafil MMP: - Single change - Typically: 1 or 2 bond cleavage; replace R-group or template
    4. 4. Recent AZ review
    5. 5. MMP as predictor of activity Classic QSAR with full molecule descriptors QSAR using MMP pIC50(m-Br to m-Cl-p-F) = -0.19
    6. 6. What have the MMPs done for us?Classic QSAR / regression• More generic, can predict >1 change• Interpretability variesMMPs• Can only predict “one step away from known”• Very interpretable• Can answer “what to make next” challenge
    7. 7. “Learning Machine” using MMPs
    8. 8. Example of MMP learning machine 1  2 transformation applied to 4 compound 3 should yield more attractive compound 4
    9. 9. MMP in Pipeline Pilot Components ProtocolsPP 8.5 CU1
    10. 10. PP MMP algorithm based on GSK publication
    11. 11. Test set: EGFR from ChEMBL - ChEMBL version 11 - 4609 IC50 values - 3581 compounds Ed Griffen et al J Med Chem. 2011, 54, 7739-50
    12. 12. Generate MMPs and transformations >90k Slow! MMPs in <1 minute MMP output Full transformation MMP transformation
    13. 13. pIC50 distribution of transformations 90,343 MMPs yield 180,684 transformations (AB / BA) bioisosters activity cliffs activity cliffs 10fold 100fold 1000fold etc
    14. 14. MMP transformations vs. full reactions Not specific enough, seen >>1 in Too specific, seen once in dataset, data set but large stddev( pIC50) pIC50 statistics n=1 Would like to have something that describes “reaction centre + nearby environment” Would like increase confidence by looking at similar MMP transformations (with similar pIC50)
    15. 15. PP reaction fingerprints: RCFP • RCFP are similar to ECFP, atoms described by:  Charge  Hybridization  Whether the atom is Reactant or Product  Whether or not the atom is in the “Reaction Site” • Need mapped reactionsPP 8.5
    16. 16. Reaction mapping is necessary Mapped Unmapped All features, no Only features describing information whether reaction site atom is in product or reactant
    17. 17. Reaction direction matters Reaction fingerprints are not identical A→ B ≠ B → A
    18. 18. MMP transformation as rules Context of MMP transformation“Rule” = MMP transformationEffect = pIC50
    19. 19. Tanimoto seach of MMP transformations A single observation… pIC50 = 1.3 pIC50 = 1.9 pIC50 = 1.5 … becomes more believable when looking at similars pIC50 = 1.8
    20. 20. Express significance as Bayesian probability Bayesian model “Good” molecules: pIC50 ≥ 1 Rank test set by likelihood transformation will yield ≥10fold increase in potency
    21. 21. Bayes can predict MMP 10 fold increase Enrichment plots of test set 100% 90% 80% 70% • RCFP_6 > RCFP_4 % Actives Captured 60% • RCFP_4 >> RCFP_2 50% 40% Random Model 30% Perfect Model 20% dActivity_class_increase_RCFP_2 Model dActivity_class_increase_RCFP_4 Model 10% dActivity_class_increase_RCFP_6 Model 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % of Samples
    22. 22. Confidence vs. pIC50pIC50 Bayesian score = confidence
    23. 23. Semi-quantitative Bayesian predictions • Multi-category Bayesian • Class = pIC50 bin • RCFP_6 Compare: • Normalised Probability (default) • #Enrichment • #EstPGood • Prediction
    24. 24. #EstPGood score smallest prediction error22.5% 22.5%30.0% 19%
    25. 25. MMP vs. Full molecule transformations vs.22.5% 30.0% Modelling with mapped reactions works better (it should)
    26. 26. MMP Idea Generator: Training• 80% training set – Generate MMP transformations – Learn classic regression model (PLS) – Learn Bayesian model from reaction fingerprints
    27. 27. MMP Idea Generator: Test ~34k transformations >6.5M design ideas Test set • ~5.6 predictions per test set molecule • MMP pIC50 := mean (pIC50reactant + pIC50transformation) • RCFP pIC50 := mean (pIC50reactant + pIC50predicted by Bayes)Runtime ~ 30 min
    28. 28. QSAR by MMP
    29. 29. QSAR by Bayes / RCFP_6
    30. 30. SAR by MMP vs. SAR by PLS ECFP_6 / phys property descriptors MMP PLS • MMP predictions nearly as good as PLS predictions • Not 100% like with like comparison: fewer predictions for MMP
    31. 31. Consensus MMP & PLS predictions Found by MMP: 11 / 56 Consensus: 26 / 62 Found by PLS: 10 / 56 12 / 1006 Red: top 5% by pIC50 (59) Solid: top 10% (118) by MMP or PLS. Total = 174
    32. 32. Conclusions• For one dataset it has been shown that – MMP transformations can form basis of an automatic “Learning Machine” – Can select “significant rules” – Consensus MMP/regresssion activity prediction works better than individual predictions
    33. 33. Spares
    34. 34. MMP vs. Bayes/RCFP predictions