Automatic Compound Design by Matched Molecular Pairs

Automatic Compound Design by
Matched Molecular Pairs
Willem van Hoorn
Senior Solutions Consultant
Professional Services

Contents

• Matched Molecular Pairs (MMPs)
• Implementation in PP
• Reaction Fingerprints
• Using MMPs as automatic learning machine

Ceci n’est pas une MMP
Similarity = 0.55 / 0.98 (ECFP_4 / MDL public keys)




Sildenafil Vardenafil

MMP:
- Single change
- Typically: 1 or 2 bond cleavage; replace R-group or template

Recent AZ review

http://pubs.acs.org/doi/abs/10.1021/jm200452d

MMP as predictor of activity

Classic QSAR with full molecule descriptors QSAR using MMP

pIC50(m-Br to m-Cl-p-F) = -0.19

What have the MMPs done for us?

Classic QSAR / regression
• More generic, can predict >1 change
• Interpretability varies

MMPs
• Can only predict “one step away from known”
• Very interpretable
• Can answer “what to make next” challenge

“Learning Machine” using MMPs

Example of MMP learning machine

1  2 transformation applied to
4
compound 3 should yield more
attractive compound 4

MMP in Pipeline Pilot

Components

Protocols

PP 8.5 CU1

PP MMP algorithm based on GSK publication

Test set: EGFR from ChEMBL

- ChEMBL version 11
- 4609 IC50 values
- 3581 compounds

Ed Griffen et al
J Med Chem. 2011, 54, 7739-50

Generate MMPs and transformations

>90k Slow!
MMPs in
<1 minute

MMP output
Full transformation

MMP transformation

pIC50 distribution of transformations
90,343 MMPs yield 180,684 transformations (AB / BA)

bioisosters

activity cliffs activity cliffs

10fold 100fold 1000fold etc

MMP transformations vs. full reactions

Not specific enough, seen >>1 in Too specific, seen once in dataset,
data set but large stddev( pIC50) pIC50 statistics n=1

Would like to have something that describes
“reaction centre + nearby environment”

Would like increase confidence by looking at similar
MMP transformations (with similar pIC50)

PP reaction fingerprints: RCFP

• RCFP are similar to ECFP, atoms described by:
 Charge
 Hybridization
 Whether the atom is Reactant or Product
 Whether or not the atom is in the “Reaction Site”
• Need mapped reactions

PP 8.5

Reaction mapping is necessary
Mapped Unmapped

All features, no
Only features describing information whether
reaction site atom is in product or
reactant

Reaction direction matters

Reaction fingerprints are not identical A→ B ≠ B → A

MMP transformation as rules

Context of MMP
transformation

“Rule” = MMP transformation
Effect = pIC50

Tanimoto seach of MMP transformations

A single observation…

pIC50 = 1.3

pIC50 = 1.9

pIC50 = 1.5

… becomes more believable when looking at similars
pIC50 = 1.8

Express significance as Bayesian probability

Bayesian model
“Good” molecules:
pIC50 ≥ 1

Rank test set by likelihood transformation will yield
≥10fold increase in potency

Bayes can predict MMP 10 fold increase
Enrichment plots of test set
100%

90%

80%

70%
• RCFP_6 > RCFP_4
% Actives Captured

60%
• RCFP_4 >> RCFP_2
50%

40%
Random Model
30%
Perfect Model

20% dActivity_class_increase_RCFP_2 Model

dActivity_class_increase_RCFP_4 Model
10%
dActivity_class_increase_RCFP_6 Model

0%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
% of Samples

Confidence vs. pIC50

pIC50

Bayesian score = confidence

Semi-quantitative Bayesian predictions
• Multi-category Bayesian
• Class = pIC50 bin
• RCFP_6

Compare:
• Normalised Probability (default)
• #Enrichment
• #EstPGood
• Prediction

#EstPGood score smallest prediction error
22.5%
22.5%

30.0% 19%

MMP vs. Full molecule transformations

vs.

22.5% 30.0%

Modelling with mapped reactions works better (it should)

MMP Idea Generator: Training

• 80% training set
– Generate MMP transformations
– Learn classic regression model (PLS)
– Learn Bayesian model from reaction fingerprints

MMP Idea Generator: Test

~34k transformations >6.5M design ideas

Test set

• ~5.6 predictions per test set molecule
• MMP pIC50 := mean (pIC50reactant + pIC50transformation)
• RCFP pIC50 := mean (pIC50reactant + pIC50predicted by Bayes)
Runtime ~ 30 min

SAR by MMP vs. SAR by PLS
ECFP_6 / phys property descriptors

MMP PLS

• MMP predictions nearly as good as PLS predictions
• Not 100% like with like comparison: fewer predictions for MMP

Consensus MMP & PLS predictions

Found by MMP: 11 / 56 Consensus: 26 / 62

Found by PLS: 10 / 56

12 / 1006

Red: top 5% by pIC50 (59)
Solid: top 10% (118) by MMP or PLS. Total = 174

Conclusions

• For one dataset it has been shown that
– MMP transformations can form basis of an
automatic “Learning Machine”
– Can select “significant rules”
– Consensus MMP/regresssion activity prediction
works better than individual predictions

MMP vs. Bayes/RCFP predictions

Automatic Compound Design by Matched Molecular Pairs

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Recently uploaded

Recently uploaded (20)

Automatic Compound Design by Matched Molecular Pairs

Editor's Notes