More Related Content Similar to Richard Cramer 2014 euro QSAR presentation (20) Richard Cramer 2014 euro QSAR presentation1. Might Template CoMFA Integrate Structure-Based and Ligand-Based
Design? Some Remarkable Predictions
Richard D. Cramer (cramer@tripos.com)
EuroQSAR 2014, St. Petersburg, Russian Federation
September 4, 2014
2. True Predictions: for a random half of ChEMBL’s facXa data
SDEP = 1.14
n = 1907
.. from a single CoMFA model
© Copyright 2014 Certara, L.P. All rights reserved.
3. Diversity of ChEMBL factorXa structures
© Copyright 2014 Certara, L.P. All rights reserved.
267 publications
1339 “reduced”
Bemis-Murcko
skeletons
>100 assay protocols
Twenty random structures
4. Template CoMFA: automated general 3D-QSAR setup
© Copyright 2014 Certara, L.P. All rights reserved.
2D Test Set
(Predictions!)
3D templates By using aligned
3D structures (from
X-ray, pharmacophore, ??)
as templates
Template
CoMFA
2D Training Set
Aligned (3D)
Training Set
CoMFA
model
Cramer, R. D.; Wendt, B. J. Chem. Inf. Model. 2014, 54, 660-671.
5. The experimental data available for factor Xa
From bindingdb From ChEMBL
© Copyright 2014 Certara, L.P. All rights reserved.
5
Goal:
Predict ChEMBL
(3900+ usable)
12 templates
(.pdb references)
270 training SAR
(analogs of the
templates)
6. Factor Xa (training set = random half of ChEMBL)
bindingDB
12 .pdb
© Copyright 2014 Certara, L.P. All rights reserved.
SDEP = 1.14
n = 1907
Training Set
Test Set
ChEMBL
Model: q2=.381/ sdep =1.15)
Predictions’ SDEP ==
model’s SDEP!
7. bindingDB
12 .pdb
© Copyright 2014 Certara, L.P. All rights reserved.
SDEP = 1.14
n = 1907
What’s going on?
Training Set
Test Set
ChEMBL
Model: q2=.381/ sdep =1.15)
This is the surprise!
Not surprising ..
Representative of
“all small mol space”?
8. More about this remarkable prediction result ..
• Other biological targets?
• The “why” and “how” of such results
• Drill-down on this factorXa result
• Toward its possible applications
• Other attributes of template CoMFA
© Copyright 2014 Certara, L.P. All rights reserved.
8
9. Template CoMFA had succeeded on ~90% of 114 targets
© Copyright 2014 Certara, L.P. All rights reserved.
Includes:
All the 74 targets
in bindingdb.org
referencing more
than one .pdb
Cramer, R. D. J. Chem. Inf. Model. 2014, 54, 2147–2156.
10. Another “all-CheMBL”target: Checkpoint kinase 1 predictions
© Copyright 2014 Certara, L.P. All rights reserved.
SDEP=1.14
But a third target, carbonic anhydrase II, did not work
(no model from ChEMBL)
11. Fundamentals of Template CoMFA
• Theory underlying 3D-QSAR: The primary cause of potency
differences among (non-covalently acting) ligands is steric
and electrostatic field differences.
• Empirical Observations: When the goal is an informative
comparison of ligand field differences, increasing ligand
shape similarity seems at least as productive as increasing
physicochemical precision (by, e.g., docking).
• Concept: Template CoMFA seeks ligand shape similarity by:
– “Copying” coordinates from any atom within any template
ligand that “best matches” a candidate’s (training or test
set) atom
– Using the topomer protocol to generate coordinates for the
still remaining “non-matching” atoms
© Copyright 2014 Certara, L.P. All rights reserved.
12. Many templates – how is the one “best match” chosen?
• “Best match” == best “anchor bond” pairing
– The best anchor bond pairing maximizes the size of
anchor-bond-rooted branches having “similar” atoms
– To be considered, a possible anchor bond must be “interesting”
– The “best match” search is exhaustive
• Every interesting bond in the candidate
• Vs every template
• Vs every interesting bond in each template
– “Atom similarity” considers types, properties, topological locations
– The actual alignment (“coordinate copying”) then begins by overlaying
the chosen “candidate” anchor onto the chosen template anchor bond
© Copyright 2014 Certara, L.P. All rights reserved.
13. Factor Xa inhibitors having .pdb references in bindingdb (2D)
© Copyright 2014 Certara, L.P. All rights reserved.
© Tripos, L.P. All Rights
14. The only 3D info: the 12 overlaid factor Xa templates
© Copyright 2014 Certara, L.P. All rights reserved.
15. .pdb Template #10 and its ChEMBL “homologues“ (2D)
© Copyright 2014 Certara, L.P. All rights reserved.
15
16. .pdb Template #10 and its ChEMBL “homologues” (3D)
© Copyright 2014 Certara, L.P. All rights reserved.
16
17. Template #10 and some of its ChEMBL Non-homologues (2D)
© Copyright 2014 Certara, L.P. All rights reserved.
17
18. Template #10 and some of its ChEMBL Non-homologues (3D)
© Copyright 2014 Certara, L.P. All rights reserved.
18
20. Integration of Structure- and Ligand-based Design (chk1)
Steric Contours Electrostatic Contours
Color coded by electrostatic potential
© Copyright 2014 Certara, L.P. All rights reserved.
20
Receptor pocket surfaces
21. What might this capability be good for?
1. Are so many training set structures needed?
2. Is there any way to put confidence limits around an
individual prediction?
3. But crystal structures are not available for many important
biological targets?
© Copyright 2014 Certara, L.P. All rights reserved.
21
22. 1. Much smaller (random) training sets do seem useful
CoMFA stats (ntrng)
nTrng nPred ratio SDEP #Cmp q2 SDEP r2 s
1908 1907 1x 1.14 9 0.405 1.13 0.630 0.89
954 2861 3x 1.22 12 0.337 1.20 0.796 0.67
477 3338 7x 1.30 10 0.336 1.22 0.856 0.57
239 3565 15x 1.30 2 0.190 1.31 0.522 1.00
© Copyright 2014 Certara, L.P. All rights reserved.
23. 2. Confidence limits can be tightened with similarity metrics
• Suppose the following project situation (using facXa)
– Actual pI50 > 8.65 (one SD > mean pI50) must be avoided
– Proposal: if predicted pI50 > 7.2 (mean pI50), no test needed
– What is the error rate (false negative)?
• Suppose a predicted pI50 is rejected if similarity to its CoMFA
template is too low
– Then what is the error rate?
© Copyright 2014 Certara, L.P. All rights reserved.
23
Pred pI50
<7.2
Obsvd
pI50
>8.65
False
Negative
% False
Negative
All structures 1907 629 185 29.4
& topdiff <200 168 98 8 8.2
& mathv > 0.65 339 348 26 7.5
& asim > .999 694 306 56 18.3
& fgpt Tan >.75 98 93 4 4.3
24. 3. No X-ray structures?
• Ligand-based approaches (pharmacophoric &/or shape)
• The pharmacophoric “elephant in the room”
– Enormous configurational search space
– Criteria for a correct pharmacophore are mostly subjective
• A possible objective criterion for a correct pharmacophore ?
– obtaining a satisfactory CoMFA model from a training set..
– aligned by template CoMFA alignment with that pharmacophoric
hypothesis as its templates
© Copyright 2014 Certara, L.P. All rights reserved.
24
25. Template CoMFA Attributes (.. implicit in talk content !!)
• TC is a ligand alignment protocol for classical CoMFA that:
– As input, only uses 3D template(s) and 2D SAR table, thus providing:
• Fast and convenient throughput
• Objectively determined models
• Application of crystallographic and/or pharmacophoric constraints
• No limitations on structural applicability
– As output, enables, practically:
• Rapid, objective, structurally unlimited potency predictions that so far are
reasonably accurate
• More structurally informative contour maps
• 3D database searching with potency predictions
• Potency-prediction-constrained de novo design
– Its 3D-QSAR models can:
• Successfully combine multiple series into a single model
• Be generated completely automatically
© Copyright 2014 Certara, L.P. All rights reserved.
26. Thanks to everyone who helped!
• Bernd Wendt for “bleeding edge” trials and feedback
• Supervisors who have kept paying me -- mostly to pursue
this topomer/self-similarity thing (for over twenty years now)
– John McAlister
– Jim Hopkins
– Dan Weiner
– Jim Mahan
© Copyright 2014 Certara, L.P. All rights reserved.
26
27. Template CoMFA References
• Cramer, R. D. Template CoMFA applied to 114 Biological Targets. J. Chem. Inf. Model.,
© Copyright 2014 Certara, L.P. All rights reserved.
2014, 54, 2147–2156.
• Wendt, B.; Cramer, R. D. Challenging the Gold Standard for 3D-QSAR: Template CoMFA
versus X-ray Alignment. J. Comp.-Aided Drug Design, 2014 , accepted.
• Cramer, R. D.; Wendt, B. Template CoMFA: The 3D-QSAR Grail? J. Chem. Inf. Model.
2014, 54, 660-671.
• Cramer, R. D. Rethinking 3D-QSAR. J. Comp.-Aided Drug Design, 2011, 25, 197-201.
• Cramer, R. D.; Jilek, R. J.; Guessregen, S.; Clark, S. J.; Wendt, B.; Clark, R. D..“Lead-
Hopping”. Validation of Topomer Similarity as a Superior Predictor of Similar Biological
Activities. J. Med. Chem., 2004, 47, 6777-6791.
• Jilek, R. J., Cramer, R. D. Topomers: A Validated Protocol for their Self-Consistent
Generation. J. Chem. Inf. Comp. Sci. 2004, 44, 1221-1227.
• Cramer, R. D. Topomer CoMFA: A Design Methodology for Rapid Lead Optimization, J.
Med. Chem. 2003, 46, 374-389.
• Cramer, R. D.; Clark, R. D.; Patterson, D. E.; Ferguson, A. M. Bioisosterism as a molecular
diversity descriptor: steric fields of single topomeric conformers. J. Med. Chem. 1996, 39,
3060-3069.
• Patterson, D. E.; Cramer, R. D.; Ferguson, A. M.; Clark, R. D.; Weinberger, L. E.
Neighborhood behavior: a useful concept for validation of molecular diversity descriptors. J.
Med. Chem. 1996, 39, 3049-3059.
27
28. factorXa predictions (if training set = bindingdb)
bindingDB
11 .pdb
270 2D SAR
© Copyright 2014 Certara, L.P. All rights reserved. 28
Goal:
Predict ChEMBL
(3900+ usable)
SDEP = 1.74
q2=.577 / SDEP=.86
This doesn’t work!
Cramer, R. D.; Wendt, B. JCIM 2014, 54, 660-671.
29. A second target: Checkpoint kinase 1 predictions
SDEP=1.46 SDEP=1.14
Training set from bindingdb SAR Training set = half ChEMBL
But a third target, carbonic anhydrase II, did not work (no model from ChEMBL)
© Copyright 2014 Certara, L.P. All rights reserved.
30. Why is pure shape similarity so productive?
Assay should
be constant..
The only possible cause of this pIC50 difference
is the difference in the fields surrounding F => H
– any docking pose change from that field
difference is only mechanistic and can be
ignored for QSAR purposes
© Copyright 2014 Certara, L.P. All rights reserved.
pIC50
-H 7.2
F -F 7.9
WHILE CONVERSELY:
Docking (small combi library) moves
the core around, producing field
variation that is noise, because ..
..an invariant core cannot have
caused changes in biological activity
31. How can Topomer CoMFA be so Effective? (2)
Multiple Regression and PLS are different !!
Input data:
+ Many columns of
random x values
© Copyright 2014 Certara, L.P. All rights reserved.
Y X
6.0 6.0
3.3 3.3
0.9 0.9
5.3 5.3
Y X
6.0 6.0
3.3 3.3
0.9 0.9
5.3 5.3
X X X X X X
.. .. .. .. .. ..
.. .. .. .. .. ..
.. .. .. .. .. ..
q2 for Y = f(X)
Multiple Regression PLS
1.000 1.000
1.000 0.000 !!
Clark, M.; Cramer, R.D. Quant. Struct.-Act. Relat. 1993, 12, 137-145
One perfectly
correlated
descriptor
32. Examples of Issues During “Atom Matching”
3D template
© Copyright 2014 Certara, L.P. All rights reserved.
topomer
template
33. “Topomer” positioning of unmapped atoms
• Topomer: a single “black-box constructed” 3D model of a monovalent
*
© Copyright 2014 Certara, L.P. All rights reserved.
fragment
• Topomer protocol:
– Only input is the “2D structure” of a single fragment (A)
– “Embedded” in 3D space by superposing the open valence (B)
– Valence geometries (bonds, angles, rings) from Concord (or
Corina) (B)
– Torsions, stereochemistry, ring flips from canonical rules (C)
– Resulting “strain energy” is ignored
*
A B C D
34. Another random sample of ten training set structures
.. suppose you only needed to align each of these ten
structures to one of those twelve templates ..
© Copyright 2014 Certara, L.P. All rights reserved.