Richard Cramer 2014 euro QSAR presentation

Might Template CoMFA Integrate Structure-Based and Ligand-Based
Design? Some Remarkable Predictions
Richard D. Cramer (cramer@tripos.com)
EuroQSAR 2014, St. Petersburg, Russian Federation
September 4, 2014

True Predictions: for a random half of ChEMBL’s facXa data
SDEP = 1.14
n = 1907
.. from a single CoMFA model
© Copyright 2014 Certara, L.P. All rights reserved.

Diversity of ChEMBL factorXa structures
267 publications
1339 “reduced”
Bemis-Murcko
skeletons
>100 assay protocols
Twenty random structures

Template CoMFA: automated general 3D-QSAR setup
2D Test Set
(Predictions!)
3D templates By using aligned
3D structures (from
X-ray, pharmacophore, ??)
as templates
Template
CoMFA
2D Training Set
Aligned (3D)
Training Set
CoMFA
model
Cramer, R. D.; Wendt, B. J. Chem. Inf. Model. 2014, 54, 660-671.

The experimental data available for factor Xa
From bindingdb From ChEMBL
5
Goal:
Predict ChEMBL
(3900+ usable)
12 templates
(.pdb references)
270 training SAR
(analogs of the
templates)

Factor Xa (training set = random half of ChEMBL)
bindingDB
12 .pdb
SDEP = 1.14
n = 1907
Training Set
Test Set
ChEMBL
Model: q2=.381/ sdep =1.15)
Predictions’ SDEP ==
model’s SDEP!

bindingDB
12 .pdb
SDEP = 1.14
n = 1907
What’s going on?
Training Set
Test Set
ChEMBL
Model: q2=.381/ sdep =1.15)
This is the surprise!
Not surprising ..
Representative of
“all small mol space”?

More about this remarkable prediction result ..
• Other biological targets?
• The “why” and “how” of such results
• Drill-down on this factorXa result
• Toward its possible applications
• Other attributes of template CoMFA
8

Template CoMFA had succeeded on ~90% of 114 targets
Includes:
All the 74 targets
in bindingdb.org
referencing more
than one .pdb
Cramer, R. D. J. Chem. Inf. Model. 2014, 54, 2147–2156.

Another “all-CheMBL”target: Checkpoint kinase 1 predictions
SDEP=1.14
But a third target, carbonic anhydrase II, did not work
(no model from ChEMBL)

Fundamentals of Template CoMFA
• Theory underlying 3D-QSAR: The primary cause of potency
differences among (non-covalently acting) ligands is steric
and electrostatic field differences.
• Empirical Observations: When the goal is an informative
comparison of ligand field differences, increasing ligand
shape similarity seems at least as productive as increasing
physicochemical precision (by, e.g., docking).
• Concept: Template CoMFA seeks ligand shape similarity by:
– “Copying” coordinates from any atom within any template
ligand that “best matches” a candidate’s (training or test
set) atom
– Using the topomer protocol to generate coordinates for the
still remaining “non-matching” atoms

Many templates – how is the one “best match” chosen?
• “Best match” == best “anchor bond” pairing
– The best anchor bond pairing maximizes the size of
anchor-bond-rooted branches having “similar” atoms
– To be considered, a possible anchor bond must be “interesting”
– The “best match” search is exhaustive
• Every interesting bond in the candidate
• Vs every template
• Vs every interesting bond in each template
– “Atom similarity” considers types, properties, topological locations
– The actual alignment (“coordinate copying”) then begins by overlaying
the chosen “candidate” anchor onto the chosen template anchor bond

Factor Xa inhibitors having .pdb references in bindingdb (2D)
© Tripos, L.P. All Rights

The only 3D info: the 12 overlaid factor Xa templates

.pdb Template #10 and its ChEMBL “homologues“ (2D)
15

.pdb Template #10 and its ChEMBL “homologues” (3D)
16

Template #10 and some of its ChEMBL Non-homologues (2D)
17

Template #10 and some of its ChEMBL Non-homologues (3D)
18

Combined Template CoMFA Alignments and Contours

Integration of Structure- and Ligand-based Design (chk1)
Steric Contours Electrostatic Contours
Color coded by electrostatic potential
20
Receptor pocket surfaces

What might this capability be good for?
1. Are so many training set structures needed?
2. Is there any way to put confidence limits around an
individual prediction?
3. But crystal structures are not available for many important
biological targets?
21

1. Much smaller (random) training sets do seem useful
CoMFA stats (ntrng)
nTrng nPred ratio SDEP #Cmp q2 SDEP r2 s
1908 1907 1x 1.14 9 0.405 1.13 0.630 0.89
954 2861 3x 1.22 12 0.337 1.20 0.796 0.67
477 3338 7x 1.30 10 0.336 1.22 0.856 0.57
239 3565 15x 1.30 2 0.190 1.31 0.522 1.00

2. Confidence limits can be tightened with similarity metrics
• Suppose the following project situation (using facXa)
– Actual pI50 > 8.65 (one SD > mean pI50) must be avoided
– Proposal: if predicted pI50 > 7.2 (mean pI50), no test needed
– What is the error rate (false negative)?
• Suppose a predicted pI50 is rejected if similarity to its CoMFA
template is too low
– Then what is the error rate?
23
Pred pI50
<7.2
Obsvd
pI50
>8.65
False
Negative
% False
Negative
All structures 1907 629 185 29.4
& topdiff <200 168 98 8 8.2
& mathv > 0.65 339 348 26 7.5
& asim > .999 694 306 56 18.3
& fgpt Tan >.75 98 93 4 4.3

3. No X-ray structures?
• Ligand-based approaches (pharmacophoric &/or shape)
• The pharmacophoric “elephant in the room”
– Enormous configurational search space
– Criteria for a correct pharmacophore are mostly subjective
• A possible objective criterion for a correct pharmacophore ?
– obtaining a satisfactory CoMFA model from a training set..
– aligned by template CoMFA alignment with that pharmacophoric
hypothesis as its templates
24

Template CoMFA Attributes (.. implicit in talk content !!)
• TC is a ligand alignment protocol for classical CoMFA that:
– As input, only uses 3D template(s) and 2D SAR table, thus providing:
• Fast and convenient throughput
• Objectively determined models
• Application of crystallographic and/or pharmacophoric constraints
• No limitations on structural applicability
– As output, enables, practically:
• Rapid, objective, structurally unlimited potency predictions that so far are
reasonably accurate
• More structurally informative contour maps
• 3D database searching with potency predictions
• Potency-prediction-constrained de novo design
– Its 3D-QSAR models can:
• Successfully combine multiple series into a single model
• Be generated completely automatically

Thanks to everyone who helped!
• Bernd Wendt for “bleeding edge” trials and feedback
• Supervisors who have kept paying me -- mostly to pursue
this topomer/self-similarity thing (for over twenty years now)
– John McAlister
– Jim Hopkins
– Dan Weiner
– Jim Mahan
26

Template CoMFA References
• Cramer, R. D. Template CoMFA applied to 114 Biological Targets. J. Chem. Inf. Model.,
2014, 54, 2147–2156.
• Wendt, B.; Cramer, R. D. Challenging the Gold Standard for 3D-QSAR: Template CoMFA
versus X-ray Alignment. J. Comp.-Aided Drug Design, 2014 , accepted.
• Cramer, R. D.; Wendt, B. Template CoMFA: The 3D-QSAR Grail? J. Chem. Inf. Model.
2014, 54, 660-671.
• Cramer, R. D. Rethinking 3D-QSAR. J. Comp.-Aided Drug Design, 2011, 25, 197-201.
• Cramer, R. D.; Jilek, R. J.; Guessregen, S.; Clark, S. J.; Wendt, B.; Clark, R. D..“Lead-
Hopping”. Validation of Topomer Similarity as a Superior Predictor of Similar Biological
Activities. J. Med. Chem., 2004, 47, 6777-6791.
• Jilek, R. J., Cramer, R. D. Topomers: A Validated Protocol for their Self-Consistent
Generation. J. Chem. Inf. Comp. Sci. 2004, 44, 1221-1227.
• Cramer, R. D. Topomer CoMFA: A Design Methodology for Rapid Lead Optimization, J.
Med. Chem. 2003, 46, 374-389.
• Cramer, R. D.; Clark, R. D.; Patterson, D. E.; Ferguson, A. M. Bioisosterism as a molecular
diversity descriptor: steric fields of single topomeric conformers. J. Med. Chem. 1996, 39,
3060-3069.
• Patterson, D. E.; Cramer, R. D.; Ferguson, A. M.; Clark, R. D.; Weinberger, L. E.
Neighborhood behavior: a useful concept for validation of molecular diversity descriptors. J.
Med. Chem. 1996, 39, 3049-3059.
27

factorXa predictions (if training set = bindingdb)
bindingDB
11 .pdb
270 2D SAR
© Copyright 2014 Certara, L.P. All rights reserved. 28
Goal:
Predict ChEMBL
(3900+ usable)
SDEP = 1.74
q2=.577 / SDEP=.86
This doesn’t work!
Cramer, R. D.; Wendt, B. JCIM 2014, 54, 660-671.

A second target: Checkpoint kinase 1 predictions
SDEP=1.46 SDEP=1.14
Training set from bindingdb SAR Training set = half ChEMBL
But a third target, carbonic anhydrase II, did not work (no model from ChEMBL)

Why is pure shape similarity so productive?
Assay should
be constant..
The only possible cause of this pIC50 difference
is the difference in the fields surrounding F => H
– any docking pose change from that field
difference is only mechanistic and can be
ignored for QSAR purposes
pIC50
-H 7.2
F -F 7.9
WHILE CONVERSELY:
Docking (small combi library) moves
the core around, producing field
variation that is noise, because ..
..an invariant core cannot have
caused changes in biological activity

How can Topomer CoMFA be so Effective? (2)
Multiple Regression and PLS are different !!
Input data:
+ Many columns of
random x values
Y X
6.0 6.0
3.3 3.3
0.9 0.9
5.3 5.3
Y X
6.0 6.0
3.3 3.3
0.9 0.9
5.3 5.3
X X X X X X
.. .. .. .. .. ..
.. .. .. .. .. ..
.. .. .. .. .. ..
q2 for Y = f(X)
Multiple Regression PLS
1.000 1.000
1.000  0.000 !!
Clark, M.; Cramer, R.D. Quant. Struct.-Act. Relat. 1993, 12, 137-145
One perfectly
correlated
descriptor

Examples of Issues During “Atom Matching”
3D template
topomer
template

“Topomer” positioning of unmapped atoms
• Topomer: a single “black-box constructed” 3D model of a monovalent
*
fragment
• Topomer protocol:
– Only input is the “2D structure” of a single fragment (A)
– “Embedded” in 3D space by superposing the open valence (B)
– Valence geometries (bonds, angles, rings) from Concord (or
Corina) (B)
– Torsions, stereochemistry, ring flips from canonical rules (C)
– Resulting “strain energy” is ignored
*
A B C D

Another random sample of ten training set structures
.. suppose you only needed to align each of these ten
structures to one of those twelve templates ..

Richard Cramer 2014 euro QSAR presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Richard Cramer 2014 euro QSAR presentation

Similar to Richard Cramer 2014 euro QSAR presentation (20)

More from Certara

More from Certara (7)

Recently uploaded

Recently uploaded (20)

Richard Cramer 2014 euro QSAR presentation