David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
1. David Evans and George Papadatos
Lilly Research Centre, Erl Wood Manor, Windlesham,
UK
22nd September 2011
2. • Discover new chemotypes
• Multiobjective space
• Isosteres in activity
• Improvements in properties
• Want to use multiple tools in same
environment
• But understand what works when
3. • Open Source Workflow tool – main client is free
• But support is available and can integrate commercial vendors + in-
house code as nodes
• Have released many Erl Wood nodes to KNIME community site
• http://tech.knime.org/community/erlwood
4. FieldAlign
Xedmin
Xedex
Xedmin
•XED minimization FieldView
•2D -> 3D •Launches FieldView
•View field points +
Xedex energies + other data
• Conformational
analysis All nodes pass SDF
7. Don’t want
to load all
databases
+ secure onto all
intranet !
users’ PCs
SOAP Web
Command- Service
line search •Apache node
Tomcat
Platform-independent communication
8. Non-proprietary structure • Read in pre-built hypothesis
(MOE, Phase)
• Or sketch from template molecule
• Jmol based visualizer
• Can also annotate and filter hits,
aids manual inspection
9. How well do automated pharmacophore
methods do compared to 2D methods?
Maximum Unbiased Validation (MUV) dataset
• 17 targets, total 30 ligands and 15000 decoys per target,
source: PubChem bioactivity data.
• Wide-ranging targets: hormone receptors, kinases, proteases,
GPCRs plus others (e.g. HSP90, HIV RT).
• Unbiased for chemical analogues as MUV ligands pre-
clustered with 2D fingerprint
•1.16 compounds per scaffold class
MUV: J. Chem. Inf. Model., 2009, 49 (2), 169-184.
10.
11. • Have looked at whole molecule similarity
• Is there more data if we find fragments which maintain
activity?
• Matched Molecular pair analysis (MMP)
• Fragments compounds and finds pairs where only one fragment differs
12. The mining and statistical analysis of transformations and
their impact on properties of interest (e.g. solubility or
activity)
left molecule right molecule transformation ΔSolubility (mgml)
HF -0.8
Br OCH3 +1.2
+2.4
13. (*in an automated and unsupervised way)
It used to be a slow and computationally expensive
process...
• Pair-wise maximum common substructure extraction – O(N2)
Recently a much more efficient algorithm was published
1) Cleave all acyclic single
bonds, one by one:
2) Index all the fragments (cf. book index):
3) Enumerate the values for each
key: >> *
Mol A >> Mol B *
Hussain and Rea (2010). J. Chem. Inf. and Model., 50 (3), 339-348. Wagener and Lommerse (2006). J. Chem. Inf. and Model., 46 (2), 677-685.
14. In: MolRegnos (IDs), structures (in RDKit format) and
property values
Out: Matched pairs (left and right molecule, IDs,
transformation, property values, ΔP, context,
transformation atom count)
Available as an Erl Wood community contribution node
15. Find isosteres in chEMBL
chEMBL
– Database of published medicinal chemistry activity data
– Using chEMBL_10 , total >1,000,000 compounds
Use here just human protein kinase inhibitors
Quality assurance for chEMBL data (SQL statement)
• Med. chem. friendly compounds, parent structure, not downgraded,
confidence score = 9, exact IC50 or Ki values only (converted to
pIC50/pKi) ~14K data points
• Compare biological values coming from the same assay ID only
Aggregate transformations; calculate and bin ΔpIC50s in
3 bins
• Good – Bad – Neutral(depending on a cut-off c = 0.4 log units)
17. chEMBL workflow outputs isosteric fragments
How similar are
isosteres in 2D
fingerprint space?
In field space?
Could fields help us
find unexpected
isosteres?
18. • 1802 fragment pairs from chEMBL_10 kinase data set
• 481 with no rotatable bonds left or right
• Simplifies conformational analysis
• For each fragment pair
1. Swap attachment points for adamantyl
2. FieldAlign to get field similarity (Use adamantyl to
constrain overlay)
3. RDKit fingerprint similarity – topological Daylight-esque
4. Correct similarities for adamantyl
• Are there isosteric pairs with high field similarity but low RDKit
similarity?
19. Size by Neutral
Field Count %
Sim Larger more
isosteric
Pairs with high
field similarity
but low 2D
similarity
Pairs with high
field and 2D
similarity
RDKit Sim
20. Size by Neutral
Field Count %
Sim
Only those with
>60% isosteric
examples
Thiophene -> Phenol
RDKit Sim
21. Size by Neutral
Field Count %
Sim
Only those with
>60% isosteric
examples
Imidazole->
Morpholine?
RDKit Sim
22. Size by Neutral
Field Count %
Sim
Only those with
>60% isosteric
examples
Some small
fragments
RDKit Sim
27. • 6299 data points from thermodynamic solubility assay
• 423 single-point transformations
• 215 no-rotatable point transformations
• Aggregate transformations; calculate and bin ΔlogS in 3 bins
• Good – Bad – Neutral (c = 0.3 log units)
• Are there transformations which increase solubility with low
field similarity but high RDKit similarity?
28. Size by Good
Field Count %
Sim
Only those with
>60% boosting
examples
Ring contraction
+ twist ?
RDKit Sim
29. Size by Good
Field Count %
Sim
Only those with
>60% boosting
examples
Big boost from
morpholine
RDKit Sim
30. • Can mine chEMBL data for non-obvious isosteres
• Will other data sets find more?
• Would like to improve workflow to make isostere data set for
3D similarity comparison
• Improve fragmentation/conformer/ alignment handling?
• Need to include whole molecule?
• Need 3D binding site data as well to confirm isosterism?
• KNIME platform developing
• Virtual screening and evaluation environment
• Rapid experimentation with varied tools
• http://tech.knime.org/community/erlwood
31. George Papadatos
Juliette Pradon
Hina Patel
Nikolas Fechner
David Thorner
Michael Bodkin
KNIME, chEMBL + Cresset !
32. ROC curves for
retrieval of >66%
isosteric groups
Field similarity
performs better
than RDKit
But AUC = 0.68
Workflow not
optimized for
this purpose