David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'

David Evans and George Papadatos
Lilly Research Centre, Erl Wood Manor, Windlesham,
UK

22nd September 2011

• Discover new chemotypes
• Multiobjective space
• Isosteres in activity
• Improvements in properties

• Want to use multiple tools in same
environment
• But understand what works when

• Open Source Workflow tool – main client is free

• But support is available and can integrate commercial vendors + in-
house code as nodes

• Have released many Erl Wood nodes to KNIME community site
• http://tech.knime.org/community/erlwood

FieldAlign
Xedmin
Xedex

Xedmin
•XED minimization FieldView
•2D -> 3D •Launches FieldView
•View field points +
Xedex energies + other data
• Conformational
analysis All nodes pass SDF

FieldAlign
• Flexible alignment
of query molecules
onto template

WHY ?
Process is more than just the database search

Company Confidential
Copyright © 2008 Eli Lilly and Company

Don’t want
to load all
databases
+ secure onto all
intranet !
users’ PCs
SOAP Web
Command- Service
line search •Apache node
Tomcat
Platform-independent communication

Non-proprietary structure • Read in pre-built hypothesis
(MOE, Phase)

• Or sketch from template molecule

• Jmol based visualizer

• Can also annotate and filter hits,
aids manual inspection

How well do automated pharmacophore
methods do compared to 2D methods?
Maximum Unbiased Validation (MUV) dataset

• 17 targets, total 30 ligands and 15000 decoys per target,
source: PubChem bioactivity data.

• Wide-ranging targets: hormone receptors, kinases, proteases,
GPCRs plus others (e.g. HSP90, HIV RT).

• Unbiased for chemical analogues as MUV ligands pre-
clustered with 2D fingerprint
•1.16 compounds per scaffold class

MUV: J. Chem. Inf. Model., 2009, 49 (2), 169-184.

• Have looked at whole molecule similarity

• Is there more data if we find fragments which maintain
activity?

• Matched Molecular pair analysis (MMP)
• Fragments compounds and finds pairs where only one fragment differs

The mining and statistical analysis of transformations and
their impact on properties of interest (e.g. solubility or
activity)
left molecule right molecule transformation ΔSolubility (mgml)

HF -0.8

Br  OCH3 +1.2

+2.4

(*in an automated and unsupervised way)
It used to be a slow and computationally expensive
process...
• Pair-wise maximum common substructure extraction – O(N2)

Recently a much more efficient algorithm was published
1) Cleave all acyclic single
bonds, one by one:

2) Index all the fragments (cf. book index):

3) Enumerate the values for each
key: >> *
Mol A >> Mol B *
Hussain and Rea (2010). J. Chem. Inf. and Model., 50 (3), 339-348. Wagener and Lommerse (2006). J. Chem. Inf. and Model., 46 (2), 677-685.

In: MolRegnos (IDs), structures (in RDKit format) and
property values

Out: Matched pairs (left and right molecule, IDs,
transformation, property values, ΔP, context,
transformation atom count)

Available as an Erl Wood community contribution node

Find isosteres in chEMBL
chEMBL
– Database of published medicinal chemistry activity data
– Using chEMBL_10 , total >1,000,000 compounds

Use here just human protein kinase inhibitors
Quality assurance for chEMBL data (SQL statement)
• Med. chem. friendly compounds, parent structure, not downgraded,
confidence score = 9, exact IC50 or Ki values only (converted to
pIC50/pKi)  ~14K data points
• Compare biological values coming from the same assay ID only

Aggregate transformations; calculate and bin ΔpIC50s in
3 bins
• Good – Bad – Neutral(depending on a cut-off c = 0.4 log units)

• Each
transformation
has a neutral
count

• Absolute value
or percentage:

NeutralCount%

chEMBL workflow outputs isosteric fragments

How similar are
isosteres in 2D
fingerprint space?

In field space?

Could fields help us
find unexpected
isosteres?

• 1802 fragment pairs from chEMBL_10 kinase data set

• 481 with no rotatable bonds left or right
• Simplifies conformational analysis

• For each fragment pair
1. Swap attachment points for adamantyl
2. FieldAlign to get field similarity (Use adamantyl to
constrain overlay)
3. RDKit fingerprint similarity – topological Daylight-esque
4. Correct similarities for adamantyl

• Are there isosteric pairs with high field similarity but low RDKit
similarity?

Size by Neutral
Field Count %
Sim Larger more
isosteric

Pairs with high
field similarity
but low 2D
similarity

Pairs with high
field and 2D
similarity
RDKit Sim

Size by Neutral
Field Count %
Sim
Only those with
>60% isosteric
examples

Thiophene -> Phenol

RDKit Sim

Size by Neutral
Field Count %
Sim
Only those with
>60% isosteric
examples

Imidazole->
Morpholine?

RDKit Sim

Size by Neutral
Field Count %
Sim
Only those with
>60% isosteric
examples

Some small
fragments

RDKit Sim

Non-proprietary structure
(from PDB)
WEE1
kinase
PDB 2I06

Buried

Solvent-
exposed

Size by Neutral
Field Count %
Sim
Only those with
>60% isosteric
examples

Me-tetrazole ->
oxadiazole

RDKit Sim

Size by Neutral
Field Count %
Sim
Only those with
>60% isosteric
examples

Thiophene ->
phenol

RDKit Sim

Non-proprietary structure
(from PDB)

• 6299 data points from thermodynamic solubility assay

• 423 single-point transformations

• 215 no-rotatable point transformations

• Aggregate transformations; calculate and bin ΔlogS in 3 bins
• Good – Bad – Neutral (c = 0.3 log units)

• Are there transformations which increase solubility with low
field similarity but high RDKit similarity?

Size by Good
Field Count %
Sim
Only those with
>60% boosting
examples

Ring contraction
+ twist ?

RDKit Sim

Size by Good
Field Count %
Sim
Only those with
>60% boosting
examples

Big boost from
morpholine

RDKit Sim

• Can mine chEMBL data for non-obvious isosteres
• Will other data sets find more?
• Would like to improve workflow to make isostere data set for
3D similarity comparison
• Improve fragmentation/conformer/ alignment handling?
• Need to include whole molecule?
• Need 3D binding site data as well to confirm isosterism?

• KNIME platform developing
• Virtual screening and evaluation environment
• Rapid experimentation with varied tools
• http://tech.knime.org/community/erlwood

George Papadatos
Juliette Pradon
Hina Patel
Nikolas Fechner
David Thorner
Michael Bodkin

KNIME, chEMBL + Cresset !

ROC curves for
retrieval of >66%
isosteric groups

Field similarity
performs better
than RDKit

But AUC = 0.68

Workflow not
optimized for
this purpose

David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'

Recommended

Recommended

More Related Content

Similar to David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'

Similar to David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs' (20)

More from Cresset

More from Cresset (20)

Recently uploaded

Recently uploaded (20)

David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'