David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
David Evans and George PapadatosLilly Research Centre, Erl Wood Manor, Windlesham, UK 22nd September 2011
• Discover new chemotypes• Multiobjective space • Isosteres in activity • Improvements in properties• Want to use multiple tools in same environment • But understand what works when
• Open Source Workflow tool – main client is free• But support is available and can integrate commercial vendors + in- house code as nodes• Have released many Erl Wood nodes to KNIME community site • http://tech.knime.org/community/erlwood
FieldAlign Xedmin XedexXedmin•XED minimization FieldView•2D -> 3D •Launches FieldView •View field points +Xedex energies + other data• Conformationalanalysis All nodes pass SDF
Don’t want to load all databases + secure onto all intranet ! users’ PCs SOAP WebCommand- Serviceline search •Apache node Tomcat Platform-independent communication
Non-proprietary structure • Read in pre-built hypothesis (MOE, Phase) • Or sketch from template molecule • Jmol based visualizer • Can also annotate and filter hits, aids manual inspection
How well do automated pharmacophoremethods do compared to 2D methods?Maximum Unbiased Validation (MUV) dataset• 17 targets, total 30 ligands and 15000 decoys per target,source: PubChem bioactivity data.• Wide-ranging targets: hormone receptors, kinases, proteases,GPCRs plus others (e.g. HSP90, HIV RT).• Unbiased for chemical analogues as MUV ligands pre-clustered with 2D fingerprint •1.16 compounds per scaffold class MUV: J. Chem. Inf. Model., 2009, 49 (2), 169-184.
• Have looked at whole molecule similarity• Is there more data if we find fragments which maintainactivity?• Matched Molecular pair analysis (MMP)• Fragments compounds and finds pairs where only one fragment differs
The mining and statistical analysis of transformations andtheir impact on properties of interest (e.g. solubility oractivity) left molecule right molecule transformation ΔSolubility (mgml) HF -0.8 Br OCH3 +1.2 +2.4
(*in an automated and unsupervised way) It used to be a slow and computationally expensive process... • Pair-wise maximum common substructure extraction – O(N2) Recently a much more efficient algorithm was published 1) Cleave all acyclic single bonds, one by one: 2) Index all the fragments (cf. book index): 3) Enumerate the values for each key: >> * Mol A >> Mol B * Hussain and Rea (2010). J. Chem. Inf. and Model., 50 (3), 339-348. Wagener and Lommerse (2006). J. Chem. Inf. and Model., 46 (2), 677-685.
In: MolRegnos (IDs), structures (in RDKit format) andproperty valuesOut: Matched pairs (left and right molecule, IDs,transformation, property values, ΔP, context,transformation atom count)Available as an Erl Wood community contribution node
Find isosteres in chEMBLchEMBL – Database of published medicinal chemistry activity data – Using chEMBL_10 , total >1,000,000 compoundsUse here just human protein kinase inhibitorsQuality assurance for chEMBL data (SQL statement) • Med. chem. friendly compounds, parent structure, not downgraded, confidence score = 9, exact IC50 or Ki values only (converted to pIC50/pKi) ~14K data points • Compare biological values coming from the same assay ID onlyAggregate transformations; calculate and bin ΔpIC50s in3 bins • Good – Bad – Neutral(depending on a cut-off c = 0.4 log units)
• Eachtransformation has a neutral count• Absolute value or percentage:NeutralCount%
chEMBL workflow outputs isosteric fragments How similar are isosteres in 2D fingerprint space? In field space? Could fields help us find unexpected isosteres?
• 1802 fragment pairs from chEMBL_10 kinase data set• 481 with no rotatable bonds left or right • Simplifies conformational analysis• For each fragment pair 1. Swap attachment points for adamantyl 2. FieldAlign to get field similarity (Use adamantyl to constrain overlay) 3. RDKit fingerprint similarity – topological Daylight-esque 4. Correct similarities for adamantyl• Are there isosteric pairs with high field similarity but low RDKitsimilarity?
Size by NeutralField Count %Sim Larger more isosteric Pairs with high field similarity but low 2D similarity Pairs with high field and 2D similarity RDKit Sim
Size by NeutralField Count %Sim Only those with >60% isosteric examples Thiophene -> Phenol RDKit Sim
Size by NeutralField Count %Sim Only those with >60% isosteric examples Imidazole-> Morpholine? RDKit Sim
Size by NeutralField Count %Sim Only those with >60% isosteric examples Some small fragments RDKit Sim
• 6299 data points from thermodynamic solubility assay• 423 single-point transformations• 215 no-rotatable point transformations• Aggregate transformations; calculate and bin ΔlogS in 3 bins • Good – Bad – Neutral (c = 0.3 log units)• Are there transformations which increase solubility with lowfield similarity but high RDKit similarity?
Size by GoodField Count %Sim Only those with >60% boosting examples Ring contraction + twist ? RDKit Sim
Size by GoodField Count %Sim Only those with >60% boosting examples Big boost from morpholine RDKit Sim
• Can mine chEMBL data for non-obvious isosteres • Will other data sets find more?• Would like to improve workflow to make isostere data set for3D similarity comparison • Improve fragmentation/conformer/ alignment handling? • Need to include whole molecule? • Need 3D binding site data as well to confirm isosterism?• KNIME platform developing • Virtual screening and evaluation environment • Rapid experimentation with varied tools • http://tech.knime.org/community/erlwood
George Papadatos Juliette Pradon Hina Patel Nikolas Fechner David Thorner Michael BodkinKNIME, chEMBL + Cresset !
ROC curves forretrieval of >66% isosteric groupsField similarityperforms better than RDKitBut AUC = 0.68 Workflow not optimized for this purpose