David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'

2,185 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,185
On SlideShare
0
From Embeds
0
Number of Embeds
75
Actions
Shares
0
Downloads
43
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'

  1. 1. David Evans and George PapadatosLilly Research Centre, Erl Wood Manor, Windlesham, UK 22nd September 2011
  2. 2. • Discover new chemotypes• Multiobjective space • Isosteres in activity • Improvements in properties• Want to use multiple tools in same environment • But understand what works when
  3. 3. • Open Source Workflow tool – main client is free• But support is available and can integrate commercial vendors + in- house code as nodes• Have released many Erl Wood nodes to KNIME community site • http://tech.knime.org/community/erlwood
  4. 4. FieldAlign Xedmin XedexXedmin•XED minimization FieldView•2D -> 3D •Launches FieldView •View field points +Xedex energies + other data• Conformationalanalysis All nodes pass SDF
  5. 5. FieldAlign• Flexible alignmentof query moleculesonto template
  6. 6. WHY ?Process is more than just the database search Company Confidential Copyright © 2008 Eli Lilly and Company
  7. 7. Don’t want to load all databases + secure onto all intranet ! users’ PCs SOAP WebCommand- Serviceline search •Apache node Tomcat Platform-independent communication
  8. 8. Non-proprietary structure • Read in pre-built hypothesis (MOE, Phase) • Or sketch from template molecule • Jmol based visualizer • Can also annotate and filter hits, aids manual inspection
  9. 9. How well do automated pharmacophoremethods do compared to 2D methods?Maximum Unbiased Validation (MUV) dataset• 17 targets, total 30 ligands and 15000 decoys per target,source: PubChem bioactivity data.• Wide-ranging targets: hormone receptors, kinases, proteases,GPCRs plus others (e.g. HSP90, HIV RT).• Unbiased for chemical analogues as MUV ligands pre-clustered with 2D fingerprint •1.16 compounds per scaffold class MUV: J. Chem. Inf. Model., 2009, 49 (2), 169-184.
  10. 10. • Have looked at whole molecule similarity• Is there more data if we find fragments which maintainactivity?• Matched Molecular pair analysis (MMP)• Fragments compounds and finds pairs where only one fragment differs
  11. 11. The mining and statistical analysis of transformations andtheir impact on properties of interest (e.g. solubility oractivity) left molecule right molecule transformation ΔSolubility (mgml) HF -0.8 Br  OCH3 +1.2 +2.4
  12. 12. (*in an automated and unsupervised way) It used to be a slow and computationally expensive process... • Pair-wise maximum common substructure extraction – O(N2) Recently a much more efficient algorithm was published 1) Cleave all acyclic single bonds, one by one: 2) Index all the fragments (cf. book index): 3) Enumerate the values for each key: >> * Mol A >> Mol B * Hussain and Rea (2010). J. Chem. Inf. and Model., 50 (3), 339-348. Wagener and Lommerse (2006). J. Chem. Inf. and Model., 46 (2), 677-685.
  13. 13. In: MolRegnos (IDs), structures (in RDKit format) andproperty valuesOut: Matched pairs (left and right molecule, IDs,transformation, property values, ΔP, context,transformation atom count)Available as an Erl Wood community contribution node
  14. 14. Find isosteres in chEMBLchEMBL – Database of published medicinal chemistry activity data – Using chEMBL_10 , total >1,000,000 compoundsUse here just human protein kinase inhibitorsQuality assurance for chEMBL data (SQL statement) • Med. chem. friendly compounds, parent structure, not downgraded, confidence score = 9, exact IC50 or Ki values only (converted to pIC50/pKi)  ~14K data points • Compare biological values coming from the same assay ID onlyAggregate transformations; calculate and bin ΔpIC50s in3 bins • Good – Bad – Neutral(depending on a cut-off c = 0.4 log units)
  15. 15. • Eachtransformation has a neutral count• Absolute value or percentage:NeutralCount%
  16. 16. chEMBL workflow outputs isosteric fragments How similar are isosteres in 2D fingerprint space? In field space? Could fields help us find unexpected isosteres?
  17. 17. • 1802 fragment pairs from chEMBL_10 kinase data set• 481 with no rotatable bonds left or right • Simplifies conformational analysis• For each fragment pair 1. Swap attachment points for adamantyl 2. FieldAlign to get field similarity (Use adamantyl to constrain overlay) 3. RDKit fingerprint similarity – topological Daylight-esque 4. Correct similarities for adamantyl• Are there isosteric pairs with high field similarity but low RDKitsimilarity?
  18. 18. Size by NeutralField Count %Sim Larger more isosteric Pairs with high field similarity but low 2D similarity Pairs with high field and 2D similarity RDKit Sim
  19. 19. Size by NeutralField Count %Sim Only those with >60% isosteric examples Thiophene -> Phenol RDKit Sim
  20. 20. Size by NeutralField Count %Sim Only those with >60% isosteric examples Imidazole-> Morpholine? RDKit Sim
  21. 21. Size by NeutralField Count %Sim Only those with >60% isosteric examples Some small fragments RDKit Sim
  22. 22. Non-proprietary structure(from PDB) WEE1 kinase PDB 2I06 Buried Solvent- exposed
  23. 23. Size by NeutralField Count %Sim Only those with >60% isosteric examples Me-tetrazole -> oxadiazole RDKit Sim
  24. 24. Size by NeutralField Count %Sim Only those with >60% isosteric examples Thiophene -> phenol RDKit Sim
  25. 25. Non-proprietary structure(from PDB)
  26. 26. • 6299 data points from thermodynamic solubility assay• 423 single-point transformations• 215 no-rotatable point transformations• Aggregate transformations; calculate and bin ΔlogS in 3 bins • Good – Bad – Neutral (c = 0.3 log units)• Are there transformations which increase solubility with lowfield similarity but high RDKit similarity?
  27. 27. Size by GoodField Count %Sim Only those with >60% boosting examples Ring contraction + twist ? RDKit Sim
  28. 28. Size by GoodField Count %Sim Only those with >60% boosting examples Big boost from morpholine RDKit Sim
  29. 29. • Can mine chEMBL data for non-obvious isosteres • Will other data sets find more?• Would like to improve workflow to make isostere data set for3D similarity comparison • Improve fragmentation/conformer/ alignment handling? • Need to include whole molecule? • Need 3D binding site data as well to confirm isosterism?• KNIME platform developing • Virtual screening and evaluation environment • Rapid experimentation with varied tools • http://tech.knime.org/community/erlwood
  30. 30. George Papadatos Juliette Pradon Hina Patel Nikolas Fechner David Thorner Michael BodkinKNIME, chEMBL + Cresset !
  31. 31. ROC curves forretrieval of >66% isosteric groupsField similarityperforms better than RDKitBut AUC = 0.68 Workflow not optimized for this purpose

×