Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Slas talk 2016

464 views

Published on

Ensuring Chemical Structure, Biological Data and Computational Model Quality
A talk given at SLAS 2016 mon Jan 25th in San Diego
covers published work and recent forays with BIA 10-2474

Published in: Science
  • Be the first to comment

Slas talk 2016

  1. 1. Sean Ekins 1 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA. 2Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA 3Collaborations Pharmaceuticals, Inc., 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA. 4Phoenix Nest, Inc. P.O. BOX 150057, Brooklyn NY 11215, USA. 5Hereditary Neuropathy Foundation, 401 Park Avenue South, 10th Floor, NewYork, NY 10016, USA Email: ekinssean@yahoo.com Twitter: collabchem
  2. 2.  Database Quality  Molecule structure availability  Dispensing Error  Simulating Error  NIH Probe Quality  BIA 10-2474  "Well, here's another nice mess you've gotten me into!"
  3. 3.  What do we trust?
  4. 4.  1Billion molecules but how many are real
  5. 5. Med. Chem. Commun., 2010,1, 325-330 Used various filters (Pfizer,Glaxo, Abbott – implemented by University of New Mexico) with antimalarial datasets Found large percentages of libraries were failing filters Some filters more stringent than others (Alarm vs Glaxo) Proposed wider use of such filters PAINS also appeared in 2010
  6. 6. NPC Browser http://tripod.nih.gov/npc/ Database released and within days 100’s of errors found in structures DDT, 16: 747-750 (2011) Science Translational Medicine 2011 DDT 17: 685-701 (2012)
  7. 7. DDT, 18: 58-70 (2013) NCATS and MRC made molecule identifiers from several pharmas available without structures.. Continues today Limits computational repurposing efforts, transparency
  8. 8. http://goo.gl/dIqhU This editorial led to collaboration It’s Not Just Structure Quality we Need to Worry About
  9. 9. Images courtesy of Bing,Tecan McDonald et al., Science 2008, 322, 917. Belaiche et al., Clin Chem 2009, 55, 1883- 1884 Plastic Leaching
  10. 10. Using Literature Data From Different Dispensing Methods to Generate Computational Models Few molecule structures and corresponding datasets are public Using data from 2 AstraZeneca patents: Tyrosine kinase EphB4 pharmacophores (Accelrys Discovery Studio) were developed using data for 14 compounds IC50 determined using different dispensing methods Analyzed correlation with simple descriptors (SAS JMP) Calculated LogP correlation with log IC50 data for acoustic dispensing (r2 = 0.34, p < 0.05, N = 14) Barlaam, B. C.; Ducray, R.,WO 2009/010794 A1, 2009 Barlaam, B. C.; Ducray, R.; Kettle, J. G., US 7,718,653 B2, 2010
  11. 11. Compound # 5 0.002 0.553 4 0.003 0.146 7 0.003 0.778 W7b 0.004 0.152 8 0.004 0.445 W5 0.006 0.087 6 0.007 0.973 W3 0.012 0.049 W1 0.014 0.112 9 0.052 0.170 10 0.064 0.817 W12 0.158 0.250 W11 0.207 14.400 11 0.486 3.030 3.3 12.8 1.6 69.6 6.2 8.2 IC50 Acoustic (µM) IC50 Tips (µM) Ratio IC50Tip/IC50ADE 276.5 48.7 259.3 42.5 111.3 13.7 139.0 4.2 14 CompoundsWith Structures and IC50 Data Barlaam, B. C.; Ducray, R.,WO 2009/010794 A1, 2009 Barlaam, B. C.; Ducray, R.; Kettle, J. G., US 7,718,653 B2, 2010
  12. 12. -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 logIC50-tips log IC50-acoustic log IC50 Values for Tip-based Serial Dilution and Dispensing Versus Acoustic Dispensing with Direct Dilution Shows Poor R2 = 0.246 acoustic technique always gave more potent IC50 values PLoS ONE 8(5): e62325 (2013)
  13. 13. Hydrophobic features (HPF) Hydrogen bond acceptor (HBA) Hydrogen bond donor (HBD) Observed vs. predicted IC50 r Acoustic mediated process 2 1 1 0.92 Tip-based process 0 2 1 0.80 PLoS ONE 8(5): e62325 (2013) Acoustic Tip based Tyrosine Kinase EphB4 Pharmacophores Generated with Discovery Studio (Accelrys) Cyan = hydrophobic Green = hydrogen bond acceptor Purple = hydrogen bond donor Each model shows most potent molecule mapping
  14. 14. • An additional 12 compounds from AstraZeneca Barlaam, B. C.; Ducray, R., WO 2008/132505 A1, 2008 • 10 of these compounds had data for tip based dispensing and 2 for acoustic dispensing • Calculated LogP and logD showed low but statistically significant correlations with tip based dispensing (r2= 0.39 p < 0.05 and 0.24 p < 0.05, N = 36) • Used as a test set for pharmacophores • The two compounds analyzed with acoustic liquid handling were predicted in the top 3 using the ‘acoustic’ pharmacophore • The ‘Tip-based’ pharmacophore failed to rank the retrieved compounds correctly Test set Evaluation of Pharmacophores PLoS ONE 8(5): e62325 (2013)
  15. 15. Pharmacophores for the tyrosine kinase EphB4 generated from crystal structures in the protein data bank PDB using Discovery Studio version 3.5.5 Automated Receptor-Ligand Pharmacophore Generation Method Cyan = hydrophobic Green = hydrogen bond acceptor Purple = hydrogen bond donor Grey = excluded volumes Each model shows most potent molecule mapping Bioorg MedChem Lett 2010, 20, 6242-6245. Bioorg MedChem Lett 2008, 18, 5717-5721. Bioorg MedChem Lett 2008, 18, 2776-2780. Bioorg MedChem Lett 2011, 21, 2207-2211. PLoS ONE 8(5): e62325 (2013)
  16. 16. • In the absence of structural data, pharmacophores and other computational and statistical models are used to guide medicinal chemistry in early drug discovery. • Our findings suggest acoustic dispensing methods could improve HTS results and avoid the development of misleading computational models and statistical relationships. • Automated pharmacophores are closer to pharmacophore generated with acoustic data – all have hydrophobic features – missing fromTip- based pharmacophore model • Importance of hydrophobicity seen with logP correlation and crystal structure interactions • Public databases should annotate this meta-data alongside biological data points, to create larger datasets for comparing different computational methods. Dispensing Issues Summary PLoS ONE 8(5): e62325 (2013)
  17. 17.  Simple computational replica of experiment  Simulate experiments  Understand error  Just need assay protocol, data on imprecision and inaccuracy  Can be used before an assay is ever performed  IPython notebook available Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073-1086 (2015)
  18. 18. Simulate Error and bias in dispensing Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073-1086 (2015)
  19. 19. Can account for some but not all error Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073-1086 (2015)
  20. 20. The number of wells for dilution series can impact error Try simulation for yourself https://goo.gl/Rku8c5 Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073- 1086 (2015)
  21. 21.  NIH spent a decade funding HTS efforts as part of the MLSCN and MLPCN  By 2010 $576.6M in funding  Various definitions of a probe  Potency, selectivity, solubility and availability  Little has been done to learn from this work J Chem Inf Model. (2014) 10:2996-3004
  22. 22.  But do we really need a crowd?  Could 1 medicinal chemist be enough?  > 40 years experience Chris Lipinski scored the original 64 cpds – he was close to median Found more probes since 2009 • Now scored more than 300 NIH Probes for desirability Extensive due diligence Based on literature (public/private) Chemical Reactivity J Chem Inf Model. (2014) 10:2996-3004 J Med Chem. (2015) 5:2068-76
  23. 23.  79% of 322 probes are desirable J Chem Inf Model. (2014) 10:2996-3004
  24. 24.  Properties from CDD  Properties from Discovery Studio  Higher MWT, rotatable bonds and heavy atoms is desirable J Chem Inf Model. (2014) 10:2996-3004
  25. 25.  Desirable probes less likely to be filtered by PAINS or BadApple as promiscuous than those scored as undesirable.  (Fisher's exact test, p>0.0001 for PAINS and p=0.04 for BadApple). J Chem Inf Model. (2014) 10:2996-3004 Since the rule of 5 there has been a considerable focus on more rules – ALERTS, PAINS,QED, BadApple etc
  26. 26.  FCFP_6 descriptors + 8 simple descriptors  Leave out 50% x 100 of Bayesian models  5 fold cross validation for n307 models External test sets J Chem Inf Model. (2014) 10:2996-3004
  27. 27. • The colors on the heat map correspond to the value of the indicated metric for each probe, listed vertically. • The scale was normalized internally with green corresponding to the optimal condition within each metric. • Data in CDD Public and can be used with 3 fold cross validation ROC = 0.69 J Chem Inf Model. (2014) 10:2996-3004
  28. 28. http://goo.gl/PVkQeo Making the data more accessible as we are drowning in molecules  Ligand efficiency higher in undesirable compounds  Bayesian model preferable in classifying desirable compounds vs other molecule quality metrics  Model could improve probe selection, score libraries, prior to more extensive due diligence  Probes could be scored by additional chemists dependent on needs e.g. bias to CNS, anticancer.. J Chem Inf Model. (2014) 10:2996-3004
  29. 29.  Complexities in finding the NIH MLP probes in PubChem  Identifier and structure searches in CAS SciFinderTM reveals an extreme disclosure  The parallel worlds of commercial and public database disclosure do not completely intersect  Integration and intersections of databases and the need for bioassay ontology adoption Public Commercial J Med Chem. (2015) 5:2068-76
  30. 30.  Nobody confirmed molecule name / structure used in trial in first few days  Predictions with Polypharma, Bayesian models and SEA (Shoichet lab)  Suggested promiscuity, beyond target of FAAH
  31. 31. Raises questions on Openness, transparency Use of software for predictions Quality and utility of predictive tools But without information on structure its impossible
  32. 32. www.collabchem.com http://cheminf20.org/ http://cdsouthan.blogspot.com/
  33. 33.  Need more collaboration or openness in terms of availability of chemistry and biology data. Role of publishers?  Increased communication between the various databases that are both public and proprietary  Companies need to be more transparent structure/ID deposition of Phase I clinical trial data globally  Could lead to more opportunity for discovery / repurposing  Chance to profile compounds with computational tools and flag possible issues  Role of ‘armchair science’ and crowd in raising issues is valid
  34. 34. Alex M. Clark Antony J.Williams Christopher Southan John Chodera and Sonya Hanson NIH NCATS 9R44TR000942-02 “Biocomputation across distributed private datasets to enhance drug discovery”. Nadia Litterman Joe Olechno ChristopherA. Lipinski Barry A. Bunin JeremyYang for the link to BadApple Biovia for providing Discovery Studio
  35. 35. Modeling error in experimental assays using the bootstrap principle: understanding discrepancies between assays using different dispensing technologies. Hanson SM, Ekins S, Chodera JD. J Comput Aided Mol Des. 2015 Dec;29(12):1073-86. Open Source Bayesian Models. 2. Mining a "Big Dataset"To Create andValidate Models with ChEMBL. Clark AM, Ekins S. J Chem Inf Model. 2015 Jun 22;55(6):1246-60. Parallel worlds of public and commercial bioactive chemistry data. Lipinski CA, Litterman NK, Southan C,Williams AJ, Clark AM, Ekins S. J Med Chem. 2015 Mar 12;58(5):2068-76. Computational prediction and validation of an expert's evaluation of chemical probes. Litterman NK, Lipinski CA, Bunin BA, Ekins S. J Chem Inf Model. 2014 Oct 27;54(10):2996-3004. Dispensing processes impact apparent biological activity as determined by computational and statistical analyses. Ekins S, Olechno J,Williams AJ. PLoS One. 2013 May 1;8(5):e62325.
  36. 36.  https://github.com/choderalab/cadd-grc-2013  https://github.com/choderalab/cadd-grc- 2013/blob/master/slides/2013-07-21%20CADD%20GRC%20- %20Experimental%20Terror%20-%207%20interleaved.pdf  https://github.com/choderalab/dispensing-errors- manuscript/blob/master/notebooks/echo-vs-tips.ipynb

×