Promiscuous patterns and perils in PubChem and the MLSCN


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Promiscuous patterns and perils in PubChem and the MLSCN

    1. 1. Promiscuous patterns and perils in PubChem and the MLSCN Jeremy J Yang Cristian Bologa Tudor Oprea Division of Biocomputing Dept of Biochem. & Mol. Biology NM Mol. Libraries Screening Center University of New Mexico
    2. 2. Goals <ul><li>Improve HTS success rate by pre-filtering or retro-filtering </li></ul><ul><li>Detection of promiscuous compounds </li></ul><ul><li>Effectively data-mine PubChem, developing tools to ask related questions </li></ul><ul><li>Generalizable knowledge </li></ul>
    3. 3. Background <ul><li>HTS is a game of chance </li></ul><ul><li>Druglike, ADMET, Ro5, leadlike, probelike </li></ul><ul><li>Smaller still: fragment-based screening </li></ul><ul><li>Ligand efficiency </li></ul><ul><li>Compound fitness relative to HTS assay method </li></ul><ul><li>False positives </li></ul><ul><li>True but useless positives </li></ul><ul><li>Aggregators </li></ul><ul><li>Reactives </li></ul><ul><li>Studying the signal vs studying the noise </li></ul><ul><li>NIH Roadmap, MLI, MLSCN (MLPCN), MLSMR </li></ul>10 6 10 0
    4. 4. Promiscuity defined <ul><li>Known types </li></ul><ul><li>aggregators </li></ul><ul><li>reactives </li></ul><ul><li>true binders </li></ul>Bioactivity for multiple targets, i.e. “frequent-hitter”, non-selective binder Multi-target bioactivity for involved scaffold Scaffold may be a determinant or simply an informatic device. Scaffold promiscuity [working definition]
    5. 5. Real vs phony promiscuity <ul><li>Apparent promiscuity may be due to: </li></ul><ul><ul><li>Actual promiscuous binding </li></ul></ul><ul><ul><li>Artifact-ual promiscuity </li></ul></ul><ul><ul><ul><li>Reactivity </li></ul></ul></ul><ul><ul><ul><li>Aggregation </li></ul></ul></ul><ul><ul><ul><li>Fluorescence </li></ul></ul></ul><ul><ul><li>Experimental errors </li></ul></ul><ul><li>For given assay method, actual and artifact-ual equivalent for most HTS intents and purposes </li></ul>
    6. 6. PubChem: cathedral or bazaar? <ul><li>Mission and success of PubChem </li></ul><ul><ul><li>38 million compounds (March 2008)‏ </li></ul></ul><ul><ul><li>High volume worldwide use by scientific community </li></ul></ul><ul><li>Mission and success of NIH MLI and MLSCN </li></ul><ul><ul><li>10 centers, 3 years </li></ul></ul><ul><ul><li>~100? probes, ~1000 assays, ~30M data </li></ul></ul>(a) National Cathedral, Washington, DC (b) Santa Fe Flea Market, Santa Fe, NM a b ref: “The Cathedral and the Bazaar”, Eric Raymond, 1997.
    7. 7. PubChem and MLSCN* <ul><li>Publicly available bioactivity data on this scale is unprecedented accomplishment & opportunity. </li></ul><ul><li>With rapid growth, data quality is an important concern. </li></ul><ul><li>Overall goals broader than individual HTS campaigns </li></ul><ul><li>Big idea: MLSCN+PubChem, reaching critical mass ? </li></ul>*MLSCN = Molecular Libraries Screening Center Network, to be MLPCN, Molecular Libraries Program Center Network
    8. 8. PubChem and MLSMR <ul><li>Molecular Libraries Small Molecule Repository </li></ul><ul><li>Managed by BioFocusDPI/Galapagos </li></ul><ul><li>~ 300k compounds (March 2008) and growing </li></ul><ul><li>Used for primary HTS by all MLI centers </li></ul>Plot c/o Victor Panchenco, BioFocusDPI
    9. 9. MLSMR and MLSCN actives <ul><li>MLSMR actives: </li></ul><ul><li>MLSCN actives: </li></ul><ul><li>MLSMR AND MLSCN: </li></ul><ul><li>MLSMR OR MLSCN: </li></ul><ul><li>MLSMR AND ^MLSCN: </li></ul><ul><li>^MLSMR AND MLSCN: </li></ul>94,148 104,078 93,616 104,610 532 10,462 (Some MLSCN compounds – esp secondary assays – from other sources such as commercial vendors.)‏ *June 2008
    10. 10. Selected published pre-filtering expert knowledge* <ul><li>Rishton </li></ul><ul><ul><li>Reactive compounds and in vitro false positives in HTS, Drug Discov. Today, 2, 382-384. </li></ul></ul><ul><li>Hann </li></ul><ul><ul><li>Strategic pooling of compounds for high-throughput screening, Mike Hann et al., J. Chem. Inf. Comp. Sci., 1999, 39, 897-902. </li></ul></ul><ul><li>Rishton </li></ul><ul><ul><li>Nonleadlikeness and leadlikeness in biochemical screening, G. M. Rishton, Drug Disc. Today, 8, 2003, 86-96. </li></ul></ul><ul><li>Seidler </li></ul><ul><ul><li>Identification and Prediction of Promiscuous Aggregating Inhibitors among Known Drugs, J. Seidler et al., J. Med Chem, 2003, 46, 4477-4486. </li></ul></ul>*generalizable domain knowledge
    11. 11. Selected pre-filtering semi-public expertise <ul><li>Blake </li></ul><ul><ul><li>James Blake, Sybyl script lint_sln.spl, formerly bundled with Sybyl, 2001(?). </li></ul></ul><ul><li>Commercial vendors </li></ul><ul><ul><li>Property predictions (LogP, ADME-Tox, solubility)‏ </li></ul></ul><ul><ul><li>Filtering tools and APIs </li></ul></ul><ul><ul><li>Some defined patterns </li></ul></ul><ul><li>“ Development of a Virtual Screening Method for Identification of 'Frequent Hitters' in Compound Libraries”, Roche et al., J. Med. Chem., 45, 2002, 137-142. [neural network, 345 descriptors] </li></ul>
    12. 12. MLSMR filtering protocols <ul><li>MLSMR has a set of smarts-based filters, “excluded functionality filters” used to reject compounds unfit for HTS </li></ul><ul><li>Filters developed in collaboration with MLSCN Chemistry working group </li></ul><ul><li>WG chose to be less restrictive than recommended by BioFocusDPI. </li></ul><ul><li>Optimum filtering not a solved problem </li></ul><ul><li>One reason: fitness assay-dependent </li></ul>
    13. 13. MLSMR re-filtered... <ul><li>Using a combined reactive filter </li></ul><ul><ul><li>49k of 286k rejected (17%)‏ </li></ul></ul><ul><ul><li>Top reactive patterns: </li></ul></ul>*search failed via PC GUI
    14. 14. MLSMR re-filtered... Example rejects:
    15. 15. Pre-filtering at UNM Java servlet using ChemAxon/JChem
    16. 16. Activity multiplicity – all assays compounds active in any assay peril? 103894 MLSCN compounds 724 MLSCN assays
    17. 17. Activity multiplicity – screening assays compounds active in any screening assay peril 89954 MLSCN compounds 268 MLSCN assays
    18. 18. Top 100 hier-scaffolds*, active PubChem MLSMR Example scaffold #1 *Wilkins, J. Med Chem 2005
    19. 19. Top active hier-scaffolds: example #1 Scaffold: 33 rd most common 510 active compounds 34 assays c1c2c([nH]c(=O)cn2)ncn1 Top 12 compounds Top 12 of 510 compounds CID, #assays active
    20. 20. Top active hier-scaffolds: example #1 #compounds vs #assays in which they are active
    21. 21. Top active hier-scaffolds: example #1 All from NCGC
    22. 22. Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
    23. 23. Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
    24. 24. Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
    25. 25. Activity mining with PubChem GUI Lots of great functionality, but not everything...
    26. 26. Activity mining with command line Automation is good...
    27. 27. Example scaffold #2 62 nd most common 523 active compounds 208 assays c1ccc(cc1)NC(=O)c2ccco2 Top active hier-scaffolds: example #2 Top 12 of 523 compounds CID, #assays active
    28. 28. Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
    29. 29. Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
    30. 30. Top active hier-scaffolds: example #3 Example scaffold #4 1307 th most common 27 active compounds 140 assays c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 toxoflavin
    31. 31. Top active hier-scaffolds: example #3 #compounds vs #assays in which they are active
    32. 32. Digression: PubChem bug... substructure search for CID 66541 143 hits substructure search for C2(C1=NC=NNC1=NC(N2)=O)=O 143 hits substructure search for c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 0 hits Ergo: aromatic structural queries not allowed?
    33. 33. Top active hier-scaffolds: example #3
    34. 34. More example scaffolds and histograms of #compounds vs. #assays in which compounds are active
    35. 35. Scaffold promiscuity vs. SEA* <ul><li>For a given scaffold, what are the active molecules and the bioassays in which they are active? </li></ul>SEA (Similarity Ensemble Approach): For given query molecule, are there bioactive similars and for what targets? *Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotech 25 (2), 197-206 (2007).
    36. 36. Possibilities <ul><li>Activity in multiple assays expected when assays related (screening/confirmatory, etc.)‏ </li></ul><ul><li>If compounds not promiscuous, one possibility is assays are redundant (also interesting)‏ </li></ul><ul><li>Need rule of thumb? Active in >5 HTS screens deserves a red flag? </li></ul><ul><li>Need assay classification/canonicalization – i.e. rigorous informatics? </li></ul><ul><li>Scaffold “popularity” generally desirable. </li></ul>
    37. 37. Possibilities <ul><li>The global bioactivty landscape (nature) calls for a comprehensive view of bioactivity, chemical space, promiscuity, privileged structures and other patterns. In other words chem+bio informatics.* </li></ul>*&quot;Is There a General Model for Bioactivity?&quot;, T.I. Oprea, O. Ursu, C.G. Bologa, and L.A. Sklar, The 8th International Conference on Chemical Structures, June, 2008, Noordwijkerhout, The Netherlands (
    38. 38. Data mining methodology notes <ul><li>Get all active MLSCN/MLSMR compounds (Entrez) </li></ul><ul><li>HierScaf perception w/ all MLSCN compounds (OE)‏ </li></ul><ul><li>Extract compounds for selected scaffolds (OE)‏ </li></ul><ul><li>Download compounds (PUG)‏ </li></ul><ul><li>Find all bioassays in which cpds are active, using local PubChem bioassay ftp-mirror </li></ul><ul><li>Get bioassay summaries (Entrez)‏ </li></ul><ul><li>Try our scripts: </li></ul>PUG = NCBI PubChem Power User Gateway (http)‏ Entrez = NCBI Entrez eUtils API OE = OpenEye OEChem All code Perl or Python *June 2008
    39. 39. Now for some general comments... 10 6 10 0
    40. 40. Probability – in HTS game etc. In other words: Quantity vs. Quality E not striking out = 1 – E striking out E success = 1 – (1 - E hit ) N where E hit = probability of hit per try N = number of tries
    41. 41. Probability, more hard lessons *&quot;Method and Apparatus for Designing Molecules with Desired Properties by Evolving Successive Populations,&quot; David Weininger, U.S. patent US5434796, 1995. De novo molecular design Case study: Grok and Grope*, 1992, Weininger, Blaney, Dixon GA -> virtual library, docking fitness, lots of cpu cycles But you need to recognize a good hit, including all aspects of fitness (ADMET, synthesis, etc.). <- approximation from memory <ul><ul><li>conclusions: </li></ul></ul><ul><ul><li>medchem knowledge is important </li></ul></ul><ul><ul><li>chemical space is huge </li></ul></ul>
    42. 42. Probability, more quantity vs quality “ Quantitative high-throughput screening: A titration-based approach that efficiently identifies biological activities in large chemical libraries”, Inglese et al., PNAS, 103, 2006, 11473-11478.
    43. 43. Probability and prejudice Lucky CEOs, coaches, and fund managers, Kahneman's 2002 Nobel prize for economics-psychology, good stories vs. Occam's razor, confirmation bias, pathological pattern-recognizers are we.
    44. 44. Statistics and signficance “ The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy)”, Stephen R. Johnson, J. Chem. Inf. Model., 48 (1), 25 -26, 2008.
    45. 45. Conclusions <ul><li>Promiscuous compounds afflict HTS, but effects can be mitigated by pre-filtering and post-analysis including recognition of patterns </li></ul><ul><li>Promiscuous patterns can be natural (chemistry based) or manufactured (e.g. Analog series)‏ </li></ul><ul><li>For scaffolds, bad-promiscuity continuous with good-promiscuity, a.k.a. privileged structures, i.e. scaffolds which frequently enable selective bioactive compounds. </li></ul>
    46. 46. Conclusions <ul><li>PubChem/MLSCN is a rich, growing, unprecedented public source of bioactivity data offering new research avenues. </li></ul><ul><li>PubChem APIs provide good methods of compound and bioassay data mining... </li></ul><ul><li>...BUT – work remains to be done to fully utilize PubChem and integrate with other data sources and procedures. </li></ul>
    47. 47. Conclusions (from Chris Lipinski*)‏ <ul><li>Designing screening libraries: </li></ul><ul><ul><li>know and use the pharma industry filters </li></ul></ul><ul><ul><li>use expert medicinal chemistry advice </li></ul></ul><ul><ul><li>get the best chemistry quality you can afford </li></ul></ul>*Chris Lipinski, Nanosyn Open House talk, Feb 16, 2008.
    48. 48. Acknowledgements, thanks <ul><li>Cristian Bologa, UNM </li></ul><ul><li>Tudor Oprea, UNM </li></ul><ul><li>Oleg Ursu, UNM </li></ul><ul><li>Steve Mathias, UNM </li></ul><ul><li>Chris Lipinsky, Melior </li></ul><ul><li>PubChem team </li></ul><ul><li>NCBI team </li></ul>OpenEye Software c/o: <ul><ul><li>contact: </li></ul></ul>
    49. 49. Scaffolds and chemotypes <ul><li>Nature and chemical scaffolds </li></ul><ul><li>Scaffolds and shape </li></ul><ul><li>Scaffolds, aromatic hetero rings and bioactivity* </li></ul><ul><li>Human understanding and scaffolds (SAR)‏ </li></ul><ul><li>Synthesis and scaffolds </li></ul><ul><li>Scaffolds and fragment-based screening </li></ul><ul><li>Commerce and scaffolds </li></ul><ul><li>Scaffold definitions </li></ul><ul><li>Wilkens et al. Hierarchical Scaffolds </li></ul>*Ertl, et al., Quest for the Rings
    50. 50. HierScaffolds: - scaffolds are one or several rings connected by linkers - compounds can be related by any of their scaffolds