• Save
Promiscuous patterns and perils in PubChem and the MLSCN
Upcoming SlideShare
Loading in...5

Promiscuous patterns and perils in PubChem and the MLSCN






Total Views
Views on SlideShare
Embed Views



2 Embeds 3

http://www.linkedin.com 2
http://health.medicbd.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Promiscuous patterns and perils in PubChem and the MLSCN Promiscuous patterns and perils in PubChem and the MLSCN Presentation Transcript

  • Promiscuous patterns and perils in PubChem and the MLSCN Jeremy J Yang Cristian Bologa Tudor Oprea Division of Biocomputing Dept of Biochem. & Mol. Biology NM Mol. Libraries Screening Center University of New Mexico
  • Goals
    • Improve HTS success rate by pre-filtering or retro-filtering
    • Detection of promiscuous compounds
    • Effectively data-mine PubChem, developing tools to ask related questions
    • Generalizable knowledge
  • Background
    • HTS is a game of chance
    • Druglike, ADMET, Ro5, leadlike, probelike
    • Smaller still: fragment-based screening
    • Ligand efficiency
    • Compound fitness relative to HTS assay method
    • False positives
    • True but useless positives
    • Aggregators
    • Reactives
    • Studying the signal vs studying the noise
    • NIH Roadmap, MLI, MLSCN (MLPCN), MLSMR
    10 6 10 0
  • Promiscuity defined
    • Known types
    • aggregators
    • reactives
    • true binders
    Bioactivity for multiple targets, i.e. “frequent-hitter”, non-selective binder Multi-target bioactivity for involved scaffold Scaffold may be a determinant or simply an informatic device. Scaffold promiscuity [working definition]
  • Real vs phony promiscuity
    • Apparent promiscuity may be due to:
      • Actual promiscuous binding
      • Artifact-ual promiscuity
        • Reactivity
        • Aggregation
        • Fluorescence
      • Experimental errors
    • For given assay method, actual and artifact-ual equivalent for most HTS intents and purposes
  • PubChem: cathedral or bazaar?
    • Mission and success of PubChem
      • 38 million compounds (March 2008)‏
      • High volume worldwide use by scientific community
    • Mission and success of NIH MLI and MLSCN
      • 10 centers, 3 years
      • ~100? probes, ~1000 assays, ~30M data
    (a) National Cathedral, Washington, DC (b) Santa Fe Flea Market, Santa Fe, NM a b ref: “The Cathedral and the Bazaar”, Eric Raymond, 1997.
  • PubChem and MLSCN*
    • Publicly available bioactivity data on this scale is unprecedented accomplishment & opportunity.
    • With rapid growth, data quality is an important concern.
    • Overall goals broader than individual HTS campaigns
    • Big idea: MLSCN+PubChem, reaching critical mass ?
    *MLSCN = Molecular Libraries Screening Center Network, to be MLPCN, Molecular Libraries Program Center Network
  • PubChem and MLSMR
    • Molecular Libraries Small Molecule Repository
    • Managed by BioFocusDPI/Galapagos
    • ~ 300k compounds (March 2008) and growing
    • Used for primary HTS by all MLI centers
    Plot c/o Victor Panchenco, BioFocusDPI
  • MLSMR and MLSCN actives
    • MLSMR actives:
    • MLSCN actives:
    94,148 104,078 93,616 104,610 532 10,462 (Some MLSCN compounds – esp secondary assays – from other sources such as commercial vendors.)‏ *June 2008
  • Selected published pre-filtering expert knowledge*
    • Rishton
      • Reactive compounds and in vitro false positives in HTS, Drug Discov. Today, 2, 382-384.
    • Hann
      • Strategic pooling of compounds for high-throughput screening, Mike Hann et al., J. Chem. Inf. Comp. Sci., 1999, 39, 897-902.
    • Rishton
      • Nonleadlikeness and leadlikeness in biochemical screening, G. M. Rishton, Drug Disc. Today, 8, 2003, 86-96.
    • Seidler
      • Identification and Prediction of Promiscuous Aggregating Inhibitors among Known Drugs, J. Seidler et al., J. Med Chem, 2003, 46, 4477-4486.
    *generalizable domain knowledge
  • Selected pre-filtering semi-public expertise
    • Blake
      • James Blake, Sybyl script lint_sln.spl, formerly bundled with Sybyl, 2001(?).
    • Commercial vendors
      • Property predictions (LogP, ADME-Tox, solubility)‏
      • Filtering tools and APIs
      • Some defined patterns
    • “ Development of a Virtual Screening Method for Identification of 'Frequent Hitters' in Compound Libraries”, Roche et al., J. Med. Chem., 45, 2002, 137-142. [neural network, 345 descriptors]
  • MLSMR filtering protocols
    • MLSMR has a set of smarts-based filters, “excluded functionality filters” used to reject compounds unfit for HTS
    • Filters developed in collaboration with MLSCN Chemistry working group
    • WG chose to be less restrictive than recommended by BioFocusDPI.
    • Optimum filtering not a solved problem
    • One reason: fitness assay-dependent
  • MLSMR re-filtered...
    • Using a combined reactive filter
      • 49k of 286k rejected (17%)‏
      • Top reactive patterns:
    *search failed via PC GUI
  • MLSMR re-filtered... Example rejects:
  • Pre-filtering at UNM Java servlet using ChemAxon/JChem
  • Activity multiplicity – all assays compounds active in any assay peril? 103894 MLSCN compounds 724 MLSCN assays
  • Activity multiplicity – screening assays compounds active in any screening assay peril 89954 MLSCN compounds 268 MLSCN assays
  • Top 100 hier-scaffolds*, active PubChem MLSMR Example scaffold #1 *Wilkins, J. Med Chem 2005
  • Top active hier-scaffolds: example #1 Scaffold: 33 rd most common 510 active compounds 34 assays c1c2c([nH]c(=O)cn2)ncn1 Top 12 compounds Top 12 of 510 compounds CID, #assays active
  • Top active hier-scaffolds: example #1 #compounds vs #assays in which they are active
  • Top active hier-scaffolds: example #1 All from NCGC
  • Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
  • Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
  • Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
  • Activity mining with PubChem GUI Lots of great functionality, but not everything...
  • Activity mining with command line Automation is good...
  • Example scaffold #2 62 nd most common 523 active compounds 208 assays c1ccc(cc1)NC(=O)c2ccco2 Top active hier-scaffolds: example #2 Top 12 of 523 compounds CID, #assays active
  • Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
  • Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
  • Top active hier-scaffolds: example #3 Example scaffold #4 1307 th most common 27 active compounds 140 assays c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 toxoflavin
  • Top active hier-scaffolds: example #3 #compounds vs #assays in which they are active
  • Digression: PubChem bug... substructure search for CID 66541 143 hits substructure search for C2(C1=NC=NNC1=NC(N2)=O)=O 143 hits substructure search for c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 0 hits Ergo: aromatic structural queries not allowed?
  • Top active hier-scaffolds: example #3
  • More example scaffolds and histograms of #compounds vs. #assays in which compounds are active
  • Scaffold promiscuity vs. SEA*
    • For a given scaffold, what are the active molecules and the bioassays in which they are active?
    SEA (Similarity Ensemble Approach): For given query molecule, are there bioactive similars and for what targets? *Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotech 25 (2), 197-206 (2007). http://shoichetlab.compbio.ucsf.edu/~keiser/sea/
  • Possibilities
    • Activity in multiple assays expected when assays related (screening/confirmatory, etc.)‏
    • If compounds not promiscuous, one possibility is assays are redundant (also interesting)‏
    • Need rule of thumb? Active in >5 HTS screens deserves a red flag?
    • Need assay classification/canonicalization – i.e. rigorous informatics?
    • Scaffold “popularity” generally desirable.
  • Possibilities
    • The global bioactivty landscape (nature) calls for a comprehensive view of bioactivity, chemical space, promiscuity, privileged structures and other patterns. In other words chem+bio informatics.*
    *"Is There a General Model for Bioactivity?", T.I. Oprea, O. Ursu, C.G. Bologa, and L.A. Sklar, The 8th International Conference on Chemical Structures, June, 2008, Noordwijkerhout, The Netherlands (http://www.int-conf-chem-structures.org/pdf/B-6.pdf).
  • Data mining methodology notes
    • Get all active MLSCN/MLSMR compounds (Entrez)
    • HierScaf perception w/ all MLSCN compounds (OE)‏
    • Extract compounds for selected scaffolds (OE)‏
    • Download compounds (PUG)‏
    • Find all bioassays in which cpds are active, using local PubChem bioassay ftp-mirror
    • Get bioassay summaries (Entrez)‏
    • Try our scripts: http://pangolin.health.unm.edu/kit/
    PUG = NCBI PubChem Power User Gateway (http)‏ Entrez = NCBI Entrez eUtils API OE = OpenEye OEChem All code Perl or Python *June 2008
  • Now for some general comments... 10 6 10 0
  • Probability – in HTS game etc. In other words: Quantity vs. Quality E not striking out = 1 – E striking out E success = 1 – (1 - E hit ) N where E hit = probability of hit per try N = number of tries
  • Probability, more hard lessons *&quot;Method and Apparatus for Designing Molecules with Desired Properties by Evolving Successive Populations,&quot; David Weininger, U.S. patent US5434796, 1995. De novo molecular design Case study: Grok and Grope*, 1992, Weininger, Blaney, Dixon GA -> virtual library, docking fitness, lots of cpu cycles But you need to recognize a good hit, including all aspects of fitness (ADMET, synthesis, etc.). <- approximation from memory
      • conclusions:
      • medchem knowledge is important
      • chemical space is huge
  • Probability, more quantity vs quality “ Quantitative high-throughput screening: A titration-based approach that efficiently identifies biological activities in large chemical libraries”, Inglese et al., PNAS, 103, 2006, 11473-11478.
  • Probability and prejudice Lucky CEOs, coaches, and fund managers, Kahneman's 2002 Nobel prize for economics-psychology, good stories vs. Occam's razor, confirmation bias, pathological pattern-recognizers are we.
  • Statistics and signficance “ The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy)”, Stephen R. Johnson, J. Chem. Inf. Model., 48 (1), 25 -26, 2008.
  • Conclusions
    • Promiscuous compounds afflict HTS, but effects can be mitigated by pre-filtering and post-analysis including recognition of patterns
    • Promiscuous patterns can be natural (chemistry based) or manufactured (e.g. Analog series)‏
    • For scaffolds, bad-promiscuity continuous with good-promiscuity, a.k.a. privileged structures, i.e. scaffolds which frequently enable selective bioactive compounds.
  • Conclusions
    • PubChem/MLSCN is a rich, growing, unprecedented public source of bioactivity data offering new research avenues.
    • PubChem APIs provide good methods of compound and bioassay data mining...
    • ...BUT – work remains to be done to fully utilize PubChem and integrate with other data sources and procedures.
  • Conclusions (from Chris Lipinski*)‏
    • Designing screening libraries:
      • know and use the pharma industry filters
      • use expert medicinal chemistry advice
      • get the best chemistry quality you can afford
    *Chris Lipinski, Nanosyn Open House talk, Feb 16, 2008.
  • Acknowledgements, thanks
    • Cristian Bologa, UNM
    • Tudor Oprea, UNM
    • Oleg Ursu, UNM
    • Steve Mathias, UNM
    • Chris Lipinsky, Melior
    • PubChem team
    • NCBI team
    OpenEye Software c/o:
      • contact: jjyang@salud.unm.edu
  • Scaffolds and chemotypes
    • Nature and chemical scaffolds
    • Scaffolds and shape
    • Scaffolds, aromatic hetero rings and bioactivity*
    • Human understanding and scaffolds (SAR)‏
    • Synthesis and scaffolds
    • Scaffolds and fragment-based screening
    • Commerce and scaffolds
    • Scaffold definitions
    • Wilkens et al. Hierarchical Scaffolds
    *Ertl, et al., Quest for the Rings
  • HierScaffolds: - scaffolds are one or several rings connected by linkers - compounds can be related by any of their scaffolds