SlideShare a Scribd company logo
1 of 50
Promiscuous patterns and perils in PubChem  and the MLSCN Jeremy J Yang Cristian Bologa Tudor Oprea Division of Biocomputing Dept of Biochem. & Mol. Biology NM Mol. Libraries Screening Center University of New Mexico
Goals ,[object Object],[object Object],[object Object],[object Object]
Background ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],10 6 10 0
Promiscuity defined ,[object Object],[object Object],[object Object],[object Object],Bioactivity for multiple targets,  i.e. “frequent-hitter”, non-selective binder Multi-target bioactivity for involved scaffold Scaffold may be a determinant or simply an informatic device. Scaffold promiscuity [working definition]
Real vs phony promiscuity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
PubChem: cathedral or bazaar? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],(a) National Cathedral, Washington, DC (b) Santa Fe Flea Market, Santa Fe, NM a b ref: “The Cathedral and the Bazaar”, Eric Raymond, 1997.
PubChem and MLSCN* ,[object Object],[object Object],[object Object],[object Object],*MLSCN = Molecular Libraries Screening Center Network, to be MLPCN, Molecular Libraries Program Center Network
PubChem and MLSMR ,[object Object],[object Object],[object Object],[object Object],Plot c/o Victor Panchenco, BioFocusDPI
MLSMR and MLSCN actives ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],94,148 104,078 93,616 104,610 532 10,462 (Some MLSCN compounds – esp secondary assays – from other sources such as commercial vendors.)‏ *June 2008
Selected published pre-filtering expert knowledge* ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],*generalizable domain knowledge
Selected pre-filtering  semi-public expertise ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MLSMR filtering protocols ,[object Object],[object Object],[object Object],[object Object],[object Object]
MLSMR re-filtered... ,[object Object],[object Object],[object Object],*search failed via PC GUI
MLSMR re-filtered... Example rejects:
Pre-filtering at UNM Java servlet using ChemAxon/JChem
Activity multiplicity – all assays compounds active in any assay peril? 103894 MLSCN compounds  724 MLSCN assays
Activity multiplicity – screening assays  compounds active in any  screening  assay peril 89954 MLSCN compounds  268 MLSCN assays
Top 100 hier-scaffolds*, active PubChem MLSMR Example scaffold #1 *Wilkins, J. Med Chem 2005
Top active hier-scaffolds: example #1 Scaffold: 33 rd  most common 510 active compounds  34 assays c1c2c([nH]c(=O)cn2)ncn1  Top 12 compounds Top 12 of 510 compounds CID, #assays active
Top active hier-scaffolds: example #1 #compounds vs #assays in which they are active
Top active hier-scaffolds: example #1 All from NCGC
Other promiscuous scaffolds COMPOUNDS:total/tested/active  ASSAYS:tested/active  SAMPLES:tested/active
Other promiscuous scaffolds COMPOUNDS:total/tested/active  ASSAYS:tested/active  SAMPLES:tested/active
Other promiscuous scaffolds COMPOUNDS:total/tested/active  ASSAYS:tested/active  SAMPLES:tested/active
Activity mining with PubChem GUI Lots of great functionality, but not everything...
Activity mining with command line Automation is good...
Example scaffold #2 62 nd  most common 523 active compounds 208 assays c1ccc(cc1)NC(=O)c2ccco2 Top active hier-scaffolds: example #2 Top 12 of 523 compounds CID, #assays active
Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
Top active hier-scaffolds: example #3 Example scaffold #4 1307 th  most common 27 active compounds 140 assays c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 toxoflavin
Top active hier-scaffolds: example #3 #compounds vs #assays in which they are active
Digression: PubChem bug... substructure search for CID 66541 143 hits substructure search for C2(C1=NC=NNC1=NC(N2)=O)=O 143 hits substructure search for c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 0 hits Ergo: aromatic structural queries not allowed?
Top active hier-scaffolds: example #3
More example scaffolds and histograms of #compounds vs. #assays in which compounds are active
Scaffold promiscuity vs. SEA* ,[object Object],SEA (Similarity Ensemble Approach): For given query molecule, are there bioactive similars and for what targets? *Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotech 25 (2), 197-206 (2007). http://shoichetlab.compbio.ucsf.edu/~keiser/sea/
Possibilities ,[object Object],[object Object],[object Object],[object Object],[object Object]
Possibilities ,[object Object],*"Is There a General Model for Bioactivity?", T.I. Oprea, O. Ursu, C.G. Bologa, and L.A. Sklar, The 8th International Conference on Chemical Structures, June, 2008, Noordwijkerhout, The Netherlands (http://www.int-conf-chem-structures.org/pdf/B-6.pdf).
Data mining methodology notes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],PUG = NCBI PubChem Power User Gateway (http)‏ Entrez = NCBI Entrez eUtils API OE = OpenEye OEChem All code Perl or Python *June 2008
Now for some general comments... 10 6 10 0
Probability – in HTS game etc. In other words: Quantity vs. Quality E not striking out  = 1 – E striking out E success  = 1 – (1 - E hit ) N where E hit  = probability of hit per try N = number of tries
Probability, more hard lessons *&quot;Method and Apparatus for Designing Molecules with Desired Properties by  Evolving Successive Populations,&quot; David Weininger, U.S. patent  US5434796, 1995.  De novo  molecular design Case study: Grok and Grope*, 1992, Weininger, Blaney, Dixon GA -> virtual library, docking fitness, lots of cpu cycles But you need to recognize a good hit, including all aspects of fitness (ADMET, synthesis, etc.).  <- approximation from memory ,[object Object],[object Object],[object Object]
Probability, more quantity vs quality “ Quantitative high-throughput screening: A titration-based approach that efficiently identifies biological activities in large chemical libraries”, Inglese et al., PNAS, 103, 2006, 11473-11478.
Probability and prejudice Lucky CEOs, coaches, and fund managers, Kahneman's 2002 Nobel prize for economics-psychology, good stories vs. Occam's razor, confirmation bias, pathological pattern-recognizers are we.
Statistics and signficance “ The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy)”, Stephen R. Johnson, J. Chem. Inf. Model., 48 (1), 25 -26, 2008.
Conclusions ,[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object]
Conclusions (from Chris Lipinski*)‏ ,[object Object],[object Object],[object Object],[object Object],*Chris Lipinski, Nanosyn Open House talk, Feb 16, 2008.
Acknowledgements, thanks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],OpenEye Software c/o: ,[object Object]
Scaffolds and chemotypes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],*Ertl, et al., Quest for the Rings
HierScaffolds:  - scaffolds are one or several rings connected by linkers - compounds can be related by any of their scaffolds

More Related Content

What's hot

Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
Deepak Bandyopadhyay
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
Philip Cheung
 

What's hot (20)

Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architecture
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformaticsBioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
 

Similar to Promiscuous patterns and perils in PubChem and the MLSCN

A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016
Andrew Pope
 
Sandeep Modi Phildelphia nov10 Drug safety
Sandeep Modi Phildelphia nov10 Drug safetySandeep Modi Phildelphia nov10 Drug safety
Sandeep Modi Phildelphia nov10 Drug safety
sm78354
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 

Similar to Promiscuous patterns and perils in PubChem and the MLSCN (20)

Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
 
Sandeep Modi Phildelphia nov10 Drug safety
Sandeep Modi Phildelphia nov10 Drug safetySandeep Modi Phildelphia nov10 Drug safety
Sandeep Modi Phildelphia nov10 Drug safety
 
Cadd
CaddCadd
Cadd
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based PlatformNext-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
 
Data Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tension
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information training
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology
 

More from Jeremy Yang

More from Jeremy Yang (20)

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS Analytics
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
 
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesMining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST API
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles Explorer
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource poster
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discovery
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of Computing
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of Computing
 
RMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsRMSD: routine measure stirs doubts
RMSD: routine measure stirs doubts
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds poster
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discovery
 
The BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDThe BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARD
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case Studies
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applications
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in Biocomputing
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 

Recently uploaded (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Promiscuous patterns and perils in PubChem and the MLSCN

  • 1. Promiscuous patterns and perils in PubChem and the MLSCN Jeremy J Yang Cristian Bologa Tudor Oprea Division of Biocomputing Dept of Biochem. & Mol. Biology NM Mol. Libraries Screening Center University of New Mexico
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 15. Pre-filtering at UNM Java servlet using ChemAxon/JChem
  • 16. Activity multiplicity – all assays compounds active in any assay peril? 103894 MLSCN compounds 724 MLSCN assays
  • 17. Activity multiplicity – screening assays compounds active in any screening assay peril 89954 MLSCN compounds 268 MLSCN assays
  • 18. Top 100 hier-scaffolds*, active PubChem MLSMR Example scaffold #1 *Wilkins, J. Med Chem 2005
  • 19. Top active hier-scaffolds: example #1 Scaffold: 33 rd most common 510 active compounds 34 assays c1c2c([nH]c(=O)cn2)ncn1 Top 12 compounds Top 12 of 510 compounds CID, #assays active
  • 20. Top active hier-scaffolds: example #1 #compounds vs #assays in which they are active
  • 21. Top active hier-scaffolds: example #1 All from NCGC
  • 22. Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
  • 23. Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
  • 24. Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
  • 25. Activity mining with PubChem GUI Lots of great functionality, but not everything...
  • 26. Activity mining with command line Automation is good...
  • 27. Example scaffold #2 62 nd most common 523 active compounds 208 assays c1ccc(cc1)NC(=O)c2ccco2 Top active hier-scaffolds: example #2 Top 12 of 523 compounds CID, #assays active
  • 28. Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
  • 29. Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
  • 30. Top active hier-scaffolds: example #3 Example scaffold #4 1307 th most common 27 active compounds 140 assays c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 toxoflavin
  • 31. Top active hier-scaffolds: example #3 #compounds vs #assays in which they are active
  • 32. Digression: PubChem bug... substructure search for CID 66541 143 hits substructure search for C2(C1=NC=NNC1=NC(N2)=O)=O 143 hits substructure search for c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 0 hits Ergo: aromatic structural queries not allowed?
  • 34. More example scaffolds and histograms of #compounds vs. #assays in which compounds are active
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. Now for some general comments... 10 6 10 0
  • 40. Probability – in HTS game etc. In other words: Quantity vs. Quality E not striking out = 1 – E striking out E success = 1 – (1 - E hit ) N where E hit = probability of hit per try N = number of tries
  • 41.
  • 42. Probability, more quantity vs quality “ Quantitative high-throughput screening: A titration-based approach that efficiently identifies biological activities in large chemical libraries”, Inglese et al., PNAS, 103, 2006, 11473-11478.
  • 43. Probability and prejudice Lucky CEOs, coaches, and fund managers, Kahneman's 2002 Nobel prize for economics-psychology, good stories vs. Occam's razor, confirmation bias, pathological pattern-recognizers are we.
  • 44. Statistics and signficance “ The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy)”, Stephen R. Johnson, J. Chem. Inf. Model., 48 (1), 25 -26, 2008.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50. HierScaffolds: - scaffolds are one or several rings connected by linkers - compounds can be related by any of their scaffolds