SlideShare a Scribd company logo
1 of 43
The utility of
for academic drug discovery
cand chemical biology
Christopher Southan
Drug Discovery Seminar, Stockholm, March 4th, 2019
Hosted by Per Arvidsson 1
Abstract
Since PubChem (https://pubchem.ncbi.nlm.nih.gov/) surfaced in 2004 it has
become the de facto global informatics hub, not just for chemistry but also
bioactivity. In addition, it is integrated within the very powerful Network
Entrez system that offers connectivity to many other entities in the
massive NCBI database resources including the literature in PubMed,
protein structures in PDB, genomic sequences, and MeSH terms. However,
the statistics of content (97.2 million compounds, 3.4 million of which have
240 million activity results against 12K protein targets) can seem daunting
to users. This presentation will give an overview of this content and the
basic concepts behind the major divisions of submitted substances (SIDs)
non-redundant compound entries (CIDs) and the mapping of activity results
to the latter (PubChem BioAssay).This will be followed by introducing
selected drug discovery-relevant resources inside PubChem including
ChEMBL and the IUPHAR/BPS Guide to Pharmacology. It will conclude with
a series of use cases, including searching against the 23 million structures
extracted from the IBM and SureChEMBL patent extraction sources in
PubChem
2
Outline
• Basic context and content
• Drug records
• Patent chemistry < > documents
• Papers < > chemistry
• Chemical biology, getting to probes
• Submission
• Conclusions
• Further information
3
4
Basics
Comparisons: PubChem is well cited
5
PubChem CID growth 2005 - 2018
6
TheTriumvirate:
Substance, Compound, BioAssay
7
• SIDs can be
biololecules (e.g. large
peptides and
antobodies, no images
• CIDs merge SMILES
strings < 1000 atoms,
to unique InChIs
• Average SID:CID ~ 2.6,
drugs ~50, aspirin (CID
244) 307 plus 1563
mixture SIDs
• CIDs 4.7 % mixtures
CID stats overview
( March 2019)
8
Top
sources
by SID
9
https://cdsouthan.blogspot.com/2016
/06/pubchem-source-of-
month.html?q=pubchem+sources
10
Drugs
An example drug
11
Multiplexing: one drug > many forms
12
7 same isotopes
= different
stereo
13
52 same connectivity
= 7 stereo + 45
isotopes
14
• Useful to understand and
navigate these multiplexing
chemistry rules
• many drug molecules have
complex connectivities
Will the real approved drugs please stand up?
(I) Selecting from ChEMBL
1511629 drugs > 2716 Phase 4, 2194 with SMILES > ChEMBL IDs mapping to CIDs = 2193
(II) Selecting from DrugBank
16
10517 SIDs > 3144 approved drugs > mapping to 2086 CIDs
(III) Selecting from Guide to Pharmacology
17
9526 SIDs > 1509 approved drugs > mapping to 1321 CIDs
TheVenn (by CID)
18
• Union total = 2965 so 3-way intersect of 958 is only 32%
• Ipso facto divergent approved drug structure capture
Or use search history Booleans
19
20
Patents
Cumulative patent-extractedCIDs
• Drug disovery SAR in documents ~ 2 to 5x more than papers and years earlier
• Majority of lead series now covered from automated exaction sources
• BindingDB curates patent SAR (http://www.bindingdb.org/bind/ByPatent.jsp) and
feeds to PubChem and ChEMBL
• SureChEMBL is the only live source with ~ two-monthly updates
• There are quality issues and overheads with automated chemistry extraction 21
Patent analysis :
SureChEMBL < > PubChem
22
Query via the ChEMBL search interface
Patent document SAR > PubChem
23
1. > Patent (via SureChEMBL)
2. > SureChEMBL structures
3. > SAR for 52 examples in
document
4. Example struc > PubChem
5. Check sources, patent
number, similar
compounds, ChEMBL and
BioAssay intersects
6. Make SAR table
7. Check for missing cpds
Tanimoto similarity shell for SAR “walking”
24
PubChem indexes chemistry against the document
25
SciFinder had 212
Substances with
112 categorised
as biological
26
Papers
• At GtoPdb expert curators judge what a paper is ”about” in terms of the key
active compound to target-map
• We focus on approved drugs, clinicall candidates, immunopharmacolgy and
most recently malaria
• We link the PubMed ID (PMID) as a reference to that database record, the
chemical strucure and quantiative bioactivity
• We then submit our entries as a Substance Identifiers (SIDs) to PubChem, with
comments the references included in the files
• PubChem, in turn links our SIDs to our PMIDs (i.e. structure-to-document, s2d)
• PubChem merges identical SIDs to CIDs and PMIDs from different sources that
may index different structures
• For GtoPdb all the links above are reciprocal
• Other sources make analogous linking
27
"We have spent millions putting chemistry into PDFs but
now are spending more millions taking it back out” (Anon)
1. GtoPdb expert curators judge what a paper is ”about” in terms of active
compounds as key lead structures and mmoa
2. We focus on approved drugs, clinicall candidates, immunopharmacolgy and
most recently malaria
3. We link the PubMed ID (PMID) as a reference to that database record, the
chemical strucure and quantiative bioactivity
4. We then submit our entries as a Substance Identifiers (SIDs) to PubChem, with
comments the references included in the files
5. PubChem, in turn links our SIDs to our PMIDs (i.e. structure-to-document, s2d)
6. PubChem merges identical SIDs to CIDs and PMIDs from different sources that
may index different structures
7. For GtoPdb all the links above are reciprocal
28
"We have spent millions putting chemistry into PDFs but
now we are spending more millions taking it back out"
(Anon )
Let’s try an author….
29
The Entrez system has linked
48 CIDs to the 58 papers
30
In this case there is a MeSH link to PubChem
31
But there is a quirk….
32
Probably
the correct
structure
33
34
Chemical Biology
Chemical Probes: not cleanly indexed so use
external source for ”mapping in” to PubChem
35
PubChem Identifier Exchange Service
36
Checking IK mapping fails by Googling InChIKey
37
38
Submitting
Getting in to
PubChem is easier
than you think
39
Conclusions
• PubChem has become an essential resource for drug discovery and chemical
biology, academic as well as commercial
• Risk of proprietary structure query interception is realistically zero
• Many functionalities to explore and content sets to compare
• Can seem daunting but simple questions can be answered and engagement
practice pays off
• Combination of selects, Booleans History combinations and NCBI Entrez
connectivity are´very powerful
• Like all such sources, it has quirks, caveats, submitter quality issues, gotchas and
constitutive cheminformatic challenges
• No chemistry rules are perfect but PubChem’s work well and are navigable
• Programmatic access by PUG REST, RDF triples: 138,312,069,777
• Has synergies with stand-alone sources such asChEMBL, SureChEMBL, and
Guide to Pharmacology
• Literature to database connectivity is improving but still big shortfall in SAR
extraction from papers into CIDs and BioAssay
40
Acknowledgment
41
References
42
Further PubChem tips and tricks
43
https://www.slideshare.net/cdsouthan/presentations https://cdsouthan.blogspot.com/

More Related Content

What's hot

Multiplexing analysis of 1000 approved drugs in PubChem
Multiplexing analysis of 1000 approved drugs in PubChemMultiplexing analysis of 1000 approved drugs in PubChem
Multiplexing analysis of 1000 approved drugs in PubChemChris Southan
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy trainingSunghwan Kim
 
PubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoveryPubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoverySunghwan Kim
 
IUPHAR/MMV Guide to Malaria Pharmacology
IUPHAR/MMV Guide to Malaria Pharmacology IUPHAR/MMV Guide to Malaria Pharmacology
IUPHAR/MMV Guide to Malaria Pharmacology Guide to PHARMACOLOGY
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureChris Southan
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem Sunghwan Kim
 
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...Frederik van den Broek
 
Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...
Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...
Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...Frederik van den Broek
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology Chris Southan
 
Semantic Technology: The Basics
Semantic Technology: The BasicsSemantic Technology: The Basics
Semantic Technology: The BasicsPeter Berger
 
IUPHAR/BPS Guide to Pharmacology in 2018
IUPHAR/BPS Guide to Pharmacology in 2018IUPHAR/BPS Guide to Pharmacology in 2018
IUPHAR/BPS Guide to Pharmacology in 2018Guide to PHARMACOLOGY
 
Advances in computer aided drug design
Advances in computer aided drug designAdvances in computer aided drug design
Advances in computer aided drug designVikas Soni
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...Chris Southan
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistrySunghwan Kim
 

What's hot (20)

IUPHAR/BPS Guide to Pharmacology
IUPHAR/BPS Guide to PharmacologyIUPHAR/BPS Guide to Pharmacology
IUPHAR/BPS Guide to Pharmacology
 
Multiplexing analysis of 1000 approved drugs in PubChem
Multiplexing analysis of 1000 approved drugs in PubChemMultiplexing analysis of 1000 approved drugs in PubChem
Multiplexing analysis of 1000 approved drugs in PubChem
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy training
 
AXP302
AXP302AXP302
AXP302
 
PubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoveryPubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug Discovery
 
When pharmaceutical companies publish large datasets an abundance of riches o...
When pharmaceutical companies publish large datasets an abundance of riches o...When pharmaceutical companies publish large datasets an abundance of riches o...
When pharmaceutical companies publish large datasets an abundance of riches o...
 
IUPHAR/MMV Guide to Malaria Pharmacology
IUPHAR/MMV Guide to Malaria Pharmacology IUPHAR/MMV Guide to Malaria Pharmacology
IUPHAR/MMV Guide to Malaria Pharmacology
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosure
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem
 
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
 
Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...
Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...
Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology
 
Semantic Technology: The Basics
Semantic Technology: The BasicsSemantic Technology: The Basics
Semantic Technology: The Basics
 
IUPHAR/BPS Guide to Pharmacology in 2018
IUPHAR/BPS Guide to Pharmacology in 2018IUPHAR/BPS Guide to Pharmacology in 2018
IUPHAR/BPS Guide to Pharmacology in 2018
 
Advances in computer aided drug design
Advances in computer aided drug designAdvances in computer aided drug design
Advances in computer aided drug design
 
Open Journal of Chemistry
Open Journal of ChemistryOpen Journal of Chemistry
Open Journal of Chemistry
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistry
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
 

Similar to PubChem for drug discovery and chemical biology

PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data ChemistrySunghwan Kim
 
Sorting bioactive wheat from database chaff
Sorting bioactive wheat from database chaffSorting bioactive wheat from database chaff
Sorting bioactive wheat from database chaffChris Southan
 
Sorting bioactive wheat from database chaff: Challenges of discerning correct...
Sorting bioactive wheat from database chaff: Challenges of discerning correct...Sorting bioactive wheat from database chaff: Challenges of discerning correct...
Sorting bioactive wheat from database chaff: Challenges of discerning correct...Guide to PHARMACOLOGY
 
Revolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and BiologyRevolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and BiologyChris Southan
 
Evolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategiesEvolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategiesChris Southan
 
Pub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityPub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityChris Southan
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsSunghwan Kim
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChemSunghwan Kim
 
Exploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChemExploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChemPaul Thiessen
 
Analysing targets and drugs to populate the GToP database
Analysing  targets and drugs to populate the GToP databaseAnalysing  targets and drugs to populate the GToP database
Analysing targets and drugs to populate the GToP databaseChris Southan
 
Open Acess Sources for Protein Interaction Inhibitors
Open Acess Sources for Protein Interaction InhibitorsOpen Acess Sources for Protein Interaction Inhibitors
Open Acess Sources for Protein Interaction InhibitorsChris Southan
 
Correct drug structures for pharmacology
Correct drug structures for pharmacologyCorrect drug structures for pharmacology
Correct drug structures for pharmacologyChris Southan
 
PubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data ChemistryPubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data ChemistrySunghwan Kim
 
PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICRafael C. Jimenez
 

Similar to PubChem for drug discovery and chemical biology (20)

PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data Chemistry
 
Sorting bioactive wheat from database chaff
Sorting bioactive wheat from database chaffSorting bioactive wheat from database chaff
Sorting bioactive wheat from database chaff
 
Sorting bioactive wheat from database chaff: Challenges of discerning correct...
Sorting bioactive wheat from database chaff: Challenges of discerning correct...Sorting bioactive wheat from database chaff: Challenges of discerning correct...
Sorting bioactive wheat from database chaff: Challenges of discerning correct...
 
Revolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and BiologyRevolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and Biology
 
Evolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategiesEvolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategies
 
Pub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityPub Med to PubChem Connectivity
Pub Med to PubChem Connectivity
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChem
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
Exploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChemExploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChem
 
Analysing targets and drugs to populate the GToP database
Analysing  targets and drugs to populate the GToP databaseAnalysing  targets and drugs to populate the GToP database
Analysing targets and drugs to populate the GToP database
 
Cheminformatics Support for MS Supporting Exposomics
Cheminformatics Support for MS Supporting ExposomicsCheminformatics Support for MS Supporting Exposomics
Cheminformatics Support for MS Supporting Exposomics
 
Open Acess Sources for Protein Interaction Inhibitors
Open Acess Sources for Protein Interaction InhibitorsOpen Acess Sources for Protein Interaction Inhibitors
Open Acess Sources for Protein Interaction Inhibitors
 
Correct drug structures for pharmacology
Correct drug structures for pharmacologyCorrect drug structures for pharmacology
Correct drug structures for pharmacology
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 
PubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data ChemistryPubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data Chemistry
 
PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUIC
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 

More from Chris Southan

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCPChris Southan
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityChris Southan
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulationsChris Southan
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Chris Southan
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeChris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentChris Southan
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Chris Southan
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCPChris Southan
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteinsChris Southan
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFERChris Southan
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databasesChris Southan
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology Chris Southan
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 posterChris Southan
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagensChris Southan
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand upChris Southan
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide TribulationsChris Southan
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRChris Southan
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology updateChris Southan
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProtChris Southan
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbChris Southan
 

More from Chris Southan (20)

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 

Recently uploaded

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 

Recently uploaded (20)

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 

PubChem for drug discovery and chemical biology

  • 1. The utility of for academic drug discovery cand chemical biology Christopher Southan Drug Discovery Seminar, Stockholm, March 4th, 2019 Hosted by Per Arvidsson 1
  • 2. Abstract Since PubChem (https://pubchem.ncbi.nlm.nih.gov/) surfaced in 2004 it has become the de facto global informatics hub, not just for chemistry but also bioactivity. In addition, it is integrated within the very powerful Network Entrez system that offers connectivity to many other entities in the massive NCBI database resources including the literature in PubMed, protein structures in PDB, genomic sequences, and MeSH terms. However, the statistics of content (97.2 million compounds, 3.4 million of which have 240 million activity results against 12K protein targets) can seem daunting to users. This presentation will give an overview of this content and the basic concepts behind the major divisions of submitted substances (SIDs) non-redundant compound entries (CIDs) and the mapping of activity results to the latter (PubChem BioAssay).This will be followed by introducing selected drug discovery-relevant resources inside PubChem including ChEMBL and the IUPHAR/BPS Guide to Pharmacology. It will conclude with a series of use cases, including searching against the 23 million structures extracted from the IBM and SureChEMBL patent extraction sources in PubChem 2
  • 3. Outline • Basic context and content • Drug records • Patent chemistry < > documents • Papers < > chemistry • Chemical biology, getting to probes • Submission • Conclusions • Further information 3
  • 5. Comparisons: PubChem is well cited 5
  • 6. PubChem CID growth 2005 - 2018 6
  • 7. TheTriumvirate: Substance, Compound, BioAssay 7 • SIDs can be biololecules (e.g. large peptides and antobodies, no images • CIDs merge SMILES strings < 1000 atoms, to unique InChIs • Average SID:CID ~ 2.6, drugs ~50, aspirin (CID 244) 307 plus 1563 mixture SIDs • CIDs 4.7 % mixtures
  • 8. CID stats overview ( March 2019) 8
  • 12. Multiplexing: one drug > many forms 12
  • 13. 7 same isotopes = different stereo 13
  • 14. 52 same connectivity = 7 stereo + 45 isotopes 14 • Useful to understand and navigate these multiplexing chemistry rules • many drug molecules have complex connectivities
  • 15. Will the real approved drugs please stand up? (I) Selecting from ChEMBL 1511629 drugs > 2716 Phase 4, 2194 with SMILES > ChEMBL IDs mapping to CIDs = 2193
  • 16. (II) Selecting from DrugBank 16 10517 SIDs > 3144 approved drugs > mapping to 2086 CIDs
  • 17. (III) Selecting from Guide to Pharmacology 17 9526 SIDs > 1509 approved drugs > mapping to 1321 CIDs
  • 18. TheVenn (by CID) 18 • Union total = 2965 so 3-way intersect of 958 is only 32% • Ipso facto divergent approved drug structure capture
  • 19. Or use search history Booleans 19
  • 21. Cumulative patent-extractedCIDs • Drug disovery SAR in documents ~ 2 to 5x more than papers and years earlier • Majority of lead series now covered from automated exaction sources • BindingDB curates patent SAR (http://www.bindingdb.org/bind/ByPatent.jsp) and feeds to PubChem and ChEMBL • SureChEMBL is the only live source with ~ two-monthly updates • There are quality issues and overheads with automated chemistry extraction 21
  • 22. Patent analysis : SureChEMBL < > PubChem 22 Query via the ChEMBL search interface
  • 23. Patent document SAR > PubChem 23 1. > Patent (via SureChEMBL) 2. > SureChEMBL structures 3. > SAR for 52 examples in document 4. Example struc > PubChem 5. Check sources, patent number, similar compounds, ChEMBL and BioAssay intersects 6. Make SAR table 7. Check for missing cpds
  • 24. Tanimoto similarity shell for SAR “walking” 24
  • 25. PubChem indexes chemistry against the document 25 SciFinder had 212 Substances with 112 categorised as biological
  • 27. • At GtoPdb expert curators judge what a paper is ”about” in terms of the key active compound to target-map • We focus on approved drugs, clinicall candidates, immunopharmacolgy and most recently malaria • We link the PubMed ID (PMID) as a reference to that database record, the chemical strucure and quantiative bioactivity • We then submit our entries as a Substance Identifiers (SIDs) to PubChem, with comments the references included in the files • PubChem, in turn links our SIDs to our PMIDs (i.e. structure-to-document, s2d) • PubChem merges identical SIDs to CIDs and PMIDs from different sources that may index different structures • For GtoPdb all the links above are reciprocal • Other sources make analogous linking 27 "We have spent millions putting chemistry into PDFs but now are spending more millions taking it back out” (Anon)
  • 28. 1. GtoPdb expert curators judge what a paper is ”about” in terms of active compounds as key lead structures and mmoa 2. We focus on approved drugs, clinicall candidates, immunopharmacolgy and most recently malaria 3. We link the PubMed ID (PMID) as a reference to that database record, the chemical strucure and quantiative bioactivity 4. We then submit our entries as a Substance Identifiers (SIDs) to PubChem, with comments the references included in the files 5. PubChem, in turn links our SIDs to our PMIDs (i.e. structure-to-document, s2d) 6. PubChem merges identical SIDs to CIDs and PMIDs from different sources that may index different structures 7. For GtoPdb all the links above are reciprocal 28 "We have spent millions putting chemistry into PDFs but now we are spending more millions taking it back out" (Anon )
  • 29. Let’s try an author…. 29
  • 30. The Entrez system has linked 48 CIDs to the 58 papers 30
  • 31. In this case there is a MeSH link to PubChem 31
  • 32. But there is a quirk…. 32
  • 35. Chemical Probes: not cleanly indexed so use external source for ”mapping in” to PubChem 35
  • 37. Checking IK mapping fails by Googling InChIKey 37
  • 39. Getting in to PubChem is easier than you think 39
  • 40. Conclusions • PubChem has become an essential resource for drug discovery and chemical biology, academic as well as commercial • Risk of proprietary structure query interception is realistically zero • Many functionalities to explore and content sets to compare • Can seem daunting but simple questions can be answered and engagement practice pays off • Combination of selects, Booleans History combinations and NCBI Entrez connectivity are´very powerful • Like all such sources, it has quirks, caveats, submitter quality issues, gotchas and constitutive cheminformatic challenges • No chemistry rules are perfect but PubChem’s work well and are navigable • Programmatic access by PUG REST, RDF triples: 138,312,069,777 • Has synergies with stand-alone sources such asChEMBL, SureChEMBL, and Guide to Pharmacology • Literature to database connectivity is improving but still big shortfall in SAR extraction from papers into CIDs and BioAssay 40
  • 43. Further PubChem tips and tricks 43 https://www.slideshare.net/cdsouthan/presentations https://cdsouthan.blogspot.com/