Assessing GtoPdb ligand content in PubChem

Christopher Southan, Elena Faccenda, Simon J. Harding, Joanna L. Sharman, Adam J. Pawson, and Jamie A
Davies, Centre for Integrative Physiology, The University of Edinburgh, EH8 9XD UK,
www.guidetopharmacology.org http://www.slideshare.net/cdsouthan/assessing-gtopdb-ligand-content-in-pubchem
Assessing the IUPHAR/BPS Guide to
PHARMACOLOGY ligand content in PubChem
INTRODUCTION
The utilities of these intersects are outlined below (in order of counts):
• CNER refers to “Chemical Named Entity Recognition” for the automated extraction of chemistry from patents by sources submitting to PubChem (of
which SureChEMBL is the largest at 16.3 million). This means that users can track-back most of our ligands to early patent filings that can often include
more SAR than eventually appeared in the papers.
• Our low overlap with DrugBank indicates both sources are complementary in bioactive compound selection (i.e. the OR union is 12605)
• The possibility of sourcing purchasable compounds is important for experimental pharmacologists. From the 64 million vendor structures in PubChem
we have nearly an 80% overlap and similarity searches may pick up analogues where there is no exact match.
• The “BioAssay active” tag overlaps extensively with ChEMBL entries but users can check for a range of activities for a ligand that maybe additional to
the values we have extracted from selected papers.
• The MeSH term “pharmacological action” is useful but our impression is that NLM is falling behind in the PubChem indexing of this term.
• PDB ligand structures are valued database cross-references for many reasons.
• We have introduced a new feature that allows users to retrieve just our 1291 approved drug SID entries (Query “approved[Comment] AND
"IUPHAR/BPS Guide to PHARMACOLOGY"[SourceName]”). The “PubChem Same Compound” select then generates 1174 small-molecule CIDs. This
facilitates different types of comparative analysis between drug lists.
• As expected, our overlap with ChEMBL structures is high but we have captured 1147 structures not in this source, mainly due to different journal capture
and shorter release cycles.
• The selection “unique to GtoPdb” indicates those CIDs where we are the only source in the whole of PubChem. These are predominantly novel
structures we have extracted from papers but in some cases we have selected a different structure from other sources.
• There may be interest in which pharmacologically active peptides we have CIDs for. A simple Mw-cut isolates 178 entries
Further details related to intersects above are given this GtoPdb blog post https://blog.guidetopharmacology.org/2016/10/31/gtopdb-ligands-in-pubchem/.
This post about PubChem sources in general may also be of interest https://cdsouthan.blogspot.se/2016/06/pubchem-source-of-month.html.
Reference[1]: “The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000
ligands”. Southan et al, Nucleic Acids Research, 2016 Jan 4;44(D1): Database Issue, D1054-68, PMID: 2646443
The International Union of Basic and Clinical Pharmacology and British
Pharmacological Society (UPHAR/BPS) Guide to PHARMACOLOGY
database (GtoPdb) and its precursor IUPHAR-DB have been capturing
the structures of pharmacologically relevant ligands since 2005 [1].
The snapshot on the right shows our eight-category ligand
classification. As an active collaboration with the PubChem team, we
have submitted our ligand records for every GtoPdb release since
2012. For release 2016.4 (October) the query ("IUPHAR/BPS Guide
to PHARMACOLOGY"[SourceName]) retrieves 8674 Substance
Identifiers (SIDs) and 6565 Compound Identifiers (CIDs). The excess
of 2109 SIDs is accounted for by antibodies, small proteins and larger
peptides that cannot form CIDs. At just over 92 million CIDs covering
473 sources, a range of property filters and full Boolean operations for
combining query sets, PubChem provides an opportunity to “slice and
dice” our ligand set in comparative and informative ways. Just a small
set of example results is shown below.
RESULTS
Supported by

Assessing GtoPdb ligand content in PubChem

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (7)

Similar to Assessing GtoPdb ligand content in PubChem

Similar to Assessing GtoPdb ligand content in PubChem (20)

More from Chris Southan

More from Chris Southan (20)

Recently uploaded

Recently uploaded (20)

Assessing GtoPdb ligand content in PubChem