1. Challenges of curating approved medicines:
Will the real drugs please stand up?
Chris Southan, representing the Database Team
NC-IUPHAR/BPS/GTPdb Biannual Meeting, Paris, October 2014
1
2. What is the total for approved drug structures?
Take your pick …..
2
4. Explanations
• Discordance: distinctly different drug molecular representations from
different sources that we would recognise canonically as the same
bioactive substance
• These are merged into multiple CIDs per drug (i.e. “multiplexed”) via the
PubChem chemistry rules due to:
– Permutation of R/S stereo centers
– Salt forms
– Mixtures
– Unresolved E/Z bonds
– Tautomers
– Isotopic derivatives including deuteration
4
5. Causes of drug structure multiplexing
• Inherent challenges and complexities of chemical representation
• Utility of PubChem depends on advanced rules applied to a submission-based
system
• Drug companies never verify their own structures in public databases
• Legacy of structure image primacy in documents
• No clear accountability for correctness of public approved drug structures
(companies? FDA? WHO(INN)? AMA(USAN)? Wikipedia? CAS?)
• Structural variants enter databases from general source proliferation,
large-scale patent extractions, chemical vendor submissions and
repeated exemplifications in journals
• The net effect is an inexorable increase in multiplexing but not necessarily
erroneous structures per se
5
12. Scale of the issue for approved drugs in PubChem:
multiplexing expansion from 2005 to 2014
12
13. So how are we doing in our database?
• Sets were salt-stripped for this comparison
• GTPdb (Oct 2014) has 983 approved drug CIDs concordant with either
ChEMBL or DrugBank
• But only 723 are 4-way concordant
• We will inspect the 152, 192 and 180 sectors for consensus expansion
13
14. Consequences and possible solutions to the
drug multiplexing issue
• Our drugs annotation Committee cannot magic these issues away
but their support is crucial
• Our consensus approach is useful and statistical defendable
• In the GTPdb we add curator comments and cross-pointers for key
multiplexed examples
• Sources that make the effort to collate drug structure sets should
cross-corroborate more
• A canonical approach to merging drug structure-to-bioactivity
mappings could be considered
• The inner connectivity layer of the InChIKey goes some way towards
this
14