Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Southan real drugs_paris_oct_11_2014


Published on

The challenges of curating approved medicines:will the real drugs please stand up?

  • Be the first to comment

  • Be the first to like this

Southan real drugs_paris_oct_11_2014

  1. 1. Challenges of curating approved medicines: Will the real drugs please stand up? Chris Southan, representing the Database Team NC-IUPHAR/BPS/GTPdb Biannual Meeting, Paris, October 2014 1
  2. 2. What is the total for approved drug structures? Take your pick ….. 2
  3. 3. Discordance between sources inside PubChem 3
  4. 4. Explanations • Discordance: distinctly different drug molecular representations from different sources that we would recognise canonically as the same bioactive substance • These are merged into multiple CIDs per drug (i.e. “multiplexed”) via the PubChem chemistry rules due to: – Permutation of R/S stereo centers – Salt forms – Mixtures – Unresolved E/Z bonds – Tautomers – Isotopic derivatives including deuteration 4
  5. 5. Causes of drug structure multiplexing • Inherent challenges and complexities of chemical representation • Utility of PubChem depends on advanced rules applied to a submission-based system • Drug companies never verify their own structures in public databases • Legacy of structure image primacy in documents • No clear accountability for correctness of public approved drug structures (companies? FDA? WHO(INN)? AMA(USAN)? Wikipedia? CAS?) • Structural variants enter databases from general source proliferation, large-scale patent extractions, chemical vendor submissions and repeated exemplifications in journals • The net effect is an inexorable increase in multiplexing but not necessarily erroneous structures per se 5
  6. 6. A case of the wrong name > structure 6
  7. 7. Fixing errors: doing our bit 7
  8. 8. Taxol: a challenging example 8
  9. 9. Finding the links: multiplexed to 129 CIDs 9
  10. 10. Reading the links for alternative taxols: different structures > 20 sets of assay results 10
  11. 11. Virtual deuteration: compounding drug multiplexing 11
  12. 12. Scale of the issue for approved drugs in PubChem: multiplexing expansion from 2005 to 2014 12
  13. 13. So how are we doing in our database? • Sets were salt-stripped for this comparison • GTPdb (Oct 2014) has 983 approved drug CIDs concordant with either ChEMBL or DrugBank • But only 723 are 4-way concordant • We will inspect the 152, 192 and 180 sectors for consensus expansion 13
  14. 14. Consequences and possible solutions to the drug multiplexing issue • Our drugs annotation Committee cannot magic these issues away but their support is crucial • Our consensus approach is useful and statistical defendable • In the GTPdb we add curator comments and cross-pointers for key multiplexed examples • Sources that make the effort to collate drug structure sets should cross-corroborate more • A canonical approach to merging drug structure-to-bioactivity mappings could be considered • The inner connectivity layer of the InChIKey goes some way towards this 14