Successfully reported this slideshow.

Southan real drugs_paris_oct_11_2014



Loading in …3
1 of 14
1 of 14

More Related Content

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Southan real drugs_paris_oct_11_2014

  1. 1. Challenges of curating approved medicines: Will the real drugs please stand up? Chris Southan, representing the Database Team NC-IUPHAR/BPS/GTPdb Biannual Meeting, Paris, October 2014 1
  2. 2. What is the total for approved drug structures? Take your pick ….. 2
  3. 3. Discordance between sources inside PubChem 3
  4. 4. Explanations • Discordance: distinctly different drug molecular representations from different sources that we would recognise canonically as the same bioactive substance • These are merged into multiple CIDs per drug (i.e. “multiplexed”) via the PubChem chemistry rules due to: – Permutation of R/S stereo centers – Salt forms – Mixtures – Unresolved E/Z bonds – Tautomers – Isotopic derivatives including deuteration 4
  5. 5. Causes of drug structure multiplexing • Inherent challenges and complexities of chemical representation • Utility of PubChem depends on advanced rules applied to a submission-based system • Drug companies never verify their own structures in public databases • Legacy of structure image primacy in documents • No clear accountability for correctness of public approved drug structures (companies? FDA? WHO(INN)? AMA(USAN)? Wikipedia? CAS?) • Structural variants enter databases from general source proliferation, large-scale patent extractions, chemical vendor submissions and repeated exemplifications in journals • The net effect is an inexorable increase in multiplexing but not necessarily erroneous structures per se 5
  6. 6. A case of the wrong name > structure 6
  7. 7. Fixing errors: doing our bit 7
  8. 8. Taxol: a challenging example 8
  9. 9. Finding the links: multiplexed to 129 CIDs 9
  10. 10. Reading the links for alternative taxols: different structures > 20 sets of assay results 10
  11. 11. Virtual deuteration: compounding drug multiplexing 11
  12. 12. Scale of the issue for approved drugs in PubChem: multiplexing expansion from 2005 to 2014 12
  13. 13. So how are we doing in our database? • Sets were salt-stripped for this comparison • GTPdb (Oct 2014) has 983 approved drug CIDs concordant with either ChEMBL or DrugBank • But only 723 are 4-way concordant • We will inspect the 152, 192 and 180 sectors for consensus expansion 13
  14. 14. Consequences and possible solutions to the drug multiplexing issue • Our drugs annotation Committee cannot magic these issues away but their support is crucial • Our consensus approach is useful and statistical defendable • In the GTPdb we add curator comments and cross-pointers for key multiplexed examples • Sources that make the effort to collate drug structure sets should cross-corroborate more • A canonical approach to merging drug structure-to-bioactivity mappings could be considered • The inner connectivity layer of the InChIKey goes some way towards this 14