Will the Correct Drugs Please Stand up?
Chris Southan, Elena Faccenda, Simon J. Harding,
Joanna L. Sharman, Adam J. Pawson, and Jamie A Davies
IUPHAR/BPS Guide to Pharmacology (GtoPdb)
University of Edinburgh, Centre for Integrated Physiology, EH8 9XD, UK.
Presentation for the 12th GCC Fulda, November 2016
1
Declarations
2
• Since circa 2005, team members working on IUPHAR-DB that became
GtoPdb in 2012 have been curating the structures of approved drugs for
human diseases
• Partly as a consequence of the work presented here, we neither claim a
definitive, nor error-free, nor a complete, approved set
• We have encountered most of the problems associated with this exercise
first hand, so we empathise with teams grappling with the same issues
• We are grateful to all the sources in PubChem used in this comparison study
• The highlighting of inter-source discordances and particular examples in this
presentation should not be misinterpreted as criticism of those sources
Surfaced totals: take your pick
3
Intersects between three sources: 2006-2009
4
20092006
Context of the current work
• Since ~2013 the GtoPdb team noticed that the structure space around
approved drugs was becoming increasingly multiplexed and “fuzzy”
• Curatorial choices were consequently getting more difficult from a
pharmacological angle
• We thus needed a molecular perspective on causes and consequences of
this “fuzz” with a view to reappraising our drug curation strategy
• Updating PMID 20298516 was a logical approach but the methods used
then had been largely superseded
• We had increased our PubChem exploitation by;
– paying close attention to our regular substance submissions and refreshes
– using it for curatorial selection
– exploring “fuzz” via relationship navigation in PubChem
– finding more approved drug sources that we could compare directly inside PubChem
• It thus became feasible to explore approved drug comparisons entirely
within PubChem
5
Methods outline
• Identify submitters in PubChem that were expected to encompass FDA and
other approved drug structures represented by CIDs (i.e. excluding large
biologicals)
• In some cases coverage was SID-tagged (e.g. DrugBank) in others explicit (e.g.
INN/USAN) and others implicit (e.g. FDA UNIIs)
• Select and/or convert each source to a PubChem CID list (or extrinsically for
ChEMBL approved)
• Compare these sources at the CID level
• Look at overlaps between four curated sets (using theVenny tool)
• Analyse intersects and diffs by PubChem relationships (next slide)
6
Drug relationship interrogation
using the PubChem rules
7
Eight Sources for comparison
8
Sequential overlaps for approved drugs
9
• ChEMBL 1900 approved CIDs as a starting point > intersecting these 8 sources
• Left only 183 CIDs in-common
• Doing seven intersects without any approved sets > 373
• Adding FDA/MDD (1216) > 198
Four-source intersects at the CID level
10
Comparative parameters for the splits
11
Consensus drug submissions: popular but old
12
• Nothing more recent than 2011 (CID
54677470 meloxicam)
Niacin tops the pops
Source-unique
entries (orphans)
13
DrugCentral orphan (from 29)
14
Cross-checking Taltirelin
15
DrugBank approved orphan (from 6)
16
NPC orphan (from 14)
17
ChEMBL approved orphan (from 25)
18
Consensus drug multiplexing in PubChem
19
Thomson Pharma
SCRIPDB
SureChEMBL
But some mixtures can be “correct “ drugs
20
Conclusions
• Discordances between sources and counts of approved drug structures
mapped into PubChem give cause for concern
• As a testimony to the challenge, this is certainly no ones “fault”
• Clean selects for approved structures from different sources should be
easier (e.g. direct InChIKey downloads)
• The PubChem selection functionality and relationship navigation
facilitates exploring the causes of discordance
• Its no surprise that confounding factors include chiral complexity and
mixtures
• Its not clear if different extrinsic comparison methods (e.g. InChIKey
and/or CACTVS toolkit ) would give more optimistic results
• Issues around structural multiplexing extend to all bioactive database
entries, not just approved drugs
• Patent extractions and vendors in PubChem are valuable but contribute
to extensive multiplexing of drug structures (e.g. virtual deuteration)
21
Consequences
• There is neither a definitive set of approved drug structures nor any
consensus on totals, FDA or global
• Considering these are the “Crown Jewels” of many decades of global R&D for
human medicines they could be better looked after
• There is no data to suggest commercial collations are significantly less
affected than public ones
• The broader ramifications of the problem are unclear but certainly present
pitfalls that have/will affect published work
• For QSAR, the edict “Trust but verify” remains particularly apposite
• Big Data and the mega-portals will simply transitively subsume and recycle
the discordances
• GtoPdb now takes a more pragmatic and parsimonious approach to
approved drug annotation, including adding more curators notes
• Note this work only addresses struc-to-struc issues.These are compounded
by additional multiplexing problems of name-to-struc and struc-to-activity
22
Amelioration ideas
• The single biggest step forward would be for drug development
organisations globally (mainly pharma) to provenance and submit their
own “Gold Standard” structures that are under regulatory consideration,
directly from their internal registrations systems into PubChem
• This avoids the mapping spaghetti in the “system” of IND, FDA, INN,
USAN, CAS, that mainly shuffle around PDF images and IUPAC names
• The countless US Government-affiliated systems and initiatives could
improve their cross-normalisation
• Will the Global Ingredient Archival System(GinAS) rule them all?
• We should explore inter-source collaborative cross-curation
• Crowdsourcing efforts, such as Wikidata, might improve the situation
• Given what look to be unsurmountable challenges should we consider
ensemble/cluster based solutions? (e.g. OpenPhacts semantic lenses)
23
Thank you, questions welcome
now and/or later over a glass of something perhaps…
24
Poster 34
See me for a flyer
http://www.slideshare.net/cdsouthan/
http://www.slideshare.net/cdsouthan/will-the-correct-drugs-please-stand-up-68239021
You can pick up the 459 in this MyNCBI link
https://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.so
uthan.1/collections/51348141/public/
Let me know if you need any of the other sets

Will the correct drugs please stand up?

  • 1.
    Will the CorrectDrugs Please Stand up? Chris Southan, Elena Faccenda, Simon J. Harding, Joanna L. Sharman, Adam J. Pawson, and Jamie A Davies IUPHAR/BPS Guide to Pharmacology (GtoPdb) University of Edinburgh, Centre for Integrated Physiology, EH8 9XD, UK. Presentation for the 12th GCC Fulda, November 2016 1
  • 2.
    Declarations 2 • Since circa2005, team members working on IUPHAR-DB that became GtoPdb in 2012 have been curating the structures of approved drugs for human diseases • Partly as a consequence of the work presented here, we neither claim a definitive, nor error-free, nor a complete, approved set • We have encountered most of the problems associated with this exercise first hand, so we empathise with teams grappling with the same issues • We are grateful to all the sources in PubChem used in this comparison study • The highlighting of inter-source discordances and particular examples in this presentation should not be misinterpreted as criticism of those sources
  • 3.
  • 4.
    Intersects between threesources: 2006-2009 4 20092006
  • 5.
    Context of thecurrent work • Since ~2013 the GtoPdb team noticed that the structure space around approved drugs was becoming increasingly multiplexed and “fuzzy” • Curatorial choices were consequently getting more difficult from a pharmacological angle • We thus needed a molecular perspective on causes and consequences of this “fuzz” with a view to reappraising our drug curation strategy • Updating PMID 20298516 was a logical approach but the methods used then had been largely superseded • We had increased our PubChem exploitation by; – paying close attention to our regular substance submissions and refreshes – using it for curatorial selection – exploring “fuzz” via relationship navigation in PubChem – finding more approved drug sources that we could compare directly inside PubChem • It thus became feasible to explore approved drug comparisons entirely within PubChem 5
  • 6.
    Methods outline • Identifysubmitters in PubChem that were expected to encompass FDA and other approved drug structures represented by CIDs (i.e. excluding large biologicals) • In some cases coverage was SID-tagged (e.g. DrugBank) in others explicit (e.g. INN/USAN) and others implicit (e.g. FDA UNIIs) • Select and/or convert each source to a PubChem CID list (or extrinsically for ChEMBL approved) • Compare these sources at the CID level • Look at overlaps between four curated sets (using theVenny tool) • Analyse intersects and diffs by PubChem relationships (next slide) 6
  • 7.
  • 8.
    Eight Sources forcomparison 8
  • 9.
    Sequential overlaps forapproved drugs 9 • ChEMBL 1900 approved CIDs as a starting point > intersecting these 8 sources • Left only 183 CIDs in-common • Doing seven intersects without any approved sets > 373 • Adding FDA/MDD (1216) > 198
  • 10.
    Four-source intersects atthe CID level 10
  • 11.
  • 12.
    Consensus drug submissions:popular but old 12 • Nothing more recent than 2011 (CID 54677470 meloxicam) Niacin tops the pops
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
    Consensus drug multiplexingin PubChem 19 Thomson Pharma SCRIPDB SureChEMBL
  • 20.
    But some mixturescan be “correct “ drugs 20
  • 21.
    Conclusions • Discordances betweensources and counts of approved drug structures mapped into PubChem give cause for concern • As a testimony to the challenge, this is certainly no ones “fault” • Clean selects for approved structures from different sources should be easier (e.g. direct InChIKey downloads) • The PubChem selection functionality and relationship navigation facilitates exploring the causes of discordance • Its no surprise that confounding factors include chiral complexity and mixtures • Its not clear if different extrinsic comparison methods (e.g. InChIKey and/or CACTVS toolkit ) would give more optimistic results • Issues around structural multiplexing extend to all bioactive database entries, not just approved drugs • Patent extractions and vendors in PubChem are valuable but contribute to extensive multiplexing of drug structures (e.g. virtual deuteration) 21
  • 22.
    Consequences • There isneither a definitive set of approved drug structures nor any consensus on totals, FDA or global • Considering these are the “Crown Jewels” of many decades of global R&D for human medicines they could be better looked after • There is no data to suggest commercial collations are significantly less affected than public ones • The broader ramifications of the problem are unclear but certainly present pitfalls that have/will affect published work • For QSAR, the edict “Trust but verify” remains particularly apposite • Big Data and the mega-portals will simply transitively subsume and recycle the discordances • GtoPdb now takes a more pragmatic and parsimonious approach to approved drug annotation, including adding more curators notes • Note this work only addresses struc-to-struc issues.These are compounded by additional multiplexing problems of name-to-struc and struc-to-activity 22
  • 23.
    Amelioration ideas • Thesingle biggest step forward would be for drug development organisations globally (mainly pharma) to provenance and submit their own “Gold Standard” structures that are under regulatory consideration, directly from their internal registrations systems into PubChem • This avoids the mapping spaghetti in the “system” of IND, FDA, INN, USAN, CAS, that mainly shuffle around PDF images and IUPAC names • The countless US Government-affiliated systems and initiatives could improve their cross-normalisation • Will the Global Ingredient Archival System(GinAS) rule them all? • We should explore inter-source collaborative cross-curation • Crowdsourcing efforts, such as Wikidata, might improve the situation • Given what look to be unsurmountable challenges should we consider ensemble/cluster based solutions? (e.g. OpenPhacts semantic lenses) 23
  • 24.
    Thank you, questionswelcome now and/or later over a glass of something perhaps… 24 Poster 34 See me for a flyer http://www.slideshare.net/cdsouthan/ http://www.slideshare.net/cdsouthan/will-the-correct-drugs-please-stand-up-68239021 You can pick up the 459 in this MyNCBI link https://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.so uthan.1/collections/51348141/public/ Let me know if you need any of the other sets