Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Peptide Tribulations

Presentation for BPS Pharmacology 2018, London

  • Login to see the comments

  • Be the first to like this

Peptide Tribulations

  1. 1. Tribulations of curating published key bioactive peptides for the Guide to PHARMACOLOGY Christopher Southan, Joanna L. Sharman, Adam J. Pawson, Simon D. Harding, Elena Faccenda and Jamie A. Davies, IUPHAR/BPS Guide to Pharmacology, Discovery Brain Sciences, University of Edinburgh, UK. BPS 2018 Molecular and Cellular Pharmacology Oral Communications 1 Tuesday, December 18, 15:00 1
  2. 2. Abstract (will not be shown) Introduction:The crucial roles of bioactive peptides in pharmacology, drug discovery and chemical biology are well established. Consequently, the IUPHAR/BPSGuide to PHARMACOLOGY (GtoPdb) and its precursor IUPHAR-DB have been curating peptide entries for over a decade.While small-molecule chemical structures have curatorial challenges with which the GtoPdb team has to grapple, these are exacerbated for peptides. Because of their increasing importance both in endogenous pharmacology (e.g. GPCR ligands) and the development of new exogenous modified peptide therapeutics we undertook a review of our peptide statistics, curation strategies, indexing in PubChem and enhancement options. Methods:We assessed our internal peptide statistics for release 2018.3 including our submitted PubChem substance entries (SIDs) and undertook a retrospective assessment of their tribulations.We also looked at equivocality problems with searching peptides in PubChem and major sequence sources. To enhance our own curation, we piloted the Sugar and Splice (S&S) program from NextMove Software to convert more of our medium-sized peptides from sequence strings, including formally specified post-translational modifications (PTMs) to SMILES molecular representations that could then merge with PubChem compound entries (CIDs). Results:The current database includes 786 endogenous and 1310 exogenous peptide entries (n.b. the presentation will update these stats from the upcoming 2018.4 release).These are nested within our 9345 PubChem SIDs but many have not formed CIDs. Legacy problems were mostly due to equivocal structural specifications of PTMs and exact positions of radiolabel incorporations. However, our capturing of at least a primary sequence string is a compromise that users can match by BLAST search. As an example, exploring “Endothelin-1” in PubChem and by NCBI sequence search exposed major name-to-structure mapping problems and multiple structures, includingSwiss-Port features for the precursor protein.We assessed the major problem of similarity ascertainment for peptides because they are too large for chemical clustering but too small for clean sequence searching. We successfully incorporated S&S into our peptide curation triage and converted many legacy sequences to SMILES strings. Conclusion: Despite their increasing importance, pharmacology database entries for bioactive peptides are associated with tribulations that GtoPdb, PubChem and other databases have so far confronted with only partial success. Many are associated with equivocal representations in papers that thus render many reported experiments irreproducible. We urge authors and journal editors to increase the specificity of peptide specifications. In collaboration with PubChem and NextMove we are improving our peptide curation, including for some of our legacy entries. 2
  3. 3. Outline • Intoducing GtoPdb • GtoPdb peptide content and stats • Peptide tribulations • PubChem peptidic pros and cons • Getting more peptides > SMILES • Stats and examples • Exploiting PubChem SID tagging • Wher we go from here • Further information 3
  4. 4. Introducing the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) • IUPHAR = International Union of Basic and Clinical Pharmacology, BPS = British Pharmacological Society • Formerly know as IUPHAR-DB for receptors and channels since 2003 • Since 2012 funded byWellcomeTrust to cover all targets in the human genome • Since 2015 WellcomeTrust “fork” as Guide to IMMUNOPHARMACOLOGY • Molecular mechanism of action (mmoa) mapping primary & secondary targets • Release cycle time (with PubChem refreshes) ~ 2 months • Six NAR Annual Database issues, latest as PMID 29149325 (2018) • Distilled into the 2-yearly BritishJournal of Pharmacology “Concise Guide to PHARMACOLOGY” as a nine-paper series (see PMID 29055037) with outlinks • Presents users with selected quality compounds for pharmacology research in silico, in vitro, in cellulo, in vivo, in clinico • An ELIXIR UK Node resource since 2016 4
  5. 5. 5 The GtoPdb hallmark: quantitative binding data Document > assay > result > compound > location > protein target D- A- R - C- L- P Where “C” is not a small molecule, we have ~ 2000 peptides included in the ~ 9000 substances we submit to PubChem
  6. 6. Endogenous peptides (786) 6
  7. 7. Non-endogenous peptides (1310) 7
  8. 8. GtoPdb peptide stats • Peptide ligs/all ligs = 22%. • Ligands with quantitative binding data/all ligs = 75% • Peptides with quantitative binding data/all peps = 63% • CID quantitative binding data peptides/all CID peps = 89% • These are from release 2018.3 so slight changes in current 2018.4 8
  9. 9. Tribulations with peptides • Author specifications often insuficient for complete molecular definition • Consequent structural equivocalties slip through the editor/referee net • Correct IUPAC peptide nomenclature, esp for modified residues is rare (ad- hoc more common) • Poor resolution of peptide name-to-structure (n2s) • Exact location of radiolables often not specified • Absence of purity verification and/or in vivo stability • Different graphic rendering styles • SMILES only < ~ 70 residues in PubChem (grey zone of small peptides) • Literature and patent extraction for database feeds are proportionally lower than small molecules • Searching patents for peptide prior-art or analogues is much more difficult than small-molecules • Species ”zoo” for venom peptides and names • Conjugates (peptides + linkers + proteins ect) provide even more tribulations 9
  10. 10. The classic peptidic triple-whammy 10 Endothelin-1, CID 91928636, 1470 ”Similar Compounds” and top-100 BLAST hits • Too big to search or cluster by SMILES • Too small to BLAST cleanly (and sans PTMs) • Too many species splits for precursors
  11. 11. Endothelin-1 in GtoPdb 11
  12. 12. Swiss-Prot precursor annotation: useful but text-only PTMs 12
  13. 13. PubChem bad news: will the real Endothelin-1 please stand up? 13 • "endothelin 1"[CompleteSynonym] > 6 CIDs > 36 SIDs (10 SID-only) • “MW 2491.9140 NOT endothelin 1“ > 16 CIDs > 23 SIDs (some unnamed) • Problematic BioAssay spliting (including for SID-only) • No fix on the immediate horizon :(
  14. 14. PubChem biologicals annotation 14
  15. 15. Hierarchical Editing Language for Macromolecules (HELM) 15
  16. 16. Our current GtoPdb push: Peptide > S&S > SMILES > SIDs > CIDs 16
  17. 17. GtoPdb plans • Continue peptide back-fill of peptides > CIDs using Sugar &Splice • Resolve our sequences against Swiss-Prot x-refs, ChEMBL and GPCRdb • Consider adding ”peptide” as a new SID tag • For IUPHAR Guide to Immunopharmacology – Sub-comitee feedback on peptides, antibodies, targets and indications – Continue curation of peptides relevant to immunity and inflamation • Anticipate curation of new ”binder” therapeutics including minibodies, polyvalents and hybrids • Belt-and-braces of linking SMILEs with compromise (i.e. sans modifications) FASTA approximations to facilitate BLAST indexing and clustering of peptide ligands • Introduce local HELM rendering • Revise legacy data model (e.g. introduce a protein ligand classification) 17
  18. 18. Acknowledgments, info and COI 18 Conflict of interest (minor) has consulted in the peptide area Thanks to the NextMove Software team for S&S support Lin Yikai, for her M.Sc. project; ”Developing bio/cheminformatics methods for converting bioactive peptide structures into machine- readable formats” Paul Thiessen from PubChem for support and FASTA sequences