Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Peptide tribulations

160 views

Published on

Presented to David Gloriam's Group, Copenhagen, Feb 2020
**********************************
The theme will be presented from the perspective of both past involvement in peptide curation in the Guide to Pharmacology (GtoPdb) and in current searching for bioactive peptides in the wider ecosystem that includes ChEMBL and PubChem. The core problem is that peptides hang in limbo land between bioinformatics (BLAST) and cheminformatics (Tanimoto) neither of which provide optimal searching. Curating peptides in GtoPdb presents many challenges, including mapping endogenous peptides to Swiss-Prot cleavage annotations. For synthetic peptides, equivocal specification of modifications and exact positions of radiolabels are also problematic However, target-mapped citation-supported quantitative binding parameters are curated where possible. For those peptides falling below the PubChem CID SMILES limit of approximately 70 residues, GtoPdb has been using Sugar and Splice from NextMove Software to convert into CIDs. Specific problems associated with finding bioactive peptides in databases will be outlined.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Peptide tribulations

  1. 1. Trials and tribulations of curating and searching bioactive peptides in databases Christopher Southan University of Copenhagen, Feb 2020 Host: David Gloriam 1
  2. 2. Abstract The theme will be presented from the perspective of both past involvement in peptide curation in the Guide to Pharmacology (GtoPdb) and in current searching for bioactive peptides in the wider ecosystem that includes ChEMBL and PubChem. The core problem is that peptides hang in limbo land between bioinformatics (BLAST) and cheminformatics (Tanimoto) neither of which provide optimal searching. Curating peptides in GtoPdb presents many challenges, including mapping endogenous peptides to Swiss-Prot cleavage annotations. For synthetic peptides, equivocal specification of modifications and exact positions of radiolabels are also problematic However, target-mapped citation-supported quantitative binding parameters are curated where possible. For those peptides falling below the PubChem CID SMILES limit of approximately 70 residues, GtoPdb has been using Sugar and Splice from NextMove Software to convert into CIDs. Specific problems associated with finding bioactive peptides in databases will be outlined. 2
  3. 3. Outline • Peptide tribulations • Intoducing GtoPdb • GtoPdb peptide content and stats • PubChem peptidic pros and cons • Getting more peptides > SMILES 3
  4. 4. Bad news: neither GtoPdb nor ChEMBL nor PubChem seach-index their peptides 4
  5. 5. Tribulations with peptides • Dificult to define structurally • Endogenous peptide activities can be complex many-to-many systems • Author specifications often insuficient for complete molecular definition • Structural equivocalties slip through the editor/referee net • Correct IUPAC peptide nomenclature use for modifications is rare • Exact location of radiolable often not specified • Absence of purity verification and/or in vivo stability against proteolytic clipping • Noisy peptide name-to-structure (n2s) mappings • SMILES only adequate for ~ 70 residues • Image rendering not standardised • Searching patents for peptide prior art more difficult than small-molecules • Literature extraction > databases proportionally lower than small molecules • Author database submissions for bioactive peptides non existant • Species ”zoo” for venom peptides and their names • Conjugates (e.g. peptide + linker + protein) even more difficult • The PIR RESID Database of Protein Modifications is no longer maintained 5
  6. 6. GtoPdb > NCBI Entrez PubMed < > PubChem 6
  7. 7. Introducing the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) • IUPHAR = International Union of Basic and Clinical Pharmacology, BPS = British Pharmacological Society • Molecular mechanism of action (mmoa) mapping primary & secondary targets • Release cycle time (with PubChem refreshes) ~ 2 months • Seven NAR Annual Database issues, latest as PMID: 31691834 (2020) • Every 2 years distilled into the BritishJournal of Pharmacology “Concise Guide to PHARMACOLOGY” as a nine-paper series (see PMID 29055037) with outlinks • Curates selected quality compounds for pharmacology research in silico, in vitro, in cellulo, in vivo, in clinico • An ELIXIR UK Node resource since 2016 7
  8. 8. 8 Expert-curated, citation provenanced, quantitative binding data Document > assay > result > compound > location > protein target D- A- R - C- L- P Where “C” is not a small molecule, GtoP has ~ 2000 peptides included in the ~ 9000 substances we submit to PubChem
  9. 9. Endogenous peptides (786) 9 http://www.guidetopharmacology.org/GRAC/LigandListForward?type=Endogenous-peptide&database=all
  10. 10. Non-endogenous peptides (1310) 10http://www.guidetopharmacology.org/GRAC/LigandListForward?type=Peptide&database=all
  11. 11. GtoPdb peptide stats (release 2019.4) • Peptide ligands/all ligands = 22%. • Ligands with quantitative binding data/all ligs = 75% • Peptides with quantitative binding data/all peps = 63% • CID quantitative binding data peptides/all CID peps = 89% 11
  12. 12. Endothelin-1 in GtoPdb (before the SMILES backfill) 12
  13. 13. GtoPdb Entrez linkage (after 2019 back-fill 13
  14. 14. The peptidic triple-whammy 14 Endothelin-1, CID 91928636, 1470 ”Similar Compounds” and top-100 BLAST hits 1. Too big to search or cluster by SMILES 2. Too small to BLAST cleanly (and sans PTMs) 3. Too many species splits for precursors
  15. 15. Swiss-Prot precursor annotation 15 • Evidence support for endogenous processing curated from the primary literature • PTMs are indicated but text-only • Very low Mass-spec verification of existence in vivo • No standardised accession identifiers • Difficult to query across (mixed feature keys) • No secondary bioactivity annotation (e.g. from most of PubMed) • No cross-pointers (e.g. to PubChem or RefSeq)
  16. 16. Will the real Endothelin please stand up? 16 • Submissions mixed between SMILES (CIDs) and sequence strings (SIDs) • "endothelin 1"[CompleteSynonym] > 6 CIDs > 36 SIDs (10 SID-only) • “MW 2491.9140 NOT endothelin 1“ > 16 CIDs > 23 SIDs (some unnamed) • BioAssay spliting is problematic
  17. 17. Hierarchical Editing Language for Macromolecules (HELM) 17
  18. 18. GtoPdb push: Peptides > S&S > SMILES > SIDs > CIDs 18 http://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=3854
  19. 19. The Next Move move (Noel O'Boyle) 19 https://www.nextmovesoftware.com/talks/OBoyle_PubChemBiologics_ACS_201708.pdf
  20. 20. NextMove Biologics 8699 SIDs > 4969 CIDs Low bioactivity annotation (e.g. 259 in ChEMBL from 1.9 million CIDs, 36 in GtoPdb from 7674
  21. 21. Acknowledgments and info 21 • Past and present GtoPdb curators working on peptide entries • The NextMove team for Sugar &Splice support and their peptide processing in PubChem • Lin Yikai, M.Sc. project; ”Developing bio/cheminformatics methods for converting bioactive peptide structures into machine-readable formats” • Anna Gaulton for ChEMBL FASTA sequences • Paul Thiessen for PubChem for peptide CIDs

×