Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Peptide Tribulations in GtoPdb


Published on

ACS Boston, Aug 2018 (abstract on slide 2)

Published in: Science
  • Be the first to comment

  • Be the first to like this

Peptide Tribulations in GtoPdb

  1. 1. Trials and tribulations of curating peptide and antibody ligands for the IUPHAR/BPS Guide to Pharmacology Christopher Southan, Joanna L. Sharman, Adam J. Pawson, Simon D. Harding, Elena Faccenda and Jamie A. Davies, IUPHAR/BPS Guide to Pharmacology, Discovery Brain Sciences, University of Edinburgh, UK. ACS Boston 2018, Biologics & Registration Session, Mon Aug 20, 15:50 - 16:15, Harbor Ballroom II 1
  2. 2. Abstract (will not be shown) As an expert-curated database of approved, clinical or research pharmacological targets mapped to defined ligands, the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) and its precursor IUPHAR-DB, have been extracting and annotating bioactive peptides from papers for well over a decade. The current total has reached 2089 peptides, split between exogenous and endogenous, within the 9144 ligand entries submitted to PubChem in our 2018.2 database release. More recently, as approved drugs or clinical candidates we have curated 235 antibodies and a small number of therapeutic nucleotides. Indexing these entity types in GtoPdb present challenges similar to those being encountered for the registration of biologicals as explicitly defined structures. In addition, we target-map the citation- supported quantitative binding parameters where possible.This presentation will outline these curatorial challenges and our efforts to at least partially ameliorate the problems. For peptides below the PubChem CID SMILES limit of approximately 70 residues we have been using Sugar and Splice from NextMove Software to convert more of our peptide SIDs to join the 6969 CIDs we already have. However, we are often confounded by the equivocal structural specifications of authors w.r.t. post translational modifications and exact positions of radiolabel incorporations. However, we do capture at least a primary sequence string as an interim compromise that users can hit by BLAST. For reported receptor-binding endogenous peptides we find some that do not match the Swiss-Prot features for the precursor protein. PubChem has been encouraging and supporting us in converting more activity- mapped peptides to CIDs and InChIKeys which should enhance inter-source connectivity. Otherwise, biological SID data can only be joined by equivocal name matching. Antibodies and other large- biological SIDs may also currently remain structurally orphaned and present their own challenges. Notwithstanding, GtoPdb has successfully curated at least primary sequences for the molecular specification of clinical Mabs. For this we use the IMGT/mAb-DB for approved monoclonals as a first stop shop since they extract sequences from INN documents. For these and clinical candidates with code names we also use the patent sequence databases to source a UniParc accession number and can sometimes get binding data that has not appeared in papers. 2
  3. 3. Outline • Intoducing GtoPdb • GtoPdb peptide content and stats • Peptide tribulations • PubChem peptidic pros and cons • Getting more peptides > SMILES • GtoPdb antibody content • Antipbody tribulations • Stats and examples • Exploiting PubChem SID tagging • Wher we go from here • Further information 3
  4. 4. Introducing the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) • IUPHAR = International Union of Basic and Clinical Pharmacology, BPS = British Pharmacological Society • Formerly know as IUPHAR-DB for receptors and channels since 2003 • Since 2012 funded byWellcomeTrust to cover all targets in the human genome • Since 2015 WellcomeTrust “fork” as Guide to IMMUNOPHARMACOLOGY • Molecular mechanism of action (mmoa) mapping primary & secondary targets • Release cycle time (with PubChem refreshes) ~ 2 months • Six well-cited NAR Annual Database issues, latest as PMID 29149325 (2018) • Distilled into the 2-yearly BritishJournal of Pharmacology “Concise Guide to PHARMACOLOGY” as a nine-paper series (see PMID 29055037) with outlinks • Presents users with selected quality compounds for pharmacology research in silico, in vitro, in cellulo, in vivo, in clinico • An ELIXIR UK Node resource since 2016 4
  5. 5. 5 Expert-curated, citation provenanced, quantitative binding data Document > assay > result > compound > location > protein target D- A- R - C- L- P Where “C” is not a small molecule, we have ~ 2000 peptides and ~ 250 antibodies included in the ~ 9000 substances we submit to PubChem
  6. 6. Peptides 6
  7. 7. Endogenous peptides (786) 7
  8. 8. Non-endogenous peptides (1310) 8
  9. 9. Peptide stats • Peptide ligs/all ligs = 22%. • Ligands with quantitative binding data/all ligs = 75% • Peptides with quantitative binding data/all peps = 63% • CID quantitative binding data peptides/all CID peps = 89% 9
  10. 10. Tribulations with peptides • Author specifications may be insuficient for complete molecular definition • Consequent structural equivocalties slip through the editor/referee net • Correct IUPAC peptide nomenclature is rare (ad-hoc more common) • Exact location of radiolables often not specified • Absence of purity verification and/or in vivo stability • Need to surface user-intuative renderings (but HELM rules OK) • Poor resolution of peptide name-to-structure (n2s) • SMILES only copes for ~ 70 residues • Searching patents for corroborative peptide prior-art is much more difficult than small-molecules • Literature extraction or author database submissions for bioactive peptides proportionally lower than small molecules • Species ”zoo” for venom peptides and their names • Conjugates (peptides + linkers + proteins ect) even more difficult • The PIR RESID Database of Protein Modifications is no longer maintained 10
  11. 11. The classic peptidic triple-whammy 11 Endothelin-1, CID 91928636, 1470 ”Similar Compounds” and top-100 BLAST hits • Too big to search or cluster by SMILES • Too small to BLAST cleanly (and sans PTMs) • Too many species splits for precursors
  12. 12. Endothelin-1 inGtoPdb 12• But this now needs a SMILES backfill
  13. 13. Swiss-Prot precursor annotation: useful but text-only PTMs 13
  14. 14. PubChem bad news: will the real Endothelin-1 please stand up? 14 • "endothelin 1"[CompleteSynonym] > 6 CIDs > 36 SIDs (10 SID-only) • “MW 2491.9140 NOT endothelin 1“ > 16 CIDs > 23 SIDs (some unnamed) • BioAssay spliting (including for SID-only) is problematic
  15. 15. PubChem good news: GtoPdb > SID SMILES > CID > biologicals annotation 15
  16. 16. PubChem: more good news 16
  17. 17. Our current push: Peptides > S&S > SMILES > SIDs > CIDs 17
  18. 18. Antibodies 18
  19. 19. Tribulations with antibody curation • Getting at least a primary Mab sequence as a molecuar definition • Not alll clinical Mab sequences > patents > INN > IMGT-DB • May get persistant UniParc ID sequence (on a good day) • Papers often omit in vitro binding data • Challenging to track press releases back to primary data • Papers usually dont usually cite the patents • But we sometimes get binding data from patents • The biosimilars are piling in • No open specification of glycan chains linked to primary sequences • Some journals publish Mab characterisation with blinded code names • Considering reseach reagents with vendor IDs if well provenanced 19
  20. 20. GtoPdb antibodies (245) 20
  21. 21. Example: adalimumab 21
  22. 22. Exploiting PubChem SID-tagging for user selections 22
  23. 23. GtoP plans • Continue peptide back-fill of peptides > CIDs using S&S • Resolve our sequences against Swiss-Prot x-refs, ChEMBL and GPCRdb • Continue adding antibody biosimilar cross-pointers • Consider adding ”peptide” as a new SID tag • For IUPHAR Guide to Immunopharmacology – Sub-comitee feedback on peptides, antibodies, targets and indications – Continue curation of peptides relevant to immunity and inflamation • Anticipate curation of new ”binder” therapeutics including minibodies, polyvalents and hybrids • Keep watching brief on large-molecule InChIKeys • Belt-and-braces of linking SMILEs with compromise (i.e. sans modifications) FASTA approximations for BLAST indexing and clustering of peptide ligands • Introduce local HELM rendering • Revise legacy data model (e.g. introduce a protein ligand classification) 23
  24. 24. Acknowledgments, info, COI 24 Conflict of interest (minor) has consulted in the peptide area Thanks to the NextMove team for S&S support Lin Yikai, for her M.Sc. project; ”Developing bio/cheminformatics methods for converting bioactive peptide structures into machine- readable formats” Anna Gaulton for ChEMBL FASTA sequences Paul Thiessen for PubChem for FASTA sequences