The IUPHAR/BPS Guide to Pharmacology database contains over 2000 curated peptide ligands and 235 antibody ligands. Summarizing peptide and antibody data presents challenges due to incomplete structural specifications in publications and a lack of standard nomenclature. The database developers are working to assign peptide sequences InChIKeys and convert them to PubChem CIDs using structure conversion tools to improve searchability. For antibodies, they aim to capture sequence data and map products to clinical records and patents. Future plans include continued peptide and antibody curation efforts and developing text and structure-based search methods.
Presented to David Gloriam's Group, Copenhagen, Feb 2020
**********************************
The theme will be presented from the perspective of both past involvement in peptide curation in the Guide to Pharmacology (GtoPdb) and in current searching for bioactive peptides in the wider ecosystem that includes ChEMBL and PubChem. The core problem is that peptides hang in limbo land between bioinformatics (BLAST) and cheminformatics (Tanimoto) neither of which provide optimal searching. Curating peptides in GtoPdb presents many challenges, including mapping endogenous peptides to Swiss-Prot cleavage annotations. For synthetic peptides, equivocal specification of modifications and exact positions of radiolabels are also problematic However, target-mapped citation-supported quantitative binding parameters are curated where possible. For those peptides falling below the PubChem CID SMILES limit of approximately 70 residues, GtoPdb has been using Sugar and Splice from NextMove Software to convert into CIDs. Specific problems associated with finding bioactive peptides in databases will be outlined.
Background of the project and simple use cases of using the Open PHACTS API and KNIME to extract compound, target and indication entities from millions of patent documents and infer meaningful links among them. Open PHACTS Linked Data meeting in Vienna.
I presented two fascinating stories where Molecular Dynamics simulations contributed to enhancing our understanding of immunodeficiencies. In one of the projects, the treatment of patients could be improved. These slides were presented at the Basler Modeller Stammtisch, 26.02.2021
Overview of the SureChEMBL system and web interface.
https://www.surechembl.org/search/
SureChEMBL is a freely available web resource for chemistry patent searching. It is based on a fully automatic and dynamic text and image mining pipeline.
Presented to David Gloriam's Group, Copenhagen, Feb 2020
**********************************
The theme will be presented from the perspective of both past involvement in peptide curation in the Guide to Pharmacology (GtoPdb) and in current searching for bioactive peptides in the wider ecosystem that includes ChEMBL and PubChem. The core problem is that peptides hang in limbo land between bioinformatics (BLAST) and cheminformatics (Tanimoto) neither of which provide optimal searching. Curating peptides in GtoPdb presents many challenges, including mapping endogenous peptides to Swiss-Prot cleavage annotations. For synthetic peptides, equivocal specification of modifications and exact positions of radiolabels are also problematic However, target-mapped citation-supported quantitative binding parameters are curated where possible. For those peptides falling below the PubChem CID SMILES limit of approximately 70 residues, GtoPdb has been using Sugar and Splice from NextMove Software to convert into CIDs. Specific problems associated with finding bioactive peptides in databases will be outlined.
Background of the project and simple use cases of using the Open PHACTS API and KNIME to extract compound, target and indication entities from millions of patent documents and infer meaningful links among them. Open PHACTS Linked Data meeting in Vienna.
I presented two fascinating stories where Molecular Dynamics simulations contributed to enhancing our understanding of immunodeficiencies. In one of the projects, the treatment of patients could be improved. These slides were presented at the Basler Modeller Stammtisch, 26.02.2021
Overview of the SureChEMBL system and web interface.
https://www.surechembl.org/search/
SureChEMBL is a freely available web resource for chemistry patent searching. It is based on a fully automatic and dynamic text and image mining pipeline.
Prota cs and targeted protein degradationDoriaFang
PROTACs (proteolysis targeting chimera) induced targeted protein degradation has emerged as a novel therapeutic strategy in drug development and attracted the favor of academic institutions, large pharmaceutical enterprises, and biotechnology companies. PROTACs opened a new chapter for novel drug development.
CINF 55: SureChEMBL: An open patent chemistry resourceGeorge Papadatos
SureChEMBL (https://www.surechembl.org) is a new resource provided by the European Bioinformatics Institute (EMBL-EBI) that annotates, extracts and indexes chemistry from full text patent documents by means of continuous, automated text and image mining. SureChEMBL is perhaps the only open, freely available, live patent chemistry resource available, in a field that has been traditionally commercial.
Since its launch last September, the SureChEMBL interface provides sophisticated keyword and chemistry-based querying and exporting functionality against a corpus of more than 16 million compounds extracted from 13 million patent documents. Both the interface and the underlying data pipeline leverage a number of technologies for name to structure conversion, as well as compound standardisation, registration and searching.
In addition to providing an overview of the system, recent developments and improvements will be described. These include the introduction of various data interexchange and exporting options, such as flat files and a data feed client. Furthermore, our future plans for the SureChEMBL system will be outlined. To date, such plans include complementing the chemical annotations with biological ones, covering genes, proteins, diseases and indications. Furthermore, we are planning to further enrich the chemical annotations with a relevance score, indicating their importance in the patent document.
ChEMBL and KNIME provide an ideal match of open data with open tools. This is a quick overview of how to access ChEMBL data resources and web services (ChEMBL, UniChem, Beaker, myChEMBL, SureChEMBL) via the KNIME platform.
With the unprecedented growth of chemical databases incorporating up to several hundred billions of synthetically feasible chemicals, modelers are not in shortage of chemicals to process. Importantly, such "Big Chemical Data" offers humongous opportunities for discovering novel bioactive molecules. However, the current generation of cheminformatics software tools is not capable of handling, characterizing, and processing such extremely large chemical libraries. In this presentation, we will discuss the rationale and the main challenges (theoretical and technical) for screening very large repositories of compounds in the current context of drug discovery. We will present several proof-of-concept studies regarding the screening of extremely large libraries (1+ billion compounds) using our novel GPU-accelerated cheminformatics platform to identify molecules with defined bioactivity. Overall, we will show that GPU computing represents an effective and inexpensive architecture to develop, employ, and validate a new generation of cheminformatics methods and tools ready to process billions of compounds.
GtoPdb: A resource for cell-based perturbogensChris Southan
Poster for ELRIG, Möndal, 11/12 May 2017.
This poster will also be presented at BioITWorld, Boston, May 23-25
A resource for the selection and interpretation of cell-based perturbogens: the IUPHAR/BPS Guide to PHARMACOLOGY
Christopher Southan, Elena Faccenda, Joanna L. Sharman, Adam J. Pawson, Simon D. Harding, Jamie A Davies,
Translational research requires the integration of the in vitro molecular mechanisms of action (mmoa) of small molecules, cell-based screening studies, animal models and eventual clinical trials. The International Union of Pharmacology (IUPHAR)/British Pharmacology Society (BPS) database, GtoPdb http://www.guidetopharmacology.org/ provides expert-annotated molecular interactions between endogenous receptor ligands, probes, lead compounds, clinical drugs and their protein targets. It thus provides a core set of quantitative pharmacological relationships that can be interrogated for many purposes, including those running cell-based screens, not only during result interpretation but also to identify key compounds for scoping and consolidation experiments. As described in [1] GtoPdb is populated by records extracted from pharmacology and medicinal chemistry journals, and released quarterly. Quality is ensured by curatorial stringency and our unique model of content selection based on recommendations from IUPHAR target class subcommittees of international experts collaborating with the in-house curators. The database now has over 14 000 binding values (mainly IC50, Ki or Kd) between 8000 ligands and 15000 human proteins (mainly primary but also secondary off-target interactions) representing a 7% druggable proteome. Our coverage is complementary to other sources. For example the 6565 structures we recently submitted to PubChem as CIDs, 5206 were not in DrugBank and 1535 not in ChEMBL. This includes recommended tool compounds with relatively defined mmoa (including 110 from the Structural Genomics Consortium Probe Portal). We also have 75% overlap with vendors for procurement and 80% with patent extractions that in many cases allow mapping to SAR data sets from first-filings (some of which we point to). In a cell screening context 1254 of our targets intersect with proteins in the Reactome pathway database. This is one way to select chemical peturbation points that could be detected by assay readouts. From Nov 2015 we have been funded by the Wellcome Trust to extend into immunopharmacology (within the existing database schema) that is now driving overall GtoPdb content expansion. Parties engaged in cell based assays using or could use compounds we have are encouraged to use GtoPdb, contact us for queries, possible analogue expansions and/or alert us to prospective new content. [1] Southan C et. al. (2016) Nucleic Acids Res. 44(D1):D1054-68, PMID: 26464438
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
Christopher Southan (The IUPHAR/BPS Guide to PHARMACOLOGY, UK)
While the raison d'être of patents is Intellectual Property (IP) there is a growing awareness of the scientific value of their data content. This is particularly so in medicinal chemistry and associated bioactivity domains where disclosed compounds and associated data not only exceeds that published in papers by several-fold and surfaces years earlier, but is also, paradoxically; completely open (i.e. no paywalls). Scientists have traditionally extracted their own relationships or used commercial sources but the last few years have seen a “big bang” in patent extractions submitted to open databases, including nearly 20 million structures now in PubChem.
This tutorial will:
Outline the statistics of patent chemistry in various open sources
Introduce a spectrum of open resources and tools
Enable an understanding of target identification, bioactivity and SAR extraction from patents and connecting these relationships to papers
Cover aspects of medicinal chemistry patent mining
Include hands on exercises using open source antimalarial research as examples
The focus will be on public databases and patent office portals, since these can be transparently demonstrated. However, the essential complementarity with commercial resources will be touched on. Those engaged in Competitive Intelligence will also find the material relevant.
Prota cs and targeted protein degradationDoriaFang
PROTACs (proteolysis targeting chimera) induced targeted protein degradation has emerged as a novel therapeutic strategy in drug development and attracted the favor of academic institutions, large pharmaceutical enterprises, and biotechnology companies. PROTACs opened a new chapter for novel drug development.
CINF 55: SureChEMBL: An open patent chemistry resourceGeorge Papadatos
SureChEMBL (https://www.surechembl.org) is a new resource provided by the European Bioinformatics Institute (EMBL-EBI) that annotates, extracts and indexes chemistry from full text patent documents by means of continuous, automated text and image mining. SureChEMBL is perhaps the only open, freely available, live patent chemistry resource available, in a field that has been traditionally commercial.
Since its launch last September, the SureChEMBL interface provides sophisticated keyword and chemistry-based querying and exporting functionality against a corpus of more than 16 million compounds extracted from 13 million patent documents. Both the interface and the underlying data pipeline leverage a number of technologies for name to structure conversion, as well as compound standardisation, registration and searching.
In addition to providing an overview of the system, recent developments and improvements will be described. These include the introduction of various data interexchange and exporting options, such as flat files and a data feed client. Furthermore, our future plans for the SureChEMBL system will be outlined. To date, such plans include complementing the chemical annotations with biological ones, covering genes, proteins, diseases and indications. Furthermore, we are planning to further enrich the chemical annotations with a relevance score, indicating their importance in the patent document.
ChEMBL and KNIME provide an ideal match of open data with open tools. This is a quick overview of how to access ChEMBL data resources and web services (ChEMBL, UniChem, Beaker, myChEMBL, SureChEMBL) via the KNIME platform.
With the unprecedented growth of chemical databases incorporating up to several hundred billions of synthetically feasible chemicals, modelers are not in shortage of chemicals to process. Importantly, such "Big Chemical Data" offers humongous opportunities for discovering novel bioactive molecules. However, the current generation of cheminformatics software tools is not capable of handling, characterizing, and processing such extremely large chemical libraries. In this presentation, we will discuss the rationale and the main challenges (theoretical and technical) for screening very large repositories of compounds in the current context of drug discovery. We will present several proof-of-concept studies regarding the screening of extremely large libraries (1+ billion compounds) using our novel GPU-accelerated cheminformatics platform to identify molecules with defined bioactivity. Overall, we will show that GPU computing represents an effective and inexpensive architecture to develop, employ, and validate a new generation of cheminformatics methods and tools ready to process billions of compounds.
GtoPdb: A resource for cell-based perturbogensChris Southan
Poster for ELRIG, Möndal, 11/12 May 2017.
This poster will also be presented at BioITWorld, Boston, May 23-25
A resource for the selection and interpretation of cell-based perturbogens: the IUPHAR/BPS Guide to PHARMACOLOGY
Christopher Southan, Elena Faccenda, Joanna L. Sharman, Adam J. Pawson, Simon D. Harding, Jamie A Davies,
Translational research requires the integration of the in vitro molecular mechanisms of action (mmoa) of small molecules, cell-based screening studies, animal models and eventual clinical trials. The International Union of Pharmacology (IUPHAR)/British Pharmacology Society (BPS) database, GtoPdb http://www.guidetopharmacology.org/ provides expert-annotated molecular interactions between endogenous receptor ligands, probes, lead compounds, clinical drugs and their protein targets. It thus provides a core set of quantitative pharmacological relationships that can be interrogated for many purposes, including those running cell-based screens, not only during result interpretation but also to identify key compounds for scoping and consolidation experiments. As described in [1] GtoPdb is populated by records extracted from pharmacology and medicinal chemistry journals, and released quarterly. Quality is ensured by curatorial stringency and our unique model of content selection based on recommendations from IUPHAR target class subcommittees of international experts collaborating with the in-house curators. The database now has over 14 000 binding values (mainly IC50, Ki or Kd) between 8000 ligands and 15000 human proteins (mainly primary but also secondary off-target interactions) representing a 7% druggable proteome. Our coverage is complementary to other sources. For example the 6565 structures we recently submitted to PubChem as CIDs, 5206 were not in DrugBank and 1535 not in ChEMBL. This includes recommended tool compounds with relatively defined mmoa (including 110 from the Structural Genomics Consortium Probe Portal). We also have 75% overlap with vendors for procurement and 80% with patent extractions that in many cases allow mapping to SAR data sets from first-filings (some of which we point to). In a cell screening context 1254 of our targets intersect with proteins in the Reactome pathway database. This is one way to select chemical peturbation points that could be detected by assay readouts. From Nov 2015 we have been funded by the Wellcome Trust to extend into immunopharmacology (within the existing database schema) that is now driving overall GtoPdb content expansion. Parties engaged in cell based assays using or could use compounds we have are encouraged to use GtoPdb, contact us for queries, possible analogue expansions and/or alert us to prospective new content. [1] Southan C et. al. (2016) Nucleic Acids Res. 44(D1):D1054-68, PMID: 26464438
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
Christopher Southan (The IUPHAR/BPS Guide to PHARMACOLOGY, UK)
While the raison d'être of patents is Intellectual Property (IP) there is a growing awareness of the scientific value of their data content. This is particularly so in medicinal chemistry and associated bioactivity domains where disclosed compounds and associated data not only exceeds that published in papers by several-fold and surfaces years earlier, but is also, paradoxically; completely open (i.e. no paywalls). Scientists have traditionally extracted their own relationships or used commercial sources but the last few years have seen a “big bang” in patent extractions submitted to open databases, including nearly 20 million structures now in PubChem.
This tutorial will:
Outline the statistics of patent chemistry in various open sources
Introduce a spectrum of open resources and tools
Enable an understanding of target identification, bioactivity and SAR extraction from patents and connecting these relationships to papers
Cover aspects of medicinal chemistry patent mining
Include hands on exercises using open source antimalarial research as examples
The focus will be on public databases and patent office portals, since these can be transparently demonstrated. However, the essential complementarity with commercial resources will be touched on. Those engaged in Competitive Intelligence will also find the material relevant.
Learn how large-scale normalized data empowers the critical early phases of drug discovery.
To address the core concerns about data quality, comprehensiveness and comparability, the Reaxys product team has developed a completely new repository for bioactivity information. Reaxys Medicinal Chemistry stands as a unique source for normalized data in vitro efficacy, in vivo animal models, compound metabolism, pharmacokinetics and toxicity. This presentation takes a look at how this approach to data supports critical early discovery methods such as in silico screening and target profiling.
Vicissitudes of target validation for BACE1 and BACE2 Chris Southan
Introduction/Background & Aims
The beta-amyloid (APP) cleaving enzyme (BACE1) was implicated as a drug target for Alzheimer's Disease (AD) back in 1999. In 2011, the paralogue, BACE2, became a new proposed target for type II diabetes (T2DM) having been reported to be the TMEM27 secretase regulating pancreatic beta-cell function [1]. By 2019 the accumulated evidence, including a swathe of failed clinical trials for BACE1 inhibitors, has produced a de facto de-validation of both targets in both diseases. As a learning exercise, the series of events leading up to this is reviewed here.
Method/Summary of work
Basic information about these two targets and the lead compounds against them were sourced via the IUPHAR/BPS Guide to Pharmacology (GtoPdb) as Target ids: 2330 and 2331, for BACE1 and 2, respectively. This was consolidated by a literature and patent review as well as following them in other databases. The most recent information on clinical trials was sourced from press releases.
Results/Discussion
GtoPdb annotates 24 lead compounds against BACE1 and 12 against BACE2. The corresponding counts mapped to these targets in ChEMBL are 8741 and 1377 making BACE1 one of the most actively pursued enzyme targets ever. Notwithstanding the massive global effort during 2018 Merck’s verubecestat and J&J’s atabecestat BACE1 inhibitors not only failed their Phase III endpoints but even appeared to worsen cognition in prodromal patients. In 2019 Amgen/Novartis stopped Phase II/III trials of umibecestat that also showed more cognitive decline in the treatment group compared to controls. BACE2 presented an anomalous situation in several ways. By 2016 both Novartis and Amgen declared their inability to reproduce the TMEM27 secretase turnover reported in 2011. Notwithstanding, Novartis and other companies have published patents on BACE2-specific inhibitors over several years and paradoxically verubecestat is more potent against BACE2 rather than 1 but was never tested for glucose-lowering. Equally puzzling is that one academic group is still publishing BACE2 inhibitors for T2D even post de-validation. One thing both targets have in common is the complete absence of genetic support from genome-wide disease association studies but this warning sign went unheeded.
Conclusions
The massive waste of resources on the pursuit of BACE1 as an AD target over the last two decades is catastrophic. This tale of de-validation is compounded for this paralogous pair of enzymes by the fact that the original evidence for BACE2 as a T2D target was eventually refuted. The story of these targets highlights a range of crucial pharmacological pitfalls that must be avoided in the future.
Reference(s)
[1] Southan C, Hancock J.M. (2013) A tale of two drug targets: the evolutionary history of BACE1 and BACE2. Front Genet. 4:293.
In silico 360 Analysis for Drug DevelopmentChris Southan
Introduction:
Consequent to a memorandum of understanding between the Karolinska Institutet and the International Union of Basic and Clinical Pharmacology (IUPHAR) in 2018 a report on academic drug development, including guidelines (ADEV) has been drafted [1]. As part of this exercise, we conceived a triage for comprehensive informatics profiling around the compound, target, disease axis. We have termed this “in slico 360” (INS360) the aim of which was to support ADEV teams since they may lack either internal expertise or external support to do this on their own. Indeed, some past SciLifeLab Drug Discovery and Development Platform projects had been halted because of overlooked competitive impingements or insufficient target validation evidence.
Methods
We assessed the current database landscape, mostly public but including commercial, for potential utility for INS360. We were guided primarily by content coverage, usability, and reputation. We also explored some open property prediction resources for assay interference and toxicological inferences.
Results:
As a first-stop-shop, we selected the IUPHAR/BPS Guide to PHARMACOLOGY with ~900 ligand-target relationships captured via expert curation of journal papers Moving up in scale we evaluated ChEMBL at 1.8 million compounds with 1.1 million assay descriptions and 7,000 targets. With yet another jump we could search the patent corpus with 18 million extracted compounds in SureChEMBL. We explored PubChem that integrates these three with over 500 other sources linked to 96 million compounds, BioAssay results and connectivity into the NCBI Entrez system. The final jump in scale for document-to-chemistry navigation was represented by SciFinder with 155 million structures. On the target side, 360-exploration has the need to encompass literature, structure, genetic variation, splicing, interactions, and disease pathways. From their UniProt links, both GtoPdb and ChEMBL provide these entry points. Navigating genetic association data in support of target validation was enabled by the OpenTargets portal and the GWAS Catalog. We also fount servers that could produce prediction scores from chemical structures for a range of features important for de-risking development.
Conclusion:
This work scoped out initial resource choices for the INS360. We propose that not only ADEV operations but essentially any pharmacology research team has much to gain from this approach and many potential pitfalls can consequently be avoided when approaching key checkpoints, such as preparing a publication. However, support may be needed for both institutions and teams to get the best out of these complex and feature-rich databases.
[1] Southan C, (2019) Towards Academic Drug Development Guidelines, ChemRxiv pre-print no. 8869574
Will the correct BACE ORFs please stand up?Chris Southan
BACE1 and BACE2 are protease targets for Alzheimer's and diabetes, respectively but their validation is now questioned
Phylogenetic analysis can added functional insights
This came up against two key problems
A surprising prevalence of incorrect protein sequences predicted from genomes
Many BACE1 and BACE2 orthologues had truncation and/or indel errors.
Key phylogenetic representative genomes are languishing in an unfinished state
Some options for amelioration of these problems will be described
An update on the evolution of these enzymes will be shown
Look for new and potentially useful human 5HT2A-directed small molecule chemistry surfaced since the last meeting., check for compounds against as 5HT2A primary target but also combined inhibitors, poll round the key databases, literature and patents, earching challenges arise from synonym soup, complex cross-reactivities (see PMID 29679900) in vitro data gaps and in vivo polypharmacology
Quality and noise in big chemistry databasesChris Southan
Presented at Aug 2019 ACS by Antony Williams. Abstract: The internet has changed the way we access chemistry data as well as providing access to data that can quickly proliferate and becomes referenceable. Web access to chemical structures and their integration with biological data has become massively enabling with numbers for UniChem, PubChem and ChemSpider reaching 157, 97 and 71 million respectively (at the time of writing). A range of specialist databases small enough to be curated have stand-alone utility and synergies when integrated into the larger collections. These include DrugBank, BindingDB, ChEBI, and many others. Databases of any size have inherent quality challenges but at large scale various forms of “noise” accumulate to problematic levels. The unfortunate consequence is that “bigger gets worse”. This is particularly associated with large uncurated submissions from vendors and automated document extractions (even though these are high-value). Virtual enumerations and circularity between overlapping sources add to the problem. As a result of some of the noise in the larger databases the value becomes highly dependent on the specific applications. An example includes using the databases to support non-targeted analysis. This presentation covers examples of these noise and quality issues and suggests at least some options to ameliorate the problem
Progress in drug discovery and chemical biology is hugely enabled by curated document-assay-result-compound-target relationships (D-A-R-C-P) in open databases from resources such as the Guide to Pharmacology and ChEMBL. These are synergistically integrated into PubChem which pre-computes chemical similarity and connectivity between over 95 million structures and 5.6 million BioAssay results. It also links chemistry to documents via various additional routes including MeSH and large scale submissions from publishers. However, these efforts are patchy and very few journals facilitate such connectivity. There thus remains a massive shortfall in public D-A-R-C-P capture from decades of papers and patents. This presentation will cover these aspects and discuss their partial amelioration by options such as author-driven depositions and open lab-book approaches as used by Open Source Malaria
Looking at chemistry - protein - papers connectivity in ELIXIRChris Southan
This is a poster for the UK ELXIR meetin in Birmingham UK, Nov 2018. It is the summary of a blog-post https://cdsouthan.blogspot.com/2018/08/an-initial-look-at-elixir-chemistry.html that asses chemistry <> protein <> papers connectivity (C-P-P) for five ELIXIR resources
Poster for World Congres of Pharmacology 2018, Kyoto
Introduction: The pharmacological literature and patents connect compound structures to their bioactivity. However, entombing these relationships for millions of compounds among millions of PDFs is acknowledged as massively problematic. The situation is ameliorated by resources that extract the entity and data relationships the authors and inventors put “in” to their PDFs back “out” into structured database records. The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) has been doing this by stringent curation of ligands and their quantitative activity against protein targets [1]. Our citations are submitted to PubChem (PC), who then link to PubMed (PM) [2]. This study presents an overview of this connectivity.
Methods: For GtoPdb entries in PC Substance we used the PC interface to count our submitted PM links. This gives the PC > PM mapping counts from which we analysed the PM links. We then performed reciprocal analyses (i.e. PM > PC) by selecting PM sets. We then compared two journals by counting structure links by year and source.
Results: From 8988 GtoPdb-submitted ligand substances in PC (release 2017.5), 7309 are linked to 8980 PM entries. Of the 7309 there are 5632 links to chemical structures in PC the rest being antibodies and larger peptides. From the 8980 PMIDs, the Journal of Medicinal Chemistry (JMC) accounted for 1003 as our most frequently cited primary source of structure-to-activity mappings. For the British Journal of Pharmacology (BJP) most of the 345 cross-references were development compounds. Further analysis showed that from 2014 to 2017 the BJP to PC links of ~ 30 structures per year are mostly from GtoPdb and the Comparative Toxicology Database. However, going back to 2010-12, this increased to 500-800 connections, mainly derived from the IBM automated chemical extraction from abstracts. A similar pattern was observed for JMC.
Conclusion: Navigation between documents and databases is an essential competence for pharmacologists and drug discovery but the NCBI Entrez system is daunting. GtoPdb is a major contributor of high-quality links and provides a first-stop to guide users into the PC/PM systems. However, our results indicated potentially serious specificity issues with automated chemistry-to-journal linking from non-GtoPdb sources.
References: [1] Harding et al. (2018). Nucl. Acids Res. 45 (Database Issue), doi: 10.1093/nar/gkx1121.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
1. Trials and tribulations of curating peptide and
antibody ligands for the IUPHAR/BPS Guide to
Pharmacology
Christopher Southan, Joanna L. Sharman, Adam J. Pawson, Simon D.
Harding, Elena Faccenda and Jamie A. Davies, IUPHAR/BPS Guide to Pharmacology,
Discovery Brain Sciences, University of Edinburgh, UK.
ACS Boston 2018, Biologics & Registration Session, Mon Aug 20,
15:50 - 16:15, Harbor Ballroom II
1
https://www.slideshare.net/cdsouthan
2. Abstract (will not be shown)
As an expert-curated database of approved, clinical or research pharmacological targets mapped to
defined ligands, the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) and its precursor IUPHAR-DB,
have been extracting and annotating bioactive peptides from papers for well over a decade. The current
total has reached 2089 peptides, split between exogenous and endogenous, within the 9144 ligand
entries submitted to PubChem in our 2018.2 database release. More recently, as approved drugs or
clinical candidates we have curated 235 antibodies and a small number of therapeutic nucleotides.
Indexing these entity types in GtoPdb present challenges similar to those being encountered for the
registration of biologicals as explicitly defined structures. In addition, we target-map the citation-
supported quantitative binding parameters where possible.This presentation will outline these
curatorial challenges and our efforts to at least partially ameliorate the problems. For peptides below
the PubChem CID SMILES limit of approximately 70 residues we have been using Sugar and Splice from
NextMove Software to convert more of our peptide SIDs to join the 6969 CIDs we already have.
However, we are often confounded by the equivocal structural specifications of authors w.r.t. post
translational modifications and exact positions of radiolabel incorporations. However, we do capture at
least a primary sequence string as an interim compromise that users can hit by BLAST. For reported
receptor-binding endogenous peptides we find some that do not match the Swiss-Prot features for the
precursor protein. PubChem has been encouraging and supporting us in converting more activity-
mapped peptides to CIDs and InChIKeys which should enhance inter-source connectivity. Otherwise,
biological SID data can only be joined by equivocal name matching. Antibodies and other large-
biological SIDs may also currently remain structurally orphaned and present their own challenges.
Notwithstanding, GtoPdb has successfully curated at least primary sequences for the molecular
specification of clinical Mabs. For this we use the IMGT/mAb-DB for approved monoclonals as a first
stop shop since they extract sequences from INN documents. For these and clinical candidates with
code names we also use the patent sequence databases to source a UniParc accession number and can
sometimes get binding data that has not appeared in papers. 2
3. Outline
• Intoducing GtoPdb
• GtoPdb peptide content and stats
• Peptide tribulations
• PubChem peptidic pros and cons
• Getting more peptides > SMILES
• GtoPdb antibody content
• Antipbody tribulations
• Stats and examples
• Exploiting PubChem SID tagging
• Wher we go from here
• Further information
3
4. Introducing the IUPHAR/BPS Guide to
PHARMACOLOGY (GtoPdb)
• IUPHAR = International Union of Basic and Clinical Pharmacology, BPS = British
Pharmacological Society
• Formerly know as IUPHAR-DB for receptors and channels since 2003
• Since 2012 funded byWellcomeTrust to cover all targets in the human genome
• Since 2015 WellcomeTrust “fork” as Guide to IMMUNOPHARMACOLOGY
• Molecular mechanism of action (mmoa) mapping primary & secondary targets
• Release cycle time (with PubChem refreshes) ~ 2 months
• Six well-cited NAR Annual Database issues, latest as PMID 29149325 (2018)
• Distilled into the 2-yearly BritishJournal of Pharmacology “Concise Guide to
PHARMACOLOGY” as a nine-paper series (see PMID 29055037) with outlinks
• Presents users with selected quality compounds for pharmacology research in
silico, in vitro, in cellulo, in vivo, in clinico
• An ELIXIR UK Node resource since 2016 http://www.guidetopharmacology.org
4
5. 5
Expert-curated, citation provenanced,
quantitative binding data
Document > assay > result > compound > location > protein target
D- A- R - C- L- P
Where “C” is not a small molecule, we have ~ 2000 peptides and ~ 250
antibodies included in the ~ 9000 substances we submit to PubChem
10. Tribulations with peptides
• Author specifications may be insuficient for complete molecular definition
• Consequent structural equivocalties slip through the editor/referee net
• Correct IUPAC peptide nomenclature is rare (ad-hoc more common)
• Exact location of radiolables often not specified
• Absence of purity verification and/or in vivo stability
• Need to surface user-intuative renderings (but HELM rules OK)
• Poor resolution of peptide name-to-structure (n2s)
• SMILES only copes for ~ 70 residues
• Searching patents for corroborative peptide prior-art is much more difficult than
small-molecules
• Literature extraction or author database submissions for bioactive peptides
proportionally lower than small molecules
• Species ”zoo” for venom peptides and their names
• Conjugates (peptides + linkers + proteins ect) even more difficult
• The PIR RESID Database of Protein Modifications is no longer maintained
10
11. The classic peptidic triple-whammy
11
Endothelin-1, CID 91928636, 1470 ”Similar Compounds” and top-100 BLAST hits
• Too big to search or cluster by SMILES
• Too small to BLAST cleanly (and sans PTMs)
• Too many species splits for precursors
19. Tribulations with antibody curation
• Getting at least a primary Mab sequence as a molecuar definition
• Not alll clinical Mab sequences > patents > INN > IMGT-DB
• May get persistant UniParc ID sequence (on a good day)
• Papers often omit in vitro binding data
• Challenging to track press releases back to primary data
• Papers usually dont usually cite the patents
• But we sometimes get binding data from patents
• The biosimilars are piling in
• No open specification of glycan chains linked to primary sequences
• Some journals publish Mab characterisation with blinded code names
• Considering reseach reagents with vendor IDs if well provenanced
19
23. GtoP plans
• Continue peptide back-fill of peptides > CIDs using S&S
• Resolve our sequences against Swiss-Prot x-refs, ChEMBL and GPCRdb
• Continue adding antibody biosimilar cross-pointers
• Consider adding ”peptide” as a new SID tag
• For IUPHAR Guide to Immunopharmacology
– Sub-comitee feedback on peptides, antibodies, targets and indications
– Continue curation of peptides relevant to immunity and inflamation
• Anticipate curation of new ”binder” therapeutics including minibodies,
polyvalents and hybrids
• Keep watching brief on large-molecule InChIKeys
• Belt-and-braces of linking SMILEs with compromise (i.e. sans modifications)
FASTA approximations for BLAST indexing and clustering of peptide ligands
• Introduce local HELM rendering
• Revise legacy data model (e.g. introduce a protein ligand classification)
23
24. Acknowledgments, info, COI
24https://sites.google.com/view/tw2informatics/home
Conflict of interest (minor) has consulted in the peptide area
Thanks to the NextMove team
for S&S support
Lin Yikai, for her M.Sc. project;
”Developing
bio/cheminformatics methods
for converting bioactive peptide
structures into machine-
readable formats”
Anna Gaulton for ChEMBL FASTA
sequences
Paul Thiessen for PubChem for
FASTA sequences