SlideShare a Scribd company logo
Chemistry counting across databases
Chemistry totals counting in UniChem
1 Centre for Discovery Brian Sciences, University of Edinburgh, Edinburgh, UK. 2 (currently) TW2Informatics Ltd, Göteborg, Sweden, cdsouthan@gmail.com
Assessing chemistry <> proteins <> papers
connectivity between ELIXIR resources
Introduction
C
As we know, the utility of ELIXIR is largely determined by connectivity and
interoperability. This can be expressed in different ways including the ability to
computationally query across the same entities between resources and the simple
provision of cross-pointers as live URLs for users to manually navigate between entity
records from different databases.
So how is ELXIR doing in this respect? This has been addressed in a blog post
https://cdsouthan.blogspot.com/2018/08/an-initial-look-at-elixir-chemistry.html that
asses chemistry <> protein <> papers connectivity (C-P-P). The should be consulted
for details since only an outline can be presented in this poster. The starting point was
our own UK ELIXIR resource of the IUPHAR/BPS Guide to PHARMACOLOGY
(GtoPdb) that includes C-P-P capture (see poster by Harding et al. and
http://www.guidetopharmacology.org/). We offer users outlinks and intersects of our
proteins via UniProt cross-references and updated our chemistry in PubChem and
UniChem. However, entity overlaps with other ELIXIR resources offer crucial
complementarity for users. Those compared for curated C-P-P are GtoPdb, ChEMBL,
ChEBI, PDBe, and most recently BRENDA, (excepting ChEBI that auto-maps C-P)
From the pre-computed chemistry intersects that UniChem generates at each release
one can plot informative comparative overlaps. The blog-post shows all five of these
but the example for GtoPdb is shown above. The pattern of overlaps has been
described in our NAR paper (PMID: 29149325). Note this is highest for PubChem
because we are a submitting source but there are minor chemistry rule differences.
Protein intersects
The easiest way to intersect proteins is via the UniProt cross-references, although
these are not available for ChEBI. The Venn diagram above shows selections of
Human Swiss-Prot x-refs for the other four sources. Some of the divergence is
explicable (e.g. the three sources do not curate PDB proteins that have no reported
chemical interactions). Note also the mappings are not all for small-molecules (e.g.
the ChEMBL and GtoPdb x-refs include antibody and large peptide interactions).
Unique or 2-way overlaps can be cross-curation opportunities to increase coverage.
Christopher Southan1,2
Publication intersects in European PubMed Central (EPMC)
For curated C-P-P resources it is useful to compare which papers have been selected
for chemistry extraction (even though its more difficult to discern “why”). In EPMC the
Data Links and Data Citations queries (HAS_CHEMBL:y) and (HAS_PDB:y) worked
cleanly. However, there was some ambiguity for (HAS_CHEBI:y). It turns out,
unfortunately, these are papers where there is a term match to ChEBI entries but not
papers that they curated to extract their chemical entries from. Neither GotPdb nor
BRENDA are current data links (GtoPdb intend to address this but in the interim lists of
papers they have curated chemistry from can be obtained via PubMed > PubMed). The
curation selectivity underlying the capture divergence is worthy of further investigation.
Chemistry intersects in PubChem
PubChem offers powerful “slice ‘n dice” options to compare 600+ sources. Of our five,
BRENDA and PDBe are not submitters but we can use the NCBI Structure (ligands
extracted from PDB) to substitute for the latter (n.b. 4-way Venn intersects are difficult
from the interface so only a 3-way is shown). Reasons for the wide divergence of
ELIXIR chemistry seen above can be partially but not entirely explained (see blog-post).
Conclusions
• This intra-ELIXIR comparative analysis was more difficult that in should have been
• One reason is that these databases have independently diverged over decades into
their utility niches with little (pre-ELIXIR) consideration of interoperability
• The exercise turned out to be peculiarly “gapped” in that it was not possible to do
standardized C-P-P x-mappings between all five, there was always at least one odd-
man-out
• Some of this could be easily addressed, for example that C-P for GtoPdb, ChEBI and
BRENDA get PMIDs indexed in EPMC for the papers they curated/extracted
• Another enhancement would be to harmonise chemistry submissions to both
UniChem and PubChem (e.g. for PDBe ligands and BRENDA compounds)
• The 37% unique chemistry in BRENDA may represent valuable capture but this
needs to be checked
• More technical dialogue between ELIXIR resources with entities-in-common would
be valuable (e.g. to cogitate on causes of divergent capture, pragmatic
interoperability assessments, collaborative curation and future RDF cross-testing)
• The C-P-P is extendable (e.g. for the new ELIXIR 3D-BioInfo imitative
• While ELIXIR Training is progressing and resources have good Help and FAQ these
results indicate an unmet need for “comparative exploitation guides” even for just C-
P-P. For example users need to know not only “what's in one but not t’other and
why?” but also “which permutations of these five, and/or others, should I use for
what?” (for chemistry see PMID: 29451740)
The EBI UniChem database provides
chemical structure cross-indexing between
39 sources that include the five compared
here. For comparison PubChem,
SureChEMBL (patents) and Human
Metabolites (HMDB) are shown on the right.
Counts refer to InChIKeys. The % unique
are for that source from the 128 million in the
11 Nov release that includes PubChem
(some are slightly different from the August
blog-post). This unique content is significant
for BRENDA, HMDB and PDBe.

More Related Content

What's hot

Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)
BIOVIA
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases
Dinesh Barupal
 
Chemistry Reserach as a Social Machine
 Chemistry Reserach as a Social Machine Chemistry Reserach as a Social Machine
Chemistry Reserach as a Social Machine
Jeremy Frey
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
Slicing and dicing curated protein targets: Analysing the drugged, druggable ...
Slicing and dicing curated protein targets: Analysing the drugged, druggable ...Slicing and dicing curated protein targets: Analysing the drugged, druggable ...
Slicing and dicing curated protein targets: Analysing the drugged, druggable ...
Guide to PHARMACOLOGY
 
GtoPdb_ITMAT_2017
GtoPdb_ITMAT_2017GtoPdb_ITMAT_2017
GtoPdb_ITMAT_2017
Guide to PHARMACOLOGY
 
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural Alignment
Saramita De Chakravarti
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
Samvartika Majumdar
 
Automated identification and conversion of chemical names to structure search...
Automated identification and conversion of chemical names to structure search...Automated identification and conversion of chemical names to structure search...
Automated identification and conversion of chemical names to structure search...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Molecular docking by harendra ...power point presentation
Molecular docking by harendra ...power point presentationMolecular docking by harendra ...power point presentation
Molecular docking by harendra ...power point presentation
Harendra Bisht
 
Automated Identification and Conversion of Chemical Names to Structure Search...
Automated Identification and Conversion of Chemical Names to Structure Search...Automated Identification and Conversion of Chemical Names to Structure Search...
Automated Identification and Conversion of Chemical Names to Structure Search...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
Lee Larcombe
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
Chris Southan
 
Vls
VlsVls
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updatesThe IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
Guide to PHARMACOLOGY
 
Cambridge structural database
Cambridge structural databaseCambridge structural database
Cambridge structural database
subhananthini jeyamurugan
 
In Silico methods for ADMET prediction of new molecules
 In Silico methods for ADMET prediction of new molecules In Silico methods for ADMET prediction of new molecules
In Silico methods for ADMET prediction of new molecules
MadhuraDatar
 
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
santosh Kumbhar
 
Chemical database preparation ppt
Chemical database preparation pptChemical database preparation ppt
Chemical database preparation ppt
samantlalit
 

What's hot (20)

Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases
 
Chemistry Reserach as a Social Machine
 Chemistry Reserach as a Social Machine Chemistry Reserach as a Social Machine
Chemistry Reserach as a Social Machine
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
Slicing and dicing curated protein targets: Analysing the drugged, druggable ...
Slicing and dicing curated protein targets: Analysing the drugged, druggable ...Slicing and dicing curated protein targets: Analysing the drugged, druggable ...
Slicing and dicing curated protein targets: Analysing the drugged, druggable ...
 
GtoPdb_ITMAT_2017
GtoPdb_ITMAT_2017GtoPdb_ITMAT_2017
GtoPdb_ITMAT_2017
 
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural Alignment
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Automated identification and conversion of chemical names to structure search...
Automated identification and conversion of chemical names to structure search...Automated identification and conversion of chemical names to structure search...
Automated identification and conversion of chemical names to structure search...
 
Molecular docking by harendra ...power point presentation
Molecular docking by harendra ...power point presentationMolecular docking by harendra ...power point presentation
Molecular docking by harendra ...power point presentation
 
Automated Identification and Conversion of Chemical Names to Structure Search...
Automated Identification and Conversion of Chemical Names to Structure Search...Automated Identification and Conversion of Chemical Names to Structure Search...
Automated Identification and Conversion of Chemical Names to Structure Search...
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
 
Vls
VlsVls
Vls
 
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updatesThe IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
 
Cambridge structural database
Cambridge structural databaseCambridge structural database
Cambridge structural database
 
In Silico methods for ADMET prediction of new molecules
 In Silico methods for ADMET prediction of new molecules In Silico methods for ADMET prediction of new molecules
In Silico methods for ADMET prediction of new molecules
 
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
 
Chemical database preparation ppt
Chemical database preparation pptChemical database preparation ppt
Chemical database preparation ppt
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 

Similar to Looking at chemistry - protein - papers connectivity in ELIXIR

Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
Chris Southan
 
Physicochemical Profiling In Drug Research
Physicochemical Profiling In Drug ResearchPhysicochemical Profiling In Drug Research
Physicochemical Profiling In Drug Research
Brian Bissett
 
Assessing GtoPdb ligand content in PubChem
Assessing GtoPdb ligand content in PubChemAssessing GtoPdb ligand content in PubChem
Assessing GtoPdb ligand content in PubChem
Chris Southan
 
The big data join in pharmacology
The big data join in pharmacologyThe big data join in pharmacology
The big data join in pharmacology
Chris Southan
 
Design of compound libraries for fragment screening (Feb 2012 version)
Design of compound libraries for fragment screening (Feb 2012 version)Design of compound libraries for fragment screening (Feb 2012 version)
Design of compound libraries for fragment screening (Feb 2012 version)
Peter Kenny
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
Chris Southan
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
Chris Southan
 
Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...
Jarle Pahr
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
Yasset Perez-Riverol
 
Types of biological databases-protein database
Types of biological databases-protein databaseTypes of biological databases-protein database
Types of biological databases-protein database
chinmayeec
 
Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...
Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...
Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...
Chris Southan
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
Abner Huang
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
Greg Landrum
 
Validation of Clomipramine interactions identified by BioBind against experim...
Validation of Clomipramine interactions identified by BioBind against experim...Validation of Clomipramine interactions identified by BioBind against experim...
Validation of Clomipramine interactions identified by BioBind against experim...
Marie-Julie Denelle
 
Flux balance analysis
Flux balance analysisFlux balance analysis
Flux balance analysis
JyotiBishlay
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...Neil Swainston
 

Similar to Looking at chemistry - protein - papers connectivity in ELIXIR (20)

Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Physicochemical Profiling In Drug Research
Physicochemical Profiling In Drug ResearchPhysicochemical Profiling In Drug Research
Physicochemical Profiling In Drug Research
 
Assessing GtoPdb ligand content in PubChem
Assessing GtoPdb ligand content in PubChemAssessing GtoPdb ligand content in PubChem
Assessing GtoPdb ligand content in PubChem
 
The big data join in pharmacology
The big data join in pharmacologyThe big data join in pharmacology
The big data join in pharmacology
 
Design of compound libraries for fragment screening (Feb 2012 version)
Design of compound libraries for fragment screening (Feb 2012 version)Design of compound libraries for fragment screening (Feb 2012 version)
Design of compound libraries for fragment screening (Feb 2012 version)
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
Types of biological databases-protein database
Types of biological databases-protein databaseTypes of biological databases-protein database
Types of biological databases-protein database
 
Ppi
PpiPpi
Ppi
 
Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...
Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...
Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...
 
Cadd assignment 4 (sarita)
Cadd assignment 4 (sarita)Cadd assignment 4 (sarita)
Cadd assignment 4 (sarita)
 
Protein Data Bank
Protein Data BankProtein Data Bank
Protein Data Bank
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Validation of Clomipramine interactions identified by BioBind against experim...
Validation of Clomipramine interactions identified by BioBind against experim...Validation of Clomipramine interactions identified by BioBind against experim...
Validation of Clomipramine interactions identified by BioBind against experim...
 
Flux balance analysis
Flux balance analysisFlux balance analysis
Flux balance analysis
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 

More from Chris Southan

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
Chris Southan
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
Chris Southan
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
Chris Southan
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
Chris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
Chris Southan
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
Chris Southan
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
Chris Southan
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
Chris Southan
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
Chris Southan
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
Chris Southan
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
Chris Southan
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
Chris Southan
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
Chris Southan
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
Chris Southan
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
Chris Southan
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
Chris Southan
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
Chris Southan
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology
Chris Southan
 
Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed
Chris Southan
 
Druggable genome in GtoPdb and other dbs
Druggable genome in GtoPdb and other dbsDruggable genome in GtoPdb and other dbs
Druggable genome in GtoPdb and other dbs
Chris Southan
 

More from Chris Southan (20)

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology
 
Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed
 
Druggable genome in GtoPdb and other dbs
Druggable genome in GtoPdb and other dbsDruggable genome in GtoPdb and other dbs
Druggable genome in GtoPdb and other dbs
 

Recently uploaded

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 

Recently uploaded (20)

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 

Looking at chemistry - protein - papers connectivity in ELIXIR

  • 1. Chemistry counting across databases Chemistry totals counting in UniChem 1 Centre for Discovery Brian Sciences, University of Edinburgh, Edinburgh, UK. 2 (currently) TW2Informatics Ltd, Göteborg, Sweden, cdsouthan@gmail.com Assessing chemistry <> proteins <> papers connectivity between ELIXIR resources Introduction C As we know, the utility of ELIXIR is largely determined by connectivity and interoperability. This can be expressed in different ways including the ability to computationally query across the same entities between resources and the simple provision of cross-pointers as live URLs for users to manually navigate between entity records from different databases. So how is ELXIR doing in this respect? This has been addressed in a blog post https://cdsouthan.blogspot.com/2018/08/an-initial-look-at-elixir-chemistry.html that asses chemistry <> protein <> papers connectivity (C-P-P). The should be consulted for details since only an outline can be presented in this poster. The starting point was our own UK ELIXIR resource of the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) that includes C-P-P capture (see poster by Harding et al. and http://www.guidetopharmacology.org/). We offer users outlinks and intersects of our proteins via UniProt cross-references and updated our chemistry in PubChem and UniChem. However, entity overlaps with other ELIXIR resources offer crucial complementarity for users. Those compared for curated C-P-P are GtoPdb, ChEMBL, ChEBI, PDBe, and most recently BRENDA, (excepting ChEBI that auto-maps C-P) From the pre-computed chemistry intersects that UniChem generates at each release one can plot informative comparative overlaps. The blog-post shows all five of these but the example for GtoPdb is shown above. The pattern of overlaps has been described in our NAR paper (PMID: 29149325). Note this is highest for PubChem because we are a submitting source but there are minor chemistry rule differences. Protein intersects The easiest way to intersect proteins is via the UniProt cross-references, although these are not available for ChEBI. The Venn diagram above shows selections of Human Swiss-Prot x-refs for the other four sources. Some of the divergence is explicable (e.g. the three sources do not curate PDB proteins that have no reported chemical interactions). Note also the mappings are not all for small-molecules (e.g. the ChEMBL and GtoPdb x-refs include antibody and large peptide interactions). Unique or 2-way overlaps can be cross-curation opportunities to increase coverage. Christopher Southan1,2 Publication intersects in European PubMed Central (EPMC) For curated C-P-P resources it is useful to compare which papers have been selected for chemistry extraction (even though its more difficult to discern “why”). In EPMC the Data Links and Data Citations queries (HAS_CHEMBL:y) and (HAS_PDB:y) worked cleanly. However, there was some ambiguity for (HAS_CHEBI:y). It turns out, unfortunately, these are papers where there is a term match to ChEBI entries but not papers that they curated to extract their chemical entries from. Neither GotPdb nor BRENDA are current data links (GtoPdb intend to address this but in the interim lists of papers they have curated chemistry from can be obtained via PubMed > PubMed). The curation selectivity underlying the capture divergence is worthy of further investigation. Chemistry intersects in PubChem PubChem offers powerful “slice ‘n dice” options to compare 600+ sources. Of our five, BRENDA and PDBe are not submitters but we can use the NCBI Structure (ligands extracted from PDB) to substitute for the latter (n.b. 4-way Venn intersects are difficult from the interface so only a 3-way is shown). Reasons for the wide divergence of ELIXIR chemistry seen above can be partially but not entirely explained (see blog-post). Conclusions • This intra-ELIXIR comparative analysis was more difficult that in should have been • One reason is that these databases have independently diverged over decades into their utility niches with little (pre-ELIXIR) consideration of interoperability • The exercise turned out to be peculiarly “gapped” in that it was not possible to do standardized C-P-P x-mappings between all five, there was always at least one odd- man-out • Some of this could be easily addressed, for example that C-P for GtoPdb, ChEBI and BRENDA get PMIDs indexed in EPMC for the papers they curated/extracted • Another enhancement would be to harmonise chemistry submissions to both UniChem and PubChem (e.g. for PDBe ligands and BRENDA compounds) • The 37% unique chemistry in BRENDA may represent valuable capture but this needs to be checked • More technical dialogue between ELIXIR resources with entities-in-common would be valuable (e.g. to cogitate on causes of divergent capture, pragmatic interoperability assessments, collaborative curation and future RDF cross-testing) • The C-P-P is extendable (e.g. for the new ELIXIR 3D-BioInfo imitative • While ELIXIR Training is progressing and resources have good Help and FAQ these results indicate an unmet need for “comparative exploitation guides” even for just C- P-P. For example users need to know not only “what's in one but not t’other and why?” but also “which permutations of these five, and/or others, should I use for what?” (for chemistry see PMID: 29451740) The EBI UniChem database provides chemical structure cross-indexing between 39 sources that include the five compared here. For comparison PubChem, SureChEMBL (patents) and Human Metabolites (HMDB) are shown on the right. Counts refer to InChIKeys. The % unique are for that source from the 128 million in the 11 Nov release that includes PubChem (some are slightly different from the August blog-post). This unique content is significant for BRENDA, HMDB and PDBe.