SlideShare a Scribd company logo
1 of 17
www.guidetopharmacology.org
Resolving cryptic needles to molecular
structures: The GtoPdb experience
Christopher Southan, Adam J. Pawson, Joanna L. Sharman, Helen
E. Benson and Elena Faccenda,
IUPHAR/BPS Guide to PHARMACOLOGY, Centre for Integrative
Physiology, University of Edinburgh
ACS CINF session: Find the Needle in a Haystack: Mining Data
from Large Chemical Spaces
1
http://www.slideshare.net/cdsouthan/southan-needles-acs
Abstract
2
The IUHAR/BPS Guide to PHARMACOLOGY database (GtoPdb) team has data-mined bioactive
chemistry since 2009 (PMID 24234439). Consequently, during the curation of 7586 needles (as
ligand entries) we have grappled extensively with the haystack. This work outlines challenges of
mapping company code numbers to structures (n2s) and lead compounds from a haystack of
anywhere between one and five million bioactive structures. By the time these are assigned non-
proprietary names, data linkages can usually be found. However, other valuable needles are lead
compounds approaching clinical development that can also be incisive pharmacological tools.
The use of company codes to designate these is often obfuscatory, with some journals even
allowing blinding where clinical reports have no n2s or links to primary data. The efforts to resolve
the NCATS and MRC repurposing candidates exemplified the problem (PMID 23159359).
Notwithstanding we have now curated 50 AZDs n2s including Open Innovation structures. Codes
also present back-mapping problems where we need to synonym-chain a) first-filings and early
papers b) consecutive different codes via mergers c) INN or USAN and d) an eventual trade
name. The mining challenges are compounded by ad hoc permutations of hyphen, no space, and
space, comma inclusions, dropping a leading zero, appending suffixes or even ghost codes. In
some cases we curate plausible patent structures pending disclosure. For others we found the
vendor-only n2s corroborated via patent match. Recent reports of potent lead structures are
particularly difficult to name-link and synonym-map. To ameliorate the problem we have recently
introduced binding synonyms such as “compound 17d [PMID 23099093]” or “example 98
(WO2011020806)”. This means users can not only immediately locate the exact structures inside
documents, including via our PubChem submissions, but often find expanded SAR series.
Broader issues of n2s obfuscation will be discussed, including the inherent contradiction with the
trend towards greater clinical trials transparency
Subtitle: A tale of three needles
3
http://cdsouthan.blogspot.se/2015/08/merck.html
Starting point: curating BACE1 clinical inhibitors for GtoPdb
4
MK-8931 blinded since 2011
one PubMed name match but no name-
to-structure (n2s)
Synonym-chaining to SCH 900931
First public surfacings of an MK-8931 n2s
5
n.b. ChemIDplus
drops the hyphen
Substance submissions
(SIDs) for CID 23627211
n2s for MK-8391 and SC-1359113
but not SCH 900931
RN 1613380-81-6 is a “ghost” entry
(i.e. no n2s)
6
ChEMBL maps CID 23627211 < > PMID: 23412139
(but MK-8931 = compound 13 not the lead as cpd 16 )
7
MK-8931 not mentioned in Merck
paper, or the ChEMBL SID
In 2015 verubecestat surfaces via Merck
8
N2s provenanced by the
INN and USAN entries but
no code back-mapping or
clinical trial link
N2s: corroborative decoding the IUPACs from the USAN
9
InChIKey search squares
the circle because no CID
pointer in the USAN
(sigh….)
So verubecestat = MK-8931, or not?
10
Interesting surprise: Merck record verubecestat as more
potent against BACE2 than BACE1
11
n.b. so would it lower blood sugar ?
c.f. BACE2 as a new diabetes target: a patent review (2010 - 2012)
http://www.ncbi.nlm.nih.gov/pubmed/23506624
Secondary sources have synonym-chained:
but how did they provenance?
12
We are left with a lot of needle questions
• So what is the molecular resolution between MK-8391, SC 1359113,
SCH 900931 verubecestat, compound 13, example 25 etc?
• Have ChemIDplus introduced an unprovenanced incorrect n2s into
the CID 23627211 synonyms?
• What is the source of the cryptic n2s surfacing in the vendor jungle?
• Why did the USAN not include the code (they usually do)
• Will Merck directly clarify any of this (e.g. publish a PubMed paper on
verubecestat including an n2s)?
• Would ChemIDplus then correct their entry?
• If the primary n2s changes will secondary sources a) notice and b)
correct and update ?
• Could verubecestat ameliorate diabetes as well as AD? (not a joke)
13
In the meantime: GtoPdb does the best we can
14
cpd 16 [PMID: 23412139] GtoP 8698verubecestat GtoPdb 8699 MK-8931 GtoPdb 8931
Which includes document-linking the best needles
15
http://www.guidetopharmacology.org/GRAC/ObjectDisplayForward?objectId=2330
BACE1
(upcoming
release
2015.2)
Conclusion: time to rethink n2s obfuscation?
(a.k.a. desist from hiding the needles)
• The GtoPdb team can testify that curating useful needles (names <>
structures <> data) from the haystack of ~60-100 million structures with ~
5 million bioactives is tough
• Code-blinding and synonym spaghetti make it even tougher
• They seriously confound big-data mining (but GtoPdb makes small data
minable)
• Is there any evidence that n2s blinding gives competitive advantage?
• Do code numbers and synonym chains have to remain an ad hoc mess?
• So why not adopt a universally useful form (e.g. ABCD123456)?
• Why do pharma companies obstinately decline to provenance their own
n2s in public databases?
• Pressure for clinical trial transparency and translational data mining is
building but there is still anomalous neglect of linking explicit structures.
• Some signs of n2s cross-corroboration across USAN, INN, FDA/SPL,
ChemIDPlus and PubChem – but could be better
16
References, acknowledgments and questions
17
http://www.ncbi.nlm.nih.gov/pubmed/24234439
http://www.ncbi.nlm.nih.gov/pubmed/23159359

More Related Content

What's hot

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsNextMove Software
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionAustin Benson
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...NextMove Software
 
Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataChris Southan
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataGreg Landrum
 
CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningCINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningNextMove Software
 
Molecular Docking Using Autodock
Molecular Docking Using AutodockMolecular Docking Using Autodock
Molecular Docking Using AutodockSapan Shah
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data ContextAlasdair Gray
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedNextMove Software
 
2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) english2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) englishPOSTECH Library
 
Molecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsMolecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsVikram Aditya
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyAlasdair Gray
 

What's hot (15)

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patents
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link prediction
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity Data
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
 
CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningCINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
 
Molecular Docking Using Autodock
Molecular Docking Using AutodockMolecular Docking Using Autodock
Molecular Docking Using Autodock
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) english2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) english
 
Molecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsMolecular Docking Using Autodock Tools
Molecular Docking Using Autodock Tools
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
 

Similar to Resolving cryptic needles to molecular structures: The GtoPdb experience

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureChris Southan
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFERChris Southan
 
FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCPChris Southan
 
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS santosh Kumbhar
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphAndrew Su
 
Rafael Gozalbes / Computational chemistry in the field of human health
Rafael Gozalbes / Computational chemistry in the field of human healthRafael Gozalbes / Computational chemistry in the field of human health
Rafael Gozalbes / Computational chemistry in the field of human healthBiocat, BioRegion of Catalonia
 
The TIR revolution in plant stress biology
The TIR revolution in plant stress biologyThe TIR revolution in plant stress biology
The TIR revolution in plant stress biologyDmitryLapin2
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyChris Southan
 
Analysis and Implementation of Particle Swarm Optimization Technique for Redu...
Analysis and Implementation of Particle Swarm Optimization Technique for Redu...Analysis and Implementation of Particle Swarm Optimization Technique for Redu...
Analysis and Implementation of Particle Swarm Optimization Technique for Redu...ijtsrd
 
Pub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityPub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityChris Southan
 
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...Al Dossetter
 
Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Chris Southan
 
GPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningGPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningOlexandr Isayev
 
Connecting antimalarial data
Connecting antimalarial dataConnecting antimalarial data
Connecting antimalarial dataChris Southan
 
Simplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsSimplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsAustin Benson
 

Similar to Resolving cryptic needles to molecular structures: The GtoPdb experience (20)

Liquid Chromatography
Liquid ChromatographyLiquid Chromatography
Liquid Chromatography
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosure
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Rafael Gozalbes / Computational chemistry in the field of human health
Rafael Gozalbes / Computational chemistry in the field of human healthRafael Gozalbes / Computational chemistry in the field of human health
Rafael Gozalbes / Computational chemistry in the field of human health
 
The TIR revolution in plant stress biology
The TIR revolution in plant stress biologyThe TIR revolution in plant stress biology
The TIR revolution in plant stress biology
 
BCSRCv1.3
BCSRCv1.3BCSRCv1.3
BCSRCv1.3
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Analysis and Implementation of Particle Swarm Optimization Technique for Redu...
Analysis and Implementation of Particle Swarm Optimization Technique for Redu...Analysis and Implementation of Particle Swarm Optimization Technique for Redu...
Analysis and Implementation of Particle Swarm Optimization Technique for Redu...
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Pub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityPub Med to PubChem Connectivity
Pub Med to PubChem Connectivity
 
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...
 
Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...
 
GPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningGPU-accelerated Virtual Screening
GPU-accelerated Virtual Screening
 
Connecting antimalarial data
Connecting antimalarial dataConnecting antimalarial data
Connecting antimalarial data
 
Simplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsSimplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusions
 

More from Chris Southan

Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityChris Southan
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulationsChris Southan
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Chris Southan
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeChris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentChris Southan
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Chris Southan
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCPChris Southan
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteinsChris Southan
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databasesChris Southan
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology Chris Southan
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 posterChris Southan
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagensChris Southan
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand upChris Southan
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide TribulationsChris Southan
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRChris Southan
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology updateChris Southan
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProtChris Southan
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbChris Southan
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology Chris Southan
 

More from Chris Southan (20)

Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology
 

Recently uploaded

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingadibshanto115
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 

Recently uploaded (20)

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 

Resolving cryptic needles to molecular structures: The GtoPdb experience

  • 1. www.guidetopharmacology.org Resolving cryptic needles to molecular structures: The GtoPdb experience Christopher Southan, Adam J. Pawson, Joanna L. Sharman, Helen E. Benson and Elena Faccenda, IUPHAR/BPS Guide to PHARMACOLOGY, Centre for Integrative Physiology, University of Edinburgh ACS CINF session: Find the Needle in a Haystack: Mining Data from Large Chemical Spaces 1 http://www.slideshare.net/cdsouthan/southan-needles-acs
  • 2. Abstract 2 The IUHAR/BPS Guide to PHARMACOLOGY database (GtoPdb) team has data-mined bioactive chemistry since 2009 (PMID 24234439). Consequently, during the curation of 7586 needles (as ligand entries) we have grappled extensively with the haystack. This work outlines challenges of mapping company code numbers to structures (n2s) and lead compounds from a haystack of anywhere between one and five million bioactive structures. By the time these are assigned non- proprietary names, data linkages can usually be found. However, other valuable needles are lead compounds approaching clinical development that can also be incisive pharmacological tools. The use of company codes to designate these is often obfuscatory, with some journals even allowing blinding where clinical reports have no n2s or links to primary data. The efforts to resolve the NCATS and MRC repurposing candidates exemplified the problem (PMID 23159359). Notwithstanding we have now curated 50 AZDs n2s including Open Innovation structures. Codes also present back-mapping problems where we need to synonym-chain a) first-filings and early papers b) consecutive different codes via mergers c) INN or USAN and d) an eventual trade name. The mining challenges are compounded by ad hoc permutations of hyphen, no space, and space, comma inclusions, dropping a leading zero, appending suffixes or even ghost codes. In some cases we curate plausible patent structures pending disclosure. For others we found the vendor-only n2s corroborated via patent match. Recent reports of potent lead structures are particularly difficult to name-link and synonym-map. To ameliorate the problem we have recently introduced binding synonyms such as “compound 17d [PMID 23099093]” or “example 98 (WO2011020806)”. This means users can not only immediately locate the exact structures inside documents, including via our PubChem submissions, but often find expanded SAR series. Broader issues of n2s obfuscation will be discussed, including the inherent contradiction with the trend towards greater clinical trials transparency
  • 3. Subtitle: A tale of three needles 3 http://cdsouthan.blogspot.se/2015/08/merck.html
  • 4. Starting point: curating BACE1 clinical inhibitors for GtoPdb 4 MK-8931 blinded since 2011 one PubMed name match but no name- to-structure (n2s) Synonym-chaining to SCH 900931
  • 5. First public surfacings of an MK-8931 n2s 5 n.b. ChemIDplus drops the hyphen
  • 6. Substance submissions (SIDs) for CID 23627211 n2s for MK-8391 and SC-1359113 but not SCH 900931 RN 1613380-81-6 is a “ghost” entry (i.e. no n2s) 6
  • 7. ChEMBL maps CID 23627211 < > PMID: 23412139 (but MK-8931 = compound 13 not the lead as cpd 16 ) 7 MK-8931 not mentioned in Merck paper, or the ChEMBL SID
  • 8. In 2015 verubecestat surfaces via Merck 8 N2s provenanced by the INN and USAN entries but no code back-mapping or clinical trial link
  • 9. N2s: corroborative decoding the IUPACs from the USAN 9 InChIKey search squares the circle because no CID pointer in the USAN (sigh….)
  • 10. So verubecestat = MK-8931, or not? 10
  • 11. Interesting surprise: Merck record verubecestat as more potent against BACE2 than BACE1 11 n.b. so would it lower blood sugar ? c.f. BACE2 as a new diabetes target: a patent review (2010 - 2012) http://www.ncbi.nlm.nih.gov/pubmed/23506624
  • 12. Secondary sources have synonym-chained: but how did they provenance? 12
  • 13. We are left with a lot of needle questions • So what is the molecular resolution between MK-8391, SC 1359113, SCH 900931 verubecestat, compound 13, example 25 etc? • Have ChemIDplus introduced an unprovenanced incorrect n2s into the CID 23627211 synonyms? • What is the source of the cryptic n2s surfacing in the vendor jungle? • Why did the USAN not include the code (they usually do) • Will Merck directly clarify any of this (e.g. publish a PubMed paper on verubecestat including an n2s)? • Would ChemIDplus then correct their entry? • If the primary n2s changes will secondary sources a) notice and b) correct and update ? • Could verubecestat ameliorate diabetes as well as AD? (not a joke) 13
  • 14. In the meantime: GtoPdb does the best we can 14 cpd 16 [PMID: 23412139] GtoP 8698verubecestat GtoPdb 8699 MK-8931 GtoPdb 8931
  • 15. Which includes document-linking the best needles 15 http://www.guidetopharmacology.org/GRAC/ObjectDisplayForward?objectId=2330 BACE1 (upcoming release 2015.2)
  • 16. Conclusion: time to rethink n2s obfuscation? (a.k.a. desist from hiding the needles) • The GtoPdb team can testify that curating useful needles (names <> structures <> data) from the haystack of ~60-100 million structures with ~ 5 million bioactives is tough • Code-blinding and synonym spaghetti make it even tougher • They seriously confound big-data mining (but GtoPdb makes small data minable) • Is there any evidence that n2s blinding gives competitive advantage? • Do code numbers and synonym chains have to remain an ad hoc mess? • So why not adopt a universally useful form (e.g. ABCD123456)? • Why do pharma companies obstinately decline to provenance their own n2s in public databases? • Pressure for clinical trial transparency and translational data mining is building but there is still anomalous neglect of linking explicit structures. • Some signs of n2s cross-corroboration across USAN, INN, FDA/SPL, ChemIDPlus and PubChem – but could be better 16
  • 17. References, acknowledgments and questions 17 http://www.ncbi.nlm.nih.gov/pubmed/24234439 http://www.ncbi.nlm.nih.gov/pubmed/23159359