SlideShare a Scribd company logo
1 of 27
Download to read offline
Chemical structure
representation in pubchem
Roger Sayle
NextMove Software, Cambridge, UK
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Selected Pubchem publications
• Sunghwan Kim, Paul A. Thiessen, Evan E. Bolton, Jie Chen, Gang Fu, Asta
Gindulyte, Lianyi Han, Jane He, Siqian He, Benjamin A. Shoemaker, Jiyao
Wang, Bo Yu, Jian Zhang and Stephen H. Bryant, “PubChem Substance and
Compound Databases”, Nucleic Acids Research, 2015.
• Volker D. Hahnke, Evan E. Bolton and Stephen H. Bryant, “PubChem atom
enironments”, Journal of Cheminformatics, 7:41, 2015.
• Evan E. Bolton, Yanli Wang, Paul A. Thiessen, Stephen H. Bryant,
“PubChem: Integrated Platform of Molecule Molecules and Biological
Activities”, Annual Reports in Computational Chemistry, Volume 4.,
Chapter 12, pp. 217-241, 2008.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Substance and compound
• A unique and invaluable feature of PubChem’s
architecture is the distinction between the deposited
structures (substances) and the normalized
structures (compounds), and the retention of both.
• Pubchem Substance contains ~209.6M structures.
• Pubchem Compound contains ~91.7M structures.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Molecular identity
• When are two chemical structures the same?
– Alternate chemical representations.
– Aromaticity and conjugation.
– Protonation states and tautomerism.
– Errors and typographical mistakes.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Pubchem standardization service
https://pubchem.ncbi.nlm.nih.gov/standardize/standardize.cgi
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
example 1: ethanol
• PubChem CID 702 has been deposited 1569 times
with six different explicit atom counts.
– 1311 have 9 atoms and 8 bonds.
– 249 have 3 atoms and 2 bonds.
– 4 have 0 atoms and 0 bonds.
– 2 have 4 atoms and 3 bonds.
– 2 have 5 atoms and 4 bonds.
– 1 has 7 atoms and 6 bonds.
• All have same SMILES (“CCO”) and InChI.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Explicit vs. implicit hydrogens
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
example 2: nitrobenzene
• Pubchem CID 7416 has been deposited as 164
distinct substance depositions (2 without structures).
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Mdl molfile-ageDdon
• Biovia 2017 changed the interpretation of CT files.
• This affects 342,689 SIDs and 213,097 CIDs.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Hydrogens: easy come/easy go?
• PubChem is inconsistent on protonation/hydrogens.
• Common organic element radicals are hydrogenated:
– [C] → C, [Cl] → Cl, [P] → P, [S] → S, [H] → [HH]
– [Li], [Be], [B], [Si], [As], [Se], [At], etc. remain unchanged.
• Some groups get deprotonated
– c1ccccc1[N+](=O)O → c1ccccc1[N+](=O)[O-]
• But generally protonation state is preserved
– CC(=O)O, CC(=O)[O-], [NH4+], [NH3+]CC(=O)[O-]
– C[N+](C)(C)O
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Example 3: o-xylene
• A major challenge in chemical databases is
aromaticity; that two compounds that differ in
Kekule forms are the same molecule.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
CID 7237
Pubchem canonical kekule smiles
• A significant novel innovation in cheminformatics
was Evan Bolton’s development of a “canonical”
Kekulé SMILES form of a molecule.
• Different chemistry toolkits (and chemists!) differ in
opinion on which ring systems are aromatic and
which are not, hence PubChem’s wish to remain
“neutral” by only providing non-aromatic SMILES.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Bolton’s algorithm
• Steps of Bolton’s Canonical Kekulé Form Algorithm:
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Tricky case: 10b,10c-dihydropyrene
• An important aspect is to aromatize all conjugated
cycles, not just those associated with SSSR.
• Unfortunately, this computationally demanding
requirement is a source of pain at the NCBI.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Conjugated ring systems
• Does it make sense to distinguish 4n+2 Hückel
aromaticity from conjugated ring systems?
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Resonance forms
• CCN(=O)=O → CC[N+](=O)[O-]
• CCN=N#N → CCN=[N+]=[N-]
• CC[O+]=C=[N-] → CCOC#N
• C[P+](C)(C)[O-] → CP(=O)(C)C
• CC(=[NH2+])[O-] → CC(=O)N
• CS(=[OH+])(=O)[O-]
• C[S+2]([O-])([O-])C
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Tautomers are normalized
• CC(=N)O → CC(=O)N
• CC(=[NH2+])[O-] → CC(=O)N
• n1ccccc1O → [nH]1ccccc1=O
• n1ccc(O)cc1 → [nH]1ccc(=O)cc1
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Classic tautomerism: laar 1886
InChI=1S/C16H12N20/c19-16-11-10-15(13-8-4-5-9-14(13)16)18-17-12-6-2-1-3-7-12/h1-11,19H
InChI=1S/C16H12N20/c19-16-11-10-15(13-8-4-5-9-14(13)16)18-17-12-6-2-1-3-7-12/h1-11,17H
CID 5355205 (CAS 3651-02-3)
5 SIDs13 SIDs
But things could be improved...
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Bonds to metals
• PubChem follows InChI breaking bonds to metals.
– Table salt
• [Na]Cl → [Na+].[Cl-]
• [Na].[Cl] → [Na].Cl
– Zirconium(IV) ethoxide
• CCO[Zr](OCC)(OCC)OCC → [Zr].CCO.CCO.CCO.CCO
• [Zr+4].CC[O-].CC[O-].CC[O-].CC[O-]
– Grignard reagents
• c1ccccc1[Mg]Br → c1cccc[c-]1.[Mg+2].[Br-]
• c1ccccc1[Mg+].[Br-] → c1cccc[c-]1.[Mg+].[Br-]
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Periodic table (circa 1997-2003)
• PubChem currently handles 109 of the 118 elements
in the periodic table [to be ratified in 2016].
• Hence “Mt” is the heaviest element at the moment.
• “Ds”, “Rg”, “Cn”, “Fl”, “Lv” already ratified.
• “Nh”, “Mc”, “Ts” and “Og” expected soon.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Pubchem Isotopes
• PubChem registration confirms that any specified
isotope has been observed experimentally.
• Hence [7CH4] is rejected, but [8CH4] is allowed.
• Interestingly, the [8CH4] of CID 11635947 has a half-
life of only two zeptoseconds (2×10-19 seconds).
• Another quirk is that PubChem doesn’t normalize
mononuclidic isotopes. Hence [19F]C (CID58338844)
is the sames as FC (CID11638).
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Disavowed by the government
• There are a number of species PubChem rejects:
– Chlorine dioxide O=[Cl]=O
– Carbide anions: [C-]#[C-] and [C-4]
• But there is hope…
– Disulfur dioxide: O=[S][S]=O → O=S=S=O
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Related compounds/substances
• CID → SID
– Same Connectivity, Same Stereochemistry, Same Isotopes
– Same Parent Connectivity, Same Exact Parent
– Mixtures, Components and Neutralized Forms
– Unique Components
– Similar Compounds (90% Tanimoto), Similar Conformers
• CID → SID
– All, Same Structure, Mixture
• SID → SID
– Same Connectivity, Same Exact
• SID → CID
– PubChem SID
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Pubchem bond encoding
• PubChem allows depositors to specify advanced
representations of molecular structures such as
inorganics and organometallics via SD tags.
• PUBCHEM_NONSTANDARDBOND
– 4 = Quadruple bond, 5 = Dative bond, 6 = Complex bond,
7 = Ionic bond.
• PUBCHEM_BONDANNOTATIONS
– 2 = Hydrogen bond, 9 = Resonance bond, 10 = Bold bond,
11 = Fischer bond, 12 = Close contact.
• Relatively few depositors make use of these.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
Final thoughts: abstract
For all of the grief that I give Evan, often over corner cases of chemical semantics that
only one or two people care about, it is fair to say that PubChem represents the
current state-of-the-art in chemical structure representation. Nobody does it better.
Under the surface, unseen to most users, are a large number of technical and scientific
innovations that have enabled PubChem to scale over the past decade and a half to
now contain approaching 100 million compounds. From simple design decisions such
as the substance vs. compound distinction [that allows PubChem to avoid the early
mistakes of CAS] to breakthroughs such as canonical Kekule SMILEs [to avoid the early
mistakes of Daylight Chemical Information Systems], the architecture of Pubchem
contains a treasure trove of cheminformatics innovations, covering normalization,
tautomers, mixtures, 2D fingerprints and similarity, substructure search, biopolymers,
text mining and much more. During this presentation I hope to share some of the cool
insights that the remarkable staff at the NCBI often forget to mention or are too
modest to point out.
Congratulations Evan and Steve.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
acknowledgements
• Evan Bolton, Steve Bryant, Paul Thiessen, Volker
Hähnke, David Lipman and the PubChem team at the
NCBI.
• John May, at NextMove Software, for the analysis of
PubChem atom types affected by Biovia changes.
• The rest of the team at NextMove Software.
• George Vacek and the team at OpenEye Scientific
Software.
252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016

More Related Content

What's hot

Conformational analysis
Conformational analysisConformational analysis
Conformational analysisPinky Vincent
 
Electrophilic addition reaction
Electrophilic addition reactionElectrophilic addition reaction
Electrophilic addition reactionkumar Bodapati
 
Benzene Ring and Aromaticity
Benzene Ring and AromaticityBenzene Ring and Aromaticity
Benzene Ring and AromaticityGaurab Roy
 
Distribution law
Distribution lawDistribution law
Distribution lawRaguM6
 
Classification of organic compounds
Classification of organic compoundsClassification of organic compounds
Classification of organic compoundsLahari Kumar
 
In Silico methods for ADMET prediction of new molecules
 In Silico methods for ADMET prediction of new molecules In Silico methods for ADMET prediction of new molecules
In Silico methods for ADMET prediction of new moleculesMadhuraDatar
 
Presentation on Buffer
Presentation on BufferPresentation on Buffer
Presentation on BufferRubinaRoy1
 
Chemical reaction and application of benzene
Chemical reaction and application of benzeneChemical reaction and application of benzene
Chemical reaction and application of benzeneVogeloh Cin Ceat
 
Slides for optical isomerism
Slides for optical isomerismSlides for optical isomerism
Slides for optical isomerismSabbir_Akand
 
Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Theabhi.in
 
BIOISOSTERSM
BIOISOSTERSMBIOISOSTERSM
BIOISOSTERSMTuba Khan
 
pka and acid dissociation constant
pka and acid dissociation constantpka and acid dissociation constant
pka and acid dissociation constantKAUSHAL SAHU
 
Medicinal chemistry Basics
Medicinal chemistry BasicsMedicinal chemistry Basics
Medicinal chemistry BasicsRahul Patil PhD
 
Drug SAR ( Structure Activity Relationship)
Drug SAR ( Structure Activity Relationship)Drug SAR ( Structure Activity Relationship)
Drug SAR ( Structure Activity Relationship)Rishabh Deshwal
 
Structure aromaticity and Huckels rule
Structure aromaticity and Huckels ruleStructure aromaticity and Huckels rule
Structure aromaticity and Huckels rulezaryabhaider7
 

What's hot (20)

Benzene and its derivatives.ppt
Benzene and its derivatives.pptBenzene and its derivatives.ppt
Benzene and its derivatives.ppt
 
Conformational analysis
Conformational analysisConformational analysis
Conformational analysis
 
Electrophilic addition reaction
Electrophilic addition reactionElectrophilic addition reaction
Electrophilic addition reaction
 
Benzene Ring and Aromaticity
Benzene Ring and AromaticityBenzene Ring and Aromaticity
Benzene Ring and Aromaticity
 
Distribution law
Distribution lawDistribution law
Distribution law
 
Classification of organic compounds
Classification of organic compoundsClassification of organic compounds
Classification of organic compounds
 
In Silico methods for ADMET prediction of new molecules
 In Silico methods for ADMET prediction of new molecules In Silico methods for ADMET prediction of new molecules
In Silico methods for ADMET prediction of new molecules
 
Presentation on Buffer
Presentation on BufferPresentation on Buffer
Presentation on Buffer
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Kekule structure
Kekule structureKekule structure
Kekule structure
 
Pharmacophore
PharmacophorePharmacophore
Pharmacophore
 
Chemical reaction and application of benzene
Chemical reaction and application of benzeneChemical reaction and application of benzene
Chemical reaction and application of benzene
 
Slides for optical isomerism
Slides for optical isomerismSlides for optical isomerism
Slides for optical isomerism
 
Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)
 
BIOISOSTERSM
BIOISOSTERSMBIOISOSTERSM
BIOISOSTERSM
 
pka and acid dissociation constant
pka and acid dissociation constantpka and acid dissociation constant
pka and acid dissociation constant
 
Medicinal chemistry Basics
Medicinal chemistry BasicsMedicinal chemistry Basics
Medicinal chemistry Basics
 
Drug SAR ( Structure Activity Relationship)
Drug SAR ( Structure Activity Relationship)Drug SAR ( Structure Activity Relationship)
Drug SAR ( Structure Activity Relationship)
 
Qsar by hansch analysis
Qsar by hansch analysisQsar by hansch analysis
Qsar by hansch analysis
 
Structure aromaticity and Huckels rule
Structure aromaticity and Huckels ruleStructure aromaticity and Huckels rule
Structure aromaticity and Huckels rule
 

Viewers also liked

RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]NextMove Software
 
GHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulGHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulNextMove Software
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsNextMove Software
 
Caleb Laieski - Environment and Civil Rights Activist
Caleb Laieski - Environment and Civil Rights ActivistCaleb Laieski - Environment and Civil Rights Activist
Caleb Laieski - Environment and Civil Rights ActivistCaleb Laieski
 
Measuring the ROI of SharePoint in your organization
Measuring the ROI of SharePoint in your organizationMeasuring the ROI of SharePoint in your organization
Measuring the ROI of SharePoint in your organizationEdgewater
 
SharePoint ROI Analysis Case Study
SharePoint ROI Analysis Case StudySharePoint ROI Analysis Case Study
SharePoint ROI Analysis Case StudyPriority SharePoint
 
Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?NextMove Software
 
17 WAYS TO DO A SIDE PLANK
17 WAYS TO DO A SIDE PLANK17 WAYS TO DO A SIDE PLANK
17 WAYS TO DO A SIDE PLANKJuan Lugo
 
¿De quién hablamos (s. xvii)
¿De quién hablamos (s. xvii) ¿De quién hablamos (s. xvii)
¿De quién hablamos (s. xvii) Luis Gil Gil
 
Figuras retóricas en publicidad II
Figuras retóricas en publicidad IIFiguras retóricas en publicidad II
Figuras retóricas en publicidad IIlourdes.domenech
 
Meetings of company
Meetings of companyMeetings of company
Meetings of companyukmaggy
 
Guided Reading: Making the Most of It
Guided Reading: Making the Most of ItGuided Reading: Making the Most of It
Guided Reading: Making the Most of ItJennifer Jones
 

Viewers also liked (14)

RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
 
GHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulGHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be useful
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical Depictions
 
Caleb Laieski - Environment and Civil Rights Activist
Caleb Laieski - Environment and Civil Rights ActivistCaleb Laieski - Environment and Civil Rights Activist
Caleb Laieski - Environment and Civil Rights Activist
 
Measuring the ROI of SharePoint in your organization
Measuring the ROI of SharePoint in your organizationMeasuring the ROI of SharePoint in your organization
Measuring the ROI of SharePoint in your organization
 
Is 20TB really Big Data?
Is 20TB really Big Data?Is 20TB really Big Data?
Is 20TB really Big Data?
 
SharePoint ROI Analysis Case Study
SharePoint ROI Analysis Case StudySharePoint ROI Analysis Case Study
SharePoint ROI Analysis Case Study
 
Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?
 
17 WAYS TO DO A SIDE PLANK
17 WAYS TO DO A SIDE PLANK17 WAYS TO DO A SIDE PLANK
17 WAYS TO DO A SIDE PLANK
 
¿De quién hablamos (s. xvii)
¿De quién hablamos (s. xvii) ¿De quién hablamos (s. xvii)
¿De quién hablamos (s. xvii)
 
Figuras retóricas en publicidad II
Figuras retóricas en publicidad IIFiguras retóricas en publicidad II
Figuras retóricas en publicidad II
 
Meetings of company
Meetings of companyMeetings of company
Meetings of company
 
Customer Journey Analytics and Big Data
Customer Journey Analytics and Big DataCustomer Journey Analytics and Big Data
Customer Journey Analytics and Big Data
 
Guided Reading: Making the Most of It
Guided Reading: Making the Most of ItGuided Reading: Making the Most of It
Guided Reading: Making the Most of It
 

Similar to Chemical structure representation in PubChem

Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
 
Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)NextMove Software
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningCINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningNextMove Software
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingSunghwan Kim
 
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...NextMove Software
 
(Green chemistry and sustainable technology) an hui lu, sheng dai (eds.)-poro...
(Green chemistry and sustainable technology) an hui lu, sheng dai (eds.)-poro...(Green chemistry and sustainable technology) an hui lu, sheng dai (eds.)-poro...
(Green chemistry and sustainable technology) an hui lu, sheng dai (eds.)-poro...Daniel Delgado MSc
 
Matthias Beller ChemAI 231116.pptx
Matthias Beller ChemAI 231116.pptxMatthias Beller ChemAI 231116.pptx
Matthias Beller ChemAI 231116.pptxMarco Tibaldi
 
RMG at the Flame Chemistry Workshop 2014
RMG at the Flame Chemistry Workshop 2014RMG at the Flame Chemistry Workshop 2014
RMG at the Flame Chemistry Workshop 2014Richard West
 
ICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CASICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CASDr. Haxel Consult
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...NextMove Software
 
Shale & Hydrocarbon Resources: Opportunities and Challenges for the Chemical ...
Shale & Hydrocarbon Resources: Opportunities and Challenges for the Chemical ...Shale & Hydrocarbon Resources: Opportunities and Challenges for the Chemical ...
Shale & Hydrocarbon Resources: Opportunities and Challenges for the Chemical ...Rob Hart
 
introduction_to_modern_liquid_chromatography.pdf
introduction_to_modern_liquid_chromatography.pdfintroduction_to_modern_liquid_chromatography.pdf
introduction_to_modern_liquid_chromatography.pdfmeriemmokhtar1
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...Ken Karapetyan
 
2.1. Molecules to Metabolism
 2.1. Molecules to Metabolism 2.1. Molecules to Metabolism
2.1. Molecules to MetabolismMiltiadis Kitsos
 

Similar to Chemical structure representation in PubChem (20)

Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningCINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information training
 
4
44
4
 
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...
 
(Green chemistry and sustainable technology) an hui lu, sheng dai (eds.)-poro...
(Green chemistry and sustainable technology) an hui lu, sheng dai (eds.)-poro...(Green chemistry and sustainable technology) an hui lu, sheng dai (eds.)-poro...
(Green chemistry and sustainable technology) an hui lu, sheng dai (eds.)-poro...
 
Matthias Beller ChemAI 231116.pptx
Matthias Beller ChemAI 231116.pptxMatthias Beller ChemAI 231116.pptx
Matthias Beller ChemAI 231116.pptx
 
RMG at the Flame Chemistry Workshop 2014
RMG at the Flame Chemistry Workshop 2014RMG at the Flame Chemistry Workshop 2014
RMG at the Flame Chemistry Workshop 2014
 
ICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CASICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CAS
 
111.docx
111.docx111.docx
111.docx
 
Ontology work at the Royal Society of Chemistry
Ontology work at the Royal Society of ChemistryOntology work at the Royal Society of Chemistry
Ontology work at the Royal Society of Chemistry
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Shale & Hydrocarbon Resources: Opportunities and Challenges for the Chemical ...
Shale & Hydrocarbon Resources: Opportunities and Challenges for the Chemical ...Shale & Hydrocarbon Resources: Opportunities and Challenges for the Chemical ...
Shale & Hydrocarbon Resources: Opportunities and Challenges for the Chemical ...
 
ACSCAS_2017Calendar (4)
ACSCAS_2017Calendar (4)ACSCAS_2017Calendar (4)
ACSCAS_2017Calendar (4)
 
introduction_to_modern_liquid_chromatography.pdf
introduction_to_modern_liquid_chromatography.pdfintroduction_to_modern_liquid_chromatography.pdf
introduction_to_modern_liquid_chromatography.pdf
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
2.1. Molecules to Metabolism
 2.1. Molecules to Metabolism 2.1. Molecules to Metabolism
2.1. Molecules to Metabolism
 

More from NextMove Software

Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...NextMove Software
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedNextMove Software
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...NextMove Software
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...NextMove Software
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsNextMove Software
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics DatabaseNextMove Software
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesNextMove Software
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)NextMove Software
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeNextMove Software
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsNextMove Software
 
Sketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightSketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightNextMove Software
 

More from NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information Exchange
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patents
 
Sketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightSketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sight
 

Recently uploaded

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 

Recently uploaded (20)

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 

Chemical structure representation in PubChem

  • 1. Chemical structure representation in pubchem Roger Sayle NextMove Software, Cambridge, UK 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 2. Selected Pubchem publications • Sunghwan Kim, Paul A. Thiessen, Evan E. Bolton, Jie Chen, Gang Fu, Asta Gindulyte, Lianyi Han, Jane He, Siqian He, Benjamin A. Shoemaker, Jiyao Wang, Bo Yu, Jian Zhang and Stephen H. Bryant, “PubChem Substance and Compound Databases”, Nucleic Acids Research, 2015. • Volker D. Hahnke, Evan E. Bolton and Stephen H. Bryant, “PubChem atom enironments”, Journal of Cheminformatics, 7:41, 2015. • Evan E. Bolton, Yanli Wang, Paul A. Thiessen, Stephen H. Bryant, “PubChem: Integrated Platform of Molecule Molecules and Biological Activities”, Annual Reports in Computational Chemistry, Volume 4., Chapter 12, pp. 217-241, 2008. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 3. Substance and compound • A unique and invaluable feature of PubChem’s architecture is the distinction between the deposited structures (substances) and the normalized structures (compounds), and the retention of both. • Pubchem Substance contains ~209.6M structures. • Pubchem Compound contains ~91.7M structures. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 4. Molecular identity • When are two chemical structures the same? – Alternate chemical representations. – Aromaticity and conjugation. – Protonation states and tautomerism. – Errors and typographical mistakes. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 5. Pubchem standardization service https://pubchem.ncbi.nlm.nih.gov/standardize/standardize.cgi 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 6. example 1: ethanol • PubChem CID 702 has been deposited 1569 times with six different explicit atom counts. – 1311 have 9 atoms and 8 bonds. – 249 have 3 atoms and 2 bonds. – 4 have 0 atoms and 0 bonds. – 2 have 4 atoms and 3 bonds. – 2 have 5 atoms and 4 bonds. – 1 has 7 atoms and 6 bonds. • All have same SMILES (“CCO”) and InChI. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 7. Explicit vs. implicit hydrogens 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 8. example 2: nitrobenzene • Pubchem CID 7416 has been deposited as 164 distinct substance depositions (2 without structures). 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 9. Mdl molfile-ageDdon • Biovia 2017 changed the interpretation of CT files. • This affects 342,689 SIDs and 213,097 CIDs. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 10. Hydrogens: easy come/easy go? • PubChem is inconsistent on protonation/hydrogens. • Common organic element radicals are hydrogenated: – [C] → C, [Cl] → Cl, [P] → P, [S] → S, [H] → [HH] – [Li], [Be], [B], [Si], [As], [Se], [At], etc. remain unchanged. • Some groups get deprotonated – c1ccccc1[N+](=O)O → c1ccccc1[N+](=O)[O-] • But generally protonation state is preserved – CC(=O)O, CC(=O)[O-], [NH4+], [NH3+]CC(=O)[O-] – C[N+](C)(C)O 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 11. Example 3: o-xylene • A major challenge in chemical databases is aromaticity; that two compounds that differ in Kekule forms are the same molecule. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016 CID 7237
  • 12. Pubchem canonical kekule smiles • A significant novel innovation in cheminformatics was Evan Bolton’s development of a “canonical” Kekulé SMILES form of a molecule. • Different chemistry toolkits (and chemists!) differ in opinion on which ring systems are aromatic and which are not, hence PubChem’s wish to remain “neutral” by only providing non-aromatic SMILES. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 13. Bolton’s algorithm • Steps of Bolton’s Canonical Kekulé Form Algorithm: 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 14. Tricky case: 10b,10c-dihydropyrene • An important aspect is to aromatize all conjugated cycles, not just those associated with SSSR. • Unfortunately, this computationally demanding requirement is a source of pain at the NCBI. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 15. Conjugated ring systems • Does it make sense to distinguish 4n+2 Hückel aromaticity from conjugated ring systems? 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 16. Resonance forms • CCN(=O)=O → CC[N+](=O)[O-] • CCN=N#N → CCN=[N+]=[N-] • CC[O+]=C=[N-] → CCOC#N • C[P+](C)(C)[O-] → CP(=O)(C)C • CC(=[NH2+])[O-] → CC(=O)N • CS(=[OH+])(=O)[O-] • C[S+2]([O-])([O-])C 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 17. Tautomers are normalized • CC(=N)O → CC(=O)N • CC(=[NH2+])[O-] → CC(=O)N • n1ccccc1O → [nH]1ccccc1=O • n1ccc(O)cc1 → [nH]1ccc(=O)cc1 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 18. Classic tautomerism: laar 1886 InChI=1S/C16H12N20/c19-16-11-10-15(13-8-4-5-9-14(13)16)18-17-12-6-2-1-3-7-12/h1-11,19H InChI=1S/C16H12N20/c19-16-11-10-15(13-8-4-5-9-14(13)16)18-17-12-6-2-1-3-7-12/h1-11,17H CID 5355205 (CAS 3651-02-3) 5 SIDs13 SIDs
  • 19. But things could be improved... 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 20. Bonds to metals • PubChem follows InChI breaking bonds to metals. – Table salt • [Na]Cl → [Na+].[Cl-] • [Na].[Cl] → [Na].Cl – Zirconium(IV) ethoxide • CCO[Zr](OCC)(OCC)OCC → [Zr].CCO.CCO.CCO.CCO • [Zr+4].CC[O-].CC[O-].CC[O-].CC[O-] – Grignard reagents • c1ccccc1[Mg]Br → c1cccc[c-]1.[Mg+2].[Br-] • c1ccccc1[Mg+].[Br-] → c1cccc[c-]1.[Mg+].[Br-] 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 21. Periodic table (circa 1997-2003) • PubChem currently handles 109 of the 118 elements in the periodic table [to be ratified in 2016]. • Hence “Mt” is the heaviest element at the moment. • “Ds”, “Rg”, “Cn”, “Fl”, “Lv” already ratified. • “Nh”, “Mc”, “Ts” and “Og” expected soon. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 22. Pubchem Isotopes • PubChem registration confirms that any specified isotope has been observed experimentally. • Hence [7CH4] is rejected, but [8CH4] is allowed. • Interestingly, the [8CH4] of CID 11635947 has a half- life of only two zeptoseconds (2×10-19 seconds). • Another quirk is that PubChem doesn’t normalize mononuclidic isotopes. Hence [19F]C (CID58338844) is the sames as FC (CID11638). 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 23. Disavowed by the government • There are a number of species PubChem rejects: – Chlorine dioxide O=[Cl]=O – Carbide anions: [C-]#[C-] and [C-4] • But there is hope… – Disulfur dioxide: O=[S][S]=O → O=S=S=O 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 24. Related compounds/substances • CID → SID – Same Connectivity, Same Stereochemistry, Same Isotopes – Same Parent Connectivity, Same Exact Parent – Mixtures, Components and Neutralized Forms – Unique Components – Similar Compounds (90% Tanimoto), Similar Conformers • CID → SID – All, Same Structure, Mixture • SID → SID – Same Connectivity, Same Exact • SID → CID – PubChem SID 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 25. Pubchem bond encoding • PubChem allows depositors to specify advanced representations of molecular structures such as inorganics and organometallics via SD tags. • PUBCHEM_NONSTANDARDBOND – 4 = Quadruple bond, 5 = Dative bond, 6 = Complex bond, 7 = Ionic bond. • PUBCHEM_BONDANNOTATIONS – 2 = Hydrogen bond, 9 = Resonance bond, 10 = Bold bond, 11 = Fischer bond, 12 = Close contact. • Relatively few depositors make use of these. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 26. Final thoughts: abstract For all of the grief that I give Evan, often over corner cases of chemical semantics that only one or two people care about, it is fair to say that PubChem represents the current state-of-the-art in chemical structure representation. Nobody does it better. Under the surface, unseen to most users, are a large number of technical and scientific innovations that have enabled PubChem to scale over the past decade and a half to now contain approaching 100 million compounds. From simple design decisions such as the substance vs. compound distinction [that allows PubChem to avoid the early mistakes of CAS] to breakthroughs such as canonical Kekule SMILEs [to avoid the early mistakes of Daylight Chemical Information Systems], the architecture of Pubchem contains a treasure trove of cheminformatics innovations, covering normalization, tautomers, mixtures, 2D fingerprints and similarity, substructure search, biopolymers, text mining and much more. During this presentation I hope to share some of the cool insights that the remarkable staff at the NCBI often forget to mention or are too modest to point out. Congratulations Evan and Steve. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016
  • 27. acknowledgements • Evan Bolton, Steve Bryant, Paul Thiessen, Volker Hähnke, David Lipman and the PubChem team at the NCBI. • John May, at NextMove Software, for the analysis of PubChem atom types affected by Biovia changes. • The rest of the team at NextMove Software. • George Vacek and the team at OpenEye Scientific Software. 252nd ACS National Meeting, Philadelphia, PA, Tuesday 23th August 2016