SlideShare a Scribd company logo
Kekulization,
Aromaticity and SMILES
Noel M. O’Boyle, John W. Mayfield
We need to talk about…
Open Babel/CDK development team and NextMove Software, Cambridge,
UK
254th ACS National Meeting, Washington Aug 2017
or
Cc1ccccc1C
CC1=CC=CC=C1C
CC1=C(C)C=CC=C1
Kekulization (bond localization)
Aromaticity assignment (bond delocalization)
“…we were able to extract some 13,000 SMILES codes for the Wikipedia
entries. Over 600 of these codes could not be processed by the SMILES
parser.
A clear majority of the problems (over 350 cases) was caused by not
respecting the SMILES syntax rules for unsubstituted pyrrole-type
nitrogen. This nitrogen was encoded as n and not as [nH] as required by
the SMILES grammar (so for example benzimidazole was incorrectly
encoded as n2c1ccccc1nc2).”
Why do we need to talk?
• It’s been 29 years since Dave Weininger’s SMILES paper,
but still, sometimes…
– Toolkits generate aromatic SMILES which other toolkits cannot read
– Toolkits fail to roundtrip their own structures through aromatic forms
– Chemical information is lost or confused, aromaticity
appears/disappears, hydrogens appear/disappear
Why do we need to talk?
• It’s been 29 years since Dave Weininger’s SMILES paper,
but still, sometimes…
– Toolkits generate aromatic SMILES which other toolkits cannot read
– Toolkits fail to roundtrip their own structures through aromatic forms
– Chemical information is lost or confused, aromaticity
appears/disappears, hydrogens appear/disappear
• Why is this?
– There is some confusion about which bonds are marked as aromatic
– There is some confusion about the purpose of kekulization, and how
to do it
– There is a lack of information on the Daylight aromaticity model
– There is some confusion about where the implicit hydrogens are in an
aromatic SMILES
• Goal is to describe how Daylight handles aromatic SMILES
– As deduced by JWM
AROMATIC SMILES
What is an aromatic SMILES?
• Has some atoms and some bonds marked as aromatic
• An atom is marked as aromatic if written as lowercase
• A bond is marked as aromatic
– Either… if the aromatic bond symbol (colon) is used
– Or… no bond symbol is used and it joins two aromatic atoms
• But not if the two atoms are in a ring, and the bond is not in a ring
c1ccccc1c2ccccc2
C3c1ccccc1-c2ccccc2C3
C3c1ccccc1c2ccccc2C3
c1ccccc1
cc
Bond marked
as aromatic
Bond not
marked as
aromatic
KEKULIZATION
Kekulization
• Given a molecule where some atoms and bonds have
been marked as aromatic
– Assign bond orders of either one or two to the aromatic bonds
such that the valencies of all of the aromatic atoms are satisfied
(i.e. are consistent with sp2)
• Note: Kekulization is not ‘dearomatization’
– No need to search for aromatic rings or even check for ring
membership
– In particular, H atoms should not be added/removed to make
rings aromatic
orc1ccccc1
Kekulization
• Given a molecule where some atoms and bonds have
been marked as aromatic
– Assign bond orders of either one or two to the aromatic bonds
such that the valencies of all of the aromatic atoms are satisfied
(i.e. are consistent with sp2)
orc1ccccc1
cc
cc1c(c)c(c)c1c
How many hydrogens to add to aromatic atoms?
• If within square brackets (e.g. [nH] or [n])
– The hydrogen count is explicit (as usual for brackets)
• If outside square brackets (e.g. c1ccncc1)
– Calculate the bond order sum, treating aromatic bonds as single bonds
– Apply normal SMILES implicit valence rules using this sum, but subtract
one from the number of implicit hydrogens (if there are any)
– E.g. in pyridine, c1cnccc1, using the normal rules each carbon would
have two hydrogens and the nitrogen one, giving one and zero resp.
• Some toolkits instead add hydrogens to satisfy aromaticity
rules
– This is not what Daylight did. In their world, the number of implicit
hydrogens is known directly from the SMILES string.
Kekulization = “Perfect matching”
• If we consider just the subset of atoms that are aromatic
and require a double bond
– A valid Kekulé structure is exactly equivalent to the graph theory
concept, a “perfect matching”
Greedy algorithm
Backtracking algorithm
Kekulization failure
• If the algorithm described above fails to find a valid
Kekulé form, then the input was incorrect
• It might be missing some hydrogens (incorrect SMILES
writer), or it might be a radical (should not have been
aromatized)
– E.g. c1ccnc1 cannot be kekulized but the writer might have
intended pyrrole (c1cc[nH]c1) or pyrrole radical (C1=C[N]C=C1)
• A reader may reject the SMILES as invalid or warn and
return a radical
• Optionally, a means might be provided to ‘fix’ (i.e. guess)
the intended structure
– This should probably not be the default behaviour as it causes
proliferation of incorrect SMILES and may not recover the
intended structure
AROMATICITY
What is the purpose of aromaticity in cheminf?
Normalize Kekulé forms
Is it a stereogenic center?
Is it aromatic in real life? Yes!
It is aromatic in cheminf? No! *
* According to Daylight aromaticity model
or
What is the purpose of aromaticity in cheminf?
• To normalize to the same representation different Kekulé
forms of a structure
– NOT to indicate whether an atom/bond displays physical
properties associated with aromaticity
• Useful to:
– generate a canonical representation
– identify stereogenic centers
– generate fingerprints
– match an aromatic query
• Note:
– If the resulting aromatic structure cannot be kekulized then it
should not be aromatized
Aromaticity models
• Based on Hückel’s rule:
– A ring is aromatic if it can be planar, the sum of π electrons
is 4n+2, and every atom can participate
• An aromaticity model can be described by two sets of
parameters:
1. how many π electrons each atom contributes
2. what cycles in the graph are tested for 4n+2
• Note that planarity is not explicitly tested
The Daylight aromaticity model
• When writing an aromatic SMILES string, it is probably a
good idea to apply the Daylight aromaticity model
• JWM has recently described the electron contributions
– https://figshare.com/articles/Daylight_Aromatic_Atoms/3370945
What rings to check?
• Best approach is to check all rings that could be aromatic
– Alternative is to use SSSR (not recommended)
– Note that outer ring systems may be aromatic while inner ones
are not
• Need to do this efficiently
– Eliminate atoms that are in rings that cannot be aromatic
– Try small rings first, as may be able to terminate early if no atoms
left to check
– Programs can terminate searches for rings above a certain size or
after backtracking N times
5e-7e-
Outer ring has 10e-
azulene
c1cc2-c(ccccc2)c1
Alternative aromaticity models for SMILES
• Preserve the aromaticity of the input atoms
– Speeds things up – no perception required
– Only sensible if reading aromatic SMILES
– Useful if you have written the SMILES yourself
• Regard all conjugated double bonds as forming a
‘delocalized’ system (JWM)
– Fast, doesn’t require ring-finding
– Not quite “aromaticity model” – as doesn’t apply Hückel rule
SUMMARY
Take-home
• There is some confusion about which bonds are marked
as aromatic, and about the count of implicit hydrogens on
aromatic atoms
• There are simple rules governing these
• There is some confusion about the purpose of
kekulization, and how to do it
– Kekulization is not dearomatization, but just assignment of bond
orders to aromatic bonds to satisfy valencies
– Equivalent to finding a perfect matching
• There is a lack of information on the Daylight aromaticity
model
– JWM has published details of the atom contributions
Agree/disagree/confused?
Email:
noel@nextmovesoftware.com
john@nextmovesoftware.com
Next step:
A validation suite
Acknowledgements:
Greg Landrum
Image: Tintin44 (Flickr)
APPENDIX
A kekulization algorithm
• Identify aromatic atoms that need a double bond (set A)
– Assign each a degree, a count of nbrs in set A
• Apply a greedy algorithm to assign double bonds
favoring low degree atoms over higher
• Does all of set A have a double bond?
• If not, try a backtracking algorithm or Blossom algorithm
to find a path of alternating bonds between two atoms
that need a double bond
– Once found, invert the bond orders along the path
• Does all of set A have a double bond?
– Handle failure
Aromatic atoms that do not require a double bond
• An important aspect of the kekulization algorithm is the initial
determination of which aromatic atoms do/not need a double
bond, e.g.
– Pyrrole-type nitrogens do not need one
– The hypervalent N of pyridine-N oxides *do* need one
• For a list, see page 158 of John May’s thesis [1], and also
the associated implementation in Beam [2], or the CDK [3]
[1] Cheminformatics for genome-scale metabolic reconstructions. EMBL-EBI/University of Cambridge, 2014.
(https://www.repository.cam.ac.uk/handle/1810/246652)
[2] https://github.com/johnmay/beam/blob/master/core/src/main/java/uk/ac/ebi/beam/Localise.java
[3]
https://github.com/cdk/cdk/blob/master/base/standard/src/main/java/org/openscience/cdk/aromaticity/Kekulization.java
Writing aromatic SMILES
• When reading aromatic SMILES, bonds without bond symbols are
marked as aromatic if they connect two aromatic atoms
– But not if the two aromatic atoms are in a ring, but the bond is not in a
ring (not important whether it’s the same ring)
• Therefore, when writing aromatic SMILES, use a bond symbol
where a ring bond is between aromatic atoms but is not itself
aromatic
– Speeds up kekulization, avoids misinterpretation (see below)
– In fact, if you apply this rule to non-ring bonds also, you can avoid ring
perception when reading aromatic SMILES
c12ccccc1c3ccccc23 c12-c3c(-c2cccc1)cccc3
Canonical Kekulé SMILES
• Kekulé SMILES are sometimes recommended over
aromatic SMILES, to avoid problems a toolkit may have
with kekulization
– Care should be taken to avoid defining cis/trans stereochemistry
that is not present
• Step 1: canonically label the atoms
– Note: some canonicalisation algorithms may use aromaticity
• Step 2: re-kekulize the structure taking into account the
canonical labels
– Can consider aromatic atoms, or alternatively can avoid
aromaticity perception if consider all atoms adjacent to a single
double bond (JWM)
Why c1ccnc1 is not valid for pyrrole radical
• There are two types of aromatic Ns in the Daylight world
– Pyrrole-type (three-valent, three single bonds)
– Pyridine-type (two valent, a double and a single bond)
– It is possible to distinguish between these based on valency
• The nitrogen in pyrrole radical is a third type:
– Two-valent, two single bonds
• This cannot be distinguished from pyridine-type when
reading
– Therefore it should not be written, as the nitrogen is assumed to
require a double bond
• Pyrrole radical SMILES that can be read unambiguosly
– The Kekulé form, C1=C[N]C=C1
– Partial aromatic form, c1c[N]cc1
– Maybe c1cc-n-c1, but most toolkits would not handle this right now
Kekulization implementation options
• Could kekulize each disconnected system of
aromatic atoms/bonds separately, or do all at
the same time
– Might speed up backtracking (though might slow
down general case)
• Could fail fast if going to reject, rather than
warn
– e.g. if odd number of atoms need double bonds
• Could favor six-member rings
– Shuffle atoms in smaller (?) rings to end of list

More Related Content

What's hot

Cheminformatics: An overview
Cheminformatics: An overviewCheminformatics: An overview
Cheminformatics: An overview
subhasis banerjee
 
4.Molecular mechanics + quantum mechanics
4.Molecular mechanics + quantum mechanics4.Molecular mechanics + quantum mechanics
4.Molecular mechanics + quantum mechanics
Abhijeet Kadam
 
Calixarenes
CalixarenesCalixarenes
Calixarenes
Aman Imani
 
molecular mechanics and quantum mechnics
molecular mechanics and quantum mechnicsmolecular mechanics and quantum mechnics
molecular mechanics and quantum mechnics
RAKESH JAGTAP
 
Merck molecular force field ppt
Merck molecular force field pptMerck molecular force field ppt
Merck molecular force field ppt
seema sangwan
 
Stereochemistry.pptx
Stereochemistry.pptxStereochemistry.pptx
Stereochemistry.pptx
AsmaAktar11
 
T boc fmoc protocols in peptide synthesis
T boc fmoc protocols in peptide synthesisT boc fmoc protocols in peptide synthesis
T boc fmoc protocols in peptide synthesis
SANTOSH KUMAR SAHOO
 
AMC PPT 4.pptx
AMC PPT 4.pptxAMC PPT 4.pptx
AMC PPT 4.pptx
Aparna Appu
 
Virtual screening strategies and applications
Virtual screening strategies and applicationsVirtual screening strategies and applications
Virtual screening strategies and applications
Ashishkumar3249
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
Bhavesh Amrute
 
Molecular and Quantum Mechanics in drug design
Molecular and Quantum Mechanics in drug designMolecular and Quantum Mechanics in drug design
Molecular and Quantum Mechanics in drug design
Ajay Kumar
 
Energy minimization
Energy minimizationEnergy minimization
Energy minimization
Shikha Popali
 
conformational search used in Pharmacophore mapping
conformational search used in Pharmacophore mappingconformational search used in Pharmacophore mapping
conformational search used in Pharmacophore mapping
Vishakha Giradkar
 
Molecular maodeling and drug design
Molecular maodeling and drug designMolecular maodeling and drug design
Molecular maodeling and drug design
Mahendra G S
 
Seminar energy minimization mettthod
Seminar energy minimization mettthodSeminar energy minimization mettthod
Seminar energy minimization mettthod
Pavan Badgujar
 
Stereochemical aspects
Stereochemical aspectsStereochemical aspects
Stereochemical aspects
MohammadHaider18
 
De novo Drug Design By Yogesh Chaudhari.pptx
De novo Drug Design By  Yogesh Chaudhari.pptxDe novo Drug Design By  Yogesh Chaudhari.pptx
De novo Drug Design By Yogesh Chaudhari.pptx
Yogesh Chaudhari
 
Spps and side reactions in peptide synthesis
Spps and side reactions in peptide synthesisSpps and side reactions in peptide synthesis
Spps and side reactions in peptide synthesis
kavyakaparthi1
 
Structure based in silico virtual screening
Structure based in silico virtual screeningStructure based in silico virtual screening
Structure based in silico virtual screening
Joon Jyoti Sahariah
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation
ChemAxon
 

What's hot (20)

Cheminformatics: An overview
Cheminformatics: An overviewCheminformatics: An overview
Cheminformatics: An overview
 
4.Molecular mechanics + quantum mechanics
4.Molecular mechanics + quantum mechanics4.Molecular mechanics + quantum mechanics
4.Molecular mechanics + quantum mechanics
 
Calixarenes
CalixarenesCalixarenes
Calixarenes
 
molecular mechanics and quantum mechnics
molecular mechanics and quantum mechnicsmolecular mechanics and quantum mechnics
molecular mechanics and quantum mechnics
 
Merck molecular force field ppt
Merck molecular force field pptMerck molecular force field ppt
Merck molecular force field ppt
 
Stereochemistry.pptx
Stereochemistry.pptxStereochemistry.pptx
Stereochemistry.pptx
 
T boc fmoc protocols in peptide synthesis
T boc fmoc protocols in peptide synthesisT boc fmoc protocols in peptide synthesis
T boc fmoc protocols in peptide synthesis
 
AMC PPT 4.pptx
AMC PPT 4.pptxAMC PPT 4.pptx
AMC PPT 4.pptx
 
Virtual screening strategies and applications
Virtual screening strategies and applicationsVirtual screening strategies and applications
Virtual screening strategies and applications
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
Molecular and Quantum Mechanics in drug design
Molecular and Quantum Mechanics in drug designMolecular and Quantum Mechanics in drug design
Molecular and Quantum Mechanics in drug design
 
Energy minimization
Energy minimizationEnergy minimization
Energy minimization
 
conformational search used in Pharmacophore mapping
conformational search used in Pharmacophore mappingconformational search used in Pharmacophore mapping
conformational search used in Pharmacophore mapping
 
Molecular maodeling and drug design
Molecular maodeling and drug designMolecular maodeling and drug design
Molecular maodeling and drug design
 
Seminar energy minimization mettthod
Seminar energy minimization mettthodSeminar energy minimization mettthod
Seminar energy minimization mettthod
 
Stereochemical aspects
Stereochemical aspectsStereochemical aspects
Stereochemical aspects
 
De novo Drug Design By Yogesh Chaudhari.pptx
De novo Drug Design By  Yogesh Chaudhari.pptxDe novo Drug Design By  Yogesh Chaudhari.pptx
De novo Drug Design By Yogesh Chaudhari.pptx
 
Spps and side reactions in peptide synthesis
Spps and side reactions in peptide synthesisSpps and side reactions in peptide synthesis
Spps and side reactions in peptide synthesis
 
Structure based in silico virtual screening
Structure based in silico virtual screeningStructure based in silico virtual screening
Structure based in silico virtual screening
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation
 

Similar to We need to talk about Kekulization, Aromaticity and SMILES

Hybridization
HybridizationHybridization
Hybridization
jwallach
 
Alkane, Alkene, Alkyne - WEEK 1.pptx
Alkane, Alkene, Alkyne - WEEK 1.pptxAlkane, Alkene, Alkyne - WEEK 1.pptx
Alkane, Alkene, Alkyne - WEEK 1.pptx
DianeChristelLunday1
 
Chapter 16 hydrocarbons
Chapter 16   hydrocarbonsChapter 16   hydrocarbons
Chapter 16 hydrocarbons
Hashim Ali
 
Alkynes
AlkynesAlkynes
Alkynes
Zaynita Aulia
 
Atkins Chapter2.ppt
Atkins Chapter2.pptAtkins Chapter2.ppt
Atkins Chapter2.ppt
HAMMOUDDima1
 
Atkins Chapter2.ppt
Atkins Chapter2.pptAtkins Chapter2.ppt
Atkins Chapter2.ppt
MudasirHussain65
 
Organic chemistry
Organic chemistryOrganic chemistry
Organic chemistry
Ankur Chopra
 
Hydrocarbons
HydrocarbonsHydrocarbons
Hydrocarbons
zcastellon
 
Pertemuan ke 7 alkene & alkyne i
Pertemuan ke 7 alkene & alkyne iPertemuan ke 7 alkene & alkyne i
Pertemuan ke 7 alkene & alkyne i
entik09
 
New chm-152-unit-11-power-points-su13-140227172047-phpapp02
New chm-152-unit-11-power-points-su13-140227172047-phpapp02New chm-152-unit-11-power-points-su13-140227172047-phpapp02
New chm-152-unit-11-power-points-su13-140227172047-phpapp02
Cleophas Rwemera
 
Chemistry
ChemistryChemistry
Chemistry
mostafaismail73
 
Chapter 1 introduction to organic chemistry
Chapter 1 introduction to organic chemistryChapter 1 introduction to organic chemistry
Chapter 1 introduction to organic chemistry
Sangidha Jagatheesan
 
Application of organic chemistry ok1294986436
Application of organic chemistry   ok1294986436Application of organic chemistry   ok1294986436
Application of organic chemistry ok1294986436
Iit Examination
 
New chm 152_unit_11_power_points-su13
New chm 152_unit_11_power_points-su13New chm 152_unit_11_power_points-su13
New chm 152_unit_11_power_points-su13
caneman1
 
New chm 152_unit_11_power_points-f12
New chm 152_unit_11_power_points-f12New chm 152_unit_11_power_points-f12
New chm 152_unit_11_power_points-f12
caneman1
 
T20 IB Chemistry Organic
T20  IB Chemistry OrganicT20  IB Chemistry Organic
T20 IB Chemistry Organic
Robert Hughes
 
Chapter07wade7thcgd 140409012745-phpapp02
Chapter07wade7thcgd 140409012745-phpapp02Chapter07wade7thcgd 140409012745-phpapp02
Chapter07wade7thcgd 140409012745-phpapp02
Cleophas Rwemera
 
07 - Structure and Synthesis of Alkenes - Wade 7th
07 - Structure and Synthesis of Alkenes - Wade 7th07 - Structure and Synthesis of Alkenes - Wade 7th
07 - Structure and Synthesis of Alkenes - Wade 7th
Nattawut Huayyai
 
15isomppa.ppt
15isomppa.ppt15isomppa.ppt
15isomppa.ppt
RashmiGupta692042
 
Unit 4
Unit 4Unit 4
Unit 4
alekey08
 

Similar to We need to talk about Kekulization, Aromaticity and SMILES (20)

Hybridization
HybridizationHybridization
Hybridization
 
Alkane, Alkene, Alkyne - WEEK 1.pptx
Alkane, Alkene, Alkyne - WEEK 1.pptxAlkane, Alkene, Alkyne - WEEK 1.pptx
Alkane, Alkene, Alkyne - WEEK 1.pptx
 
Chapter 16 hydrocarbons
Chapter 16   hydrocarbonsChapter 16   hydrocarbons
Chapter 16 hydrocarbons
 
Alkynes
AlkynesAlkynes
Alkynes
 
Atkins Chapter2.ppt
Atkins Chapter2.pptAtkins Chapter2.ppt
Atkins Chapter2.ppt
 
Atkins Chapter2.ppt
Atkins Chapter2.pptAtkins Chapter2.ppt
Atkins Chapter2.ppt
 
Organic chemistry
Organic chemistryOrganic chemistry
Organic chemistry
 
Hydrocarbons
HydrocarbonsHydrocarbons
Hydrocarbons
 
Pertemuan ke 7 alkene & alkyne i
Pertemuan ke 7 alkene & alkyne iPertemuan ke 7 alkene & alkyne i
Pertemuan ke 7 alkene & alkyne i
 
New chm-152-unit-11-power-points-su13-140227172047-phpapp02
New chm-152-unit-11-power-points-su13-140227172047-phpapp02New chm-152-unit-11-power-points-su13-140227172047-phpapp02
New chm-152-unit-11-power-points-su13-140227172047-phpapp02
 
Chemistry
ChemistryChemistry
Chemistry
 
Chapter 1 introduction to organic chemistry
Chapter 1 introduction to organic chemistryChapter 1 introduction to organic chemistry
Chapter 1 introduction to organic chemistry
 
Application of organic chemistry ok1294986436
Application of organic chemistry   ok1294986436Application of organic chemistry   ok1294986436
Application of organic chemistry ok1294986436
 
New chm 152_unit_11_power_points-su13
New chm 152_unit_11_power_points-su13New chm 152_unit_11_power_points-su13
New chm 152_unit_11_power_points-su13
 
New chm 152_unit_11_power_points-f12
New chm 152_unit_11_power_points-f12New chm 152_unit_11_power_points-f12
New chm 152_unit_11_power_points-f12
 
T20 IB Chemistry Organic
T20  IB Chemistry OrganicT20  IB Chemistry Organic
T20 IB Chemistry Organic
 
Chapter07wade7thcgd 140409012745-phpapp02
Chapter07wade7thcgd 140409012745-phpapp02Chapter07wade7thcgd 140409012745-phpapp02
Chapter07wade7thcgd 140409012745-phpapp02
 
07 - Structure and Synthesis of Alkenes - Wade 7th
07 - Structure and Synthesis of Alkenes - Wade 7th07 - Structure and Synthesis of Alkenes - Wade 7th
07 - Structure and Synthesis of Alkenes - Wade 7th
 
15isomppa.ppt
15isomppa.ppt15isomppa.ppt
15isomppa.ppt
 
Unit 4
Unit 4Unit 4
Unit 4
 

More from baoilleach

Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overview
baoilleach
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?
baoilleach
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Web
baoilleach
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2
baoilleach
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
baoilleach
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
baoilleach
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
baoilleach
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculation
baoilleach
 
Data Analysis in QSAR
Data Analysis in QSARData Analysis in QSAR
Data Analysis in QSAR
baoilleach
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
baoilleach
 
My Open Access papers
My Open Access papersMy Open Access papers
My Open Access papers
baoilleach
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
baoilleach
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...
baoilleach
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tune
baoilleach
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
baoilleach
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopy
baoilleach
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devices
baoilleach
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
baoilleach
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
baoilleach
 
Improving enrichment rates
Improving enrichment ratesImproving enrichment rates
Improving enrichment rates
baoilleach
 

More from baoilleach (20)

Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overview
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Web
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculation
 
Data Analysis in QSAR
Data Analysis in QSARData Analysis in QSAR
Data Analysis in QSAR
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
 
My Open Access papers
My Open Access papersMy Open Access papers
My Open Access papers
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tune
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopy
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devices
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
 
Improving enrichment rates
Improving enrichment ratesImproving enrichment rates
Improving enrichment rates
 

Recently uploaded

LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
yourprojectpartner05
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
fatima132662
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
Sérgio Sacani
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
OmAle5
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
ShibsekharRoy1
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 

Recently uploaded (20)

LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 

We need to talk about Kekulization, Aromaticity and SMILES

  • 1. Kekulization, Aromaticity and SMILES Noel M. O’Boyle, John W. Mayfield We need to talk about… Open Babel/CDK development team and NextMove Software, Cambridge, UK 254th ACS National Meeting, Washington Aug 2017
  • 3. “…we were able to extract some 13,000 SMILES codes for the Wikipedia entries. Over 600 of these codes could not be processed by the SMILES parser. A clear majority of the problems (over 350 cases) was caused by not respecting the SMILES syntax rules for unsubstituted pyrrole-type nitrogen. This nitrogen was encoded as n and not as [nH] as required by the SMILES grammar (so for example benzimidazole was incorrectly encoded as n2c1ccccc1nc2).”
  • 4. Why do we need to talk? • It’s been 29 years since Dave Weininger’s SMILES paper, but still, sometimes… – Toolkits generate aromatic SMILES which other toolkits cannot read – Toolkits fail to roundtrip their own structures through aromatic forms – Chemical information is lost or confused, aromaticity appears/disappears, hydrogens appear/disappear
  • 5. Why do we need to talk? • It’s been 29 years since Dave Weininger’s SMILES paper, but still, sometimes… – Toolkits generate aromatic SMILES which other toolkits cannot read – Toolkits fail to roundtrip their own structures through aromatic forms – Chemical information is lost or confused, aromaticity appears/disappears, hydrogens appear/disappear • Why is this? – There is some confusion about which bonds are marked as aromatic – There is some confusion about the purpose of kekulization, and how to do it – There is a lack of information on the Daylight aromaticity model – There is some confusion about where the implicit hydrogens are in an aromatic SMILES • Goal is to describe how Daylight handles aromatic SMILES – As deduced by JWM
  • 7. What is an aromatic SMILES? • Has some atoms and some bonds marked as aromatic • An atom is marked as aromatic if written as lowercase • A bond is marked as aromatic – Either… if the aromatic bond symbol (colon) is used – Or… no bond symbol is used and it joins two aromatic atoms • But not if the two atoms are in a ring, and the bond is not in a ring c1ccccc1c2ccccc2 C3c1ccccc1-c2ccccc2C3 C3c1ccccc1c2ccccc2C3 c1ccccc1 cc Bond marked as aromatic Bond not marked as aromatic
  • 9. Kekulization • Given a molecule where some atoms and bonds have been marked as aromatic – Assign bond orders of either one or two to the aromatic bonds such that the valencies of all of the aromatic atoms are satisfied (i.e. are consistent with sp2) • Note: Kekulization is not ‘dearomatization’ – No need to search for aromatic rings or even check for ring membership – In particular, H atoms should not be added/removed to make rings aromatic orc1ccccc1
  • 10. Kekulization • Given a molecule where some atoms and bonds have been marked as aromatic – Assign bond orders of either one or two to the aromatic bonds such that the valencies of all of the aromatic atoms are satisfied (i.e. are consistent with sp2) orc1ccccc1 cc cc1c(c)c(c)c1c
  • 11. How many hydrogens to add to aromatic atoms? • If within square brackets (e.g. [nH] or [n]) – The hydrogen count is explicit (as usual for brackets) • If outside square brackets (e.g. c1ccncc1) – Calculate the bond order sum, treating aromatic bonds as single bonds – Apply normal SMILES implicit valence rules using this sum, but subtract one from the number of implicit hydrogens (if there are any) – E.g. in pyridine, c1cnccc1, using the normal rules each carbon would have two hydrogens and the nitrogen one, giving one and zero resp. • Some toolkits instead add hydrogens to satisfy aromaticity rules – This is not what Daylight did. In their world, the number of implicit hydrogens is known directly from the SMILES string.
  • 12. Kekulization = “Perfect matching” • If we consider just the subset of atoms that are aromatic and require a double bond – A valid Kekulé structure is exactly equivalent to the graph theory concept, a “perfect matching”
  • 15. Kekulization failure • If the algorithm described above fails to find a valid Kekulé form, then the input was incorrect • It might be missing some hydrogens (incorrect SMILES writer), or it might be a radical (should not have been aromatized) – E.g. c1ccnc1 cannot be kekulized but the writer might have intended pyrrole (c1cc[nH]c1) or pyrrole radical (C1=C[N]C=C1) • A reader may reject the SMILES as invalid or warn and return a radical • Optionally, a means might be provided to ‘fix’ (i.e. guess) the intended structure – This should probably not be the default behaviour as it causes proliferation of incorrect SMILES and may not recover the intended structure
  • 17. What is the purpose of aromaticity in cheminf? Normalize Kekulé forms Is it a stereogenic center? Is it aromatic in real life? Yes! It is aromatic in cheminf? No! * * According to Daylight aromaticity model or
  • 18. What is the purpose of aromaticity in cheminf? • To normalize to the same representation different Kekulé forms of a structure – NOT to indicate whether an atom/bond displays physical properties associated with aromaticity • Useful to: – generate a canonical representation – identify stereogenic centers – generate fingerprints – match an aromatic query • Note: – If the resulting aromatic structure cannot be kekulized then it should not be aromatized
  • 19. Aromaticity models • Based on Hückel’s rule: – A ring is aromatic if it can be planar, the sum of π electrons is 4n+2, and every atom can participate • An aromaticity model can be described by two sets of parameters: 1. how many π electrons each atom contributes 2. what cycles in the graph are tested for 4n+2 • Note that planarity is not explicitly tested
  • 20. The Daylight aromaticity model • When writing an aromatic SMILES string, it is probably a good idea to apply the Daylight aromaticity model • JWM has recently described the electron contributions – https://figshare.com/articles/Daylight_Aromatic_Atoms/3370945
  • 21. What rings to check? • Best approach is to check all rings that could be aromatic – Alternative is to use SSSR (not recommended) – Note that outer ring systems may be aromatic while inner ones are not • Need to do this efficiently – Eliminate atoms that are in rings that cannot be aromatic – Try small rings first, as may be able to terminate early if no atoms left to check – Programs can terminate searches for rings above a certain size or after backtracking N times 5e-7e- Outer ring has 10e- azulene c1cc2-c(ccccc2)c1
  • 22. Alternative aromaticity models for SMILES • Preserve the aromaticity of the input atoms – Speeds things up – no perception required – Only sensible if reading aromatic SMILES – Useful if you have written the SMILES yourself • Regard all conjugated double bonds as forming a ‘delocalized’ system (JWM) – Fast, doesn’t require ring-finding – Not quite “aromaticity model” – as doesn’t apply Hückel rule
  • 24. Take-home • There is some confusion about which bonds are marked as aromatic, and about the count of implicit hydrogens on aromatic atoms • There are simple rules governing these • There is some confusion about the purpose of kekulization, and how to do it – Kekulization is not dearomatization, but just assignment of bond orders to aromatic bonds to satisfy valencies – Equivalent to finding a perfect matching • There is a lack of information on the Daylight aromaticity model – JWM has published details of the atom contributions
  • 27. A kekulization algorithm • Identify aromatic atoms that need a double bond (set A) – Assign each a degree, a count of nbrs in set A • Apply a greedy algorithm to assign double bonds favoring low degree atoms over higher • Does all of set A have a double bond? • If not, try a backtracking algorithm or Blossom algorithm to find a path of alternating bonds between two atoms that need a double bond – Once found, invert the bond orders along the path • Does all of set A have a double bond? – Handle failure
  • 28. Aromatic atoms that do not require a double bond • An important aspect of the kekulization algorithm is the initial determination of which aromatic atoms do/not need a double bond, e.g. – Pyrrole-type nitrogens do not need one – The hypervalent N of pyridine-N oxides *do* need one • For a list, see page 158 of John May’s thesis [1], and also the associated implementation in Beam [2], or the CDK [3] [1] Cheminformatics for genome-scale metabolic reconstructions. EMBL-EBI/University of Cambridge, 2014. (https://www.repository.cam.ac.uk/handle/1810/246652) [2] https://github.com/johnmay/beam/blob/master/core/src/main/java/uk/ac/ebi/beam/Localise.java [3] https://github.com/cdk/cdk/blob/master/base/standard/src/main/java/org/openscience/cdk/aromaticity/Kekulization.java
  • 29. Writing aromatic SMILES • When reading aromatic SMILES, bonds without bond symbols are marked as aromatic if they connect two aromatic atoms – But not if the two aromatic atoms are in a ring, but the bond is not in a ring (not important whether it’s the same ring) • Therefore, when writing aromatic SMILES, use a bond symbol where a ring bond is between aromatic atoms but is not itself aromatic – Speeds up kekulization, avoids misinterpretation (see below) – In fact, if you apply this rule to non-ring bonds also, you can avoid ring perception when reading aromatic SMILES c12ccccc1c3ccccc23 c12-c3c(-c2cccc1)cccc3
  • 30. Canonical Kekulé SMILES • Kekulé SMILES are sometimes recommended over aromatic SMILES, to avoid problems a toolkit may have with kekulization – Care should be taken to avoid defining cis/trans stereochemistry that is not present • Step 1: canonically label the atoms – Note: some canonicalisation algorithms may use aromaticity • Step 2: re-kekulize the structure taking into account the canonical labels – Can consider aromatic atoms, or alternatively can avoid aromaticity perception if consider all atoms adjacent to a single double bond (JWM)
  • 31. Why c1ccnc1 is not valid for pyrrole radical • There are two types of aromatic Ns in the Daylight world – Pyrrole-type (three-valent, three single bonds) – Pyridine-type (two valent, a double and a single bond) – It is possible to distinguish between these based on valency • The nitrogen in pyrrole radical is a third type: – Two-valent, two single bonds • This cannot be distinguished from pyridine-type when reading – Therefore it should not be written, as the nitrogen is assumed to require a double bond • Pyrrole radical SMILES that can be read unambiguosly – The Kekulé form, C1=C[N]C=C1 – Partial aromatic form, c1c[N]cc1 – Maybe c1cc-n-c1, but most toolkits would not handle this right now
  • 32. Kekulization implementation options • Could kekulize each disconnected system of aromatic atoms/bonds separately, or do all at the same time – Might speed up backtracking (though might slow down general case) • Could fail fast if going to reject, rather than warn – e.g. if odd number of atoms need double bonds • Could favor six-member rings – Shuffle atoms in smaller (?) rings to end of list

Editor's Notes

  1. 2.7%
  2. Pyrrole radical
  3. Pyrrole radical