SlideShare a Scribd company logo
1 of 1
Download to read offline
John	Mayfield,	Daniel	Lowe	and	Roger	Sayle	
NextMove	Software	Ltd,	Cambridge,	UK.
NextMove	Software	Limited	
Innovation	Centre	(Unit	23)	
Cambridge	Science	Park	
Milton	Road,	Cambridge	
UK		CB4	0EY	
www.nextmovesoftware.com
Introduction
Robert Hanson, Andrey Yerin, Mikko Vainio, and Sophia Gillian Musacchio for initiating and
participating in the “Fix CIP” collaboration and the many in-depth technical discussions that
have lead to improvements in the tools. Karl Nedwed for providing KnowItAll results. Philip
Skinner for providing ChemDraw licenses. Noel O’Boyle for feedback and suggestions.
the need for open-cip
The Cahn-Ingold-Prelog (CIP) priority rules rank atoms around a stereogenic unit to
assign a stereo-descriptor that is invariant to atom order and layout, for example R (right) or
S (left) for tetrahedral atoms.
A directed acyclic graph (digraph) is constructed for each stereogenic unit and the out
edges from the root node compared and ranked according to eight sequence rules[1]. Each
rule is applied exhaustively and tested on the entire digraph before applying the next rule[2].
Acknowledgements
Results
1. P-92.1.3 Nomenclature of Organic Chemistry: IUPAC Recommendations and Preferred Names 2013
2. Paulina Mata. The CIP System Again:  Respecting Hierarchies Is Always a Must. J. Chem. Inf. Comput. Sci., 1999,
39 (6)
Bibliography
Conclusion
The CIP sequence rules provide a standard way for chemists to effectively describe the
configurations of most stereogenic units. However, beyond simple cases the complexity of
the rules necessitates software is used as an aid to naming configurations. The results
demonstrate even then, software implementations do not all agree on the configuration.
Through the results presented here and the on-going effort of the Fix CIP collaboration,
software should aim to converge upon consistent stereochemistry naming. An Open CIP
software tool could provide “blessed” stereochemistry configuration names and provide a
standard algorithm implementation for other vendors to integrate or adapt.
Comparing Cahn-Ingold-Prelog Rule Implementations
Rule 1
a. Higher atomic number precedes lower
b. An atom node duplicated closer to the root ranks higher than one duplicated further
Rule 2 Higher atomic mass number precedes lower
Rule 3 Z precedes E and this precedes nonstereogenic (nst) double bonds
Rule 4
a. Chiral stereogenic units precede pseudoasymmetric stereogenic units and these
precede nonstereogenic units (R = S > r = s > nst)
b. When two ligands have different descriptor pairs, the one with the first chosen like
descriptor pairs has priority over the one with a corresponding unlike descriptor
pairs
c. r precedes s
Rule 5 An atom or group with descriptor R has priority over its enantiomorph S
Stereochemistry in Databases
154
_23
hem
nce
601
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
% of Dataset
Count
0
1
2
3
4
5
6
7
8
9
eMolecules	(June	2017)
PubChem	Substance
PubChem	Compound	(Aug	2017)
ChEMBL	23
ChEBI	154
14	million	records
234	million	records
93	million	records
1.7	million	records
95	thousand	records
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
% of Dataset
Count
0
1
2
3
4
5
6
7
8
9
Number	of	Stereogenic	Units
+
_154
l_23
hem
ance
0601
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
% of Dataset
Count
0
1
2
3
4
5
6
7
8
9
The number of defined stereogenic units per molecule varies between databases.
The application of Rule 1a to the digraph for 2-butanol ranks the out edges connected to
the root as giving the label S (4 > 2 > 5 are anticlockwise looking towards 6).
ChEBI ChEMBL eMolecules PubChem
Compound1
PubChem
Substance
Rule 1a 281K 99.6% 1.8M 98.6% 2.4M 97.0% 53.5M 100.0% 93.1M 98.7%
Rule 1b 4 1 164 255
Rule 2 14 3,565 6,789
Rule 3 29 3 441 36 45
Rule 4a 122 126 273 4 12,770
Rule 4b 563 0.2% 4,037 0.2% 3,188 0.1% 125K 0.1%
Rule 4c 19 558
Rule 5 285 0.1% 23.4K 1.2% 69K 2.8% 15 1.1M 1.2%
Total 282K 1.9M 2.4M 53.5M 94.3M
The majority of stereogenic units are constitutionally asymmetric and can be ranked using
Rule 1a. However, in some datasets the number of stereogenic units requiring Rule 4b
and 5 can be significant.
I II III IV V VI VII VIII IX X XIa XIb XII XIII
Centres 2.0 R R R R R R R R R r R R r R
JMol 14.20.3 R R R R R R R R R r R R r R
ACD/ChemSketch 14.05beta R R R R R R R R R r R R r R
Balloon 1.6.5beta R R R R R R R R R r R R r R
KnowItAll ChemWindow 2018 R R R R R R R R R r R R r R5
ChemDraw 16.0 R R R R S R R R R r R R r R
BIOVIA Draw 2017 R R R - R R R R R -1 R R -1 R
MarvinSketch 17.17 R - - - S R - R - r R R r -
Indigo 1.3.0Beta.r16 -2 R - - R - R R R r S R - -
RDKit 2017.03.03 S R S R S R R S R R R R - -
DataWarrior 4.6.0 R R R - S R R S R R R3 R - -
CACTVS (NCI Resolver Aug 17) R R S - S4 R R S R R S R - -
OPSIN 2.3.1 R R R R R - - - - - S R - -
LexiChem (OEChem) 20170613 R R - - R - - - - - S R - -
ChemDoodle 7.0.2 R R - - S - - s - r S R - -
CDK 2.0 - R R5 - S - - - - - S R - -
JUMBO 6 R - S - - - - - - - S S - -
Constitutional
(Rule 1a, 1b, 2)
Geometrical +
Topographical
(Rule 3,4a,4b,4c,5)
Special
(Mancude,
Aux Descriptors)
1. Pseudoasymmetric r/s labels not displayed but must be
calculated due to answers given for IX and XIII
2. Runtime error occurs
3. Impossible to test as different Kekulé forms are normalised
4. R in CACTVS since Feb 2015, NCI resolver is old version
5. Other descriptor is assigned differently
A set of fourteen structures was collected to identify differences between software
implementations. The structures were selected to cover all the sequence rules and their
applications to special cases.
Eight sequence rules (in essence)
Fix CIP Collaboration
Since submitting this work for presentation the developers: Centres, JMol, ACD/
ChemSketch, and Balloon have begun a collaboration. We are in the process of
submitting for publication an extended in-depth validation set and proposing sequence rule
refinements and additions where they are required.
1As part of the PubChem Compound’s processing, non-constitutional stereochemistry is
removed: for example the nine stereoisomers of inositols are all represented by CID 892.
Atoms connected by double and triple bonds as well as ring closures result in
duplicated nodes in the digraph. In the structure below atoms 5 and 6 appear twice and
atom 1 (the root) appears three times.
Due to this duplication, complex ring systems can generate exponentially large digraphs
that are not computationally tractable. Further complexity in digraphs is caused by the use
of fractional atomic numbers in mancude ring-systems and assignment of auxiliary
descriptors for applying Rules 3-5.
H
OH
H
H
H
H
H
H H
H
H
1
7
6
5
(1)
(1)
65234
O
O
3
4 2
1
6 5
7
7
O
H
H
H
H
H
H
H
H
H
321 5
4
6
1
2 3
5
6 4
H

More Related Content

Similar to Comparing Cahn-Ingold-Prelog Rule Implementations

Bits protein structure
Bits protein structureBits protein structure
Bits protein structureBITS
 
modelling assignment
modelling assignmentmodelling assignment
modelling assignmentShwetA Kumari
 
Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal
 
Rna secondary structure prediction, a cuckoo search approach
Rna secondary structure prediction, a cuckoo search approachRna secondary structure prediction, a cuckoo search approach
Rna secondary structure prediction, a cuckoo search approacheSAT Journals
 
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...Kamel Mansouri
 
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...Yaser Kalifa
 
Electron Density Derived Descriptors in Drug Discovery and Protein Modeling
Electron Density Derived Descriptors in Drug Discovery and Protein ModelingElectron Density Derived Descriptors in Drug Discovery and Protein Modeling
Electron Density Derived Descriptors in Drug Discovery and Protein ModelingN. Sukumar
 
NANO281 Lecture 01 - Introduction to Data Science in Materials Science
NANO281 Lecture 01 - Introduction to Data Science in Materials ScienceNANO281 Lecture 01 - Introduction to Data Science in Materials Science
NANO281 Lecture 01 - Introduction to Data Science in Materials ScienceUniversity of California, San Diego
 
Gordon2003
Gordon2003Gordon2003
Gordon2003toluene
 
Fault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networksFault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networksIJECEIAES
 
GPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningGPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningOlexandr Isayev
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsAndrew McEachran
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsOregon State University
 

Similar to Comparing Cahn-Ingold-Prelog Rule Implementations (20)

Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
foglar book.pdf
foglar book.pdffoglar book.pdf
foglar book.pdf
 
modelling assignment
modelling assignmentmodelling assignment
modelling assignment
 
Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013
 
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
 
Rna secondary structure prediction, a cuckoo search approach
Rna secondary structure prediction, a cuckoo search approachRna secondary structure prediction, a cuckoo search approach
Rna secondary structure prediction, a cuckoo search approach
 
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
 
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...
 
Electron Density Derived Descriptors in Drug Discovery and Protein Modeling
Electron Density Derived Descriptors in Drug Discovery and Protein ModelingElectron Density Derived Descriptors in Drug Discovery and Protein Modeling
Electron Density Derived Descriptors in Drug Discovery and Protein Modeling
 
Event 32
Event 32Event 32
Event 32
 
NANO281 Lecture 01 - Introduction to Data Science in Materials Science
NANO281 Lecture 01 - Introduction to Data Science in Materials ScienceNANO281 Lecture 01 - Introduction to Data Science in Materials Science
NANO281 Lecture 01 - Introduction to Data Science in Materials Science
 
Gordon2003
Gordon2003Gordon2003
Gordon2003
 
Fault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networksFault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networks
 
GPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningGPU-accelerated Virtual Screening
GPU-accelerated Virtual Screening
 
The Performance Validation of Neural Network Based 13C NMR Prediction Using a...
The Performance Validation of Neural Network Based 13C NMR Prediction Using a...The Performance Validation of Neural Network Based 13C NMR Prediction Using a...
The Performance Validation of Neural Network Based 13C NMR Prediction Using a...
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction models
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Computational Chemistry Robots
Computational Chemistry RobotsComputational Chemistry Robots
Computational Chemistry Robots
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computations
 

More from NextMove Software

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...NextMove Software
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedNextMove Software
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...NextMove Software
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...NextMove Software
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...NextMove Software
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsNextMove Software
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics DatabaseNextMove Software
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesNextMove Software
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)NextMove Software
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeNextMove Software
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsNextMove Software
 

More from NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information Exchange
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patents
 

Recently uploaded

A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 

Recently uploaded (20)

A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 

Comparing Cahn-Ingold-Prelog Rule Implementations

  • 1. John Mayfield, Daniel Lowe and Roger Sayle NextMove Software Ltd, Cambridge, UK. NextMove Software Limited Innovation Centre (Unit 23) Cambridge Science Park Milton Road, Cambridge UK CB4 0EY www.nextmovesoftware.com Introduction Robert Hanson, Andrey Yerin, Mikko Vainio, and Sophia Gillian Musacchio for initiating and participating in the “Fix CIP” collaboration and the many in-depth technical discussions that have lead to improvements in the tools. Karl Nedwed for providing KnowItAll results. Philip Skinner for providing ChemDraw licenses. Noel O’Boyle for feedback and suggestions. the need for open-cip The Cahn-Ingold-Prelog (CIP) priority rules rank atoms around a stereogenic unit to assign a stereo-descriptor that is invariant to atom order and layout, for example R (right) or S (left) for tetrahedral atoms. A directed acyclic graph (digraph) is constructed for each stereogenic unit and the out edges from the root node compared and ranked according to eight sequence rules[1]. Each rule is applied exhaustively and tested on the entire digraph before applying the next rule[2]. Acknowledgements Results 1. P-92.1.3 Nomenclature of Organic Chemistry: IUPAC Recommendations and Preferred Names 2013 2. Paulina Mata. The CIP System Again:  Respecting Hierarchies Is Always a Must. J. Chem. Inf. Comput. Sci., 1999, 39 (6) Bibliography Conclusion The CIP sequence rules provide a standard way for chemists to effectively describe the configurations of most stereogenic units. However, beyond simple cases the complexity of the rules necessitates software is used as an aid to naming configurations. The results demonstrate even then, software implementations do not all agree on the configuration. Through the results presented here and the on-going effort of the Fix CIP collaboration, software should aim to converge upon consistent stereochemistry naming. An Open CIP software tool could provide “blessed” stereochemistry configuration names and provide a standard algorithm implementation for other vendors to integrate or adapt. Comparing Cahn-Ingold-Prelog Rule Implementations Rule 1 a. Higher atomic number precedes lower b. An atom node duplicated closer to the root ranks higher than one duplicated further Rule 2 Higher atomic mass number precedes lower Rule 3 Z precedes E and this precedes nonstereogenic (nst) double bonds Rule 4 a. Chiral stereogenic units precede pseudoasymmetric stereogenic units and these precede nonstereogenic units (R = S > r = s > nst) b. When two ligands have different descriptor pairs, the one with the first chosen like descriptor pairs has priority over the one with a corresponding unlike descriptor pairs c. r precedes s Rule 5 An atom or group with descriptor R has priority over its enantiomorph S Stereochemistry in Databases 154 _23 hem nce 601 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 % of Dataset Count 0 1 2 3 4 5 6 7 8 9 eMolecules (June 2017) PubChem Substance PubChem Compound (Aug 2017) ChEMBL 23 ChEBI 154 14 million records 234 million records 93 million records 1.7 million records 95 thousand records 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 % of Dataset Count 0 1 2 3 4 5 6 7 8 9 Number of Stereogenic Units + _154 l_23 hem ance 0601 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 % of Dataset Count 0 1 2 3 4 5 6 7 8 9 The number of defined stereogenic units per molecule varies between databases. The application of Rule 1a to the digraph for 2-butanol ranks the out edges connected to the root as giving the label S (4 > 2 > 5 are anticlockwise looking towards 6). ChEBI ChEMBL eMolecules PubChem Compound1 PubChem Substance Rule 1a 281K 99.6% 1.8M 98.6% 2.4M 97.0% 53.5M 100.0% 93.1M 98.7% Rule 1b 4 1 164 255 Rule 2 14 3,565 6,789 Rule 3 29 3 441 36 45 Rule 4a 122 126 273 4 12,770 Rule 4b 563 0.2% 4,037 0.2% 3,188 0.1% 125K 0.1% Rule 4c 19 558 Rule 5 285 0.1% 23.4K 1.2% 69K 2.8% 15 1.1M 1.2% Total 282K 1.9M 2.4M 53.5M 94.3M The majority of stereogenic units are constitutionally asymmetric and can be ranked using Rule 1a. However, in some datasets the number of stereogenic units requiring Rule 4b and 5 can be significant. I II III IV V VI VII VIII IX X XIa XIb XII XIII Centres 2.0 R R R R R R R R R r R R r R JMol 14.20.3 R R R R R R R R R r R R r R ACD/ChemSketch 14.05beta R R R R R R R R R r R R r R Balloon 1.6.5beta R R R R R R R R R r R R r R KnowItAll ChemWindow 2018 R R R R R R R R R r R R r R5 ChemDraw 16.0 R R R R S R R R R r R R r R BIOVIA Draw 2017 R R R - R R R R R -1 R R -1 R MarvinSketch 17.17 R - - - S R - R - r R R r - Indigo 1.3.0Beta.r16 -2 R - - R - R R R r S R - - RDKit 2017.03.03 S R S R S R R S R R R R - - DataWarrior 4.6.0 R R R - S R R S R R R3 R - - CACTVS (NCI Resolver Aug 17) R R S - S4 R R S R R S R - - OPSIN 2.3.1 R R R R R - - - - - S R - - LexiChem (OEChem) 20170613 R R - - R - - - - - S R - - ChemDoodle 7.0.2 R R - - S - - s - r S R - - CDK 2.0 - R R5 - S - - - - - S R - - JUMBO 6 R - S - - - - - - - S S - - Constitutional (Rule 1a, 1b, 2) Geometrical + Topographical (Rule 3,4a,4b,4c,5) Special (Mancude, Aux Descriptors) 1. Pseudoasymmetric r/s labels not displayed but must be calculated due to answers given for IX and XIII 2. Runtime error occurs 3. Impossible to test as different Kekulé forms are normalised 4. R in CACTVS since Feb 2015, NCI resolver is old version 5. Other descriptor is assigned differently A set of fourteen structures was collected to identify differences between software implementations. The structures were selected to cover all the sequence rules and their applications to special cases. Eight sequence rules (in essence) Fix CIP Collaboration Since submitting this work for presentation the developers: Centres, JMol, ACD/ ChemSketch, and Balloon have begun a collaboration. We are in the process of submitting for publication an extended in-depth validation set and proposing sequence rule refinements and additions where they are required. 1As part of the PubChem Compound’s processing, non-constitutional stereochemistry is removed: for example the nine stereoisomers of inositols are all represented by CID 892. Atoms connected by double and triple bonds as well as ring closures result in duplicated nodes in the digraph. In the structure below atoms 5 and 6 appear twice and atom 1 (the root) appears three times. Due to this duplication, complex ring systems can generate exponentially large digraphs that are not computationally tractable. Further complexity in digraphs is caused by the use of fractional atomic numbers in mancude ring-systems and assignment of auxiliary descriptors for applying Rules 3-5. H OH H H H H H H H H H 1 7 6 5 (1) (1) 65234 O O 3 4 2 1 6 5 7 7 O H H H H H H H H H 321 5 4 6 1 2 3 5 6 4 H