How registration worksStructures are accepted in some machine-readableformat and boiled down to some position-independentcanonical form.Drop exact coordinates and retain only relativecoordinates, disregarding bond length.Canonicalization based on depiction of bonds (wedges orhashes) rather than 3D positions around atoms.
Algorithm for hexagons• Identify the perspective conformation (boat, chair, regular hexagon, and so on)• Determine perspective stereo• Assign wedge or hash to the bonds accordingly• (tricky) Reconstruct the sugar ring so as to minimize disruption of the rest of the molecule
Assigning chair stereochemistryTake the x-axis as either the line through the toptwo ring atoms or bottom two ring atoms.Substituents with Δy positive are up, Δy negativeare down.Then remap chair to a regular hexagon (tricky).
Assigning HaworthstereochemistryThis works for both hexagons andpentagons.Remove any hashes or wedgeswithin the ring.Take the x-axis as a line through one of the ring C–O bonds.Substituents with Δy positive are up, Δy negative are down.The Haworth LLLLLL/RRRRRR hexagon is unappealing, but can be tidiedto a regular hexagon grid without too much disruption.The same goes for the Haworth pentagon.
Future work: integrate with CVSPStructure validation•Warn on query atoms, pseudo atoms, polymers, etc.•Nonsensical stereoAllows users to put together their own standardization workflow usingmodules provided:•Apply default CVSP or user-defined SMIRKS rules•Layout•Neutralize•Get canonical tautomer using ChemAxon’s algorithms•Get biggest organic fragmenthttp://cv.beta.rsc-us.org/
More future workImprove chair tidyingDo not disrupt/flip/invert or move around theaglyconeFused ringsRun over all of ChemSpider