SlideShare a Scribd company logo
Carbohydrate structure
representation and public
     chemistry databases
 Colin Batchelor, Ken Karapetyan, David Sharpe,
         Valery Tkachenko and Antony Williams
                                 batchelorc@rsc.org
                          ACS New Orleans April 2013
Overview
Public chemistry databases and registration
Why sugar rings are difficult
Consequences
Algorithms
Future directions
Some public chemistry databases
How registration works
Structures are accepted in some machine-readable
format and boiled down to some position-independent
canonical form.

Drop exact coordinates and retain only relative
coordinates, disregarding bond length.

Canonicalization based on depiction of bonds (wedges or
hashes) rather than 3D positions around atoms.
Why sugars are difficult
Why sugar rings are difficult
Consequences
Algorithm for hexagons
• Identify the perspective conformation (boat,
  chair, regular hexagon, and so on)
• Determine perspective stereo
• Assign wedge or hash to the bonds
  accordingly
• (tricky) Reconstruct the sugar ring so as to
  minimize disruption of the rest of the
  molecule
Hexagons in the plane
Assigning chair stereochemistry


Take the x-axis as either the line through the top
two ring atoms or bottom two ring atoms.
Substituents with Δy positive are up, Δy negative
are down.
Then remap chair to a regular hexagon (tricky).
Assigning Haworth
stereochemistry
This works for both hexagons and
pentagons.
Remove any hashes or wedges
within the ring.
Take the x-axis as a line through one of the ring C–O bonds.
Substituents with Δy positive are up, Δy negative are down.
The Haworth LLLLLL/RRRRRR hexagon is unappealing, but can be tidied
to a regular hexagon grid without too much disruption.
The same goes for the Haworth pentagon.
Future work: integrate with CVSP
Structure validation
•Warn on query atoms, pseudo atoms, polymers, etc.
•Nonsensical stereo
Allows users to put together their own standardization workflow using
modules provided:
•Apply default CVSP or user-defined SMIRKS rules
•Layout
•Neutralize
•Get canonical tautomer using ChemAxon’s algorithms
•Get biggest organic fragment

http://cv.beta.rsc-us.org/
More future work
Improve chair tidying
Do not disrupt/flip/invert or move around the
aglycone
Fused rings
Run over all of ChemSpider
Questions?

E-mail: batchelorc@rsc.org

More Related Content

Similar to 20130410 carbohydrates

Crystal System.pptx
Crystal System.pptxCrystal System.pptx
Crystal System.pptx
VikramNalawade2
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombeMatt Challacombe
 
Exam 2 review chapters 4 and 6 (1)
Exam 2 review chapters 4 and 6 (1)Exam 2 review chapters 4 and 6 (1)
Exam 2 review chapters 4 and 6 (1)Desire Nongol
 
Basic crystallography
Basic crystallographyBasic crystallography
Basic crystallographyMukhlis Adam
 
Homology Modeling.pptx
Homology Modeling.pptxHomology Modeling.pptx
Homology Modeling.pptx
AmnaAkram29
 
7926563mocskoff pack method k sampling.ppt
7926563mocskoff pack method k sampling.ppt7926563mocskoff pack method k sampling.ppt
7926563mocskoff pack method k sampling.ppt
GustavoGuilln4
 
A_I_Structure.pdf
A_I_Structure.pdfA_I_Structure.pdf
A_I_Structure.pdf
Sandip887931
 
Introduction to crystallography and x ray diffraction theory
Introduction to crystallography and x ray diffraction theoryIntroduction to crystallography and x ray diffraction theory
Introduction to crystallography and x ray diffraction theory
Raghd Muhi Al-Deen Jassim
 
Stereo chemistry and kinetic molecular theory
Stereo chemistry and kinetic molecular theoryStereo chemistry and kinetic molecular theory
Stereo chemistry and kinetic molecular theory
Alexis Wellington
 
NMR Random Coil Index & Protein Dynamics
NMR Random Coil Index & Protein Dynamics NMR Random Coil Index & Protein Dynamics
NMR Random Coil Index & Protein Dynamics
Mark Berjanskii
 
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice ComputationsUCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
University of California, San Diego
 
solid state.ppt
solid state.pptsolid state.ppt
solid state.ppt
GauravYadav64321
 
crystalstructure.ppt
crystalstructure.pptcrystalstructure.ppt
crystalstructure.ppt
GETU2
 
crystalstructure (1).ppt
crystalstructure (1).pptcrystalstructure (1).ppt
crystalstructure (1).ppt
Shinoj6
 
Crystalstructure-.ppt
Crystalstructure-.pptCrystalstructure-.ppt
Crystalstructure-.ppt
Dr.YNM
 
X ray diffraction. Materials characterization .pptx
X ray diffraction. Materials characterization .pptxX ray diffraction. Materials characterization .pptx
X ray diffraction. Materials characterization .pptx
BagraBay
 
Applied Biochemistry
Applied BiochemistryApplied Biochemistry
Applied Biochemistry
christanantony
 

Similar to 20130410 carbohydrates (20)

Crystal System.pptx
Crystal System.pptxCrystal System.pptx
Crystal System.pptx
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
 
Exam 2 review chapters 4 and 6 (1)
Exam 2 review chapters 4 and 6 (1)Exam 2 review chapters 4 and 6 (1)
Exam 2 review chapters 4 and 6 (1)
 
Basic crystallography
Basic crystallographyBasic crystallography
Basic crystallography
 
Stereochemistry lecture
Stereochemistry lectureStereochemistry lecture
Stereochemistry lecture
 
Homology Modeling.pptx
Homology Modeling.pptxHomology Modeling.pptx
Homology Modeling.pptx
 
7926563mocskoff pack method k sampling.ppt
7926563mocskoff pack method k sampling.ppt7926563mocskoff pack method k sampling.ppt
7926563mocskoff pack method k sampling.ppt
 
A_I_Structure.pdf
A_I_Structure.pdfA_I_Structure.pdf
A_I_Structure.pdf
 
Introduction to crystallography and x ray diffraction theory
Introduction to crystallography and x ray diffraction theoryIntroduction to crystallography and x ray diffraction theory
Introduction to crystallography and x ray diffraction theory
 
Stereo chemistry and kinetic molecular theory
Stereo chemistry and kinetic molecular theoryStereo chemistry and kinetic molecular theory
Stereo chemistry and kinetic molecular theory
 
NMR Random Coil Index & Protein Dynamics
NMR Random Coil Index & Protein Dynamics NMR Random Coil Index & Protein Dynamics
NMR Random Coil Index & Protein Dynamics
 
Lecture 02
Lecture 02Lecture 02
Lecture 02
 
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice ComputationsUCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
 
solid state.ppt
solid state.pptsolid state.ppt
solid state.ppt
 
crystalstructure.ppt
crystalstructure.pptcrystalstructure.ppt
crystalstructure.ppt
 
crystalstructure (1).ppt
crystalstructure (1).pptcrystalstructure (1).ppt
crystalstructure (1).ppt
 
Crystalstructure-.ppt
Crystalstructure-.pptCrystalstructure-.ppt
Crystalstructure-.ppt
 
Crystal structure
Crystal structureCrystal structure
Crystal structure
 
X ray diffraction. Materials characterization .pptx
X ray diffraction. Materials characterization .pptxX ray diffraction. Materials characterization .pptx
X ray diffraction. Materials characterization .pptx
 
Applied Biochemistry
Applied BiochemistryApplied Biochemistry
Applied Biochemistry
 

More from Royal Society of Chemistry

The Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovationThe Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovation
Royal Society of Chemistry
 
Utilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rscUtilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rsc
Royal Society of Chemistry
 
RSC Mobile
RSC Mobile RSC Mobile
Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
Royal Society of Chemistry
 
ChemSpider as a chemical term resolver
ChemSpider as a chemical term resolverChemSpider as a chemical term resolver
ChemSpider as a chemical term resolver
Royal Society of Chemistry
 
RSC membership presentation 2011
RSC membership presentation 2011RSC membership presentation 2011
RSC membership presentation 2011
Royal Society of Chemistry
 
Newcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU OnlineNewcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU Online
Royal Society of Chemistry
 
ChemNet Careers 2011-12
ChemNet Careers 2011-12ChemNet Careers 2011-12
ChemNet Careers 2011-12
Royal Society of Chemistry
 
Town hall speech
Town hall speechTown hall speech
Town hall speech
Royal Society of Chemistry
 
Linking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchLinking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchRoyal Society of Chemistry
 
Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityRoyal Society of Chemistry
 
Chem spider introduction spring 2011
Chem spider introduction spring 2011Chem spider introduction spring 2011
Chem spider introduction spring 2011
Royal Society of Chemistry
 

More from Royal Society of Chemistry (16)

The Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovationThe Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovation
 
Utilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rscUtilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rsc
 
RSC Mobile
RSC Mobile RSC Mobile
RSC Mobile
 
Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
 
ChemSpider as a chemical term resolver
ChemSpider as a chemical term resolverChemSpider as a chemical term resolver
ChemSpider as a chemical term resolver
 
ChemCareers India Specialist presentation
ChemCareers India Specialist presentation ChemCareers India Specialist presentation
ChemCareers India Specialist presentation
 
RSC membership presentation 2011
RSC membership presentation 2011RSC membership presentation 2011
RSC membership presentation 2011
 
Newcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU OnlineNewcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU Online
 
ChemNet Careers 2011-12
ChemNet Careers 2011-12ChemNet Careers 2011-12
ChemNet Careers 2011-12
 
Town hall speech
Town hall speechTown hall speech
Town hall speech
 
Chemistry Landscape - Town Hall Speech
Chemistry Landscape - Town Hall SpeechChemistry Landscape - Town Hall Speech
Chemistry Landscape - Town Hall Speech
 
All aboard the Semantic Bandwagon
All aboard the Semantic BandwagonAll aboard the Semantic Bandwagon
All aboard the Semantic Bandwagon
 
Linking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchLinking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish research
 
Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the community
 
Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201
 
Chem spider introduction spring 2011
Chem spider introduction spring 2011Chem spider introduction spring 2011
Chem spider introduction spring 2011
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

20130410 carbohydrates

  • 1. Carbohydrate structure representation and public chemistry databases Colin Batchelor, Ken Karapetyan, David Sharpe, Valery Tkachenko and Antony Williams batchelorc@rsc.org ACS New Orleans April 2013
  • 2. Overview Public chemistry databases and registration Why sugar rings are difficult Consequences Algorithms Future directions
  • 4. How registration works Structures are accepted in some machine-readable format and boiled down to some position-independent canonical form. Drop exact coordinates and retain only relative coordinates, disregarding bond length. Canonicalization based on depiction of bonds (wedges or hashes) rather than 3D positions around atoms.
  • 5. Why sugars are difficult
  • 6. Why sugar rings are difficult
  • 8. Algorithm for hexagons • Identify the perspective conformation (boat, chair, regular hexagon, and so on) • Determine perspective stereo • Assign wedge or hash to the bonds accordingly • (tricky) Reconstruct the sugar ring so as to minimize disruption of the rest of the molecule
  • 10. Assigning chair stereochemistry Take the x-axis as either the line through the top two ring atoms or bottom two ring atoms. Substituents with Δy positive are up, Δy negative are down. Then remap chair to a regular hexagon (tricky).
  • 11. Assigning Haworth stereochemistry This works for both hexagons and pentagons. Remove any hashes or wedges within the ring. Take the x-axis as a line through one of the ring C–O bonds. Substituents with Δy positive are up, Δy negative are down. The Haworth LLLLLL/RRRRRR hexagon is unappealing, but can be tidied to a regular hexagon grid without too much disruption. The same goes for the Haworth pentagon.
  • 12. Future work: integrate with CVSP Structure validation •Warn on query atoms, pseudo atoms, polymers, etc. •Nonsensical stereo Allows users to put together their own standardization workflow using modules provided: •Apply default CVSP or user-defined SMIRKS rules •Layout •Neutralize •Get canonical tautomer using ChemAxon’s algorithms •Get biggest organic fragment http://cv.beta.rsc-us.org/
  • 13. More future work Improve chair tidying Do not disrupt/flip/invert or move around the aglycone Fused rings Run over all of ChemSpider