Molfiles Molfiles are the primary exchange format between structure drawing packages Can be different between different drawing packages Most commonly carry X,Y coordinates for layout Can support polymers, organometallics, etc. Can carry 3D coordinates
SMILES (http://en.wikipedia.org/wiki/SMILES) SMILES is a common format Can support polymers, organometallics, etc. Does NOT carry X,Y or Z coordinates for layout so requires layout algorithms – can be problematic! Generally different between drawing packages
InChI SINGLE code base managed by IUPAC – integrated into drawing packages. No variability as with SMILES InChI Strings can be reversed to structures – same problem as with SMILES – no layout Well adopted by the community (databases, publishers, blogs, Wikipedia) – good for searching the internet
InChI No support for polymers, organometallics Many option settings can lead to variability and make integration across databases difficult – FixedH option especially problematic “Slight” chance of collisions of InChIKeys VERY USEFUL FOR INTEGRATING THE WEB
Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science
How do we build it? We deal in Molfiles or SDF files – with coordinates Valence checking, charge imbalance We have our own “business logic” to standardize InChI to “aggregate tautomers” to one record We link out to external sites using their IDs
Searches: The INTERNETAll ChemSpider and Internet searches are “simply algorithms”but synonym searching is based on an assertion
Validating structures Check for “full stereo” and use stereo descriptors especially for checking! Check for quality of associated data sources Check against reference literature when available – but it can be wrong Question EVERYTHING!
Contributing to The Quality of DataWhat is the Structure of Vitamin K?
Contributing to The Quality of Data What is the Structure of Vitamin K?A lipid cofactor that is required for normal bloodclotting. Several forms of vitamin K have beenidentified: VITAMIN K1 (phytomenadione)derived from plants, VITAMIN K2(menaquinone) from bacteria & syntheticnaphthoquinone provitamins, VITAMIN K3(menadione).