Your SlideShare is downloading. ×
0
ChemSpider and Traveling the Internet            via Chemical Structures                         Antony Williams          ...
Compounds and Identifiers
Chemistry on the Internet   Where do you source chemistry information?   What can you trust online?   How can you recog...
Molfiles (http://en.wikipedia.org/wiki/Chemical_table_file)
Molfiles   10 9 0 0 1 0 0 0     0 0 1 V2000     31.2937 -9.0366    0.0000 C 0 0   0   0   0   0   0   0   0   0   0   0...
Molfiles Molfiles are the primary exchange format between  structure drawing packages Can be different between different...
SMILES (http://en.wikipedia.org/wiki/SMILES) SMILES is a common format Can support polymers,  organometallics, etc. Doe...
Stereo
Tautomers
SMILES ACD/Labs CC(C)CCC[C@@H](C)CCC[C@@H]  (C)CCCC(C)=CCC2=C(C)C(=O)c1ccccc1C2=O OpenEye CC1=C(C(=O)c2ccccc2C1=O)C/C=...
The InChI Identifier
InChI SINGLE code base managed by IUPAC –  integrated into drawing packages. No variability  as with SMILES InChI String...
The InChI Standard
Tautomers – “Mobile H Perception”
Double Bond Orientation
Stereo
Checking for Stereochemistry
Checking for StereochemistryUse your drawing package!
Checking for Stereochemistry
Checking for Stereochemistry
Checking for Stereochemistry
InChIKeysSearch the Web by Structure
InChIs
Databases and Standardization
Databases and Standardization
InChI No support for polymers, organometallics Many option settings can lead to variability and  make integration across...
Vancomycin
VancomycinSearch Molecular   Search Full Molecule  SKELETON
Full Skeleton Search: 104 Hits
Full Molecule Search: 4 Hits
Where is chemistry online?   Encyclopedic articles (Wikipedia)   Chemical vendor databases   Metabolic pathway database...
www.chemspider.com
How do we build it? We deal in Molfiles or SDF files – with coordinates Valence checking, charge imbalance We have our ...
Searches: The INTERNETAll ChemSpider and Internet searches are “simply algorithms”but synonym searching is based on an ass...
Validated Names for Searching…
Validating structures Check for “full stereo” and use stereo descriptors  especially for checking! Check for quality of ...
Contributing to The Quality of DataWhat is the Structure of Vitamin K?
Contributing to The Quality of Data What is the Structure of Vitamin K?A lipid cofactor that is required for normal bloodc...
What is the Structure of Vitamin K1?
CAS’s Common Chemistry
Wikipedia
Wolfram Alpha
DailyMed
ALL Different, ALL “Domoic Acids”
Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorBlog: www.chemspider.com/blogPersonal Blog: www.chemconnector.comSL...
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation
Upcoming SlideShare
Loading in...5
×

ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

2,050

Published on

This is a short presentation given to chemistry students at Drexel University as a remote presentation. This was for the class of Jean-Claude Bradley.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,050
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation"

  1. 1. ChemSpider and Traveling the Internet via Chemical Structures Antony Williams Drexel University, November 2012
  2. 2. Compounds and Identifiers
  3. 3. Chemistry on the Internet Where do you source chemistry information? What can you trust online? How can you recognize potential issues? Cross-referencing and curating data
  4. 4. Molfiles (http://en.wikipedia.org/wiki/Chemical_table_file)
  5. 5. Molfiles 10 9 0 0 1 0 0 0 0 0 1 V2000 31.2937 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 26.6526 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 31.2937 -7.7066 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 30.1161 -9.6877 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 25.5096 -9.6877 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 28.9731 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 27.8163 -9.7016 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 26.6664 -7.7066 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 32.4367 -9.6877 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 30.1161 -11.0177 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3 1 2 0 0 0 0 4 1 1 0 0 0 0 9 1 1 0 0 0 0 7 2 1 0 0 0 0 5 2 2 0 0 0 0 8 2 1 0 0 0 0 6 4 1 0 0 0 0 4 10 1 6 0 0 0 7 6 1 0 0 0 0 M END
  6. 6. Molfiles Molfiles are the primary exchange format between structure drawing packages Can be different between different drawing packages Most commonly carry X,Y coordinates for layout Can support polymers, organometallics, etc. Can carry 3D coordinates
  7. 7. SMILES (http://en.wikipedia.org/wiki/SMILES) SMILES is a common format Can support polymers, organometallics, etc. Does NOT carry X,Y or Z coordinates for layout so requires layout algorithms – can be problematic! Generally different between drawing packages
  8. 8. Stereo
  9. 9. Tautomers
  10. 10. SMILES ACD/Labs CC(C)CCC[C@@H](C)CCC[C@@H] (C)CCCC(C)=CCC2=C(C)C(=O)c1ccccc1C2=O OpenEye CC1=C(C(=O)c2ccccc2C1=O)C/C=C(C)/CCC[C @H](C)CCC[C@H](C)CCCC(C)C ChEMBL CC(C)CCC[C@@H](C)CCC[C@@H] (C)CCCC(=CCC1=C(C)C(=O)c2ccccc2C1=O)C
  11. 11. The InChI Identifier
  12. 12. InChI SINGLE code base managed by IUPAC – integrated into drawing packages. No variability as with SMILES InChI Strings can be reversed to structures – same problem as with SMILES – no layout Well adopted by the community (databases, publishers, blogs, Wikipedia) – good for searching the internet
  13. 13. The InChI Standard
  14. 14. Tautomers – “Mobile H Perception”
  15. 15. Double Bond Orientation
  16. 16. Stereo
  17. 17. Checking for Stereochemistry
  18. 18. Checking for StereochemistryUse your drawing package!
  19. 19. Checking for Stereochemistry
  20. 20. Checking for Stereochemistry
  21. 21. Checking for Stereochemistry
  22. 22. InChIKeysSearch the Web by Structure
  23. 23. InChIs
  24. 24. Databases and Standardization
  25. 25. Databases and Standardization
  26. 26. InChI No support for polymers, organometallics Many option settings can lead to variability and make integration across databases difficult – FixedH option especially problematic “Slight” chance of collisions of InChIKeys VERY USEFUL FOR INTEGRATING THE WEB
  27. 27. Vancomycin
  28. 28. VancomycinSearch Molecular Search Full Molecule SKELETON
  29. 29. Full Skeleton Search: 104 Hits
  30. 30. Full Molecule Search: 4 Hits
  31. 31. Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science
  32. 32. www.chemspider.com
  33. 33. How do we build it? We deal in Molfiles or SDF files – with coordinates Valence checking, charge imbalance We have our own “business logic” to standardize InChI to “aggregate tautomers” to one record We link out to external sites using their IDs
  34. 34. Searches: The INTERNETAll ChemSpider and Internet searches are “simply algorithms”but synonym searching is based on an assertion
  35. 35. Validated Names for Searching…
  36. 36. Validating structures Check for “full stereo” and use stereo descriptors especially for checking! Check for quality of associated data sources Check against reference literature when available – but it can be wrong Question EVERYTHING!
  37. 37. Contributing to The Quality of DataWhat is the Structure of Vitamin K?
  38. 38. Contributing to The Quality of Data What is the Structure of Vitamin K?A lipid cofactor that is required for normal bloodclotting. Several forms of vitamin K have beenidentified: VITAMIN K1 (phytomenadione)derived from plants, VITAMIN K2(menaquinone) from bacteria & syntheticnaphthoquinone provitamins, VITAMIN K3(menadione).
  39. 39. What is the Structure of Vitamin K1?
  40. 40. CAS’s Common Chemistry
  41. 41. Wikipedia
  42. 42. Wolfram Alpha
  43. 43. DailyMed
  44. 44. ALL Different, ALL “Domoic Acids”
  45. 45. Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorBlog: www.chemspider.com/blogPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×