Your SlideShare is downloading. ×
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

2,018

Published on

This is a short presentation given to chemistry students at Drexel University as a remote presentation. This was for the class of Jean-Claude Bradley.

This is a short presentation given to chemistry students at Drexel University as a remote presentation. This was for the class of Jean-Claude Bradley.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,018
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ChemSpider and Traveling the Internet via Chemical Structures Antony Williams Drexel University, November 2012
  • 2. Compounds and Identifiers
  • 3. Chemistry on the Internet Where do you source chemistry information? What can you trust online? How can you recognize potential issues? Cross-referencing and curating data
  • 4. Molfiles (http://en.wikipedia.org/wiki/Chemical_table_file)
  • 5. Molfiles 10 9 0 0 1 0 0 0 0 0 1 V2000 31.2937 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 26.6526 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 31.2937 -7.7066 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 30.1161 -9.6877 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 25.5096 -9.6877 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 28.9731 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 27.8163 -9.7016 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 26.6664 -7.7066 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 32.4367 -9.6877 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 30.1161 -11.0177 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3 1 2 0 0 0 0 4 1 1 0 0 0 0 9 1 1 0 0 0 0 7 2 1 0 0 0 0 5 2 2 0 0 0 0 8 2 1 0 0 0 0 6 4 1 0 0 0 0 4 10 1 6 0 0 0 7 6 1 0 0 0 0 M END
  • 6. Molfiles Molfiles are the primary exchange format between structure drawing packages Can be different between different drawing packages Most commonly carry X,Y coordinates for layout Can support polymers, organometallics, etc. Can carry 3D coordinates
  • 7. SMILES (http://en.wikipedia.org/wiki/SMILES) SMILES is a common format Can support polymers, organometallics, etc. Does NOT carry X,Y or Z coordinates for layout so requires layout algorithms – can be problematic! Generally different between drawing packages
  • 8. Stereo
  • 9. Tautomers
  • 10. SMILES ACD/Labs CC(C)CCC[C@@H](C)CCC[C@@H] (C)CCCC(C)=CCC2=C(C)C(=O)c1ccccc1C2=O OpenEye CC1=C(C(=O)c2ccccc2C1=O)C/C=C(C)/CCC[C @H](C)CCC[C@H](C)CCCC(C)C ChEMBL CC(C)CCC[C@@H](C)CCC[C@@H] (C)CCCC(=CCC1=C(C)C(=O)c2ccccc2C1=O)C
  • 11. The InChI Identifier
  • 12. InChI SINGLE code base managed by IUPAC – integrated into drawing packages. No variability as with SMILES InChI Strings can be reversed to structures – same problem as with SMILES – no layout Well adopted by the community (databases, publishers, blogs, Wikipedia) – good for searching the internet
  • 13. The InChI Standard
  • 14. Tautomers – “Mobile H Perception”
  • 15. Double Bond Orientation
  • 16. Stereo
  • 17. Checking for Stereochemistry
  • 18. Checking for StereochemistryUse your drawing package!
  • 19. Checking for Stereochemistry
  • 20. Checking for Stereochemistry
  • 21. Checking for Stereochemistry
  • 22. InChIKeysSearch the Web by Structure
  • 23. InChIs
  • 24. Databases and Standardization
  • 25. Databases and Standardization
  • 26. InChI No support for polymers, organometallics Many option settings can lead to variability and make integration across databases difficult – FixedH option especially problematic “Slight” chance of collisions of InChIKeys VERY USEFUL FOR INTEGRATING THE WEB
  • 27. Vancomycin
  • 28. VancomycinSearch Molecular Search Full Molecule SKELETON
  • 29. Full Skeleton Search: 104 Hits
  • 30. Full Molecule Search: 4 Hits
  • 31. Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science
  • 32. www.chemspider.com
  • 33. How do we build it? We deal in Molfiles or SDF files – with coordinates Valence checking, charge imbalance We have our own “business logic” to standardize InChI to “aggregate tautomers” to one record We link out to external sites using their IDs
  • 34. Searches: The INTERNETAll ChemSpider and Internet searches are “simply algorithms”but synonym searching is based on an assertion
  • 35. Validated Names for Searching…
  • 36. Validating structures Check for “full stereo” and use stereo descriptors especially for checking! Check for quality of associated data sources Check against reference literature when available – but it can be wrong Question EVERYTHING!
  • 37. Contributing to The Quality of DataWhat is the Structure of Vitamin K?
  • 38. Contributing to The Quality of Data What is the Structure of Vitamin K?A lipid cofactor that is required for normal bloodclotting. Several forms of vitamin K have beenidentified: VITAMIN K1 (phytomenadione)derived from plants, VITAMIN K2(menaquinone) from bacteria & syntheticnaphthoquinone provitamins, VITAMIN K3(menadione).
  • 39. What is the Structure of Vitamin K1?
  • 40. CAS’s Common Chemistry
  • 41. Wikipedia
  • 42. Wolfram Alpha
  • 43. DailyMed
  • 44. ALL Different, ALL “Domoic Acids”
  • 45. Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorBlog: www.chemspider.com/blogPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams

×