ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation
Upcoming SlideShare
Loading in...5
×
 

ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

on

  • 2,246 views

This is a short presentation given to chemistry students at Drexel University as a remote presentation. This was for the class of Jean-Claude Bradley.

This is a short presentation given to chemistry students at Drexel University as a remote presentation. This was for the class of Jean-Claude Bradley.

Statistics

Views

Total Views
2,246
Views on SlideShare
613
Embed Views
1,633

Actions

Likes
1
Downloads
4
Comments
0

4 Embeds 1,633

http://www.chemspider.com 1622
http://translate.googleusercontent.com 8
https://www.chemspider.com 2
http://inchi.chemspider.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation Presentation Transcript

  • ChemSpider and Traveling the Internet via Chemical Structures Antony Williams Drexel University, November 2012
  • Compounds and Identifiers
  • Chemistry on the Internet Where do you source chemistry information? What can you trust online? How can you recognize potential issues? Cross-referencing and curating data
  • Molfiles (http://en.wikipedia.org/wiki/Chemical_table_file)
  • Molfiles 10 9 0 0 1 0 0 0 0 0 1 V2000 31.2937 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 26.6526 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 31.2937 -7.7066 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 30.1161 -9.6877 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 25.5096 -9.6877 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 28.9731 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 27.8163 -9.7016 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 26.6664 -7.7066 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 32.4367 -9.6877 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 30.1161 -11.0177 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3 1 2 0 0 0 0 4 1 1 0 0 0 0 9 1 1 0 0 0 0 7 2 1 0 0 0 0 5 2 2 0 0 0 0 8 2 1 0 0 0 0 6 4 1 0 0 0 0 4 10 1 6 0 0 0 7 6 1 0 0 0 0 M END
  • Molfiles Molfiles are the primary exchange format between structure drawing packages Can be different between different drawing packages Most commonly carry X,Y coordinates for layout Can support polymers, organometallics, etc. Can carry 3D coordinates
  • SMILES (http://en.wikipedia.org/wiki/SMILES) SMILES is a common format Can support polymers, organometallics, etc. Does NOT carry X,Y or Z coordinates for layout so requires layout algorithms – can be problematic! Generally different between drawing packages
  • Stereo
  • Tautomers
  • SMILES ACD/Labs CC(C)CCC[C@@H](C)CCC[C@@H] (C)CCCC(C)=CCC2=C(C)C(=O)c1ccccc1C2=O OpenEye CC1=C(C(=O)c2ccccc2C1=O)C/C=C(C)/CCC[C @H](C)CCC[C@H](C)CCCC(C)C ChEMBL CC(C)CCC[C@@H](C)CCC[C@@H] (C)CCCC(=CCC1=C(C)C(=O)c2ccccc2C1=O)C
  • The InChI Identifier
  • InChI SINGLE code base managed by IUPAC – integrated into drawing packages. No variability as with SMILES InChI Strings can be reversed to structures – same problem as with SMILES – no layout Well adopted by the community (databases, publishers, blogs, Wikipedia) – good for searching the internet
  • The InChI Standard
  • Tautomers – “Mobile H Perception”
  • Double Bond Orientation
  • Stereo
  • Checking for Stereochemistry
  • Checking for StereochemistryUse your drawing package!
  • Checking for Stereochemistry
  • Checking for Stereochemistry
  • Checking for Stereochemistry
  • InChIKeysSearch the Web by Structure
  • InChIs
  • Databases and Standardization
  • Databases and Standardization
  • InChI No support for polymers, organometallics Many option settings can lead to variability and make integration across databases difficult – FixedH option especially problematic “Slight” chance of collisions of InChIKeys VERY USEFUL FOR INTEGRATING THE WEB
  • Vancomycin
  • VancomycinSearch Molecular Search Full Molecule SKELETON
  • Full Skeleton Search: 104 Hits
  • Full Molecule Search: 4 Hits
  • Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science
  • www.chemspider.com
  • How do we build it? We deal in Molfiles or SDF files – with coordinates Valence checking, charge imbalance We have our own “business logic” to standardize InChI to “aggregate tautomers” to one record We link out to external sites using their IDs
  • Searches: The INTERNETAll ChemSpider and Internet searches are “simply algorithms”but synonym searching is based on an assertion
  • Validated Names for Searching…
  • Validating structures Check for “full stereo” and use stereo descriptors especially for checking! Check for quality of associated data sources Check against reference literature when available – but it can be wrong Question EVERYTHING!
  • Contributing to The Quality of DataWhat is the Structure of Vitamin K?
  • Contributing to The Quality of Data What is the Structure of Vitamin K?A lipid cofactor that is required for normal bloodclotting. Several forms of vitamin K have beenidentified: VITAMIN K1 (phytomenadione)derived from plants, VITAMIN K2(menaquinone) from bacteria & syntheticnaphthoquinone provitamins, VITAMIN K3(menadione).
  • What is the Structure of Vitamin K1?
  • CAS’s Common Chemistry
  • Wikipedia
  • Wolfram Alpha
  • DailyMed
  • ALL Different, ALL “Domoic Acids”
  • Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorBlog: www.chemspider.com/blogPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams