Your SlideShare is downloading. ×
Connecting Chemistry Across the Internet Using ChemSpider
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Connecting Chemistry Across the Internet Using ChemSpider

536

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
536
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Connecting Chemistry Across the Internet Using ChemSpider Antony J Williams and Valery Tkachenko SERMACS, November 15th 2012
  • 2. Chemistry Data and the Weeds
  • 3. Tell me about Roundup
  • 4. So what is Round Up?
  • 5. The World’s Encyclopedia
  • 6. Roundup
  • 7. Where do we Round Up data? Where can I find the molfile for Roundup? Papers/Patents about Roundup? What are the side effects of Roundup? Where can I order Roundup? What are the physicochemical properties? Metabolic pathways? Different synonyms of Roundup? Synthesis of Roundup? Side effects of Roundup? Etc….
  • 8. Where do I Round Up Data?
  • 9. In an increasing LinkedData map….
  • 10. But I want to aggregate data? So…
  • 11. ChemSpider Takes on the role of a structure centric hub:  Connecting, validating, qualifying data  Enhancing data with connections to services  Provides access to data and services for others to use (Thermo, Agilent, Bruker, Waters, ACD/Labs, Accelrys, etc.)  Uses available services to integrate, connect and enhance the offering
  • 12. Roundup on ChemSpider
  • 13. What will ChemSpider give us??
  • 14. What will ChemSpider give us??
  • 15. What will ChemSpider give us??
  • 16. What will ChemSpider give us??
  • 17. What will ChemSpider give us??
  • 18. What will ChemSpider give us??
  • 19. ChemSpider is Collapsing Data???
  • 20. What will ChemSpider give us??
  • 21. For Glyphosate itself
  • 22. How did we build it? We deal in Molfiles or SDF files – with coordinates Deposit anything that has an InChI – we support what InChI can handle, good and bad Standardization based on “InChI standardization” InChIs aggregate (certain) tautomers How much of ChemSpider is “on ChemSpider”?
  • 23. Connecting Chemistry across the web So much of what is seen on ChemSpider is retrieved in real time using services
  • 24. Connecting Chemistry across the web
  • 25. Online Predictions
  • 26. A Comment on Quality For >28 million chemical compounds there are some errors:  “Incorrect” structure representations  Mismatched name-structure relationships  Experimental properties (the values, the units)  Real vs. virtual compounds – text-mining and conversion  We have deprecated a LOT of data…
  • 27. Downsides of InChI Good for small molecules – but no polymers, issues with inorganics, organometallics, imperfect stereochemistry. ChemSpider is “small molecules” InChI used as the “deduplicator” – FIRST version of a compound into the database becomes THE structure to deduplicate against…
  • 28. Side Effects of InChI Usage
  • 29. SMILES by comparison…
  • 30. Side Effects of InChI Usage
  • 31. Standardization IssuesDepiction based on molfile
  • 32. Downsides of Overall Approach Meshing data together based on InChIs worked for simple molecules 2D layout errors inherited or limited by algorithm Complex molecules that are meant to be the same thing were NOT deduplicated. Compounds differing by one stereocenter, named the same, meant to be the same, are not the same
  • 33. So much data online is “erroneous”
  • 34. The confusion of name-structures
  • 35. Collapsing Data – Standardization
  • 36. What needs to happen? If we could validate  Catch errors in databases (and clean)  Proactively catch errors in publications/patents  Reduce junk in the ether – improve QUALITY! If we collectively standardized  Interlinking between databases should improve  CVSP – a separate presentation….stick around
  • 37. Crowdsourcing ChemSpider ChemSpider is crowdsourced Community deposition, annotation and curation Anyone can “Leave Feedback” Registered users can add data
  • 38. ChemSpider and Global Chemistry Hub Internet Data Small organic molecules Commercial Software Undefined materials Pre-competitive Data Organometallics Open Science Nanomaterials Open Data Polymers Publishers Minerals Educators Particle bound Open Databases Links to Biologicals Chemical Vendors
  • 39. Delivering a Prediction Platform Experimental data will be used as the basis of model generation – a predictive platform…
  • 40. The Future of ChemSpider Continued focus on quality over quantity – but more data is good too! ChemSpider Reactions – work in progress and includes >300,000 reactions Plugging in a validation and standardization platform Delivering personal and institutional repository capabilities
  • 41. Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams

×