ChemSpider as a chemical term resolver


Published on

In recent years, in parallel with the general broad trend of information proliferation, many tens of public chemical databases have been created and made available using internet technologies. In many cases fluent data exchange has occurred between these various databases as they source information from one another. While this has the advantages of linking together multiple data sources the results also include the proliferation of errors across the various databases. The lack of a public authority to resolve such errors significantly affects the quality of freely accessible chemical information. While ChemSpider has previously allowed a crowdsourcing approach to curation efforts have now migrated to addressing this problem using a "federated resolver" approach. This presentation will report on our work in this area.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

ChemSpider as a chemical term resolver

  1. 1. ChemSpider as a Chemical Term Resolver Antony Williams and Valery Tkachenko, ACS San Diego March 2012
  2. 2. The Web of Chemistry – VERY BIG!
  3. 3. Online Databases are “Linking”
  4. 4. It is so difficult to navigate… IP? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Competitors? Working On Connections Now? to disease? Expressed in right cell type?
  5. 5. Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver services to support drug discovery programs in pharma and public domain 22 partners, 8 pharmaceutical companies, 3 biotechs 36 months project Guiding principle is open access, open usage, open source - Key to standards adoption -
  6. 6. What is the Structure of Vitamin K?
  7. 7. MeSH A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified:  VITAMIN K 1 (phytomenadione) derived from plants,  VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins,  VITAMIN K 3 (menadione).
  8. 8. What is the Structure of Vitamin K1?
  9. 9. Create an Online “Resolver” as apath to chemistry Search all forms of structure IDs Systematic name(s) Trivial Name(s) SMILES InChI Strings InChIKeys Database IDs Registry Number
  10. 10. ChemSpider
  11. 11. Available Information… Linked to vendors, safety data, toxicity, metabolism
  12. 12. Available Information….
  13. 13. Vitamin K1 Names
  14. 14. Vitamin K1 on ChemSpider CORRECT
  15. 15. Resolving Names for QUALITY Searching chemical identifiers should resolve to the correct chemical as much as possible
  16. 16. Validated Name-Structure Dictionaries Chemical name dictionaries are used for:  Text-mining (publications, patents)  Used to index PubMed and link to Google Patents  Linking to other databases – think Biology!  When structures are not available drug names link  Searching the web  Names link to structures link to InChIs
  17. 17. I want to know about “Vincristine”
  18. 18. Vincristine: Identifiers
  19. 19. Vincristine: PatentsLinked by Name
  20. 20. Many Names, One Structure
  21. 21. Top 200 Drugs on Wikipedia
  22. 22. The Project Challenge PART ONE Agree on the set of chemical names to work with Independently create an SDF file in each “lab” Compare differences and agree on final structures Issue “Gold Standard” SDF file to team
  23. 23. RSC Process
  24. 24. Relative accuracy of groups againstfinal master list
  25. 25. The Project Challenge PART TWO Use Gold Standard SDF File to investigate data quality on these compounds in Internet Databases Two checks  Search chemical name – does it return the correct compound. If not correct, how is it different?  Search “structure” – SMILES, Molfile, InChIString or InChIKey
  26. 26. “The First 10”
  27. 27. Performance on 150 Drug Names
  28. 28. NPC Browser Set
  29. 29. Standardize Use the SRS as a guidance document for standardization Adjust as necessary to our needs
  30. 30. Nitro groups
  31. 31. Salt and Ionic Bonds
  32. 32. One dictionary look up is never enough… ChemSpider does not contain all chemistry We are not the only ones curating data New chemistry expands daily and goes online
  33. 33. One dictionary look up is never enough… Federation is key….  Check ChemSpider first, if not found then  Check PubChem  Check NCI resolver  Check ChEBI  Check ….the “network” of open interfaces Each resolver will have its own “quantitative confidence”.
  34. 34. Chemical Identifier Resolver (CIR) Converts a given structure identifier into another representation or structure identifier. Resolve names, identifiers etc
  35. 35. What can become a resolver?
  36. 36. We are building…. A central federated resolver utilizing available services Dictionary lookups, systematic name conversions (multiple tools – ACD/Labs, Lexichem, OPSIN) “Consensus” decisions and guidance BUT Chemicals have timelines!!!
  38. 38. Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: