Your SlideShare is downloading. ×
0
ChemSpider as a Chemical           Term Resolver    Antony Williams, Valery Tkachenko,            Sean Ekins and Andy Fant...
The Web of Chemistry – VERY BIG!
Online Databases are “Linking”
It is so difficult to navigate…                                                        IP?                                ...
Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver se...
What is the Structure of Vitamin K?
MeSH A lipid cofactor that is required for normal blood  clotting. Several forms of vitamin K have been identified:   V...
What is the Structure of Vitamin K1?
Create an Online “Resolver” as apath to chemistry Search all forms of structure IDs   Systematic name(s)   Trivial Name...
ChemSpider
Available Information… Linked to vendors, safety data, toxicity, metabolism
Available Information….
Vitamin K1 Names
Vitamin K1 on ChemSpider CORRECT
Resolving Names for QUALITY Searching chemical identifiers should resolve to  the correct chemical as much as possible
Validated Name-Structure Dictionaries Chemical name dictionaries are used for:     Text-mining (publications, patents)  ...
I want to know about “Vincristine”
Vincristine: Identifiers
Vincristine: PatentsLinked by Name
Many Names, One Structure
Top 200 Drugs on Wikipediahttp://en.wikipedia.org/wiki/List_of_bestselling_drugs
The Project Challenge PART ONE Agree on the set of chemical names to work with Independently create an SDF file in each ...
RSC Process
Relative accuracy of groups againstfinal master list
The Project Challenge PART TWO Use Gold Standard SDF File to investigate data  quality on these compounds in Internet Dat...
“The First 10”
Performance on 150 Drug Names
NPC Browser Set
Standardize Use the SRS as a guidance document for  standardization Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
One dictionary look up is never enough… ChemSpider does not contain all chemistry We are not the only ones curating data...
One dictionary look up is never enough… Federation is key….     Check ChemSpider first, if not found then     Check Pub...
Chemical Identifier Resolver (CIR)                                    Converts a given                                    ...
What can become a resolver?
We are building…. A central federated resolver utilizing available  services Dictionary lookups, systematic name convers...
ORIGINAL   FINAL
Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/Anto...
Chem spider as a chemical term resolver
Chem spider as a chemical term resolver
Chem spider as a chemical term resolver
Chem spider as a chemical term resolver
Upcoming SlideShare
Loading in...5
×

Chem spider as a chemical term resolver

2,610

Published on

In recent years, in parallel with the general broad trend of information proliferation, many tens of public chemical databases have been created and made available using internet technologies. In many cases fluent data exchange has occurred between these various databases as they source information from one another. While this has the advantages of linking together multiple data sources the results also include the proliferation of errors across the various databases. The lack of a public authority to resolve such errors significantly affects the quality of freely accessible chemical information. While ChemSpider has previously allowed a crowdsourcing approach to curation efforts have now migrated to addressing this problem using a "federated resolver" approach. This presentation will report on our work in this area.

Published in: Technology, Education
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
2,610
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Chem spider as a chemical term resolver"

  1. 1. ChemSpider as a Chemical Term Resolver Antony Williams, Valery Tkachenko, Sean Ekins and Andy Fant ACS San Diego March 2012
  2. 2. The Web of Chemistry – VERY BIG!
  3. 3. Online Databases are “Linking”
  4. 4. It is so difficult to navigate… IP? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Competitors? Working On Connections Now? to disease? Expressed in right cell type?
  5. 5. Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver services to support drug discovery programs in pharma and public domain 22 partners, 8 pharmaceutical companies, 3 biotechs 36 months project Guiding principle is open access, open usage, open source - Key to standards adoption -
  6. 6. What is the Structure of Vitamin K?
  7. 7. MeSH A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified:  VITAMIN K 1 (phytomenadione) derived from plants,  VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins,  VITAMIN K 3 (menadione).
  8. 8. What is the Structure of Vitamin K1?
  9. 9. Create an Online “Resolver” as apath to chemistry Search all forms of structure IDs Systematic name(s) Trivial Name(s) SMILES InChI Strings InChIKeys Database IDs Registry Number
  10. 10. ChemSpider
  11. 11. Available Information… Linked to vendors, safety data, toxicity, metabolism
  12. 12. Available Information….
  13. 13. Vitamin K1 Names
  14. 14. Vitamin K1 on ChemSpider CORRECT
  15. 15. Resolving Names for QUALITY Searching chemical identifiers should resolve to the correct chemical as much as possible
  16. 16. Validated Name-Structure Dictionaries Chemical name dictionaries are used for:  Text-mining (publications, patents)  Used to index PubMed and link to Google Patents  Linking to other databases – think Biology!  When structures are not available drug names link  Searching the web  Names link to structures link to InChIs
  17. 17. I want to know about “Vincristine”
  18. 18. Vincristine: Identifiers
  19. 19. Vincristine: PatentsLinked by Name
  20. 20. Many Names, One Structure
  21. 21. Top 200 Drugs on Wikipediahttp://en.wikipedia.org/wiki/List_of_bestselling_drugs
  22. 22. The Project Challenge PART ONE Agree on the set of chemical names to work with Independently create an SDF file in each “lab” Compare differences and agree on final structures Issue “Gold Standard” SDF file to team
  23. 23. RSC Process
  24. 24. Relative accuracy of groups againstfinal master list
  25. 25. The Project Challenge PART TWO Use Gold Standard SDF File to investigate data quality on these compounds in Internet Databases Two checks  Search chemical name – does it return the correct compound. If not correct, how is it different?  Search “structure” – SMILES, Molfile, InChIString or InChIKey
  26. 26. “The First 10”
  27. 27. Performance on 150 Drug Names
  28. 28. NPC Browser Set
  29. 29. Standardize Use the SRS as a guidance document for standardization Adjust as necessary to our needs
  30. 30. Nitro groups
  31. 31. Salt and Ionic Bonds
  32. 32. One dictionary look up is never enough… ChemSpider does not contain all chemistry We are not the only ones curating data New chemistry expands daily and goes online
  33. 33. One dictionary look up is never enough… Federation is key….  Check ChemSpider first, if not found then  Check PubChem  Check NCI resolver  Check ChEBI  Check ….the “network” of open interfaces Each resolver will have its own “quantitative confidence”.
  34. 34. Chemical Identifier Resolver (CIR) Converts a given structure identifier into another representation or structure identifier. Resolve names, identifiers etchttp://cactus.nci.nih.gov/chemical/structure
  35. 35. What can become a resolver?
  36. 36. We are building…. A central federated resolver utilizing available services Dictionary lookups, systematic name conversions (multiple tools – ACD/Labs, Lexichem, OPSIN) “Consensus” decisions and guidance BUT Chemicals have timelines!!!
  37. 37. ORIGINAL FINAL
  38. 38. Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×