Oops and Downs of Resolving InChIs For the Chemistry Community

2,308 views

Published on

The InChI resolver was rolled out to the community in March 2009 with the purpose of providing a centralized resource for chemists to resolve InChIs (International Chemical Identifiers). This presentation will provide an overview of the development of the underlying technologies associated with the InChI resolver, and how the resolver is being used, integrated and enhanced to provide additional value to the chemistry community. We will discuss present limitations to application of the resolver for providing access to databases and chemistry information distributed across the internet and define our vision for enhancing interconnectivity across Open databases using the InChI resolver as the glue.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,308
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Oops and Downs of Resolving InChIs For the Chemistry Community

  1. 1. Oops and downs of resolving InChIs for the chemistry community
  2. 2. The InChI Has Arrived <ul><li>My opinions : </li></ul><ul><li>The InChI is a crucial part of the future of structure-based relationships on the web </li></ul><ul><li>The semantic web of chemistry will sit on the shoulders of InChI until there is something better </li></ul><ul><li>InChIs and publishers are already in relationship – publishers who have not adopted will follow </li></ul>
  3. 3. PPP – Perfection vs Productive vs Prolific <ul><li>The InChI is not perfect </li></ul><ul><li>There are limitations but they are acknowledged and in discussion </li></ul><ul><li>The InChI is very “productive” </li></ul><ul><li>InChIs are showing up in databases, manuscripts, spreadsheets, on publications, in software </li></ul>
  4. 4. A Lot of Variability in InChIs <ul><li>Source: Unofficial InChI FAQ page </li></ul>
  5. 5. InChIStrings Hash to InChIKeys
  6. 6. HVYWMOMLDIMFJA-DPAQBDIFSA-N
  7. 7. The InChI Resolver
  8. 8. Inchis.chemspider.com
  9. 9. Resolve an InChI or InChIKey
  10. 10. Resolved
  11. 11. Connection Only Resolving
  12. 12. InChIs and Big Databases <ul><li>There appears to be a bigger is better mentality with online databases </li></ul><ul><li>InChI has shown a lot of “overlap” in the ChemSpider database </li></ul><ul><li>Distinction : a unique chemical entity versus what it’s meant to be </li></ul><ul><li>Some simple examples … </li></ul>
  13. 13. Spot The Difference
  14. 14. Standard InChIKeys
  15. 15. Spot The Difference
  16. 16. 55 Hits in 0.08 Seconds
  17. 17. Large Databases Contain Junk <ul><li>InChI Resolvers will get us back to results but it’s a look up.. </li></ul><ul><li>There is an enormous need for curation and linking resolved structures to “correct” structures – a manual task </li></ul>
  18. 18. Generate-It
  19. 19. Draw and generate
  20. 20. Generate
  21. 21. All Flavors
  22. 22. Historical and Future InChIs <ul><li>The Standard InChI removed variability </li></ul><ul><li>There will be new variants in the future </li></ul><ul><li>There are already millions of historical InChIs “out there” </li></ul><ul><li>Resolvers should accommodate historical and future InChIs </li></ul>
  23. 23. In Our Resolver…
  24. 24. On to ChemSpider…
  25. 25. NEW Patents and Pubmed on ChemSpider
  26. 26. InChIs to Patents and Pubmed Articles
  27. 27. But there will be multiple resolvers… <ul><li>Each publisher, database, scientist can choose not to publish their structures into a centralized database </li></ul><ul><li>There are many large online databases. There is no need to merge/mirror them – each can be a resolver </li></ul><ul><li>They need to be federated </li></ul>
  28. 28. Many ways to address resolving <ul><li>Our approach is simple – lookup. We look up the structure. SIMPLE. </li></ul>
  29. 29. NCI/CADD resolver: 69 million structures
  30. 30. Differences <ul><li>The NCI and ChemSpider Resolvers are “different” </li></ul><ul><ul><li>Different databases behind the resolver – Feedback from NCI: “Preliminary results indicate that inchis.chemspider.com can resolve approx. 28% of our structures.” </li></ul></ul><ul><ul><li>Our approaches for resolving differ </li></ul></ul><ul><ul><li>Some features are different </li></ul></ul>
  31. 31. The InChI Resolver Protocol <ul><li>There will not be only one InChI Resolver – there will be many </li></ul><ul><ul><li>Publishers </li></ul></ul><ul><ul><li>Commercial Databases </li></ul></ul><ul><ul><li>Free services and resources : PubChem, ChemSpider, NCI Database, ChEBI </li></ul></ul><ul><li>Resolvers will not be mirrors of each other </li></ul><ul><ul><li>There is no need to mirror when a protocol is in place </li></ul></ul>
  32. 32. InChI Resolver Protocol <ul><li>InChI resolving needs to be federated </li></ul><ul><li>A common protocol can connect resolvers so that a user gets a complete results set </li></ul><ul><li>Individual resolvers can have different capabilities but an agreed common protocol for resolving InChIs </li></ul>
  33. 33. Discuss with us on Google Groups <ul><li>Draft protocol for ACS Spring 2010 from </li></ul><ul><ul><li>RSC ChemSpider </li></ul></ul><ul><ul><li>NCI/CADD </li></ul></ul><ul><ul><li>PubChem </li></ul></ul><ul><ul><li>Symyx </li></ul></ul><ul><li>Proof of concept hopefully by end of this year for initial feedback (NCI and ChemSpider </li></ul><ul><li>Join us at http://tinyurl.com/r7q9zc http://groups.google.com/group/inchiresolverprotocol </li></ul>
  34. 34. InChI trust <ul><li>The founder members of the Trust: Elsevier, Thompson Reuters, Wiley, Nature Publishing Group, Royal Society of Chemistry, Symyx, FIZ-Chemie, Taylor & Francis and OpenEye </li></ul>
  35. 35. In InChIs We Trust <ul><li>It was said…. </li></ul><ul><ul><li>“ There is a finite, but very small probability of finding two structures with the same InChIKey.” </li></ul></ul><ul><ul><li>The first collision was announced on Sunday by Jonathan Goodman </li></ul></ul>
  36. 36. Spongistatin
  37. 37. Probabilities are what they are… <ul><li>“ The molecule for which a collision has been reported … gives rise to 2 26 = 67,108,864 possible stereoisomers” </li></ul><ul><li>The probability of a clash is low but finite…and it happened. </li></ul><ul><li>OR…there may be a bug…work underway </li></ul>
  38. 38. The Future <ul><li>InChI is here </li></ul><ul><li>InChIKeys are proliferating </li></ul><ul><li>The need for lookup is inevitable – the need for federated resolvers is obvious </li></ul><ul><li>Intention to provide draft resolver protocol by end of year </li></ul><ul><li>ACS Spring – unveil proof of concept </li></ul>
  39. 39. Acknowledgments <ul><li>The InChI “Team” – leadership team, developers, advisors, funders and the community providing feedback </li></ul><ul><li>Royal Society of Chemistry </li></ul>
  40. 40. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog

×