Oops and downs of resolving InChIs for the chemistry community
The InChI Has Arrived <ul><li>My  opinions : </li></ul><ul><li>The InChI is a crucial part of the future of structure-base...
PPP – Perfection vs Productive vs Prolific <ul><li>The InChI is  not  perfect </li></ul><ul><li>There are limitations but ...
A Lot of Variability in InChIs <ul><li>Source: Unofficial InChI FAQ page </li></ul>
InChIStrings Hash to InChIKeys
HVYWMOMLDIMFJA-DPAQBDIFSA-N
The InChI Resolver
Inchis.chemspider.com
Resolve an InChI or InChIKey
Resolved
Connection Only Resolving
InChIs and Big Databases <ul><li>There appears to be a bigger is better mentality with online databases </li></ul><ul><li>...
Spot The Difference
Standard InChIKeys
Spot The Difference
55 Hits in 0.08 Seconds
Large Databases Contain Junk <ul><li>InChI Resolvers will get us back to results but it’s a look up.. </li></ul><ul><li>Th...
Generate-It
Draw and generate
Generate
All Flavors
Historical and Future InChIs  <ul><li>The Standard InChI removed variability </li></ul><ul><li>There will be new variants ...
In Our Resolver…
On to ChemSpider…
NEW  Patents  and  Pubmed  on ChemSpider
InChIs to Patents and Pubmed Articles
But there will be multiple resolvers… <ul><li>Each publisher, database, scientist can choose not to publish their structur...
Many ways to address resolving <ul><li>Our approach is simple – lookup. We look up the structure. SIMPLE.  </li></ul>
NCI/CADD resolver: 69 million structures
Differences <ul><li>The NCI and ChemSpider Resolvers are “different” </li></ul><ul><ul><li>Different databases behind the ...
The InChI Resolver Protocol <ul><li>There will not be only one InChI Resolver – there will be many </li></ul><ul><ul><li>P...
InChI Resolver Protocol <ul><li>InChI resolving needs to be federated </li></ul><ul><li>A common protocol can connect reso...
Discuss with us on Google Groups <ul><li>Draft protocol for ACS Spring 2010 from </li></ul><ul><ul><li>RSC ChemSpider </li...
InChI trust <ul><li>The founder members of the Trust: Elsevier, Thompson Reuters, Wiley, Nature Publishing Group, Royal So...
In InChIs We Trust <ul><li>It was said….  </li></ul><ul><ul><li>“ There is a finite, but very small probability of finding...
Spongistatin
Probabilities are what they are… <ul><li>“ The molecule for which a collision has been reported … gives rise to 2 26  = 67...
The Future <ul><li>InChI is here </li></ul><ul><li>InChIKeys are proliferating </li></ul><ul><li>The need for lookup is in...
Acknowledgments <ul><li>The InChI “Team” – leadership team, developers, advisors, funders and the community providing feed...
Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog
Upcoming SlideShare
Loading in...5
×

Oops and Downs of Resolving InChIs For the Chemistry Community

1,948

Published on

The InChI resolver was rolled out to the community in March 2009 with the purpose of providing a centralized resource for chemists to resolve InChIs (International Chemical Identifiers). This presentation will provide an overview of the development of the underlying technologies associated with the InChI resolver, and how the resolver is being used, integrated and enhanced to provide additional value to the chemistry community. We will discuss present limitations to application of the resolver for providing access to databases and chemistry information distributed across the internet and define our vision for enhancing interconnectivity across Open databases using the InChI resolver as the glue.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,948
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Oops and Downs of Resolving InChIs For the Chemistry Community

  1. 1. Oops and downs of resolving InChIs for the chemistry community
  2. 2. The InChI Has Arrived <ul><li>My opinions : </li></ul><ul><li>The InChI is a crucial part of the future of structure-based relationships on the web </li></ul><ul><li>The semantic web of chemistry will sit on the shoulders of InChI until there is something better </li></ul><ul><li>InChIs and publishers are already in relationship – publishers who have not adopted will follow </li></ul>
  3. 3. PPP – Perfection vs Productive vs Prolific <ul><li>The InChI is not perfect </li></ul><ul><li>There are limitations but they are acknowledged and in discussion </li></ul><ul><li>The InChI is very “productive” </li></ul><ul><li>InChIs are showing up in databases, manuscripts, spreadsheets, on publications, in software </li></ul>
  4. 4. A Lot of Variability in InChIs <ul><li>Source: Unofficial InChI FAQ page </li></ul>
  5. 5. InChIStrings Hash to InChIKeys
  6. 6. HVYWMOMLDIMFJA-DPAQBDIFSA-N
  7. 7. The InChI Resolver
  8. 8. Inchis.chemspider.com
  9. 9. Resolve an InChI or InChIKey
  10. 10. Resolved
  11. 11. Connection Only Resolving
  12. 12. InChIs and Big Databases <ul><li>There appears to be a bigger is better mentality with online databases </li></ul><ul><li>InChI has shown a lot of “overlap” in the ChemSpider database </li></ul><ul><li>Distinction : a unique chemical entity versus what it’s meant to be </li></ul><ul><li>Some simple examples … </li></ul>
  13. 13. Spot The Difference
  14. 14. Standard InChIKeys
  15. 15. Spot The Difference
  16. 16. 55 Hits in 0.08 Seconds
  17. 17. Large Databases Contain Junk <ul><li>InChI Resolvers will get us back to results but it’s a look up.. </li></ul><ul><li>There is an enormous need for curation and linking resolved structures to “correct” structures – a manual task </li></ul>
  18. 18. Generate-It
  19. 19. Draw and generate
  20. 20. Generate
  21. 21. All Flavors
  22. 22. Historical and Future InChIs <ul><li>The Standard InChI removed variability </li></ul><ul><li>There will be new variants in the future </li></ul><ul><li>There are already millions of historical InChIs “out there” </li></ul><ul><li>Resolvers should accommodate historical and future InChIs </li></ul>
  23. 23. In Our Resolver…
  24. 24. On to ChemSpider…
  25. 25. NEW Patents and Pubmed on ChemSpider
  26. 26. InChIs to Patents and Pubmed Articles
  27. 27. But there will be multiple resolvers… <ul><li>Each publisher, database, scientist can choose not to publish their structures into a centralized database </li></ul><ul><li>There are many large online databases. There is no need to merge/mirror them – each can be a resolver </li></ul><ul><li>They need to be federated </li></ul>
  28. 28. Many ways to address resolving <ul><li>Our approach is simple – lookup. We look up the structure. SIMPLE. </li></ul>
  29. 29. NCI/CADD resolver: 69 million structures
  30. 30. Differences <ul><li>The NCI and ChemSpider Resolvers are “different” </li></ul><ul><ul><li>Different databases behind the resolver – Feedback from NCI: “Preliminary results indicate that inchis.chemspider.com can resolve approx. 28% of our structures.” </li></ul></ul><ul><ul><li>Our approaches for resolving differ </li></ul></ul><ul><ul><li>Some features are different </li></ul></ul>
  31. 31. The InChI Resolver Protocol <ul><li>There will not be only one InChI Resolver – there will be many </li></ul><ul><ul><li>Publishers </li></ul></ul><ul><ul><li>Commercial Databases </li></ul></ul><ul><ul><li>Free services and resources : PubChem, ChemSpider, NCI Database, ChEBI </li></ul></ul><ul><li>Resolvers will not be mirrors of each other </li></ul><ul><ul><li>There is no need to mirror when a protocol is in place </li></ul></ul>
  32. 32. InChI Resolver Protocol <ul><li>InChI resolving needs to be federated </li></ul><ul><li>A common protocol can connect resolvers so that a user gets a complete results set </li></ul><ul><li>Individual resolvers can have different capabilities but an agreed common protocol for resolving InChIs </li></ul>
  33. 33. Discuss with us on Google Groups <ul><li>Draft protocol for ACS Spring 2010 from </li></ul><ul><ul><li>RSC ChemSpider </li></ul></ul><ul><ul><li>NCI/CADD </li></ul></ul><ul><ul><li>PubChem </li></ul></ul><ul><ul><li>Symyx </li></ul></ul><ul><li>Proof of concept hopefully by end of this year for initial feedback (NCI and ChemSpider </li></ul><ul><li>Join us at http://tinyurl.com/r7q9zc http://groups.google.com/group/inchiresolverprotocol </li></ul>
  34. 34. InChI trust <ul><li>The founder members of the Trust: Elsevier, Thompson Reuters, Wiley, Nature Publishing Group, Royal Society of Chemistry, Symyx, FIZ-Chemie, Taylor & Francis and OpenEye </li></ul>
  35. 35. In InChIs We Trust <ul><li>It was said…. </li></ul><ul><ul><li>“ There is a finite, but very small probability of finding two structures with the same InChIKey.” </li></ul></ul><ul><ul><li>The first collision was announced on Sunday by Jonathan Goodman </li></ul></ul>
  36. 36. Spongistatin
  37. 37. Probabilities are what they are… <ul><li>“ The molecule for which a collision has been reported … gives rise to 2 26 = 67,108,864 possible stereoisomers” </li></ul><ul><li>The probability of a clash is low but finite…and it happened. </li></ul><ul><li>OR…there may be a bug…work underway </li></ul>
  38. 38. The Future <ul><li>InChI is here </li></ul><ul><li>InChIKeys are proliferating </li></ul><ul><li>The need for lookup is inevitable – the need for federated resolvers is obvious </li></ul><ul><li>Intention to provide draft resolver protocol by end of year </li></ul><ul><li>ACS Spring – unveil proof of concept </li></ul>
  39. 39. Acknowledgments <ul><li>The InChI “Team” – leadership team, developers, advisors, funders and the community providing feedback </li></ul><ul><li>Royal Society of Chemistry </li></ul>
  40. 40. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×