Oops and Downs of Resolving InChIs For the Chemistry Community
Upcoming SlideShare
Loading in...5
×
 

Oops and Downs of Resolving InChIs For the Chemistry Community

on

  • 3,254 views

The InChI resolver was rolled out to the community in March 2009 with the purpose of providing a centralized resource for chemists to resolve InChIs (International Chemical Identifiers). This ...

The InChI resolver was rolled out to the community in March 2009 with the purpose of providing a centralized resource for chemists to resolve InChIs (International Chemical Identifiers). This presentation will provide an overview of the development of the underlying technologies associated with the InChI resolver, and how the resolver is being used, integrated and enhanced to provide additional value to the chemistry community. We will discuss present limitations to application of the resolver for providing access to databases and chemistry information distributed across the internet and define our vision for enhancing interconnectivity across Open databases using the InChI resolver as the glue.

Statistics

Views

Total Views
3,254
Views on SlideShare
3,253
Embed Views
1

Actions

Likes
2
Downloads
17
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Oops and Downs of Resolving InChIs For the Chemistry Community Oops and Downs of Resolving InChIs For the Chemistry Community Presentation Transcript

    • Oops and downs of resolving InChIs for the chemistry community
    • The InChI Has Arrived
      • My opinions :
      • The InChI is a crucial part of the future of structure-based relationships on the web
      • The semantic web of chemistry will sit on the shoulders of InChI until there is something better
      • InChIs and publishers are already in relationship – publishers who have not adopted will follow
    • PPP – Perfection vs Productive vs Prolific
      • The InChI is not perfect
      • There are limitations but they are acknowledged and in discussion
      • The InChI is very “productive”
      • InChIs are showing up in databases, manuscripts, spreadsheets, on publications, in software
    • A Lot of Variability in InChIs
      • Source: Unofficial InChI FAQ page
    • InChIStrings Hash to InChIKeys
    • HVYWMOMLDIMFJA-DPAQBDIFSA-N
    • The InChI Resolver
    • Inchis.chemspider.com
    • Resolve an InChI or InChIKey
    • Resolved
    • Connection Only Resolving
    • InChIs and Big Databases
      • There appears to be a bigger is better mentality with online databases
      • InChI has shown a lot of “overlap” in the ChemSpider database
      • Distinction : a unique chemical entity versus what it’s meant to be
      • Some simple examples …
    • Spot The Difference
    • Standard InChIKeys
    • Spot The Difference
    • 55 Hits in 0.08 Seconds
    • Large Databases Contain Junk
      • InChI Resolvers will get us back to results but it’s a look up..
      • There is an enormous need for curation and linking resolved structures to “correct” structures – a manual task
    • Generate-It
    • Draw and generate
    • Generate
    • All Flavors
    • Historical and Future InChIs
      • The Standard InChI removed variability
      • There will be new variants in the future
      • There are already millions of historical InChIs “out there”
      • Resolvers should accommodate historical and future InChIs
    • In Our Resolver…
    • On to ChemSpider…
    • NEW Patents and Pubmed on ChemSpider
    • InChIs to Patents and Pubmed Articles
    • But there will be multiple resolvers…
      • Each publisher, database, scientist can choose not to publish their structures into a centralized database
      • There are many large online databases. There is no need to merge/mirror them – each can be a resolver
      • They need to be federated
    • Many ways to address resolving
      • Our approach is simple – lookup. We look up the structure. SIMPLE.
    • NCI/CADD resolver: 69 million structures
    • Differences
      • The NCI and ChemSpider Resolvers are “different”
        • Different databases behind the resolver – Feedback from NCI: “Preliminary results indicate that inchis.chemspider.com can resolve approx. 28% of our structures.”
        • Our approaches for resolving differ
        • Some features are different
    • The InChI Resolver Protocol
      • There will not be only one InChI Resolver – there will be many
        • Publishers
        • Commercial Databases
        • Free services and resources : PubChem, ChemSpider, NCI Database, ChEBI
      • Resolvers will not be mirrors of each other
        • There is no need to mirror when a protocol is in place
    • InChI Resolver Protocol
      • InChI resolving needs to be federated
      • A common protocol can connect resolvers so that a user gets a complete results set
      • Individual resolvers can have different capabilities but an agreed common protocol for resolving InChIs
    • Discuss with us on Google Groups
      • Draft protocol for ACS Spring 2010 from
        • RSC ChemSpider
        • NCI/CADD
        • PubChem
        • Symyx
      • Proof of concept hopefully by end of this year for initial feedback (NCI and ChemSpider
      • Join us at http://tinyurl.com/r7q9zc http://groups.google.com/group/inchiresolverprotocol
    • InChI trust
      • The founder members of the Trust: Elsevier, Thompson Reuters, Wiley, Nature Publishing Group, Royal Society of Chemistry, Symyx, FIZ-Chemie, Taylor & Francis and OpenEye
    • In InChIs We Trust
      • It was said….
        • “ There is a finite, but very small probability of finding two structures with the same InChIKey.”
        • The first collision was announced on Sunday by Jonathan Goodman
    • Spongistatin
    • Probabilities are what they are…
      • “ The molecule for which a collision has been reported … gives rise to 2 26 = 67,108,864 possible stereoisomers”
      • The probability of a clash is low but finite…and it happened.
      • OR…there may be a bug…work underway
    • The Future
      • InChI is here
      • InChIKeys are proliferating
      • The need for lookup is inevitable – the need for federated resolvers is obvious
      • Intention to provide draft resolver protocol by end of year
      • ACS Spring – unveil proof of concept
    • Acknowledgments
      • The InChI “Team” – leadership team, developers, advisors, funders and the community providing feedback
      • Royal Society of Chemistry
    • Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog