Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

  • 811 views
Uploaded on

The task of finding chemical information online can be daunting since even the most rudimentary query on Google can provide tens to hundreds of thousands of links to peruse. While there has been an …

The task of finding chemical information online can be daunting since even the most rudimentary query on Google can provide tens to hundreds of thousands of links to peruse. While there has been an increase in the number of online chemical structure databases there has not been a central online resource allowing integrated chemical structure-searching of chemistry databases, chemistry articles, patents and web pages, such as blogs and wikis, until now. ChemSpider provides a significant knowledge base and resource for chemists working in different domains. From the perspective of the InChI identifiers this project can be considered to be a success story since ChemSpider has used both for the development of the database and the provision of fast searching routines. ChemSpider has provided web services for both InChI generation and searching, leading to a proliferation of InChI in the web-based domain of chemistry. This talk will provide an update of ChemSpiders functionality.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
811
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
21
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Going a mile InChI by InChI Enabling online chemistry at ChemSpider Antony Williams
  • 2. Languages of Chemistry
  • 3. ChemSpider 2009
    • “ Building a Structure Centric Community for Chemists”
    • Hosting structures, spectra, images, documents, outlinks
    • Many web services for retrieval of data, conversion of files, generation of properties..
    • Now a platform for:
      • data deposition, curation and annotation – remove the junk
      • Supporting Open Notebook Science efforts
      • chemistry document mark-up with ChemMantis
      • the online ChemSpider Journal of Chemistry
  • 4. Statistics and Connections
    • >6000 unique users per day on average
    • >40,000 transactions per day
    • >21.4 million compounds and growing daily
    • Advocate of InChIs for searching and integration
  • 5. Search Cholesterol
  • 6. Search Cholesterol
  • 7. Search Cholesterol
  • 8. Search Cholesterol
  • 9. Search Cholesterol
  • 10. Search Cholesterol
  • 11. Searching
    • Structure searching based on
      • SMILES
      • InChIString
      • InChIKey
      • StdInChI
      • StdInChIKey
      • molfile uploads
      • structures drawn in applet
    • Search across Google (to string limit for InChIString)
      • Skeleton search
      • Full structure search
  • 12. InChIKey Searches Work
  • 13. Depositions
    • Depositions from users – single structures and SDFs
    • Depositions from databases/vendors – SDF files
    • And then came InChIs…
      • InChIs and InChIKeys are available on Blogs for harvesting
      • Publishers are making their structures available as InChIs for harvesting
      • InChIs are NOT ideal for building a database…some lessons
      • We want to link to publications especially…
  • 14. Chemistry Papers
    • Cultivation of a rare Verrucosispora strain (sediment, Sea of Japan) gave three polyketides, atrop -abyssomicin C 35 , abyssomicin G 36 and abyssomicin H 37 . Atrop -abyssomicin C 35 has previously been reported as a synthetic compound, but ready conversion to abyssomicin D suggests that it was probably naturally produced. Atrop -abyssomicin C was an inhibitor of S. aureus N315 (MRSA) and 4-amino-4-deoxychorismate (ADC) synthase. The tenacibactins A–D 38 – 41 , hydroxamate siderophores isolated from culture of the filamentous bacterium Tenacibaculum sp. ( Chondrus ocellatus , Awajishima Island, Japan), all possessed iron-chelating activity with tenacibactins C 40 and D 41 being considerably more effective than tenacibactins A 38 and B 39 .
  • 15. Structures in Chemistry Papers
  • 16. Aesthetics vs Machine Readable
    • Beautiful chemical structures submitted by authors can be beasts for machines
  • 17. InChI Representation
  • 18. InChI’fication of Articles
    • InChIs from publishers – a lot of work for a publisher to provide exact structures for articles. Applause to RSC for Project Prospect and now Nature Chemistry
    • An enormous editorial task with a massive benefit to the community
    • If the structures were correct…imagine a centralized DOI:InChI database
  • 19. Cleaning Structures
  • 20. Converting InChIs to Structures Bacitracin A
    • InChI=1/C66H103N17O16S/......./t35 u ,36 u ,37 u ,40-,41+,42+,43-,44+,45-,46-,47+,48 u ,52-,53-,54-/m0/s1
    • InChI=1/C66H103N17O16S/.......)/t35 ? ,36 ? ,37 ? ,40-,41+,42+,43-,44+,45-,46-,47+,48 ? ,52-,53-,54-/m0/s1
  • 21.  
  • 22. Converting InChIs to Structures
    • What we want is a good layout, retention of stereochemistry labels and tautomers as drawn
  • 23. Auxinfo – Who Uses It? Who Converts It?
    • AuxInfo=1/1/N:52,24,90,40,41,50,100,60,51,23,61,68,66,67,17,16,83,65,64,15,81,30,28,84,92,37,62,99,74,71,13,45,11,39,49,22,59,63,9,88,79,31,35,95,98,75,70,42,76,25,5,47,19,56,77,86,78,26,34,93,7,6,53,18,55,44,85,2,48,12,91,10,80,33,14,38,96,73,69,94,43,58,21,1,27,32,3,4,89,87,82,29,36,97,8,72,54,20,57,46/E:(4,5)(13,14)(18,19)(85,86)(87,88)/it:im/rA:100cONOOCCCOCNCNCNCCCCCONCCCCCOCOCCONCCOCNCCCCNCCSCNCCCCCOCCONCCCCCCCCCCNCCONCCCCCCNCOCCNCOCOCNCCNCNOCCC/rB:;;;s3d4;;;d7;;s9;s10;d11;d9s12;;;;s15s16;s14;s18;d18;N19;s19;s22;s23;;s21;d25;s25;d26;s28;s26s30;s25;N31;s33;s34;d34;s35;N35;s37;s39;s39;;s42;d43;s42;s44s45;s44;s47;s47;s49;s49;s51;s38s42;d53;;s55;d55;P56;s56;s59;s59;;s62;d63;s63;d65;s64;s66d67;s7;s6p69;s5s70;d6;s6;;n73s74;d1s2s74;s75;s58;s78;N79;s79;d78;s81;s83;s84;s80;d86;p14s15s86;d77;s61;s77;s16s91;;s55;s62s93p94;s93;d93;s7n96;s9s98;s22;/rC:12.1656,-8.504,0;12.1656,-6.2884,0;16.5968,-3.1336,0;15.3445,-5.2769,0;16.5968,-4.5785,0;16.7654,-7.5166,0;20.0406,-7.5166,0;20.0406,-6.2402,0;22.208,-6.2161,0;23.2436,-5.4937,0;22.8582,-4.2895,0;21.6059,-4.2895,0;21.1725,-5.4937,0;19.294,-19.028,0;16.067,-19.6542,0;11.2264,-17.9202,0;13.1048,-19.6542,0;20.45,-18.426,0;21.5337,-19.028,0;20.45,-17.1255,0;21.5578,-21.1954,0;22.6174,-18.4019,0;22.6174,-17.1255,0;23.7252,-16.4512,0;19.2459,-23.9168,0;22.7137,-21.8698,0;19.2699,-25.2654,0;20.4018,-23.2425,0;23.8697,-21.1714,0;21.5819,-23.8927,0;22.7378,-23.1943,0;18.0899,-23.2665,0;23.9179,-23.8686,0;25.0497,-23.1702,0;26.2298,-23.8686,0;25.0497,-21.8457,0;27.3617,-23.1702,0;26.2298,-25.2172,0;27.3617,-21.8457,0;28.5417,-21.1714,0;26.2298,-21.1714,0;28.5417,-25.2172,0;29.8181,-25.6025,0;30.6128,-24.5188,0;28.5417,-23.8686,0;29.8181,-23.411,0;31.9614,-24.5188,0;32.6357,-25.6989,0;32.6357,-23.3629,0;31.9614,-22.2069,0;33.9603,-23.3629,0;34.6346,-24.5188,0;27.3617,-25.8675,0;27.3617,-27.2161,0;20.0165,-11.3457,0;18.9328,-11.9959,0;20.0165,-10.0452,0;18.9087,-13.3686,0;17.825,-11.3457,0;16.7172,-11.9959,0;17.825,-10.0693,0;23.5807,-11.7551,0;23.5807,-13.0315,0;24.7367,-13.6817,0;22.5211,-13.6817,0;22.5211,-14.9822,0;24.7367,-14.934,0;23.5807,-15.5842,0;18.9328,-8.1427,0;17.8009,-7.5166,0;17.8009,-5.2769,0;16.091,-6.3365,0;16.1633,-8.6003,0;14.0681,-7.3962,0;14.7906,-8.6003,0;12.8158,-7.3962,0;14.0681,-9.8044,0;17.5119,-13.3686,0;16.8135,-14.5728,0;17.5119,-15.7528,0;15.4408,-14.5728,0;17.0699,-12.6613,0;14.7665,-15.7528,0;13.3697,-15.7528,0;12.6713,-16.9569,0;16.8135,-16.9569,0;15.4408,-16.9569,0;17.5119,-18.137,0;14.7906,-10.9845,0;19.0291,-9.395,0;12.6954,-9.8044,0;11.2264,-11.1049,0;22.2321,-10.0452,0;21.1243,-11.9478,0;22.2321,-11.3457,0;21.1243,-9.4191,0;23.3399,-9.4191,0;21.1243,-8.1186,0;22.2321,-7.5166,0;23.7252,-19.028,0;
  • 24. Who Has Responsibility?
    • Who will take responsibility for drawing/enumerating the structures?
    • Where can software contribute?
    • What Quality is “good enough”?
    • We MUST reduce rework!!!
  • 25. A Lot of Variability in InChIs
    • Source: Unofficial InChI FAQ page
  • 26. InChIs for Taxol
  • 27. Taxol
    • DrugBank: RCINICONZNJXQF-CLDWUXIMDD
    • ChEBI: RCINICONZNJXQF-GXKQXQCDDN
    • Wikipedia: RCINICONZNJXQF-MZXODVADBJ
    • Which one is correct???
  • 28. InChIKeys for Taxol
    • DrugBank: RCINICONZNJXQF-CLDWUXIMDD
    • ChEBI: RCINICONZNJXQF-GXKQXQCDDN
    • Wikipedia: RCINICONZNJXQF-MZXODVADBJ
    • ChEBI and Wikipedia are the SAME structure
    • Drugbank is a DIFFERENT structure at ONE stereocenter
  • 29. Does one stereocenter matter?
    • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • 30. InChIStrings Hash to InChIKeys
  • 31. The InChI Resolver
  • 32. HVYWMOMLDIMFJA-DPAQBDIFSA-N
  • 33.  
  • 34. Resolve-It
  • 35. Resolve-It
  • 36. Pretty-It
  • 37. JMol-It, Download-It and Zoom-It
  • 38. Kind-of-Resolve-It
  • 39. Generate-It
  • 40. Draw-It : Thanks Symyx (Beta release)
  • 41. Generate-It
  • 42. All Flavors
  • 43. Serve Up Services
  • 44. And Once It’s Resolved…
  • 45. Out to ChemSpider…and its resources
  • 46. COMING: InChI Resolver to DOIs
  • 47. Full Text-Based Literature Searching to DOIs Including Citations Now
  • 48. When Structures are “Connected”
  • 49. When Structures are “Connected”
  • 50. ChemSpider Everywhere
    • Linked from Wikipedia
    • Linked from Open Notebook Science sites using EMBED
    • Linked from Blogs using Structure/Spectra EMBED
    • Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets
    • Integrated to software offerings from Thermo, Waters, Agilent, Bruker
  • 51. ChemSpider Everywhere Embed Functionality (like YouTube)
  • 52. ChemSpider Everywhere www.spectralgame.com
  • 53. ChemSpider Everywhere Crowdsourced Curation of Spectra
  • 54. ChemSpider Everywhere RSC Compounds
  • 55. ChemSpider Everywhere Nature Chemistry
    • Nature Chemistry articles are annotated to identify all of the chemical compounds mentioned throughout the text. Users can choose to view the article with all of the compounds highlighted, and find out more about those compounds by linking out to other information resources including PubChem and ChemSpider .
  • 56. ChemSpider Everywhere ChemMobi
  • 57. Structure RSS Feeds with InChIs
  • 58. InChIs are Incomplete
    • What is NOT supported, yet:
      • polymers
      • organometallics
      • Markush structures
      • 3-D structures
      • excited states
      • interlocking structures (e.g. rotaxanes)
      • host-guest complexes
  • 59. Progressing InChI
    • Highest priority for the InChI Team is communication with structure drawing package vendors – THE interfaces to the users
    • For the InChI Resolver : Delivery of services to allow publishers to deposit their structure collections with associated DOIs to ChemSpider
    • Not every structure is important…Discussions with Publishers to discern primary compounds
  • 60. Conclusions
    • InChIs and Internet Chemistry
    • http://inchis.chemspider.com
  • 61. Acknowledgments
    • Richard Kidd, Royal Society of Chemistry
    • Keith Taylor, Symyx
    • Chris Singleton, Steven Bachrach and Alan McNaught for feedback
    • “ The InChI team”