Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Going a mile InChI by InChI : Enabling online chemistry at ChemSpider - Presentation Transcript

    1. Going a mile InChI by InChI Enabling online chemistry at ChemSpider Antony Williams
    2. Languages of Chemistry
    3. ChemSpider 2009
      • “ Building a Structure Centric Community for Chemists”
      • Hosting structures, spectra, images, documents, outlinks
      • Many web services for retrieval of data, conversion of files, generation of properties..
      • Now a platform for:
        • data deposition, curation and annotation – remove the junk
        • Supporting Open Notebook Science efforts
        • chemistry document mark-up with ChemMantis
        • the online ChemSpider Journal of Chemistry
    4. Statistics and Connections
      • >6000 unique users per day on average
      • >40,000 transactions per day
      • >21.4 million compounds and growing daily
      • Advocate of InChIs for searching and integration
    5. Search Cholesterol
    6. Search Cholesterol
    7. Search Cholesterol
    8. Search Cholesterol
    9. Search Cholesterol
    10. Search Cholesterol
    11. Searching
      • Structure searching based on
        • SMILES
        • InChIString
        • InChIKey
        • StdInChI
        • StdInChIKey
        • molfile uploads
        • structures drawn in applet
      • Search across Google (to string limit for InChIString)
        • Skeleton search
        • Full structure search
    12. InChIKey Searches Work
    13. Depositions
      • Depositions from users – single structures and SDFs
      • Depositions from databases/vendors – SDF files
      • And then came InChIs…
        • InChIs and InChIKeys are available on Blogs for harvesting
        • Publishers are making their structures available as InChIs for harvesting
        • InChIs are NOT ideal for building a database…some lessons
        • We want to link to publications especially…
    14. Chemistry Papers
      • Cultivation of a rare Verrucosispora strain (sediment, Sea of Japan) gave three polyketides, atrop -abyssomicin C 35 , abyssomicin G 36 and abyssomicin H 37 . Atrop -abyssomicin C 35 has previously been reported as a synthetic compound, but ready conversion to abyssomicin D suggests that it was probably naturally produced. Atrop -abyssomicin C was an inhibitor of S. aureus N315 (MRSA) and 4-amino-4-deoxychorismate (ADC) synthase. The tenacibactins A–D 38 – 41 , hydroxamate siderophores isolated from culture of the filamentous bacterium Tenacibaculum sp. ( Chondrus ocellatus , Awajishima Island, Japan), all possessed iron-chelating activity with tenacibactins C 40 and D 41 being considerably more effective than tenacibactins A 38 and B 39 .
    15. Structures in Chemistry Papers
    16. Aesthetics vs Machine Readable
      • Beautiful chemical structures submitted by authors can be beasts for machines
    17. InChI Representation
    18. InChI’fication of Articles
      • InChIs from publishers – a lot of work for a publisher to provide exact structures for articles. Applause to RSC for Project Prospect and now Nature Chemistry
      • An enormous editorial task with a massive benefit to the community
      • If the structures were correct…imagine a centralized DOI:InChI database
    19. Cleaning Structures
    20. Converting InChIs to Structures Bacitracin A
      • InChI=1/C66H103N17O16S/......./t35 u ,36 u ,37 u ,40-,41+,42+,43-,44+,45-,46-,47+,48 u ,52-,53-,54-/m0/s1
      • InChI=1/C66H103N17O16S/.......)/t35 ? ,36 ? ,37 ? ,40-,41+,42+,43-,44+,45-,46-,47+,48 ? ,52-,53-,54-/m0/s1
    21.  
    22. Converting InChIs to Structures
      • What we want is a good layout, retention of stereochemistry labels and tautomers as drawn
    23. Auxinfo – Who Uses It? Who Converts It?
      • AuxInfo=1/1/N:52,24,90,40,41,50,100,60,51,23,61,68,66,67,17,16,83,65,64,15,81,30,28,84,92,37,62,99,74,71,13,45,11,39,49,22,59,63,9,88,79,31,35,95,98,75,70,42,76,25,5,47,19,56,77,86,78,26,34,93,7,6,53,18,55,44,85,2,48,12,91,10,80,33,14,38,96,73,69,94,43,58,21,1,27,32,3,4,89,87,82,29,36,97,8,72,54,20,57,46/E:(4,5)(13,14)(18,19)(85,86)(87,88)/it:im/rA:100cONOOCCCOCNCNCNCCCCCONCCCCCOCOCCONCCOCNCCCCNCCSCNCCCCCOCCONCCCCCCCCCCNCCONCCCCCCNCOCCNCOCOCNCCNCNOCCC/rB:;;;s3d4;;;d7;;s9;s10;d11;d9s12;;;;s15s16;s14;s18;d18;N19;s19;s22;s23;;s21;d25;s25;d26;s28;s26s30;s25;N31;s33;s34;d34;s35;N35;s37;s39;s39;;s42;d43;s42;s44s45;s44;s47;s47;s49;s49;s51;s38s42;d53;;s55;d55;P56;s56;s59;s59;;s62;d63;s63;d65;s64;s66d67;s7;s6p69;s5s70;d6;s6;;n73s74;d1s2s74;s75;s58;s78;N79;s79;d78;s81;s83;s84;s80;d86;p14s15s86;d77;s61;s77;s16s91;;s55;s62s93p94;s93;d93;s7n96;s9s98;s22;/rC:12.1656,-8.504,0;12.1656,-6.2884,0;16.5968,-3.1336,0;15.3445,-5.2769,0;16.5968,-4.5785,0;16.7654,-7.5166,0;20.0406,-7.5166,0;20.0406,-6.2402,0;22.208,-6.2161,0;23.2436,-5.4937,0;22.8582,-4.2895,0;21.6059,-4.2895,0;21.1725,-5.4937,0;19.294,-19.028,0;16.067,-19.6542,0;11.2264,-17.9202,0;13.1048,-19.6542,0;20.45,-18.426,0;21.5337,-19.028,0;20.45,-17.1255,0;21.5578,-21.1954,0;22.6174,-18.4019,0;22.6174,-17.1255,0;23.7252,-16.4512,0;19.2459,-23.9168,0;22.7137,-21.8698,0;19.2699,-25.2654,0;20.4018,-23.2425,0;23.8697,-21.1714,0;21.5819,-23.8927,0;22.7378,-23.1943,0;18.0899,-23.2665,0;23.9179,-23.8686,0;25.0497,-23.1702,0;26.2298,-23.8686,0;25.0497,-21.8457,0;27.3617,-23.1702,0;26.2298,-25.2172,0;27.3617,-21.8457,0;28.5417,-21.1714,0;26.2298,-21.1714,0;28.5417,-25.2172,0;29.8181,-25.6025,0;30.6128,-24.5188,0;28.5417,-23.8686,0;29.8181,-23.411,0;31.9614,-24.5188,0;32.6357,-25.6989,0;32.6357,-23.3629,0;31.9614,-22.2069,0;33.9603,-23.3629,0;34.6346,-24.5188,0;27.3617,-25.8675,0;27.3617,-27.2161,0;20.0165,-11.3457,0;18.9328,-11.9959,0;20.0165,-10.0452,0;18.9087,-13.3686,0;17.825,-11.3457,0;16.7172,-11.9959,0;17.825,-10.0693,0;23.5807,-11.7551,0;23.5807,-13.0315,0;24.7367,-13.6817,0;22.5211,-13.6817,0;22.5211,-14.9822,0;24.7367,-14.934,0;23.5807,-15.5842,0;18.9328,-8.1427,0;17.8009,-7.5166,0;17.8009,-5.2769,0;16.091,-6.3365,0;16.1633,-8.6003,0;14.0681,-7.3962,0;14.7906,-8.6003,0;12.8158,-7.3962,0;14.0681,-9.8044,0;17.5119,-13.3686,0;16.8135,-14.5728,0;17.5119,-15.7528,0;15.4408,-14.5728,0;17.0699,-12.6613,0;14.7665,-15.7528,0;13.3697,-15.7528,0;12.6713,-16.9569,0;16.8135,-16.9569,0;15.4408,-16.9569,0;17.5119,-18.137,0;14.7906,-10.9845,0;19.0291,-9.395,0;12.6954,-9.8044,0;11.2264,-11.1049,0;22.2321,-10.0452,0;21.1243,-11.9478,0;22.2321,-11.3457,0;21.1243,-9.4191,0;23.3399,-9.4191,0;21.1243,-8.1186,0;22.2321,-7.5166,0;23.7252,-19.028,0;
    24. Who Has Responsibility?
      • Who will take responsibility for drawing/enumerating the structures?
      • Where can software contribute?
      • What Quality is “good enough”?
      • We MUST reduce rework!!!
    25. A Lot of Variability in InChIs
      • Source: Unofficial InChI FAQ page
    26. InChIs for Taxol
    27. Taxol
      • DrugBank: RCINICONZNJXQF-CLDWUXIMDD
      • ChEBI: RCINICONZNJXQF-GXKQXQCDDN
      • Wikipedia: RCINICONZNJXQF-MZXODVADBJ
      • Which one is correct???
    28. InChIKeys for Taxol
      • DrugBank: RCINICONZNJXQF-CLDWUXIMDD
      • ChEBI: RCINICONZNJXQF-GXKQXQCDDN
      • Wikipedia: RCINICONZNJXQF-MZXODVADBJ
      • ChEBI and Wikipedia are the SAME structure
      • Drugbank is a DIFFERENT structure at ONE stereocenter
    29. Does one stereocenter matter?
      • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
    30. InChIStrings Hash to InChIKeys
    31. The InChI Resolver
    32. HVYWMOMLDIMFJA-DPAQBDIFSA-N
    33.  
    34. Resolve-It
    35. Resolve-It
    36. Pretty-It
    37. JMol-It, Download-It and Zoom-It
    38. Kind-of-Resolve-It
    39. Generate-It
    40. Draw-It : Thanks Symyx (Beta release)
    41. Generate-It
    42. All Flavors
    43. Serve Up Services
    44. And Once It’s Resolved…
    45. Out to ChemSpider…and its resources
    46. COMING: InChI Resolver to DOIs
    47. Full Text-Based Literature Searching to DOIs Including Citations Now
    48. When Structures are “Connected”
    49. When Structures are “Connected”
    50. ChemSpider Everywhere
      • Linked from Wikipedia
      • Linked from Open Notebook Science sites using EMBED
      • Linked from Blogs using Structure/Spectra EMBED
      • Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets
      • Integrated to software offerings from Thermo, Waters, Agilent, Bruker
    51. ChemSpider Everywhere Embed Functionality (like YouTube)
    52. ChemSpider Everywhere www.spectralgame.com
    53. ChemSpider Everywhere Crowdsourced Curation of Spectra
    54. ChemSpider Everywhere RSC Compounds
    55. ChemSpider Everywhere Nature Chemistry
      • Nature Chemistry articles are annotated to identify all of the chemical compounds mentioned throughout the text. Users can choose to view the article with all of the compounds highlighted, and find out more about those compounds by linking out to other information resources including PubChem and ChemSpider .
    56. ChemSpider Everywhere ChemMobi
    57. Structure RSS Feeds with InChIs
    58. InChIs are Incomplete
      • What is NOT supported, yet:
        • polymers
        • organometallics
        • Markush structures
        • 3-D structures
        • excited states
        • interlocking structures (e.g. rotaxanes)
        • host-guest complexes
    59. Progressing InChI
      • Highest priority for the InChI Team is communication with structure drawing package vendors – THE interfaces to the users
      • For the InChI Resolver : Delivery of services to allow publishers to deposit their structure collections with associated DOIs to ChemSpider
      • Not every structure is important…Discussions with Publishers to discern primary compounds
    60. Conclusions
      • InChIs and Internet Chemistry
      • http://inchis.chemspider.com
    61. Acknowledgments
      • Richard Kidd, Royal Society of Chemistry
      • Keith Taylor, Symyx
      • Chris Singleton, Steven Bachrach and Alan McNaught for feedback
      • “ The InChI team”

    + Antony Williams, ChemSpidermanAntony Williams, ChemSpiderman, 7 months ago

    custom

    585 views, 1 favs, 1 embeds more stats

    The task of finding chemical information online can more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 585
      • 584 on SlideShare
      • 1 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 11
    Most viewed embeds
    • 1 views on http://www.chemspider.com

    more

    All embeds
    • 1 views on http://www.chemspider.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories