Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry - Presentation Transcript

    1. Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Antony Williams Bio-IT World 2009
    2. Linked Data Cloud
    3. Chemistry on the Internet
      • Much of the information online is User Beware!
      • The Quality of information is “diverse”
      • Technologies can “link and connect” information but validation and curation is key to providing quality
      • The LinkedData web is of less value when the data linked are “wrong”
    4. Quality Costs
      • Chemical Abstracts Service (CAS), a division of the ACS is “Gold Standard” in Chemistry related information
        • 101 years of content, $260 million revenue (2006), >40 million substances and 60 million sequences
        • But online…
    5. What is “wrong”?
    6. Languages and Links in Chemistry
      • A platform for:
        • Data deposition, curation and annotation
        • Supporting Open Notebook Science efforts
        • Chemistry document mark-up with ChemMantis
        • The Open Access ChemSpider Journal of Chemistry
    7. Search Cholesterol
    8. Search Cholesterol
    9. Search Cholesterol
    10. Search Cholesterol
    11. Search Cholesterol
    12. Search Cholesterol
    13. Complex Data and Information
    14. Online Data
      • Many websites host structure-based information
      • Question quality!!!
    15.  
    16. Wikipedia, C&E News, PubChem
      • C&E News (from ACS)
    17. Does one stereocenter matter?
    18. Vancomycin
      • Who will curate?
      • PubChem is not resourced to clean these errors 
      • How would you clean such a large dataset?
    19. Vancomycin ChemSpider: 1 compound – 3 days
    20. Question Everything www.dhmo.org
    21. DailyMed
        • “ DailyMed provides high quality information about marketed drugs.
        • This information includes FDA approved labels (package inserts).”
    22. The FDA’s DailyMed
    23. Structures on DailyMed Poor Representations
    24. Structures on DailyMed Lack of Stereochemisty
    25. Incorrect Structures Scanning (?) Issues
    26. Incorrect Structures
    27. Does it Matter?
      • Does it matter to the consumer that the structures are wrong? No…what matters is what is in the bottle is the right medication!
      • To make DailyMed structure searchable it DOES matter
      • To data mine DailyMed it matters
      • To mark up DailyMed it matters
    28. Collaborative Knowledge Management for Chemists
    29. Wikipedia Links to Drugbank
    30. Taxol on PubChem
    31. Taxol on Daily Med
    32. The InChI Identifier
    33. Multiple Layers
      • Source: Unofficial InChI FAQ page
    34. InChIStrings Hash to InChIKeys
    35. InChIs for Taxol
    36. Back to Taxol
      • DrugBank: RCINICONZNJXQF-CLDWUXIMDD
      • ChEBI: RCINICONZNJXQF-GXKQXQCDDN
      • Wikipedia: RCINICONZNJXQF-MZXODVADBJ
      • Which one is correct???
    37. InChIKeys for Taxol
      • DrugBank: RCINICONZNJXQF-CLDWUXIMDD
      • ChEBI: RCINICONZNJXQF-GXKQXQCDDN
      • Wikipedia: RCINICONZNJXQF-MZXODVADBJ
      • ChEBI and Wikipedia are the SAME structure
      • Drugbank is a DIFFERENT structure – ONE stereocenter
    38. The InChI Resolver
    39.  
    40. Coming Soon…Linked Articles
    41. How bad can it get??? And who is right????
    42. ChemMantis
      • Chem ical M arkup A nd N omenclature T ransformation I ntegrated S ystem – ChemMantis
      • A platform for entity extraction for chemistry documents, markup and integration to online information sources – Wikipedia, ChemSpider, Entrez…
      • Web-based submission, markup and publishing platform now hosting the ChemSpider Journal of Chemistry
    43. ChemMantis Markup
    44. Enable Electronic Articles…
      • Structures are the language of chemistry
      • Show structures to chemists and search/link from there…
    45. Species Markup
    46. Dictionaries are Easily Enhanced
      • Copy-Paste into appropriate Entity Dictionary
      • Impacts all future markups
      • Expanding knowledgebases of information
      • Linked out to rich sources of information
    47. Build Dictionaries Ontologies Next
    48. Outlinks…
    49. Publishers and Document Mark-Up
    50. ChemSpider Everywhere
      • Linked from Wikipedia
      • Linked from Open Notebook Science sites using EMBED
      • Linked from Blogs using Structure/Spectra EMBED
      • Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets
      • Integrated to software offerings from Thermo, Waters, Agilent, Bruker
    51. ChemSpider Everywhere Embed Functionality (like YouTube)
    52. ChemSpider Everywhere www.spectralgame.com
    53. ChemSpider Everywhere Crowdsourced Curation of Spectra
    54. ChemSpider Everywhere RSC Compounds
    55. ChemSpider Everywhere Nature Chemistry
      • Nature Chemistry articles are annotated to identify all of the chemical compounds mentioned throughout the text.
      • Those compounds are linked out to other information resources including PubChem and ChemSpider .
    56. ChemSpider Everywhere ChemMobi
    57. Structure RSS Feeds with InChIs
    58.  
    59. Acknowledgments
      • Richard Kidd, Royal Society of Chemistry
      • Jason Wilde, Nature Publishing Group
      • Martin Walker and the Wikipedia Chemistry team
      • Microsoft – Rudy Potenzone
      • Symyx – Keith Taylor and James Jack
      • SureChem – Nicko Goncharoff
      • Spectral game - Andrew Lang and Jean-Claude Bradley
      • “ The InChI team and Advisory Group”
    60. Conclusions
      • www.chemspider.com
      • www.chemspider.com/journal
      • InChIs and Internet Chemistry
      • http://inchis.chemspider.com

    + Antony Williams, ChemSpidermanAntony Williams, ChemSpiderman, 6 months ago

    custom

    965 views, 1 favs, 2 embeds more stats

    There is an increasing availability of free and ope more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 965
      • 907 on SlideShare
      • 58 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 7
    Most viewed embeds
    • 52 views on http://www.chemspider.com
    • 6 views on http://chemspider.com

    more

    All embeds
    • 52 views on http://www.chemspider.com
    • 6 views on http://chemspider.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories