Cleaning up chemistry for the pharma industry: delivering a flexible platform for interrogating the FDA DailyMed website

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Cleaning up chemistry for the pharma industry: delivering a flexible platform for interrogating the FDA DailyMed website - Presentation Transcript

    1. Cleaning up chemistry for the pharma industry Delivering a flexible platform for interrogating the FDA DailyMed website Antony Williams
    2. Vision
      • Use the DailyMed FDA website data as a data source
      • Use Microsoft Sharepoint Server as a platform to demonstrate integrated ChemSpider technology
      • Deliver some “Chemistry” on the BioIT Alliance website
      • Get funding to support ChemSpider
    3. Reality
    4. Chemistry on the Internet
      • The Internet can clearly benefit chemists searching for information
      • Much of the information is based on assertions and User Beware!
      • The Quality of information available is diverse and how does the user know what is and is not “correct”?
    5. www.chemspider.com
      • 21.5 million structures, 150 data sources and growing
      • Flexible searching
      • Deposition of structures, spectra, crowdsourced curation and annotation
    6. Complex Data and Information
    7. 21.5 Million Structures, Varied Sources
      • There are “bad structures” on the database
      • There are bad structure-name pairs
      • Users have associated “incorrect information”
    8. Data Curation
    9. Caution! Question Everything!
    10. Question Everything www.dhmo.org
    11. Vancomycin
      • Who will curate?
      • PubChem is not resourced to clean these errors 
      • How would you clean such a large dataset?
    12. Vancomycin ChemSpider: 1 compound – 3 days
    13. DailyMed
        • “ DailyMed provides high quality information about marketed drugs.
        • This information includes FDA approved labels (package inserts).”
    14. The FDA’s DailyMed
    15. The Intention
      • Make DailyMed structure searchable via ChemSpider
      • In the process curate data on ChemSpider and validate data on DailyMed
      • Improve the curation platform on ChemSpider
      • Perform markup of DailyMed articles to enhance the reading experience
    16. Structures on DailyMed Poor Representations
    17. Structures on DailyMed Lack of Stereochemisty
    18. Incorrect Structures Simply Wrong
    19. Incorrect Structures Scanning (?) Issues
    20. Incorrect Structures “HOO-BOY!!!!!”
    21. Does it Matter?
      • Does it matter to the consumer that the structures are wrong? No…what matters is what is in the bottle is the right medication!
      • To make DailyMed structure searchable it DOES matter
      • To data mine DailyMed it matters
      • To mark up DailyMed it matters
    22. The Process
      • Import all XML files from DailyMed
      • Use “Home built” entity extraction based on our dictionary of chemical names
      • Articles online here:
        • http://www.chemspider.com/DailyMed.aspx
        • Example Article: http://www.chemspider.com/DailyMedArticle.aspx?id=2
    23. State of the Data
    24. Tolinase: DailyMed on ChemSpider
    25. OTHER Mentioned Chemicals
    26. One Name – Multiple Structures NO Stereo Full Stereo Partial Stereo Partial Stereo
    27. Editing a Record
      • Do NOT deprecate record…remove association between name and chemical structure
    28.  
    29. Partial Stereochemistry
    30. Loop of Assertions
      • Reduce to ONE structure – with full explicit stereo
    31. How bad can it get??? And who is right????
    32. Name-Structure Pairs
      • Cleaning up the associations of names and structures is torturous and time-consuming
      • Decisions get made and can be challenged
      • Names are not “removed” …they are still on the database
      • Such a curated “dictionary” is very valuable
    33. ChemMantis
      • Chem ical M arkup A nd N omenclature T ransformation I ntegrated S ystem – ChemMantis
      • A platform for entity extraction for chemistry documents, markup and integration to online information sources – Wikipedia, ChemSpider, Entrez…
      • Web-based submission, markup and publishing platform now hosting the ChemSpider Journal of Chemistry
    34. Back to DailyMed
    35. Quality of Structures!!!
    36. ChemMantis Markup
    37. Species Markup
    38. Dictionaries are Easily Enhanced
      • Copy-Paste into appropriate Entity Dictionary
      • Impacts all future markups
      • Expanding knowledgebases of information
      • Linked out to rich sources of information
    39.  
    40. Outlinks…
    41. Where To From Here?
      • The platform is built…it’s all eyeballs for curation now
      • As structure-identifier pairs are curated DailyMed will improve
      • The project is now on hold – no resources to continue
    42. If We Had Our Way…
      • Convert every DailyMed Label to a ChemMantis marked up document
      • Use the XML segregation of the Tablet Labels to tag where chemicals are in the label
      • Allow data mining based on “where” in a label the chemicals are..drug-drug interactions etc
      • Markup and mine property data out of the labels using new dictionaries related to properties such as IC50 and toxicity
    43. Conclusions
      • The internet enables chemistry – and at a reduced cost
      • Question Quality! All online information is suspect
      • Crowdsourcing for expansion, curation and integration can both improve the quality of existing information and add new content
      • If the FDA doesn’t have responsibility for what is on Tablet Labels…who does? The answer is simply an assertion!
    44. Interesting Sites
      • ChemSpider
        • http://www.chemspider.com
      • ChemSpider Journal of Chemistry
        • http://www.chemmantis.com
      • The InChI resolver
        • http://inchis.chemspider.com (goes live at ACS Spring)
      • The ChemSpider blog
        • http://www.chemspider.com/blog
      • Contact
        • [email_address]

    + Antony Williams, ChemSpidermanAntony Williams, ChemSpiderman, 7 months ago

    custom

    692 views, 1 favs, 0 embeds more stats

    The original abstract is below. Ultimately this wor more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 692
      • 692 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 11
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories