Cleaning up chemistry for the pharma industry: delivering a flexible platform for interrogating the FDA DailyMed website

  • 1,065 views
Uploaded on

The original abstract is below. Ultimately this work was not funded by Microsoft and we did not deliver it on Sharepoint Server. Nevertheless, we DO depend heavily on Microsoft Technology to do what …

The original abstract is below. Ultimately this work was not funded by Microsoft and we did not deliver it on Sharepoint Server. Nevertheless, we DO depend heavily on Microsoft Technology to do what we do... .NET and SQL server specifically.

DailyMed is a website hosted by the FDA providing access to information about marketed drugs. This information includes FDA approved labels (package inserts) and provides a standard, comprehensive, up-to-date, look-up and download resource of medication content and labeling as found in medication package inserts. With an intention of enhancing the dataset by making it searchable by chemical structure/substructure we determined that the data contained numerous chemistry errors. We have therefore used a combination of text-mining, automated and manual curation to improve the quality of the data set. In so doing we have also made querying of the data more flexible. Specifically we have used the Microsoft Sharepoint technology to create a portal allowing both text-based and structure-based querying. We will report on the advantages such an approach delivers in terms of flexible interrogation of DailyMed.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,065
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
25
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Cleaning up chemistry for the pharma industry Delivering a flexible platform for interrogating the FDA DailyMed website Antony Williams
  • 2. Vision
    • Use the DailyMed FDA website data as a data source
    • Use Microsoft Sharepoint Server as a platform to demonstrate integrated ChemSpider technology
    • Deliver some “Chemistry” on the BioIT Alliance website
    • Get funding to support ChemSpider
  • 3. Reality
  • 4. Chemistry on the Internet
    • The Internet can clearly benefit chemists searching for information
    • Much of the information is based on assertions and User Beware!
    • The Quality of information available is diverse and how does the user know what is and is not “correct”?
  • 5. www.chemspider.com
    • 21.5 million structures, 150 data sources and growing
    • Flexible searching
    • Deposition of structures, spectra, crowdsourced curation and annotation
  • 6. Complex Data and Information
  • 7. 21.5 Million Structures, Varied Sources
    • There are “bad structures” on the database
    • There are bad structure-name pairs
    • Users have associated “incorrect information”
  • 8. Data Curation
  • 9. Caution! Question Everything!
  • 10. Question Everything www.dhmo.org
  • 11. Vancomycin
    • Who will curate?
    • PubChem is not resourced to clean these errors 
    • How would you clean such a large dataset?
  • 12. Vancomycin ChemSpider: 1 compound – 3 days
  • 13. DailyMed
      • “ DailyMed provides high quality information about marketed drugs.
      • This information includes FDA approved labels (package inserts).”
  • 14. The FDA’s DailyMed
  • 15. The Intention
    • Make DailyMed structure searchable via ChemSpider
    • In the process curate data on ChemSpider and validate data on DailyMed
    • Improve the curation platform on ChemSpider
    • Perform markup of DailyMed articles to enhance the reading experience
  • 16. Structures on DailyMed Poor Representations
  • 17. Structures on DailyMed Lack of Stereochemisty
  • 18. Incorrect Structures Simply Wrong
  • 19. Incorrect Structures Scanning (?) Issues
  • 20. Incorrect Structures “HOO-BOY!!!!!”
  • 21. Does it Matter?
    • Does it matter to the consumer that the structures are wrong? No…what matters is what is in the bottle is the right medication!
    • To make DailyMed structure searchable it DOES matter
    • To data mine DailyMed it matters
    • To mark up DailyMed it matters
  • 22. The Process
    • Import all XML files from DailyMed
    • Use “Home built” entity extraction based on our dictionary of chemical names
    • Articles online here:
      • http://www.chemspider.com/DailyMed.aspx
      • Example Article: http://www.chemspider.com/DailyMedArticle.aspx?id=2
  • 23. State of the Data
  • 24. Tolinase: DailyMed on ChemSpider
  • 25. OTHER Mentioned Chemicals
  • 26. One Name – Multiple Structures NO Stereo Full Stereo Partial Stereo Partial Stereo
  • 27. Editing a Record
    • Do NOT deprecate record…remove association between name and chemical structure
  • 28.  
  • 29. Partial Stereochemistry
  • 30. Loop of Assertions
    • Reduce to ONE structure – with full explicit stereo
  • 31. How bad can it get??? And who is right????
  • 32. Name-Structure Pairs
    • Cleaning up the associations of names and structures is torturous and time-consuming
    • Decisions get made and can be challenged
    • Names are not “removed” …they are still on the database
    • Such a curated “dictionary” is very valuable
  • 33. ChemMantis
    • Chem ical M arkup A nd N omenclature T ransformation I ntegrated S ystem – ChemMantis
    • A platform for entity extraction for chemistry documents, markup and integration to online information sources – Wikipedia, ChemSpider, Entrez…
    • Web-based submission, markup and publishing platform now hosting the ChemSpider Journal of Chemistry
  • 34. Back to DailyMed
  • 35. Quality of Structures!!!
  • 36. ChemMantis Markup
  • 37. Species Markup
  • 38. Dictionaries are Easily Enhanced
    • Copy-Paste into appropriate Entity Dictionary
    • Impacts all future markups
    • Expanding knowledgebases of information
    • Linked out to rich sources of information
  • 39.  
  • 40. Outlinks…
  • 41. Where To From Here?
    • The platform is built…it’s all eyeballs for curation now
    • As structure-identifier pairs are curated DailyMed will improve
    • The project is now on hold – no resources to continue
  • 42. If We Had Our Way…
    • Convert every DailyMed Label to a ChemMantis marked up document
    • Use the XML segregation of the Tablet Labels to tag where chemicals are in the label
    • Allow data mining based on “where” in a label the chemicals are..drug-drug interactions etc
    • Markup and mine property data out of the labels using new dictionaries related to properties such as IC50 and toxicity
  • 43. Conclusions
    • The internet enables chemistry – and at a reduced cost
    • Question Quality! All online information is suspect
    • Crowdsourcing for expansion, curation and integration can both improve the quality of existing information and add new content
    • If the FDA doesn’t have responsibility for what is on Tablet Labels…who does? The answer is simply an assertion!
  • 44. Interesting Sites
    • ChemSpider
      • http://www.chemspider.com
    • ChemSpider Journal of Chemistry
      • http://www.chemmantis.com
    • The InChI resolver
      • http://inchis.chemspider.com (goes live at ACS Spring)
    • The ChemSpider blog
      • http://www.chemspider.com/blog
    • Contact
      • [email_address]