Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs

on

  • 3,221 views

Internet-based public domain databases containing chemical compounds have grown in number, capability and content in recent years. There are now many databases containing millions of chemical ...

Internet-based public domain databases containing chemical compounds have grown in number, capability and content in recent years. There are now many databases containing millions of chemical compounds associated with different types of data including chemical names, properties, analytical data, and with associated mapping to proteins, assay data, clinical information and so on. These disparate data sources suffer from one common issue – quality of data. This presentation will provide an overview of our efforts to source the appropriate structural representations for 200 top-selling drugs from public domain sources. This intra- and inter-laboratory comparison of approaches, processes and necessary agreements exposed the challenges associated with aggregating structure-based data. The project also provided data regarding the distribution of quality issues associated with many of the community’s popular databases.

Statistics

Views

Total Views
3,221
Views on SlideShare
1,911
Embed Views
1,310

Actions

Likes
0
Downloads
19
Comments
0

7 Embeds 1,310

http://www.chemconnector.com 916
http://www.chemspider.com 382
http://lanyrd.com 5
https://www.chemspider.com 3
http://www.linkedin.com 2
http://74.6.117.48 1
http://translate.googleusercontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs Presentation Transcript

  • 1. Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs Antony Williams ACS Denver September 2011
  • 2. Upfront Acknowledgment - All Authors…
    • Royal Society of Chemistry – Antony Williams, David Sharpe
    • University of North Carolina, Chapel Hill – Alex Tropsha, Denis Fourches, Eugene Muratov, Andrew Fant
    • Chemotargets SL – Ricard Garcia-Serna
    • IMIM-Hospital del Mar Research Institute and Universitat Pompeu Fabra – Jordi Mestres
    • Astra Zeneca – Sorel Muresan, Christopher Southan
    • ACD/Labs – Andrey Erin
  • 3. Internet-Based Chemistry
    • Internet-based chemistry resources are:
      • Diverse in quality
      • Confusing
      • Uncoordinated
      • Fixable – with a lot of effort
  • 4.  
  • 5.
    • Open PHACTS : partnership between European Community and EFPIA
    • Freely accessible for knowledge discovery and verification.
      • Data on small molecules
      • Pharmacological profiles
      • Pharmacokinetics
      • ADMET data
      • Biological targets and pathways
      • Proprietary and public data sources.
  • 6. Stop Whining – Fix it
  • 7. What needs to happen?
    • Standards
      • Standardization of structures
        • ChEBI/PubChem sharing
        • InChI adoption
    • Collaboration
      • Stop reinventing the wheel
      • Share data, share efforts and speed the process
    • Vision is not good enough – Execute!
  • 8. Standards : Structure Standardization
  • 9. Standards : Structure Standardization
  • 10. Standards : Structure Standardization
  • 11. Collaboration
  • 12. Then this won’t happen…
  • 13.  
  • 14. Top 200 Drugs on Wikipedia http://en.wikipedia.org/wiki/List_of_bestselling_drugs
  • 15. The Project Challenge PART ONE
    • Agree on the set of chemical names to work with
    • Independently create an SDF file in each “lab”
    • Compare differences and agree on final structures
    • Issue “Gold Standard” SDF file to team
  • 16. The Project Challenge PART TWO
    • Use Gold Standard SDF File to investigate data quality on these compounds in Internet Databases
    • Two checks
      • Search chemical name – does it return the correct compound. If not correct, how is it different?
      • Search “structure” – SMILES, Molfile, InChIString or InChIKey
  • 17. 200 Top-Selling Drugs (2006)
    • Biologicals removed immediately
    • Single compounds versus mixtures identified
    • Decision to NOT exclude racemates
    • List of 152 drugs to analyze
    • Generic names used
  • 18. Different Approaches
    • ACD/Labs – Curated commercial dictionary
    • RSC|ChemSpider and UNC Chapel Hill – manual curation
    • ChemoTargets/IMIM – lookup against database
    • AstraZeneca – lookup against database
  • 19. Different Approaches
  • 20. Different Approaches
  • 21. Different Approaches
  • 22. Different Approaches
  • 23. Choose a Starting Point
  • 24. Comparisons
  • 25. Observations
    • Manual curation – slow and imperfect process.
      • A loop of assertions
      • Software tool issues
    • Lookup – fast and imperfect
      • Totally dependent on initial investment in time
    • InChIs
      • Very useful for comparison
      • Imperfect
  • 26. Structure Representations
  • 27. Representing Racemates
  • 28. Representing Racemates - Formoterol
  • 29. Racemic Mixtures
  • 30. Racemic Mixtures X
  • 31. “ The First 10”
  • 32. Collaboration on Curation
    • If we could collaborate on curation…share through standards and open interfaces
  • 33. Proof of Concept Data Curation Sharing
  • 34. SciDBs.com (Coming soon)
  • 35. Conclusions
    • It is DIFFICULT to aggregate high quality structure datasets of even common drugs!
    • InChI is very enabling but enhanced stereo necessary
    • Is there a need to be “right”?
    • Publication will provide:
      • Recommendations for structure standardization
      • Rank ordering of resources
      • Suggestions for InChI enhancement
      • SDF file
      • Curation feed of structures and synonyms
  • 36. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams