Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs


Published on

Internet-based public domain databases containing chemical compounds have grown in number, capability and content in recent years. There are now many databases containing millions of chemical compounds associated with different types of data including chemical names, properties, analytical data, and with associated mapping to proteins, assay data, clinical information and so on. These disparate data sources suffer from one common issue – quality of data. This presentation will provide an overview of our efforts to source the appropriate structural representations for 200 top-selling drugs from public domain sources. This intra- and inter-laboratory comparison of approaches, processes and necessary agreements exposed the challenges associated with aggregating structure-based data. The project also provided data regarding the distribution of quality issues associated with many of the community’s popular databases.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs

  1. 1. Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs Antony Williams ACS Denver September 2011
  2. 2. Upfront Acknowledgment - All Authors… <ul><li>Royal Society of Chemistry – Antony Williams, David Sharpe </li></ul><ul><li>University of North Carolina, Chapel Hill – Alex Tropsha, Denis Fourches, Eugene Muratov, Andrew Fant </li></ul><ul><li>Chemotargets SL – Ricard Garcia-Serna </li></ul><ul><li>IMIM-Hospital del Mar Research Institute and Universitat Pompeu Fabra – Jordi Mestres </li></ul><ul><li>Astra Zeneca – Sorel Muresan, Christopher Southan </li></ul><ul><li>ACD/Labs – Andrey Erin </li></ul>
  3. 3. Internet-Based Chemistry <ul><li>Internet-based chemistry resources are: </li></ul><ul><ul><li>Diverse in quality </li></ul></ul><ul><ul><li>Confusing </li></ul></ul><ul><ul><li>Uncoordinated </li></ul></ul><ul><ul><li>Fixable – with a lot of effort </li></ul></ul>
  4. 5. <ul><li>Open PHACTS : partnership between European Community and EFPIA </li></ul><ul><li>Freely accessible for knowledge discovery and verification. </li></ul><ul><ul><li>Data on small molecules </li></ul></ul><ul><ul><li>Pharmacological profiles </li></ul></ul><ul><ul><li>Pharmacokinetics </li></ul></ul><ul><ul><li>ADMET data </li></ul></ul><ul><ul><li>Biological targets and pathways </li></ul></ul><ul><ul><li>Proprietary and public data sources. </li></ul></ul>
  5. 6. Stop Whining – Fix it
  6. 7. What needs to happen? <ul><li>Standards </li></ul><ul><ul><li>Standardization of structures </li></ul></ul><ul><ul><ul><li>ChEBI/PubChem sharing </li></ul></ul></ul><ul><ul><ul><li>InChI adoption </li></ul></ul></ul><ul><li>Collaboration </li></ul><ul><ul><li>Stop reinventing the wheel </li></ul></ul><ul><ul><li>Share data, share efforts and speed the process </li></ul></ul><ul><li>Vision is not good enough – Execute! </li></ul>
  7. 8. Standards : Structure Standardization
  8. 9. Standards : Structure Standardization
  9. 10. Standards : Structure Standardization
  10. 11. Collaboration
  11. 12. Then this won’t happen…
  12. 14. Top 200 Drugs on Wikipedia
  13. 15. The Project Challenge PART ONE <ul><li>Agree on the set of chemical names to work with </li></ul><ul><li>Independently create an SDF file in each “lab” </li></ul><ul><li>Compare differences and agree on final structures </li></ul><ul><li>Issue “Gold Standard” SDF file to team </li></ul>
  14. 16. The Project Challenge PART TWO <ul><li>Use Gold Standard SDF File to investigate data quality on these compounds in Internet Databases </li></ul><ul><li>Two checks </li></ul><ul><ul><li>Search chemical name – does it return the correct compound. If not correct, how is it different? </li></ul></ul><ul><ul><li>Search “structure” – SMILES, Molfile, InChIString or InChIKey </li></ul></ul>
  15. 17. 200 Top-Selling Drugs (2006) <ul><li>Biologicals removed immediately </li></ul><ul><li>Single compounds versus mixtures identified </li></ul><ul><li>Decision to NOT exclude racemates </li></ul><ul><li>List of 152 drugs to analyze </li></ul><ul><li>Generic names used </li></ul>
  16. 18. Different Approaches <ul><li>ACD/Labs – Curated commercial dictionary </li></ul><ul><li>RSC|ChemSpider and UNC Chapel Hill – manual curation </li></ul><ul><li>ChemoTargets/IMIM – lookup against database </li></ul><ul><li>AstraZeneca – lookup against database </li></ul>
  17. 19. Different Approaches
  18. 20. Different Approaches
  19. 21. Different Approaches
  20. 22. Different Approaches
  21. 23. Choose a Starting Point
  22. 24. Comparisons
  23. 25. Observations <ul><li>Manual curation – slow and imperfect process. </li></ul><ul><ul><li>A loop of assertions </li></ul></ul><ul><ul><li>Software tool issues </li></ul></ul><ul><li>Lookup – fast and imperfect </li></ul><ul><ul><li>Totally dependent on initial investment in time </li></ul></ul><ul><li>InChIs </li></ul><ul><ul><li>Very useful for comparison </li></ul></ul><ul><ul><li>Imperfect </li></ul></ul>
  24. 26. Structure Representations
  25. 27. Representing Racemates
  26. 28. Representing Racemates - Formoterol
  27. 29. Racemic Mixtures
  28. 30. Racemic Mixtures X
  29. 31. “ The First 10”
  30. 32. Collaboration on Curation <ul><li>If we could collaborate on curation…share through standards and open interfaces </li></ul>
  31. 33. Proof of Concept Data Curation Sharing
  32. 34. (Coming soon)
  33. 35. Conclusions <ul><li>It is DIFFICULT to aggregate high quality structure datasets of even common drugs! </li></ul><ul><li>InChI is very enabling but enhanced stereo necessary </li></ul><ul><li>Is there a need to be “right”? </li></ul><ul><li>Publication will provide: </li></ul><ul><ul><li>Recommendations for structure standardization </li></ul></ul><ul><ul><li>Rank ordering of resources </li></ul></ul><ul><ul><li>Suggestions for InChI enhancement </li></ul></ul><ul><ul><li>SDF file </li></ul></ul><ul><ul><li>Curation feed of structures and synonyms </li></ul></ul>
  34. 36. Thank you Email: Twitter: ChemConnector Blog: Personal Blog: SLIDES: