Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mining public domain data as a basis for drug repurposing


Published on

Online databases containing high throughput screening and other property data continue to proliferate in number. Many pharmaceutical chemists will have used databases such as PubChem, ChemSpider, DrugBank, BindingDB and many others. This work will report on the potential value of these databases for providing data to be used to repurpose drugs using cheminformatics-based approaches (e.g. docking, ligand-based machine learning methods). This work will also discuss the potentially related applications of the Open PHACTS project, a European Union Innovative Medicines Initiative project, that is utilizing semantic web based approaches to integrate large scale chemical and biological data in new ways. We will report on how compound and data quality should be taken into account when utilizing data from online databases and how their careful curation can provide high quality data that can be used to underpin the delivery of molecular models that can in turn identify new uses for old drugs.

Published in: Technology
  • Be the first to comment

Mining public domain data as a basis for drug repurposing

  1. Mining public domain data as a basis for drug repurposing Antony J Williams, Sean Ekins and Valery Tkachenko ACS Philadelphia August 2012
  2. Drug Repurposing Drug repurposing commonly means data reexamination also! Lots of data mining occurs Then more screening which creates more data.. LOTS of public databases used to examine repurposing…
  3. A LOT of data coming online
  4. Interlinked on the semantic web
  5. Where do you get your data?  Databases?  Patents?  Papers?  Your own lab?  Collaborators?  All of the above?  What is likely common to all sources? Data Quality issues. There is no perfect database.
  6. Public Domain Databases Our databases are a mess… Non-curated databases are proliferating errors We source and deposit data between databases Original sources of errors hard to determine Curation is time-consuming and challenging
  7. Availability of libraries of FDA drugs Johns Hopkins Clinical Compound library- made compounds available at cost
  8. The FDA Drug Database
  9. The DailyMed Database
  10. Government Databases ShouldCome With a Health Warning Williams and Ekins, DDT, 16: 747-750 (2011)
  11. What is Neomycin?
  12. Not this…
  13. Data Errors in the NPC Browser: Analysis of Steroids Substructure # of # of No Incomplete Complete but Hits Correct stereochemistry Stereochemistry incorrect Hits stereochemistry Gonane 34 5 8 21 0 Gon-4-ene 55 12 3 33 7 Gon-1,4-diene 60 17 10 23 10Williams, Ekins and TkachenkoDrug Disc Today 17: 685-701 (2012)
  14. Drug Disambiguation Project
  15. NCATS Discovering “New TherapeuticUses for Existing Molecules”58 Molecule namesand identifiers. Whereare the “structures”?
  16. NCATS dataset• Several groups tried to collate molecules• Chris Lipinski provided approximately 30 unique molecules• Simple molecule descriptors shows no difference between compounds classified as discontinued (N= 15) or those in clinical trials (n = 14).• Where is the definitive set of publicly accessible molecules for computational repurposing and analysis?
  17. Drug structure quality is important.. Many groups ARE doing in silico repositioning Integrating or using sets of FDA drugs..and if structures are incorrect predictions will be Where is the definitive set of FDA approved drugs with correct structures? Ideally we need linkage between in vitro data and clinical data
  18. We have a problem… Lots of data available but quality is suspect Errors proliferate database to database Data continues to flow in unabated When errors are identified hard to get fixed! Data licensing is confusing – “Open Data” We are “takers” not “givers” mostly… Standards are lacking:  Data licensing  Data processing – structure standardization
  19. So what needs to happen to improve?• Let’s agree collaboration and crowdsourcing can help• Provide SIMPLE ways to provide feedback• Contribute when possible – databases should provide feedback mechanisms• Adopt standards for structure handling and representation• Adopt standards for data interchange• Allow machine handling of data – use the power of the semantic web
  20. Williams, Ekins and Tkachenko, Drug Disc Today 17: 685-701 (2012)
  21. Collaboration on Curation Collaborate on curation…share through standards and open interfaces
  22. All DBs should take comments!
  23. Standardize Use the SRS as guidance for standardization
  24. “Appify” curation and collaboration• The data network is complex• “Appify” collaboration and curation networks• Increasing crowdsourcing role for data analysis Ekins & Williams, Pharm Res, 27: 393-395, 2010.
  25. Mobile Apps for Drug Discovery
  26. Open Drug Discovery Teams Free iOS app used to expose repurposing data All of this data has been tweeted, Clark and Williams, Mol Informatics, in Press 2012
  27. Open Drug Discovery Teams
  28. Simple Rules for licensing “open” data  Gather stakeholders. Decide if goals are primarily scientific, commercial or mixed.  Explore benefits of open licensing and drawbacks of enclosure. Hold closely to open definitions and standards. Do not write your own IP licenses!  Provide simple explanations for terms of use. Use metadata to indicate licensing terms explicitly - the Creative Commons Rights Expression Language is a good tool.  Do not lock up metadata. If you can’t make the data public domain, make the metadata public domain.Williams, Wilbanks and Ekins.PLoS Comput. Biol. in Press Sept.2012
  29. Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver services to support drug discovery programs in pharma and public domain 22 partners, 8 pharmaceutical companies, 3 biotechs 36 months project Guiding principle is open access, open usage, open source - Key to standards adoption -
  30. To facilitate THIS process! IP? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Competitors? Working On Connections Now? to disease? Expressed in right cell type?
  31. It’s not JUST structures of course…
  32. Taxol: Paclitaxel Bioassay Data Most Bioassay data associated with structure with one ambiguous stereocenter
  33. Measuring data: dispensing dependencies Data from 2 AstraZeneca patents - Ephrin pharmacophores developed using data for 14 compounds with IC50. Different dispensing methods give different results. Impact hypotheses and could impact drug discovery. Acoustic Disposable tip Hydrophobic Hydrogen Hydrogen Observed vs. features (HPF) bond acceptor bond donor predicted IC50 (HBA) (HBD) r Acoustic mediated process 2 1 1 0.92 Disposable tip mediated process 0 2 1 0.80Ekins, Olechno and Williams, Submitted 2012
  34. Measuring data: dispensing dependencies Acoustically-derived IC50 values were 1.5 to 276.5-fold lower than for tip-based dispensing• Pharmacophores and other computational models are used to guide medicinal chemistry.• Non tip-based methods may improve HTS results and avoid misleading computational and statistical models.• No analysis of influence of dispensing processes on data.• Public databases should annotate metadata to create larger datasets for comparing different computational methods. How much data is reproducible, accurate, valid? The challenge of high-throughput science.
  35. Conclusions
  36. Acknowledgments Sean Ekins Christopher Lipinski Joe Olechno John Wilbanks Drug Disambiguation project team RSC Cheminformatics Team
  37. Thank youEmail: williamsa@rsc.orgTwitter: @chemconnectorBlog: www.chemconnector.comSLIDES: ekinssean@yahoo.comTwitter: collabchemBlog: