0
Pre-competitive Collaboration: Sharing Data to Increase Predictability<br />3rd Annual Drug Discovery Partnership: Filling...
Opportunities for Competitive Collaboration<br />
Industry is Sharing More <br />
Solubility and <br />Melting Points <br />are critical properties in the drug discovery process<br />
Data quality is essential for both measurements and predictions based on measurements<br />
Openness is proving to be a powerful tool for assessing the reliability of data<br />
Solubility prediction for Taxol using<br /> Abraham descriptors<br />Pred Exp<br />
Predicted temperature dependent solubility of Taxol in water based on melting point (M) <br />
The Trusted Source Model<br />Before online databases (early 90s) searching for properties like melting points using ONE “...
Merck Index
Chemical Vendor Catalogs (e.g. Sigma-Aldrich)
Peer-Reviewed Journals</li></ul>Single values don’t tend to be contradicted<br />
Question Assumptions<br />Using technology, we can begin to replace the “trusted source” model with one based on transpare...
The Chemical Information Validation Sheet <br />567 curated and referenced measurements from <br />Fall 2010 Chemical Info...
Discovering outliers for melting points (stdev/average)<br />
Investigating the m.p. inconsistencies of EGCG<br />
Investigating the m.p. inconsistencies of cyclohexanone<br />
Most popular data sources<br />
Alfa Aesar donates melting points to the public<br />
Open Melting Point Explorer<br />(Andrew Lang)<br />
Outliers<br />MDPI <br />dataset<br />PhysProp (EPA donated all data to public also)<br />
Outliers for ethanol: Alfa Aesar and Oxford MSDS<br />
Inconsistencies and SMILES problems within MDPI dataset<br />
MDPI Dataset labeled with High Trust Level<br />
Open Melting Point Datasets<br />Currently 27,000 mps for 20,000 compounds<br />
What is the melting point of 4-benzyltoluene?<br />American Petroleum Institute5 C<br />PHYSPROP-30 C<br />PHYSPROP	125 C<...
The quest to resolve the melting point <br />of 4-benzyltoluene: liquid at room temp <br />and can be frozen <-30C (Evan C...
Open Lab Notebook page measuring the melting point of 4-benzyltoluene<br />
Motivation: Faster Science,Better Science<br />
Ruling out all melting points above -15C?<br />
Oops – 4-benzyltoluene freezes after 16 days at -15C!<br />
Measuring the melting point by slowly heating from -15 C gives 5 C<br />
There are NO FACTS, <br />only measurements embedded within assumptions<br />Open Notebook Science maintains the integrity...
TRUST<br />PROOF<br />
Common errors in datasets<br />multiple melting points for the same compound in the same database<br />stereochemistry iss...
Open Random Forest modeling of Open Melting Point data using CDK descriptors<br />(Andrew Lang)<br />R2 = 0.78, TPSA and n...
Melting point prediction service<br />
Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)<br />
Publication of double+ validated melting point dataset to Nature Precedings and LuLu<br />
Crowdsourcing Solubility Data<br />
ONS Challenge Judges<br />
ONS Challenge Award Winners<br />
Web services for summary data<br />(Andrew Lang)<br />
Reaction Attempts Book<br />
Reaction Attempts Book: Reactants listed Alphabetically<br />
Interactive NMR spectra using JSpecView or ChemDoodleand the Open JCAMP-DX format<br />
Predicting Best Solvent for Imine Formation using solubility and melting point data <br />(Evan Curtin)<br />
Predicting Yield of Imine Formation in Ethanol <br />(Evan Curtin)<br />
Google Apps Scripts web services<br />
Google Apps Scripts for conveniently exploring melting point data<br />
Comparison of model with triple validated measurements<br />Straight chain carboxylic acids from 1 to 10 carbons<br />Stra...
Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available) <br />
Google Apps Scripts for planning reactions and creating schemes<br />
Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)<br />
All ONS web services<br />
Some Initiatives Promoting More Openness in Drug Discovery<br />
Upcoming SlideShare
Loading in...5
×

Bradley Opal 2011

1,104

Published on

Jean-Claude Bradley presents at the Opal Events 3rd Annual Drug Discovery Partnership: Filling the Pipeline on Pre-competitive Collaboration: Sharing Data to Increase Predictability

Published in: Education, Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,104
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Bradley Opal 2011"

  1. 1. Pre-competitive Collaboration: Sharing Data to Increase Predictability<br />3rd Annual Drug Discovery Partnership: Filling the Pipeline<br />Jean-Claude Bradley<br />Associate Professor of Chemistry<br />Drexel University<br />October 17, 2011<br />
  2. 2. Opportunities for Competitive Collaboration<br />
  3. 3. Industry is Sharing More <br />
  4. 4. Solubility and <br />Melting Points <br />are critical properties in the drug discovery process<br />
  5. 5. Data quality is essential for both measurements and predictions based on measurements<br />
  6. 6. Openness is proving to be a powerful tool for assessing the reliability of data<br />
  7. 7. Solubility prediction for Taxol using<br /> Abraham descriptors<br />Pred Exp<br />
  8. 8. Predicted temperature dependent solubility of Taxol in water based on melting point (M) <br />
  9. 9. The Trusted Source Model<br />Before online databases (early 90s) searching for properties like melting points using ONE “trusted source” was practical and acceptable as part of the chemistry culture.<br /><ul><li>CRC Handbook
  10. 10. Merck Index
  11. 11. Chemical Vendor Catalogs (e.g. Sigma-Aldrich)
  12. 12. Peer-Reviewed Journals</li></ul>Single values don’t tend to be contradicted<br />
  13. 13. Question Assumptions<br />Using technology, we can begin to replace the “trusted source” model with one based on transparency and provenance<br />
  14. 14. The Chemical Information Validation Sheet <br />567 curated and referenced measurements from <br />Fall 2010 Chemical Information Retrieval course<br />
  15. 15. Discovering outliers for melting points (stdev/average)<br />
  16. 16. Investigating the m.p. inconsistencies of EGCG<br />
  17. 17. Investigating the m.p. inconsistencies of cyclohexanone<br />
  18. 18. Most popular data sources<br />
  19. 19. Alfa Aesar donates melting points to the public<br />
  20. 20. Open Melting Point Explorer<br />(Andrew Lang)<br />
  21. 21. Outliers<br />MDPI <br />dataset<br />PhysProp (EPA donated all data to public also)<br />
  22. 22. Outliers for ethanol: Alfa Aesar and Oxford MSDS<br />
  23. 23. Inconsistencies and SMILES problems within MDPI dataset<br />
  24. 24. MDPI Dataset labeled with High Trust Level<br />
  25. 25. Open Melting Point Datasets<br />Currently 27,000 mps for 20,000 compounds<br />
  26. 26. What is the melting point of 4-benzyltoluene?<br />American Petroleum Institute5 C<br />PHYSPROP-30 C<br />PHYSPROP 125 C<br />peer reviewed journal (2008)97.5 C<br />government database-30 C<br />government database4.58 C<br />
  27. 27. The quest to resolve the melting point <br />of 4-benzyltoluene: liquid at room temp <br />and can be frozen <-30C (Evan Curtin)<br />
  28. 28. Open Lab Notebook page measuring the melting point of 4-benzyltoluene<br />
  29. 29. Motivation: Faster Science,Better Science<br />
  30. 30. Ruling out all melting points above -15C?<br />
  31. 31. Oops – 4-benzyltoluene freezes after 16 days at -15C!<br />
  32. 32. Measuring the melting point by slowly heating from -15 C gives 5 C<br />
  33. 33. There are NO FACTS, <br />only measurements embedded within assumptions<br />Open Notebook Science maintains the integrity of data provenance by making assumptions explicit<br />
  34. 34. TRUST<br />PROOF<br />
  35. 35. Common errors in datasets<br />multiple melting points for the same compound in the same database<br />stereochemistry issues<br />sign inversion<br />conversion errors (Kelvin/Celcius Fahrenheit/Celcius)<br />bad SMILES (non-rendering)<br />salts associated with SMILES for free base<br />using boiling point for melting point<br />
  36. 36. Open Random Forest modeling of Open Melting Point data using CDK descriptors<br />(Andrew Lang)<br />R2 = 0.78, TPSA and nHdon most important<br />
  37. 37. Melting point prediction service<br />
  38. 38. Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)<br />
  39. 39. Publication of double+ validated melting point dataset to Nature Precedings and LuLu<br />
  40. 40.
  41. 41.
  42. 42. Crowdsourcing Solubility Data<br />
  43. 43. ONS Challenge Judges<br />
  44. 44. ONS Challenge Award Winners<br />
  45. 45. Web services for summary data<br />(Andrew Lang)<br />
  46. 46. Reaction Attempts Book<br />
  47. 47. Reaction Attempts Book: Reactants listed Alphabetically<br />
  48. 48.
  49. 49. Interactive NMR spectra using JSpecView or ChemDoodleand the Open JCAMP-DX format<br />
  50. 50. Predicting Best Solvent for Imine Formation using solubility and melting point data <br />(Evan Curtin)<br />
  51. 51. Predicting Yield of Imine Formation in Ethanol <br />(Evan Curtin)<br />
  52. 52. Google Apps Scripts web services<br />
  53. 53. Google Apps Scripts for conveniently exploring melting point data<br />
  54. 54. Comparison of model with triple validated measurements<br />Straight chain carboxylic acids from 1 to 10 carbons<br />Straight chain alcohols from 1 to 10 carbons<br />
  55. 55. Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available) <br />
  56. 56. Google Apps Scripts for planning reactions and creating schemes<br />
  57. 57. Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)<br />
  58. 58. All ONS web services<br />
  59. 59. Some Initiatives Promoting More Openness in Drug Discovery<br />
  60. 60. Open Primary Research in Drug Design using Web2.0 tools (malaria)(blogs, wikis, Second Life, mailing lists)<br />Rajarshi Guha<br />Indiana U<br />Tsu-Soo Tan<br />Nanyang Inst.<br />Docking<br />JC Bradley<br />Drexel U<br />Synthesis<br />Phil Rosenthal<br />UCSF<br />(malaria)<br />Dan Zaharevitz<br />NCI<br />(tumors)<br />Testing<br />
  61. 61. Outcome of Guha-Bradley-Rosenthal collaboration<br />
  62. 62. Conclusions<br /><ul><li>For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance
  63. 63. Open Notebook Science can be a useful tool in this context</li>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×