CINF 2012 talk Recrystallization App


Published on

Jean-Claude Bradley presents on a recrystallization app based on Open Data feeds and models.

Published in: Education

CINF 2012 talk Recrystallization App

  1. 1. The deployment of an app from Open Data feedsand algorithms: Recommending recrystallization solvents ACS-CINF Symposium Jean-Claude Bradley Associate Professor of Chemistry Drexel University December 13, 2012
  2. 2. The importance of recrystallization• Generally preferred if there is a known solvent that gives a good yield• Scales much more easily and cheaply than chromatography• However, for new compounds much trial and error may be needed
  3. 3. The Recrystallization App (Andrew Lang)
  4. 4. What are good solvents to recrystallize benzoic acid? (Andrew Lang)
  5. 5. Click on the solvent to see temp curve (Andrew Lang)
  6. 6. Deliver melting point data via App (Andrew Lang)
  7. 7. How does it work?1. Look up the solvent boiling point2. Look up the room temperature solubility or predict it viaAbraham descriptors predicted from a model using theCDK3. Look up the solute melting point or predict it via amodel using the CDK4. Use the melting point and the solubility at roomtemperature to predict the solubility at boiling5. Calculate the predicted recrystallization yield
  8. 8. Openness in ChemistryThe Recrystallization App produces and usesOpen Data:• Open Solubility Collection and Models• Open Melting Point Collection and Models• Modeling depends mainly on CDK (Open Source Software with Open Descriptors)• Open Notebook Science WHY?
  9. 9. Open Data Collections are essential for this strategy Open transparent Data transformation Open Data Open Data Transparent chain of provenance
  10. 10. Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs
  11. 11. What is the melting point of 4-benzyltoluene? American Petroleum Institute 5C PHYSPROP -30 C PHYSPROP 125 C peer reviewed journal (2008) 97.5 C government database -30 C government database 4.58 C
  12. 12. Motivation: Faster Science, Better Science
  13. 13. The quest to resolve the melting pointof 4-benzyltoluene: liquid at room temp and can be frozen <-30C
  14. 14. Open Lab Notebook page measuring the melting point of 4-benzyltoluene
  15. 15. Ruling out all melting points above -15C?
  16. 16. Oops – 4-benzyltoluene freezes after 16 days at -15C!
  17. 17. Measuring the melting point by slowly heating from -15 C gives 5 C
  18. 18. There are NO FACTS, only measurements embedded within assumptions Open Notebook Science maintainsthe integrity of data provenance by making assumptions explicit
  19. 19. Open Random Forest modeling of Open Melting Point data using CDK descriptors (Andrew Lang) R2 = 0.78, TPSA and nHdon most important
  20. 20. Melting point prediction service
  21. 21. Web services for summary data (Andrew Lang)
  22. 22. Using a Google Spreadsheet as a “dashboard interface” for reaction planning and analysis
  23. 23. Calling Google App Scripts
  24. 24. Calling Google App Scripts (Andrew Lang and Rich Apodaca)
  25. 25. Never having to leave the Google Spreadsheet dashboard for access to key info (Andrew Lang and Rich Apodaca)
  26. 26. A click away from an interactive NMR display (using JCAMP-DX format and ChemDoodle) (Andrew Lang)
  27. 27. Google Apps Scripts for conveniently exploring melting point data
  28. 28. Comparison of model with triple validated measurements Straight chain carboxylic acids from 1 to 10 carbons Straight chain alcohols from 1 to 10 carbons
  29. 29. Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)
  30. 30. Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)
  31. 31. Dibenzalacetone derivatives docking against tubulin (paclitaxel site) (Andrew Lang)
  32. 32. “Simple” aldol condensation synthesis Top Hit (no reports of synthesis) In top ten (a few reports of synthesis) (Andrew Lang)
  33. 33. Information from the literature on the target synthesis
  34. 34. Information from the literature on the target synthesis
  35. 35. Searching for aldol condensations of acetone in the Reaction Attempts database (about90% of reactions in Open Notebooks are “not successful”) (Andrew Lang)
  36. 36. An example of a “failed experiment” in an Open Notebook with useful information
  37. 37. A failed experiment reveals the importance of aldehyde solubility
  38. 38. An example of a successful experiment in an Open Notebook
  39. 39. A successful synthesis by avoiding water, dramatically increasing NaOH and long reaction time
  40. 40. Chemical Information Retrieval 2012 property assignment
  41. 41. Melting Point Outlier List
  42. 42. Melting Point Outlier example
  43. 43. Solubility Outlier List
  44. 44. Solubility of benzoic acid in 1-octanol discrepancies
  45. 45. Using ChemSpider to ensure all stereocenters are defined before searching for properties
  46. 46. Using the InChIKey to find single isomers
  47. 47. Chemical Information Validation Sheet 2012
  48. 48. Each entry validated with an image
  49. 49. Avoiding redundant property data points with a single click within the validation sheet
  50. 50. Open Chemical Property Matrix (OCPM)Boiling point Vapor pressure Flash point Abraham Melting point descriptors logP Aqueous Octanol solubility solubility
  51. 51. Open Chemical Property Matrix (OCPM)
  52. 52. OCPM relationships
  53. 53. OCPM melting point sheet
  54. 54. Dibenzalacetone libraries are promising for connecting the OCPM with useful applications
  55. 55. ConclusionsMore openness in chemistry can make science more efficientProvide interfaces that make sense to the end users:Open Data, Open Models and Open Source Software to modelersApps (smartphones, Google App Scripts, etc.) for chemists at the bench Acknowledgements Andrew Lang (code, modeling) Bill Acree (modeling, solubility data contribution) Antony Williams (ChemSpider services, mp data curation) Matthew McBride and Rida Atif (recrystallization and synthesis) Kayla Gogarty (OCPM)