Your SlideShare is downloading. ×

CINF 2012 talk Recrystallization App


Published on

Jean-Claude Bradley presents on a recrystallization app based on Open Data feeds and models.

Jean-Claude Bradley presents on a recrystallization app based on Open Data feeds and models.

Published in: Education
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. The deployment of an app from Open Data feedsand algorithms: Recommending recrystallization solvents ACS-CINF Symposium Jean-Claude Bradley Associate Professor of Chemistry Drexel University December 13, 2012
  • 2. The importance of recrystallization• Generally preferred if there is a known solvent that gives a good yield• Scales much more easily and cheaply than chromatography• However, for new compounds much trial and error may be needed
  • 3. The Recrystallization App (Andrew Lang)
  • 4. What are good solvents to recrystallize benzoic acid? (Andrew Lang)
  • 5. Click on the solvent to see temp curve (Andrew Lang)
  • 6. Deliver melting point data via App (Andrew Lang)
  • 7. How does it work?1. Look up the solvent boiling point2. Look up the room temperature solubility or predict it viaAbraham descriptors predicted from a model using theCDK3. Look up the solute melting point or predict it via amodel using the CDK4. Use the melting point and the solubility at roomtemperature to predict the solubility at boiling5. Calculate the predicted recrystallization yield
  • 8. Openness in ChemistryThe Recrystallization App produces and usesOpen Data:• Open Solubility Collection and Models• Open Melting Point Collection and Models• Modeling depends mainly on CDK (Open Source Software with Open Descriptors)• Open Notebook Science WHY?
  • 9. Open Data Collections are essential for this strategy Open transparent Data transformation Open Data Open Data Transparent chain of provenance
  • 10. Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs
  • 11. What is the melting point of 4-benzyltoluene? American Petroleum Institute 5C PHYSPROP -30 C PHYSPROP 125 C peer reviewed journal (2008) 97.5 C government database -30 C government database 4.58 C
  • 12. Motivation: Faster Science, Better Science
  • 13. The quest to resolve the melting pointof 4-benzyltoluene: liquid at room temp and can be frozen <-30C
  • 14. Open Lab Notebook page measuring the melting point of 4-benzyltoluene
  • 15. Ruling out all melting points above -15C?
  • 16. Oops – 4-benzyltoluene freezes after 16 days at -15C!
  • 17. Measuring the melting point by slowly heating from -15 C gives 5 C
  • 18. There are NO FACTS, only measurements embedded within assumptions Open Notebook Science maintainsthe integrity of data provenance by making assumptions explicit
  • 19. Open Random Forest modeling of Open Melting Point data using CDK descriptors (Andrew Lang) R2 = 0.78, TPSA and nHdon most important
  • 20. Melting point prediction service
  • 21. Web services for summary data (Andrew Lang)
  • 22. Using a Google Spreadsheet as a “dashboard interface” for reaction planning and analysis
  • 23. Calling Google App Scripts
  • 24. Calling Google App Scripts (Andrew Lang and Rich Apodaca)
  • 25. Never having to leave the Google Spreadsheet dashboard for access to key info (Andrew Lang and Rich Apodaca)
  • 26. A click away from an interactive NMR display (using JCAMP-DX format and ChemDoodle) (Andrew Lang)
  • 27. Google Apps Scripts for conveniently exploring melting point data
  • 28. Comparison of model with triple validated measurements Straight chain carboxylic acids from 1 to 10 carbons Straight chain alcohols from 1 to 10 carbons
  • 29. Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)
  • 30. Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)
  • 31. Dibenzalacetone derivatives docking against tubulin (paclitaxel site) (Andrew Lang)
  • 32. “Simple” aldol condensation synthesis Top Hit (no reports of synthesis) In top ten (a few reports of synthesis) (Andrew Lang)
  • 33. Information from the literature on the target synthesis
  • 34. Information from the literature on the target synthesis
  • 35. Searching for aldol condensations of acetone in the Reaction Attempts database (about90% of reactions in Open Notebooks are “not successful”) (Andrew Lang)
  • 36. An example of a “failed experiment” in an Open Notebook with useful information
  • 37. A failed experiment reveals the importance of aldehyde solubility
  • 38. An example of a successful experiment in an Open Notebook
  • 39. A successful synthesis by avoiding water, dramatically increasing NaOH and long reaction time
  • 40. Chemical Information Retrieval 2012 property assignment
  • 41. Melting Point Outlier List
  • 42. Melting Point Outlier example
  • 43. Solubility Outlier List
  • 44. Solubility of benzoic acid in 1-octanol discrepancies
  • 45. Using ChemSpider to ensure all stereocenters are defined before searching for properties
  • 46. Using the InChIKey to find single isomers
  • 47. Chemical Information Validation Sheet 2012
  • 48. Each entry validated with an image
  • 49. Avoiding redundant property data points with a single click within the validation sheet
  • 50. Open Chemical Property Matrix (OCPM)Boiling point Vapor pressure Flash point Abraham Melting point descriptors logP Aqueous Octanol solubility solubility
  • 51. Open Chemical Property Matrix (OCPM)
  • 52. OCPM relationships
  • 53. OCPM melting point sheet
  • 54. Dibenzalacetone libraries are promising for connecting the OCPM with useful applications
  • 55. ConclusionsMore openness in chemistry can make science more efficientProvide interfaces that make sense to the end users:Open Data, Open Models and Open Source Software to modelersApps (smartphones, Google App Scripts, etc.) for chemists at the bench Acknowledgements Andrew Lang (code, modeling) Bill Acree (modeling, solubility data contribution) Antony Williams (ChemSpider services, mp data curation) Matthew McBride and Rida Atif (recrystallization and synthesis) Kayla Gogarty (OCPM)