Chemical Information Retrieval Class 1


Published on

Jean-Claude Bradley presents the first lecture of Chemical Information Retrieval in the Fall of 2012 at Drexel University.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Chemical Information Retrieval Class 1

  1. 1. Chemical Information Retrieval 2012 First Class CHEM367/767 Drexel University Jean-Claude Bradley Associate Professor of Chemistry Drexel University September 28, 2012
  2. 2. Finding reliablechemical information can be really hard
  3. 3. After this class, you should feel that you can never blindly trustchemical data sources again
  4. 4. But…You will learn how to do the best you can with imperfect information
  5. 5. The Chemical Information Validation Sheet 567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course
  6. 6. Discovering outliers for melting points (stdev/average)
  7. 7. Investigating the m.p. inconsistencies of EGCG
  8. 8. Investigating the m.p. inconsistencies of cyclohexanone
  9. 9. Most popular data sources
  10. 10. Alfa Aesar donates melting points to the public
  11. 11. Open Melting Point Explorer (Andrew Lang)
  12. 12. OutliersMDPI EPI (donated alldataset data to public also)
  13. 13. Outliers for ethanol: Alfa Aesar and Oxford MSDS
  14. 14. Inconsistencies and SMILES problems within MDPI dataset
  15. 15. MDPI Dataset labeled with High Trust Level
  16. 16. Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs
  17. 17. What is the melting point of 4-benzyltoluene? American Petroleum Institute5 C PHYSPROP -30 C PHYSPROP 125 C peer reviewed journal (2008) 97.5 C government database -30 C government database 4.58 C
  18. 18. The quest to resolve the melting pointof 4-benzyltoluene: liquid at room temp and can be frozen <-30C
  19. 19. Open Lab Notebook page measuring the melting point of 4-benzyltoluene
  20. 20. Motivation: Faster Science, Better Science
  21. 21. Ruling out all melting points above -15C?
  22. 22. Oops – 4-benzyltoluene freezes after 16 days at -15C!
  23. 23. Measuring the melting point by slowly heating from -15 C gives 5 C
  24. 24. There are NO FACTS, only measurements embedded within assumptionsOpen Notebook Science maintainsthe integrity of data provenance by making assumptions explicit
  25. 25. “Simple” aldol condensation synthesis Top Hit (no reports of synthesis) In top ten (a few reports of synthesis) (Andrew Lang)
  26. 26. Information from the literature on the target synthesis
  27. 27. Information from the literature on the target synthesis
  28. 28. An example of a “failed experiment” in an Open Notebook with useful information
  29. 29. A successful synthesis by avoiding water, dramatically increasing NaOH and long reaction time
  30. 30. Open Random Forest modeling of Open Melting Point data using CDK descriptors (Andrew Lang) R2 = 0.78, TPSA and nHdon most important
  31. 31. Melting point prediction service
  32. 32. Web services for summary data (Andrew Lang)
  33. 33. Using a Google Spreadsheet as a “dashboard interface” for reaction planning and analysis
  34. 34. Calling Google App Scripts
  35. 35. Calling Google App Scripts (Andrew Lang and Rich Apodaca)
  36. 36. Google Apps Scripts for conveniently exploring melting point data
  37. 37. Comparison of model with triple validated measurements Straight chain carboxylic acids from 1 to 10 carbons Straight chain alcohols from 1 to 10 carbons
  38. 38. Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)
  39. 39. Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)
  40. 40. Google Apps Scripts web services
  41. 41. Integration of Multiple Web Services to Recommend Solvents for Reactions (Andrew Lang)
  42. 42. What are good solvents to recrystallize benzoic acid? (Andrew Lang)
  43. 43. Click on the solvent to see temp curve (Andrew Lang)
  44. 44. Deliver melting point data via App (Andrew Lang)
  45. 45. Web services from data collected in this class will be added here
  46. 46. In this class you will learnHow to search Science1.0 resources •Peer-Reviewed journals •Commercial databases •Patents •Conference Proceedings
  47. 47. In this class you will learn How to participate in Science2.0•wikis (Wikipedia, class wiki)•blogs•interactive databases (ChemSpider)•social software (Twitter, FriendFeed)
  48. 48. In this class you will learn How to leverage Science3.0 •machine readable web-services(via collaboration with Andrew Lang)
  49. 49. Now lets take a look at the class wiki