Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ChemInfo 2011 class1


Published on

Jean-Claude Bradley presents the introductory lecture for Chemical Information Retrieval at Drexel University for Fall 2011 on September 23, 2011. Examples are given to demonstrate how difficult it can be to find and assess chemical information such as melting points. An overview of the class wiki is then given

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

ChemInfo 2011 class1

  1. 1. Chemical Information Retrieval 2011 Jean-Claude Bradley September 23, 2011 First Class Associate Professor of Chemistry Drexel University CHEM367/767 Drexel University
  2. 2. Finding reliable chemical information can be really hard
  3. 3. After this class, you should feel that you can never blindly trust chemical data sources again
  4. 4. But… You will learn how to do the best you can with imperfect information
  5. 5. The Chemical Information Validation Sheet 567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course
  6. 6. Discovering outliers for melting points (stdev/average)
  7. 7. Investigating the m.p. inconsistencies of EGCG
  8. 8. Investigating the m.p. inconsistencies of cyclohexanone
  9. 9. Most popular data sources
  10. 10. Alfa Aesar donates melting points to the public
  11. 11. Open Melting Point Explorer (Andrew Lang)
  12. 12. Outliers MDPI dataset EPI (donated all data to public also)
  13. 13. Outliers for ethanol: Alfa Aesar and Oxford MSDS
  14. 14. Inconsistencies and SMILES problems within MDPI dataset
  15. 15. MDPI Dataset labeled with High Trust Level
  16. 16. Open Melting Point Datasets Currently 20,000 compounds with Open MPs
  17. 17. American Petroleum Institute 5 C PHYSPROP -30 C PHYSPROP 125 C peer reviewed journal (2008) 97.5 C government database -30 C government database 4.58 C What is the melting point of 4-benzyltoluene?
  18. 18. The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp and can be frozen <-30C
  19. 19. Open Lab Notebook page measuring the melting point of 4-benzyltoluene
  20. 20. Motivation: Faster Science, Better Science
  21. 21. Ruling out all melting points above -15C?
  22. 22. Oops – 4-benzyltoluene freezes after 16 days at -15C!
  23. 23. Measuring the melting point by slowly heating from -15 C gives 5 C
  24. 24. There are NO FACTS, only measurements embedded within assumptions Open Notebook Science maintains the integrity of data provenance by making assumptions explicit
  25. 25. Open Random Forest modeling of Open Melting Point data using CDK descriptors (Andrew Lang) R2 = 0.78, TPSA and nHdon most important
  26. 26. Melting point prediction service
  27. 27. Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)
  28. 28. Using melting point for temperature dependent solubility prediction
  29. 29. Web services for summary data (Andrew Lang)
  30. 30. Web service calls from within a Google Spreadsheet for solubility measurement and prediction (Andrew Lang)
  31. 31. Integration of Multiple Web Services to Recommend Solvents for Reactions (Andrew Lang)
  32. 32. Publication of double+ validated melting point dataset to Nature Precedings and LuLu
  33. 35. Reaction Attempts Book
  34. 36. Reaction Attempts Book: Reactants listed Alphabetically
  35. 38. All ONS web services
  36. 39. Google Apps Scripts web services
  37. 40. Google Apps Scripts for conveniently exploring melting point data
  38. 41. Straight chain carboxylic acids from 1 to 10 carbons Straight chain alcohols from 1 to 10 carbons Comparison of model with triple validated measurements
  39. 42. Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)
  40. 43. Google Apps Scripts for planning reactions and creating schemes
  41. 44. Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)
  42. 45. Web services from data collected in this class will be added here
  43. 46. In this class you will learn How to search Science1.0 resources <ul><li>Peer-Reviewed journals </li></ul><ul><li>Commercial databases </li></ul><ul><li>Patents </li></ul><ul><li>Conference Proceedings </li></ul>
  44. 47. In this class you will learn How to participate in Science2.0 <ul><li>wikis (Wikipedia, class wiki) </li></ul><ul><li>blogs </li></ul><ul><li>interactive databases (ChemSpider) </li></ul><ul><li>social software (Twitter, FriendFeed) </li></ul>
  45. 48. In this class you will learn How to leverage Science3.0 (via collaboration with Andrew Lang) <ul><li>machine readable web-services </li></ul>
  46. 49. Now lets take a look at the class wiki