Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Science


Published on

Jean-Claude Bradley presents on October 9, 2009 at the Northeastern Regional Meeting of the American Chemical Society in Hartford. This talk, entitled "Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Science", was part of a symposium on Publishing and Promoting Chemistry in the Internet Age. It consists of an overview of Open Notebook Science with some new content on solubility prediction algorithms written by Andrew Lang and a few example of students taking a Chemical Information Retrieval class at Drexel University using research logs on a wiki to flesh out their projects.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Science

  1. 1. Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Science Jean-Claude Bradley October 9, 2009 Northeast Regional Meeting of the American Chemical Society Associate Professor of Chemistry Drexel University
  2. 2. The NaH oxidation controversy
  3. 3. Information spreads quickly through the blogosphere
  4. 4. 15% NMR yield
  5. 6. Khalid Mirza and Marshall Moritz
  6. 8. Top results on a Google search
  7. 9. Motivation: Faster Science, Better Science
  8. 10. Open Notebook Science Logos (Andy Lang, Shirley Wu) Sharing: how much and when
  9. 11. There are NO FACTS, only measurements embedded within assumptions Open Notebook Science maintains the integrity of data provenance by making assumptions explicit
  10. 12. TRUST PROOF
  11. 13. The solubility of 4-chlorobenzaldehyde
  12. 14. The Log makes Assumptions Explicit
  13. 15. The Rationale of Findings Explicit
  14. 16. Raw Data Made Public Splatter? Some liquid
  15. 17. YouTube for demonstrating experimental set-up
  16. 18. Calculations Made Public on Google Spreadsheets
  17. 19. Link to Docking Results: Lists of SMILES in GoogleDocs (Rajarshi Guha)
  18. 20. Link to Docking Procedure (Rajarshi Guha)
  19. 21. Revision History on Google Spreadsheets
  20. 22. Wiki Page History
  21. 23. Comparing Wiki Page Versions
  22. 24. Proof of Purity with interactive NMR spectrum using JSpecView and JCAMP-DX
  23. 25. Linking to Molecules in Chemistry Databases
  24. 26. Experimental Spectra and User-Deposited Data on ChemSpider
  25. 27. (Andy Lang, Tony Williams) Open Data JCAMP spectra for education (Andy Lang, Tony Williams, Robert Lancashire)
  26. 28. Database Curation via Game Playing
  27. 29. Over 50,000 spectrum views so far - worldwide
  28. 30. Link Spectral Game to Open Educational Content
  29. 31. NMR game in Second Life (Andy Lang)
  30. 32. The Ugi reaction: can we predict precipitation? Can we predict solubility in organic solvents?
  31. 33. Crowdsourcing Solubility Data
  32. 34. ONS Submeta Award Winners
  33. 35. ONS Challenge Judges
  34. 36. Teaching Lab: Brent Friesen (Dominican University)
  35. 37. Solubility Experiment List
  36. 38. Solubilities collected in a Google Spreadsheet
  37. 39. Rajarshi Guha’s Live Web Query using Google Viz API
  38. 40. Rajarshi Guha and Andy Lang: Chemical Space Explorer
  39. 41. WE ARE HERE How can the scientific process become more automated?
  40. 42. The Robot Scientist
  41. 43. Semi-Automated Measurement of solubility via web service analysis of JCAMP-DX files (Andy Lang)
  42. 44. Solubility Measurement Requests: DoSol sheet <ul><li>Outlier Bot: flags measurements with high standard deviation to mean ratios </li></ul><ul><li>Google Analytics queries – new solvent/solute searches </li></ul><ul><li>Solubility request form – researcher in Israel requesting pyrene in acetonitrile solubility for environmental soil contamination study </li></ul><ul><li>Application based models – high priority Ugi reactants </li></ul>
  43. 45. Solubility Prediction (Andy Lang’s Model)
  44. 46. Solubility prediction can generate requests for additional measurements
  45. 47. Solvent mixture and temperature: multidimensional solubility data Actual Data (4-nitrobenzaldehyde) From quadratic regression of 5D space Feeds DoSol Sheet the next points to measure to best cover the space
  46. 48. Understanding in addition to empirical modeling Missed in a prior publication on solubility for this compound
  47. 49. Data provenance: From Wikipedia to…
  48. 50. … the lab notebook and raw data
  49. 51. Including links to the literature
  50. 52. Pierre Lindenbaum’s Solubility Data as RDF Triples
  51. 53. <ul><li>Concentration (0.4, 0.2, 0.07 M) </li></ul><ul><li>Solvent (methanol, ethanol, acetonitrile, THF) </li></ul><ul><li>Excess of some reagents (1.2 eq.) </li></ul>How does Open Notebook Science fit with traditional publication?
  52. 54. Mettler-Toledo MiniMapper
  53. 55. Mettler-Toledo MiniBlock System
  54. 56. XML reports from MiniMapper robot
  55. 57. GoogleDoc to program and report
  56. 58. Paper written on Wiki
  57. 59. References to papers, blog posts, lab notebook pages, raw data
  58. 60. Paper on Journal of Visualized Experiments (JoVE)
  59. 61. Pre-print on Nature Precedings
  60. 62. ChemSpider Automated Mark-up of Chemical Names
  61. 63. Cameron Neylon’s Notebooks Other Open Notebooks
  62. 64. Anthony Salvagno’s Notebook (Steve Koch group)
  63. 65. Educational “Open Notebooks”
  64. 66. Educational “Open Notebooks”
  65. 67. Educational “Open Notebooks”
  66. 68. Crowdsourcing ChemInfo Resource Collection