The Role of Trust in Science at SLA 2011


Published on

Jean-Claude Bradley presents at the Special Libraries Association meeting on June 14, 2011 on the "International Year of Chemistry: Perils and Promises of Modern Communication in the Sciences- The Role of Trust". The talk mainly covers the problems with a trusted source based model for melting point data and demonstrates that an Open Data model including Open Notebook Science when necessary can be very helpful in curating datasets. Web services for experimental and predicted melting points are then reviewed.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The Role of Trust in Science at SLA 2011

  1. 1. International Year of Chemistry: <br />Perils and Promises of Modern Communication <br />in the Sciences <br />The Role of Trust<br />Special Libraries Association<br />Jean-Claude Bradley<br />Department of Chemistry<br />Drexel University<br />June 14, 2011<br />
  2. 2. Unknown Perils of the Past<br />Before online databases (early 90s) searching for properties like melting points using ONE “trusted source” was practical<br /><ul><li>CRC Handbook
  3. 3. Merck Index
  4. 4. Chemical Vendor Catalogs (e.g. Sigma-Aldrich)
  5. 5. Peer-Reviewed Journals</li></li></ul><li>Known Perils of the Present<br />Today, many librarians discourage the use of new online sources (like Wikipedia) for the searching of chemical data and recommend using only “trusted sources”<br />The problem is that the “trusted source” model is - and always was – fundamentally flawed.<br />Ironically most of Wikipedia’s chemical information is problematic BECAUSE it is based on “trusted sources”! <br />
  6. 6. Promises for the Future<br />Using technology, we can begin to replace the “trusted source” model with one based on transparency and provenance<br />
  7. 7. The current state of transparency in scientific communication<br />Case study of melting point data<br />
  8. 8. The Chemical Information Validation Sheet <br />567 curated and referenced measurements from <br />Fall 2010 Chemical Information Retrieval course<br />
  9. 9. The Chemical Information Validation Explorer <br />(Andrew Lang)<br />
  10. 10. Discovering outliers for melting points (stdev/average)<br />
  11. 11. Investigating the m.p. inconsistencies of EGCG<br />
  12. 12. Investigating the m.p. inconsistencies of cyclohexanone<br />
  13. 13. Most popular data sources<br />
  14. 14. Alfa Aesar donates melting points to the public<br />
  15. 15. Open Melting Point Explorer<br />(Andrew Lang)<br />
  16. 16. Outliers<br />MDPI <br />dataset<br />EPI (donated all data to public also)<br />
  17. 17. Outliers for ethanol: Alfa Aesar and Oxford MSDS<br />
  18. 18. Inconsistencies and SMILES problems within MDPI dataset<br />
  19. 19. MDPI Dataset labeled with High Trust Level<br />
  20. 20. Open Melting Point Datasets<br />Currently 20,000 compounds with Open MPs<br />
  21. 21. Live curation on a public Google Spreadsheet of compounds with highest mp ranges<br />(collaboration with Andrew Lang and Antony Williams)<br />
  22. 22. Some melting points can’t be resolved <br />only with literature: 4-benzyltoluene<br />
  23. 23. The quest to resolve the melting point <br />of 4-benzyltoluene: liquid at room temp <br />and can be frozen <-30C<br />
  24. 24. The quest to resolve the melting point <br />of 4-benzyltoluene: ambiguous results upon heating but clearly remains a liquid at -15 C for 2 days in freezer<br />
  25. 25. Further investigation into the literature for <br />the melting point of 4-benzyltoluene<br />Although a general description of method is provided the raw data are not<br />
  26. 26. Because of broken provenance errors cascade through the literature <br />Calculations in patent based on incorrect data <br />
  27. 27. Open Random Forest modeling of Open Melting Point data using CDK descriptors<br />(Andrew Lang)<br />R2 = 0.78, TPSA and nHdon most important<br />
  28. 28. Melting point prediction service<br />
  29. 29. Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)<br />
  30. 30. Using melting point for temperature dependent solubility prediction<br />
  31. 31. Motivation: Faster Science,Better Science<br />
  32. 32. There are NO FACTS, <br />only measurements embedded within assumptions<br />Open Notebook Science maintains the integrity of data provenance by making assumptions explicit<br />
  33. 33. TRUST<br />PROOF<br />
  34. 34. Strategy for an Open Notebook:<br />First record then abstract structure<br />In order to be discoverable use Google friendly formats (simple HTML, no login) <br />In order to be replicable use free hosted tools (Wikispaces, Google Spreadsheets)<br />
  35. 35. Crowdsourcing Solubility Data<br />
  36. 36. Data provenance: <br />From Wikipedia to…<br />
  37. 37. …the lab notebook and raw data<br />
  38. 38. Calculations Made Public on <br />Google Spreadsheets<br />
  39. 39. Interactive NMR spectra using JSpecView and JCAMP-DX<br />
  40. 40. Raw Data As Images<br />Splatter?<br />Some liquid<br />
  41. 41. YouTube for demonstrating experimental set-up<br />
  42. 42. Solubilities collected in a Google Spreadsheet<br />
  43. 43. Rajarshi Guha’s Live Web Query using Google Viz API<br />
  44. 44. Web services for summary data<br />(Andrew Lang)<br />
  45. 45. Web service calls from within a Google Spreadsheet for solubility measurement and prediction<br />(Andrew Lang)<br />
  46. 46. Integration of Multiple Web Services to Recommend Solvents for Reactions<br />(Andrew Lang)<br />
  47. 47.
  48. 48.
  49. 49.
  50. 50. Reaction Attempts Book<br />
  51. 51. Reaction Attempts Book: Reactants listed Alphabetically<br />
  52. 52. ONS Challenge Solubility Book cited for nanotechnology application<br />
  53. 53. Data Disks<br />
  54. 54. All ONS web services <br />
  55. 55. For all Formats of ONS Projects<br />
  56. 56. Conclusions<br /><ul><li>For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance
  57. 57. Open Notebook Science offers an efficient way to make research transparent and discoverable</li>