Open Notebooks Science

4,878 views
5,306 views

Published on

A keynote talk I gave OSU Research Week on the importance of Open Science, especially Open Notebook Science, illustrated by practical examples. Talk inspired by Jean-Claude Bradley. Slides inspired by Cameron Neylon.

Published in: Science, Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
4,878
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • http://usefulchem.wikispaces.com/D-EXP022
    From a library of derivatives, it was the hop hit for the docking site of taxol
  • Open Notebooks Science

    1. 1. Andrew Lang Professor of Mathematics Oral Roberts University February 17, 2014 OSU Research Week
    2. 2. -Cameron Neylon
    3. 3. Eight committees investigated the allegations and published reports, finding no evidence of fraud or scientific misconduct. However, the reports* called on the scientists to avoid any such allegations in the future by taking steps to regain public confidence in their work, for example by opening up access to their supporting data, processing methods and software, and by promptly honouring freedom of information requests. * Archana Venkatraman, "Data Without the Doubts". Information World Review
    4. 4. Andrew Wakefield’s study, linked the measles, mumps and rubella vaccine to autism. Vaccination rates in the developed world plummeted after the study’s publication and a heated anti-vaccination movement persists today.
    5. 5. http://www.cfr.org/interactives/GH_Vaccine_Map/#map
    6. 6. ?
    7. 7. Science has lost its way, at a big cost to humanity Researchers are rewarded for splashy findings, not for double-checking accuracy. So many scientists looking for cures to diseases have been building on ideas that aren't even true. A few years ago, scientists at the Thousand Oaks biotech firm Amgen set out to double-check the results of 53 landmark papers in their fields of cancer research and blood biology. The idea was to make sure that research on which Amgen was spending millions of development dollars still held up. They figured that a few of the studies would fail the test — that the original results couldn't be reproduced because the findings were especially novel or described fresh therapeutic approaches. But what they found was startling: Of the 53 landmark papers, only six could be proved valid. http://www.latimes.com/business/la-fi-hiltzik-20131027,0,1228881.column#axzz2ix1w9zGf
    8. 8. A special challenge for science writers covering research today arises from science’s growing credibility problem. It stems from the cumulative effect of errors and exaggerations that has fueled a recent rise in retractions, misconduct, and fraud among peer-reviewed researchers. For reporters covering major scientific developments – from the search for alien life and genomics, to particle physics, climate change and cancer — it can be difficult to distinguish error from fraud, sloppiness from deception, eagerness from greed or, increasingly, scientific conviction from partisan passion. Findings in fields from climate change to vaccines can also be deceptively cherry-picked in service of a political cause.
    9. 9. trust evidence
    10. 10. trust documentation
    11. 11. trust confidence
    12. 12. trust reproducibility
    13. 13. Anything produced is released under a CC0 license: Open Data, Open Access, Open Source.
    14. 14. Faster Science failed experiments discoverable unexpected collaborations real-time data and results
    15. 15. Faster Science failed experiments discoverable unexpected collaborations real-time data and results
    16. 16. Faster Science failed experiments discoverable unexpected collaborations real-time data and results
    17. 17. Faster Science failed experiments discoverable unexpected collaborations real-time data and results
    18. 18. Faster Science failed experiments discoverable unexpected collaborations real-time data and results
    19. 19. no insider information reusability reproducibility transparency
    20. 20. no insider information reusability reproducibility transparency
    21. 21. no insider information reusability reproducibility transparency
    22. 22. no insider information reusability reproducibility transparency
    23. 23. no insider information reusability reproducibility transparency
    24. 24. Open Drug Discovery for Neglected Diseases malaria schistosomiasis gram positive bacteria breast cancer
    25. 25. Drugs for neglected diseases need to be…
    26. 26. cheap and…
    27. 27. easy to make.
    28. 28. docking combinatorial library synthesis solvent selection recrystallization biological assay solubility models solubility data melting point models melting point data The big picture
    29. 29. docking combinatorial library synthesis solvent selection recrystallization biological assay solubility models solubility data melting point models melting point data Let’s focus
    30. 30. Early models, before 2005 were…
    31. 31. …specialized 1979 Martin – disubstituted benzenes 1987 Hanson – normal alkanes 1988 Needham – normal and branched alkanes 1990 Abramowitz – non-hydrogen bonded benzenes 1991 Dearden – anilines 1993 Katritzky – aldehydes, amines, and ketones 1994 Simamora – rigid aromatic 1996 Charlton – alkanes 1996 Katritzky – pyridines 1999 Zhao – aliphatic 2001 Chickos – homologous series 2003 Bergstrom – druglike (N = 277, r2 = 0.54)
    32. 32. In 2005… …everything changed
    33. 33. MDPI - cheminformatics.org Karthikeyan 2005 N = 4173, r2 = 0.65
    34. 34. PHYSPROP Clark 2005 N = 6257, r2 = 0.61
    35. 35. Recent melting point models use these datasets… …never reproducing r2 = 0.65 (0.47 – 0.56)
    36. 36. Even though [a] melting point can be measured accurately, its prediction has been a notoriously difficult problem.
    37. 37. We began measuring, collecting, and curating melting points in the Fall of 2010
    38. 38. Jean-Claude Bradley’s Chemical Information Retrieval Course at Drexel 567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course
    39. 39. Most popular data sources… …chemical vendors
    40. 40. Alfa Aesar donates ~13,000 melting points to the public domain
    41. 41. collection curation modelingvalidation measurement ONS melting point workflow
    42. 42. Collection: Open Data source data points curated values source year data type Bell 2483 1631 1995 donated-CC0 Bergstrom 277 277 2003 open MDPI-Karthikeyan 4450 4084 2005 open Hughes 287 262 2008 open Oxford-MSDS 3217 1481 2010 open Drugbank 875 875 2011 open Griffiths 3757 278 2011 donated-CC0 Alfa Aesar 12986 8739 2011 donated-CC0 PHYSPROP 11645 9694 2011 donated-CC0 ONS 471 471 2012 open 27792 curated measurements for 19515 compounds
    43. 43. Curation is… …lots of hard, tedious work (Jean-Claude Bradley and Antony Williams) Antony Williams – RSC ChemSpider
    44. 44. Inconsistencies and SMILES problems within the “high trust level” MDPI dataset
    45. 45. PHYSPROP Structure Errors (Incorrect Valence) 2315 out of 43543 contained pentavalent nitrogens
    46. 46. PHYSPROP Errors: Structure displayed is for the neutral compound dopamine but the associated CAS Number and chemical name in the file are for the hydrobromide salt.
    47. 47. unit errors: Kelvin/Celsius, Fahrenheit/Celsius bad SMILES (non-rendering, hypervalency) salts associated with SMILES for free base using boiling point for melting point
    48. 48. Some melting points can’t be resolved only with literature: 4-benzyltoluene
    49. 49. Open lab notebook page measuring the melting point of 4-benzyltoluene
    50. 50. Melting Point Model CDK descriptor calculator R statistical computing melting point data
    51. 51. use this model
    52. 52. compounds doubleplusgood single CDK descriptor calculator R statistical computing Melting Point Model
    53. 53. Straight chain carboxylic acids from 1 to 10 carbons Straight chain alcohols from 1 to 10 carbons Comparison of model with double+ validated measurements
    54. 54. Cyclic primary amines from 3 to 6 carbons cyclobutylamine flagged for measurement only single source available
    55. 55. Publication of double+ validated melting point dataset …as a preprint
    56. 56. Publication of double+ validated melting point dataset …as a book
    57. 57. Data and model deployed… …on the web web service
    58. 58. …in Google spreadsheets
    59. 59. …as an app
    60. 60.  Can the solvents used to recrystallize compounds in organic teaching labs be improved?  Trans-dibenzalacetone  Aldol condensation between two molecules of benzaldehyde and one molecule of acetone [Matthew McBride: Undergraduate Research Assistant - Drexel]
    61. 61.  First recrystallized in ethyl acetate in 1906: Straus and Ecker, Ber. 39, 2988 (1906)  Recrystallized in ethyl acetate in Organic Syntheses
    62. 62.  Recommended recrystallization solvent: ethyl acetate. (http://classes.kvcc.edu/chm230/mixed%20aldol%20condensation.pdf (http://www.xula.edu/chemistry/documents/orgleclab/Aldol_notes.pdf)
    63. 63. Enter compound identification and desired parameters
    64. 64. How does it work? 1. Look up the solvent boiling point 2. Look up the room temperature solubility or predict it via measured or predicted Abraham descriptors 3. Look up the solute melting point or predict it via a model 4. Use the melting point and the solubility at room temperature to predict the solubility at boiling 5. Calculate the predicted recrystallization yield
    65. 65. Lists solvents and their predicted recrystallization yield. Prediction is generated by the temperature dependent solubility curves.
    66. 66.  ethyl acetate (predicted yield of 72%) vs ethanol (predicted yield of 93%)  ethyl acetate  ethanol 0.09M 1.1M 0.62M 2.06M
    67. 67. Dibenzalacetone derivatives docking against tubulin (paclitaxel site)
    68. 68.  Derivatives of dibenzalacetone may be synthesized by altering the aldehyde used  From a library of derivatives, the following compound was the top hit for the docking site of Taxol  Uses phenanthrene-9-carboxaldehyde
    69. 69.  Perform a Reaxys search to determine availability of synthesis procedures  No results [Matthew McBride: Undergraduate Research Assistant - Drexel]
    70. 70.  Used methanol and benzene  Melting Point: 264-265°C (http://usefulchem.wikispaces.com/EXP286) [Matthew McBride: Undergraduate Research Assistant - Drexel]
    71. 71. trust reproducibility open notebook science
    72. 72. Acknowledgements Jean-Claude Bradley (Drexel) Cameron Neylon (Advocacy Director at PLOS) Antony Williams (RSC ChemSpider) Drexel research assistants: Evan Curtin and Matthew McBride ORU research assistants: David Bulger, Daryl Charron, Lizzie Clark, Lacey Condron, Samantha Gaines, Alejandro Hernandez, Maria Hernandez, Jesse Patsolic, and Matthew Wilson

    ×