Open Notebook Science:
   Transparency in
      Research
   Georgia Tech Library
    Open Access Week
       Jean-Claude Bradley
     Associate Professor of Chemistry
            Drexel University

           October 23, 2012
Openness in Chemistry



      WHY?
Dibenzalacetone derivatives docking against
          tubulin (paclitaxel site)




                               (Andrew Lang)
“Simple” aldol condensation synthesis



                                Top Hit
                                (no reports
                                of synthesis)



                               In top ten
                               (a few reports
                               of synthesis)


                            (Andrew Lang)
What is the current standard for “sufficient
 information” in communicating organic
                chemistry?


       By definition, all peer-reviewed
       published documentation has
       been approved as sufficient by
       authors, editors and reviewers.
Searching for aldol condensations of acetone
     in the Reaction Attempts database




                               (Andrew Lang)
Information from the literature on the target synthesis
Information from the literature on the target synthesis
Information from the literature on the target synthesis
A successful synthesis by avoiding water, dramatically
      increasing NaOH and long reaction time
An example of a failed experiment in an Open
      Notebook with useful information
A failed experiment reveals the importance of aldehyde
                       solubility
Motivation: Faster Science, Better
             Science
An example of a successful experiment in an Open
                   Notebook
Never having to leave the Google Spreadsheet
      dashboard for access to key info




                            (Andrew Lang and Rich
                                  Apodaca)
A click away from an interactive NMR display (using
        JCAMP-DX format and ChemDoodle)




                                   (Andrew Lang)
Contributing to Science while Teaching it:
  Chemical Information Retrieval Class
The Chemical Information Validation Sheet

       567 curated and referenced measurements from
       Fall 2010 Chemical Information Retrieval course
Discovering outliers for melting points
          (stdev/average)
Investigating the m.p. inconsistencies of EGCG
Investigating the m.p. inconsistencies of
             cyclohexanone
Most popular data sources
Alfa Aesar donates melting points to the public
Open Melting Point Explorer




                        (Andrew Lang)
Outliers
MDPI            EPI (donated all
dataset        data to public also)
Outliers for ethanol: Alfa Aesar and Oxford
                   MSDS
Inconsistencies and SMILES problems within
               MDPI dataset
MDPI Dataset labeled with High Trust Level
Open Melting Point Datasets
Currently 20,000 compounds with Open MPs
What is the melting point of 4-benzyltoluene?



  American Petroleum Institute5 C
  PHYSPROP                                -30 C
  PHYSPROP                                125
  C
  peer reviewed journal (2008) 97.5 C
  government database               -30 C
  government database               4.58 C
The quest to resolve the melting point
of 4-benzyltoluene: liquid at room temp
       and can be frozen <-30C
Open Lab Notebook page measuring the
   melting point of 4-benzyltoluene
Ruling out all melting points above -15C?
Oops – 4-benzyltoluene freezes after 16 days at -15C!
Measuring the melting point by slowly heating
            from -15 C gives 5 C
There are NO FACTS,
  only measurements embedded
        within assumptions

Open Notebook Science maintains
the integrity of data provenance by
   making assumptions explicit
Open Random Forest modeling of Open Melting Point
           data using CDK descriptors
                 (Andrew Lang)
   R2 = 0.78, TPSA and nHdon most important
Melting point prediction service
Web services for summary data




                      (Andrew Lang)
Using a Google Spreadsheet as a “dashboard interface”
          for reaction planning and analysis
Calling Google App Scripts
Calling Google App Scripts




                   (Andrew Lang and Rich
                         Apodaca)
Google Apps Scripts for conveniently
   exploring melting point data
Comparison of model with triple validated measurements
           Straight chain carboxylic acids from 1 to 10 carbons




              Straight chain alcohols from 1 to 10 carbons
Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for
               validation – only single source available)
Open Melting Points in Supplementary Data Pages
          of Wikipedia (Martin Walker)
Google Apps Scripts web services
Chemistry Google App Scripts description sheet




                            (Andrew Lang and Rich
                                  Apodaca)
Integration of Multiple Web Services to
  Recommend Solvents for Reactions




                             (Andrew Lang)
The Recrystallization App




                       (Andrew Lang)
The importance of recrystallization


• Generally preferred if there is a known
  solvent that gives a good yield

• Scales much more easily and cheaply than
  chromatography

• However, for new compounds much trial and
  error may be needed
How does it work?

1. Look up the solvent boiling point

2. Look up the room temperature solubility or predict it via
Abraham descriptors predicted from a model using the
CDK

3. Look up the solute melting point or predict it via a model
using the CDK

4. Use the melting point and the solubility at room
temperature to predict the solubility at boiling

5. Calculate the predicted recrystallization yield
The Recrystallization App produces and uses
Open Data:
•Open Solubility Collection and Models
•Open Melting Point Collection and Models
•Modeling depends mainly on CDK (Open
Source Software with Open Descriptors)
•Open Notebook Science
What are good solvents to recrystallize benzoic acid?




                                      (Andrew Lang)
Click on the solvent to see temp curve




                             (Andrew Lang)
Deliver melting point data via App




                           (Andrew Lang)
Chemical Information Retrieval 2012
       property assignment
Melting Point Outlier List
Melting Point Outlier example
Solubility Outlier List
Solubility of benzoic acid in 1-octanol
             discrepancies
Using ChemSpider to ensure all
stereocenters are defined before searching
              for properties
Using the InChIKey to find single isomers
Chemical Information Validation Sheet 2012
Each entry validated with an image
Avoiding redundant property data points with
  a single click within the validation sheet
Open Chemical Property Matrix (OCPM)
Boiling point         Vapor
                      pressure
                                        Flash point

     Abraham                     Melting point
     descriptors

                      logP
         Aqueous                       Octanol
         solubility                    solubility
Open Chemical Property Matrix (OCPM)
OCPM relationships
OCPM melting point sheet
Conclusions

More openness in chemistry can make science more efficient

Provide interfaces that make sense to the end users:
Open Data, Open Models and Open Source Software to modelers
Apps (smartphones, Google App Scripts, etc.) for chemists at the bench



                  Acknowledgements
   Andrew Lang (code, modeling)
   Bill Acree (modeling, solubility data contribution)
   Antony Williams (ChemSpider services, mp data curation)
   Matthew McBride and Rida Atif (recrystallization and synthesis)
   Kayla Gogarty (OCPM)

Bradley Open Notebook Science Georgia Tech OA week