The collection, curation and modeling of Open Melting Point measurements
Upcoming SlideShare
Loading in...5
×
 

The collection, curation and modeling of Open Melting Point measurements

on

  • 1,925 views

Jean-Claude Bradley and Andrew Lang present at the 5th Meeting on U.S. Government Chemical Databases and Open Chemistry on August 26, 2011 about "The collection, curation and modeling of Open Melting ...

Jean-Claude Bradley and Andrew Lang present at the 5th Meeting on U.S. Government Chemical Databases and Open Chemistry on August 26, 2011 about "The collection, curation and modeling of Open Melting Point measurements". The talk also covers the role of Open Notebook Science and Google Apps Scripts in this effort.

Statistics

Views

Total Views
1,925
Views on SlideShare
1,892
Embed Views
33

Actions

Likes
1
Downloads
11
Comments
0

1 Embed 33

http://lanyrd.com 33

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The collection, curation and modeling of Open Melting Point measurements The collection, curation and modeling of Open Melting Point measurements Presentation Transcript

  • The collection, curation and modeling of Open Melting Point measurements
    5th Meeting on U.S. Government Chemical Databases and Open Chemistry
    Jean-Claude Bradley
    Andrew Lang
    Antony Williams
    Department of Chemistry
    Drexel University
    ChemSpider
    Royal Society of Chemistry
    Department of Mathematics
    Oral Roberts University
    August 26, 2011
  • The Problem of Data Quality in Chemistry
    • Lack of provenance
    • Reliance on a system of “trusted sources”
    In the case of melting points:
    • CRC Handbook
    • Merck Index
    • Chemical Vendor Catalogs (e.g. Sigma-Aldrich)
    • Peer-Reviewed Journals
  • Strategy for the curation of melting points
    Rely on redundancy when possible
    Provide the maximum level of provenance when necessary (Open Notebook Science)
    Adhere to Open Data, Open Descriptors and Open Algorithms for measurements and modeling
    Using technology, we can begin to replace the “trusted source” model with one based on transparency and provenance
  • The Chemical Information Validation Sheet
    567 curated and referenced measurements from
    Fall 2010 Chemical Information Retrieval course
  • Investigating the m.p. inconsistencies of EGCG
  • Most popular data sources
  • Alfa Aesar donates melting points to the public
  • Open Melting Point Explorer
  • Outliers
    EPA/PhysProp (donated all data to public also)
    MDPI
    dataset
  • Outliers for ethanol: Alfa Aesar and Oxford MSDS
  • Inconsistencies and SMILES problems within MDPI dataset
  • MDPI Dataset labeled with High Trust Level
  • EPA/PHYSPROP Structure Errors (Incorrect Valence): 2315 out of 43543 were contained pentavalentnitrogens
  • EPA/PHYSPROP Errors: Structure displayed is for the neutral compound dopamine but the associated CAS Number and chemical name in the file are for the hydrobromidesalt.
  • Common errors in datasets
    multiple melting points for the same compound in the same database
    stereochemistry issues
    sign inversion
    conversion errors (Kelvin/CelciusFahrenheit/Celcius)
    bad SMILES (non-rendering)
    salts associated with SMILES for free base
    using boiling point for melting point
  • Open melting point datasets
    Double+ validated: 2706 compounds (7413 highly curated measurements. range: 0.01-5 C. Compounds that had at least one chiral center, possessed cis/trans isomerism, were inorganic or a salt removed.)
    Entire dataset: 19933 unique compounds (27684 measurements – no inorganics or salts)
  • Open Models with Open Data Using Open Descriptors (CDK)
  • Modeling Results
  • Melting point prediction service
  • Melting point predictions and measurements on iPhone/iPad(Alex Clark)
  • Publication of double+ validated melting point dataset to Nature Precedings and LuLu
  • For all Formats of ONS Projects
  • Open Melting Point Datasets
    Currently 20,000 compounds with Open MPs
  • Some melting points can’t be resolved
    only with literature: 4-benzyltoluene
  • Motivation: Faster Science,Better Science
  • Open Lab Notebook page measuring the melting point of 4-benzyltoluene
  • Using melting point for temperature dependent solubility prediction
  • Crowdsourcing Solubility Data
  • Integration of Multiple Web Services to Recommend Solvents for Reactions
  • All ONS web services
  • Google Apps Scripts web services
  • Google Apps Scripts for conveniently exploring melting point data
  • Comparison of model with triple validated measurements
    Straight chain carboxylic acids from 1 to 10 carbons
    Straight chain alcohols from 1 to 10 carbons
  • Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)
  • Google Apps Scripts for planning reactions and creating schemes
  • Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)
  • Conclusions
    • For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance
    • Open Notebook Science offers an efficient way to make research transparent and discoverable