New Forms of Scholarly Communication in ScienceThe Role of TrustSpecial Libraries AssociationJean-Claude BradleyDepartment of ChemistryDrexel UniversityJune 15, 2011
Unknown Perils of the PastBefore online databases (early 90s) searching for properties like melting points using ONE “trusted source” was practicalCRC Handbook
Merck Index
Chemical Vendor Catalogs (e.g. Sigma-Aldrich)
Peer-Reviewed JournalsKnown Perils of the PresentToday, many librarians discourage the use of new online sources (like Wikipedia) for the searching of chemical data and recommend using only “trusted sources”The problem is that the “trusted source” model is - and always was – fundamentally flawed.Ironically most of Wikipedia’s chemical information is problematic BECAUSE it is based on “trusted sources”!
Promises for the FutureUsing technology, we can begin to replace the “trusted source” model with one based on transparency and provenance
The current state of transparency in scientific communicationCase study of melting point data
The Chemical Information Validation Sheet 567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course
Discovering outliers for melting points (stdev/average)
Investigating the m.p. inconsistencies of EGCG
Investigating the m.p. inconsistencies of cyclohexanone
Most popular data sources
Alfa Aesar donates melting points to the public
Open Melting Point Explorer(Andrew Lang)
OutliersMDPI datasetEPI (donated all data to public also)
Outliers for ethanol: Alfa Aesar and Oxford MSDS
Inconsistencies and SMILES problems within MDPI dataset
MDPI Dataset labeled with High Trust Level
Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs
Live curation on a public Google Spreadsheet of compounds with highest mp ranges(collaboration with Andrew Lang and Antony Williams)
Some melting points can’t be resolved only with literature: 4-benzyltoluene
The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp and can be frozen <-30C
The quest to resolve the melting point of 4-benzyltoluene: ambiguous results upon heating but clearly remains a liquid at -15 C for 2 days in freezer
Further investigation into the literature for the melting point of 4-benzyltolueneAlthough a general description of method is provided the raw data are not
Because of broken provenance errors cascade through the literature Calculations in patent based on incorrect data
Open Random Forest modeling of Open Melting Point data using CDK descriptors(Andrew Lang)R2 = 0.78, TPSA and nHdon most important
Melting point prediction service
Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)
Using melting point for temperature dependent solubility prediction
Motivation: Faster Science,Better Science
There are NO FACTS, only measurements embedded within assumptionsOpen Notebook Science maintains the integrity of data provenance by making assumptions explicit
TRUSTPROOF
Crowdsourcing Solubility Data
Data provenance: From Wikipedia to…
…the lab notebook and raw data
Solubilities collected in a Google Spreadsheet
Web services for summary data(Andrew Lang)
Web service calls from within a Google Spreadsheet for solubility measurement and prediction(Andrew Lang)
Integration of Multiple Web Services to Recommend Solvents for Reactions(Andrew Lang)
Reaction Attempts Book
Reaction Attempts Book: Reactants listed Alphabetically
ONS Challenge Solubility Book cited for nanotechnology application
All ONS web services
For all Formats of ONS Projects

Bradley SLA Talk on Open Melting Point Collections