From Robert Boyle’s The Sceptical Chymist to Modern Data-Driven Chemistry
1. Prof. Geo
ff
rey R. Hutchison, Department of Chemistry, University of Pittsburgh
From Robert Boyle’s
The Sceptical Chymist to
Modern Data-Driven Chemistry
(And where do we go from here?)
https://hutchison.chem.pitt.edu/
2. Scientific Prestige:
Public Disputes
How do you decide the best?
1535 - Fontana solves cubic
Wins prestige (and job)
Writes a coded poem
Niccolò Fontana “Tartaglia”
Image: Wikipedia
(Rijksmuseum NL)
Gerolamo Cardano
Image: Wikipedia
(Unknown origin)
3. The Skeptical
Chymist: pub. 1661
Transition between alchemy
and more modern chemistry
Also, beginnings of scienti
fi
c
publishing…
Image Credit: Wikipedia (U. Penn Library)
4. 1660s
Chemistry = Alchemy
Boyle believed in transmutation
So did Newton
But they published observations
and discoveries
(sometimes reluctantly) Philosophical Transactions - 1665
Image: Wikipedia (Royal Society)
5. Scientific Publishing in 1665
Philosophical Transactions and The Sceptical Chymist
• Words and static
fi
gures / drawings
• Four
fi
gures
- chimney, tools for mining
- unusual calf’s head (no nose!)
• Not that di
ff
erent from most
modern scienti
fi
c articles
Philosophical Transactions - 1665
Royal Society
https://royalsociety.org/blog/2017/02/images-from-the-archive/
6. Modern Chemistry
Data-intensive
.. a lot goes into the
fi
gures
& tables
O N
N
N
N
O
N
N
N
O
HN
O
theobromine ca
ff
eine
• Analytical Data
(1H, 13C NMR, MS, IR, UV/Vis…)
• Crystallography
• Calculations (DFT, etc.)
• Applications (AFM, devices, …)
• Reactions, Chemical Diagrams..
7. Modern Chemistry
Submission
Peer Review / Reproducible?
Editing
Published Final Form
O N
N
N
N
O
N
N
N
O
HN
O
theobromine ca
ff
eine
• Analytical Data
(1H, 13C NMR, MS, IR, UV/Vis…)
• Crystallography
• Calculations (DFT, etc.)
• Applications (AFM, devices, …)
• Reactions, Chemical Diagrams..
8. Modern Chemistry
Submission
Peer Review / Reproducible?
Editing
Published Final Form
O N
N
N
N
O
N
N
N
O
HN
O
theobromine ca
ff
eine
• Analytical Data
(1H, 13C NMR, MS, IR, UV/Vis…)
• Crystallography
• Calculations (DFT, etc.)
• Applications (AFM, devices, …)
• Reactions, Chemical Diagrams..
Preprint
9. Modern Chemistry
Submission
Peer Review / Reproducible?
Editing
Published Final Form
O N
N
N
N
O
N
N
N
O
HN
O
theobromine ca
ff
eine
• Analytical Data
(1H, 13C NMR, MS, IR, UV/Vis…)
• Crystallography
• Calculations (DFT, etc.)
• Applications (AFM, devices, …)
• Reactions, Chemical Diagrams..
Preprint Generate PDF(s)
10. Extract it from the images…
• Copy and paste (?)
• WebPlotDigitizer
• OSRA (Igor Filippov)
• ChemDataExtractor (M. Swain)
• Pay an undergraduate
You Want the Data?
3n
E
kcal/mol
500
100
10
8
5
3
Confab - MMFF94
#
of
Conformers 1
102
104
106
108
# of Rotatable Bonds
0 5 10 15
days
hrs
msec
“Holy Grail”
GFN
DFT-D
MP2
ML
Force Field
Median
R
2
0
0.5
1.0
Time (s)
10−4
10−2
1 102
104
7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 ppm
0.897
0.917
0.945
0.966
1.322
1.569
2.151
2.356
1.00
0.46
0.78
0.23
0.14
NAME
EXPNO
PROCN
Date_
Time
INSTR
PROBH
PULPR
TD
SOLVE
NS
DS
SWH
FIDRE
AQ
RG
DW
DE
TE
D1
TD0
=====
SFO1
NUC1
P1
SI
SF
WDW
SSB
LB
GB
PC
Mo2-Tributyl phosphate
US Patent 8,198,437
11. Open Data, Open Standards
Open Source
• Peter Murray-Rust, Henry Rzepa
Rajarshi Guha, Christoph
Steinbeck, Jörg Wegner, Rich
Apodaca, Egon L. Willighagen…
• ACS San Diego 2005
• DOI: 10.1021/ci050400b
The Blue Obelisk
Blue Obelisk, Horton Plaza
San Diego
12. “
”
— Prof. Henry S. Rzepa (Imperial College)
Spring 2005 ACS Meeting, San Diego, CA
I can plug my iPod into any
computer and it will recognize
my music and give me all sorts
of metadata: artist, title, type of
music...
Why can’t I read the chemical
data off my
fi
les?
13. But why do I care?
.. why chemistry needs to share open data
• It’s e
ffi
cient. Student creates PDF of data, extract data from another PDF??
• It enables reuse. Philosophical Transactions worked. Reuse is science
• Data is king. There’s even a journal, Scienti
fi
c Data
• Crowdsourcing. Imagine every chemistry student taking melting points,
solubility, measuring spectra, calculations …
14. Some shared chemical data
• Cambridge Crystallographic Data
Center (CDC)
• Inorganic Crystal Structure
Database (ICSD)
• Open Crystallography Database
(COD)
• Protein Data Bank (PDB)
• Ligand Expo
• American Mineralogist Database
Crystallography
Materials Horizons 2020, 7, 135-142
via https://crystallography.net/
15. Drug Discovery / Catalogues
• PubChem
• ZINC
• ChemSpider
• eMolecules
• ChEMBL
• DrugBase
Not only…
17. Often small…
• AIST Spectral Database SDBS
(34k)
• NMRShiftDB (44k)
• (Commercial data)
• IR, NMR, MS databases
• Often hard to share
Spectroscopy
18. .. at least in chemistry
• Interactive
fi
gures:
• Not just 2D static images
• Supporting information as
repository:
• Documents / text / README
• Raw data, spectra, etc.
• Code
• Jupyter notebooks (analysis)
Future of Publishing
19. Take-Home
Modern chemistry is data - publishing should be too
• Post your raw data and open it - you can create a DOI: Zenodo, Figshare …
• Enable shared repositories - imagine searching all the chemical data
• Cheminformatics exists - exchange data and metadata
It’s 2021, not 1665
Publish 21st century science