Using online chemistry databases
to facilitate structure identification
in mass spectral data
Antony Williams,
Valery Tkachenko, Alexey Pshenichnov
ACS Denver, March 2015
Free and Easy
• Everything I will show in terms of ChemSpider
is available for free online today
• To make it easy to “take notes” these slides
will be available at:
www.slideshare.net/AntonyWilliams/
www.ChemSpider.com
ChemSpider
What will ChemSpider give us?
What will ChemSpider give us?
For Mass Spectrometrists
• Valuable searches for Mass Spec would be:
• Search the database by mass or formula for
structure identification
• Search subsets of data – e.g. “metabolism”,
pesticides etc
• Link structure-based data across the internet
• Provide “programming interfaces” to integrate
• Does ChemSpider provide value to Mass
Spectrometrists?
Pre-calculated data
Data Source Selection
• >34 million chemicals include
• Vendor collections
• Government databases
• Individual/Lab data
• Publication data
• All segregated allowing for data source
selection
Data Source Selection - Type
Data Source Selection -
Individual
Mass Spec Analysis
Jim Little, Eastman Chemical
ChemSpider Interface
1287 Hits Ranked by Defect
1287 Hits Ranked by # of
References
Top Ranked Hit
Tinuvin 328
What can I find on ChemSpider?
What can I find?
What can I find?
Source and Purchase…
What can I find on ChemSpider?
External Calculation Engines
What can I find on ChemSpider?
…and in the RSC Databases..
Linked to the Publisher
What can I find?
And out to Google Patents
What About the Entire Web?
The InChI Identifier
InChIStrings Hash to InChIKeys
Searching Internet by Structure
Extended Study
Sorting by references
Position sorted by references
Position 1 only
Web Services For Collaboration
• Many instrument vendors are using or
investigating our web-based services for
compound lookup
• Many academic sites integrating directly –
metabonomics, name lookup, mass-based
searching
Results of the ChemSpider Search
in the MarkerLynx Worksheet
Hit Details in ChemSpider
“REAL Spectral Data”
• Masses on ChemSpider are clearly valuable!
• We’d like to host “spectral curves”
• But we’re a publisher so what can we do?
Spectra: Cholesterol
ChemSpider ID 24528095 H1 NMR
ChemSpider ID 24528095 HHCOSY
Publications & “Real Spectra”
• We are turning text into spectra
• We are turning figures into spectra
ESI – Text Spectra
1H NMR (CDCl3, 400 MHz):
δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t,
1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz,
C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
“Where is the real data please?”
FIGURE
DATA
Future Developments
• We have extracted 100s of 1000s of text strings
from patents – next we go into our archive
• We estimate many 1000s of figures with spectral
data in our ESI and articles
• We are aiming for a million spectra online…
• But YOU can submit your data today and share it
We want this…we need YOU!
Data Mining – it’s mine, mine!
New Repository Architecture
doi: 10.1007/s10822-014-9784-5
Acknowledgments
• Jim Little, Eastman Chemical Company
• Daniel Lowe – NextMove Software
• Bill Brouwer – Plot2Txt Development
• Carlos Cobas and Stan Sykora– MestreLabs
• Patrick Wheeler - ACD/Labs
Thank you
Email: williamsa@rsc.org
ORCID: 0000-0002-2668-4821
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Using online chemistry databases to facilitate structure identification in mass spectral data