High-throughput non-targeted analyses (NTA) rely on chemical reference databases for tentative identification of observed chemical features. Many of these databases and online resources incorporate chemical structure data not in a form that is readily observed by mass spectrometry (e.g., structures that may contain salts, solvates, chemicals with stereochemistry included). Our center has previously reported on the production of “QSAR-ready structures”, forms of chemical structures processed for use in prediction models that are de-salted, de-solvated, and containing no stereochemistry information. While appropriate for property prediction and modeling purposes, QSAR-ready structures do not completely address structural issues of relevance to mass spectrometry. In this work we have adapted the workflow used to generate QSAR-ready structures to allow for the creation of “MS-Ready structures” that includes procedures to separate mixtures and multi-component substances within our database. The goal of this research is to provide an open workflow such that all users and architects of chemical reference databases have the capability to generate and query structures as observed in a mass spectrometer. Chemical structures were processed using KNIME Analytics Platform® and incorporated into the US EPA’s CompTox Chemistry Dashboard so that tentative identification is achievable by querying MS-Ready structures by monoisotopic mass and molecular formula. Inclusion of MS-Ready structures within the Dashboard improved the coverage of candidate structures observed via HRMS, indicating the increased potential for identification of unknowns. The availability of pre-processed MS-ready structure files as Open Data and the developing services for direct integration with other software tools positions the Dashboard as a valuable resource for the mass spectrometry community. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
An open workflow to generate "MS-Ready" structures and improve non-targeted mass spectrometry
1. Photo image area measures 2” H x 6.93” W and can be masked by a
collage strip of one, two or three images.
The photo image area is located 3.19” from left and 3.81” from top of page.
Each image used in collage should be reduced or cropped to a maximum of
2” high, stroked with a 1.5 pt white frame and positioned edge-to-edge with
accompanying images.
An open workflow to generate “MS Ready”
structures and improve non-targeted mass
spectrometry
Office of Research and Development
National Center for Computational Toxicology, RTP, NC August 24, 2017
Andrew D. McEachran, Kamel Mansouri, Chris Grulke,
Antony J. Williams
http://orcid.org/0000-0003-1423-330X
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
ANYL 435
2. Comparing Analysis Approaches
• Targeted Analysis:
- We know exactly what we’re looking for
- 10s – 100s of chemicals
• Suspect Screening Analysis (SSA):
- We have chemicals of interest
- 100s – 1,000s of chemicals
• Non-Targeted Analysis (NTA):
- We have no preconceived lists
- 1,000s – 10,000s of chemicals
- In dust, soil, food, air, water, products- potential
exposure source for plants, animals, and humans
Slide from Sobus, Williams
8/24/2017
5. Data Source Ranking of “known
unknowns”
• Mass and/or formula
unknown to a researcher,
contained within a
reference database
• Most likely candidate
chemicals have the most
references/sources
8/24/2017
C14H22N2O3
266.16304
Chemical
Reference
Database
Sorted
candidate
structures
8. MS-Ready Structure Processing
1. Removal of inorganics and separation of mixtures
2. Removal of salts and counterions
3. Conversion of tautomers to consistent representations
4. Neutralization of charged structures and removal of
stereochemistry information
5. Addition of explicit hydrogen atoms and aromatization of
structures
6. Removal of duplicates
8/24/2017
15. PFOS
8/24/2017
-All of these forms of PFOS
returned with a single formula
search
-Requires multiple searches in
ChemSpider, PubChem, etc.
16. ENTACT Trial
• Collaborative trial with >20 lab participants
• Blinded mixtures, environmental samples, etc.
• Identify what you can using what you have
8/24/2017
23. Conclusions
• Database searching is a critical part of NTA/SSA
workflows
• Providing accurate mappings between the MS-Ready
form and all forms of a chemical contained within a
database improves identification
• EPA’s CompTox Chemistry Dashboard provides MS-
Ready structures for search and download
8/24/2017
24. Acknowledgements
EPA NCCT
Tony Williams
Chris Grulke
John Wambaugh
Kamel Mansouri*
Jeff Edwards
Ann Richard
Jennifer Smith
EPA NERL
Katherine Phillips
Kristin Isaacs
Kathie Dionisio
Jon Sobus
Mark Strynar
Elin Ulrich
Seth Newton
Jarod Grossman
Sarah Laughlin-
Toth*
Aurelie Marcotte*
*ORISE Research Participant
8/24/2017