SlideShare a Scribd company logo
1 of 35
Download to read offline
1
An Actionable Annotation Scoring Framework
for Gas Chromatography - High Resolution
Mass Spectrometry (GC-HRMS)
Jeremy P. Koelmel1, Hongyu Xie 2, Elliott J. Price3, Elizabeth Z. Lin1,
Katherine E. Manz4, Paul Stelben1, Matthew Paige1, Stefano Papazian2,
Joseph Okeme1, Dean P. Jones5, Dinesh Barupal6, John Bowden11,12,
Pawel Rostkowski7, Kurt D. Pennell4, Vladimir Nikiforov8, Thanh Wang9,
Xin Hu5, Yunjia Lai10, Gary W. Miller10, Douglas I. Walker6*,
Jonathan W. Martin2, Krystal J. Godri Pollitt1*
1 Department of Environmental Health Science, Yale School of Public Health, New
Haven, CT, 06520, United States
2 Department of Environmental Science, Science for Life Laboratory, Stockholm
University, Stockholm 10691, Sweden
3 RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic
4 School of Engineering, Brown University, Providence, RI 02912, United States
5 Department of Medicine, School of Medicine, Emory University, Atlanta 30322, United
States
6 Department of Environmental Medicine and Public Health, Icahn School of Medicine at
Mount Sinai, New York, NY, 10029, United States
7 NILU- Norwegian Institute for Air Research, 2007 Kjeller, Norway
8 NILU- Norwegian Institute for Air Research, Framsenteret, 9007 Tromsø, Norway
9 MTM Research Centre, Örebro University, SE-701 82, Örebro, Sweden
10 Department of Environmental Health Sciences, Mailman School of Public Health,
Columbia University, New York, NY 10032, United States
11 Center for Environmental and Human Toxicology & Department of Physiological
Sciences, University of Florida, Gainesville, FL, 32603, United States
12 Department of Chemistry, University of Florida, Gainesville, FL, 32603 United States
*Corresponding author
Krystal J. Godri Pollitt - krystal.pollitt@yale.edu
Douglas I. Walker - douglas.walker@mssm.edu
2
Abstract
Omics-based technologies have enabled comprehensive characterization of our
exposure to environmental chemicals (exposome) as well as assessment of the
corresponding biological responses at the molecular level (e.g., metabolome, lipidome,
proteome, and genome). By systematically measuring personal exposures and linking
these stimuli to biological perturbations, researchers can determine specific chemical
exposures of concern, identify mechanisms and biomarkers of toxicity, and design
interventions to reduce exposures. However, further advancement of metabolomics and
exposomics approaches is limited by a lack of standardization and approaches for
assigning confidence to chemical annotations. While a wealth of chemical data is
generated by gas chromatography high-resolution mass spectrometry (GC-HRMS),
incorporating GC-HRMS data into an annotation framework, and communicating
confidence in these assignments is challenging. It is essential to be able to compare
chemical data for exposomics studies across platforms to build upon prior knowledge and
advance the technology. Here we discuss the major pieces of evidence provided by
common GC-HRMS workflows, including retention time and retention index, electron
ionization, positive chemical ionization, electron capture negative ionization, and
atmospheric pressure chemical ionization spectral matching, molecular ion, accurate
mass, isotopic patterns, database occurrence, and occurrence in blanks. We then provide
a qualitative framework for incorporating these various lines of evidence for
communicating confidence in GC-HRMS data by adapting the Schymanski scoring
schema developed for reporting confidence levels by liquid chromatography high-
resolution mass spectrometry (LC-HRMS). Validation of our framework is presented using
standards spiked in plasma, and confident annotations in outdoor and indoor air samples,
showing a false positive rate of 12% for suspect screening for chemical identifications
assigned as Level 2 (when structurally similar isomers are not considered false positives).
This framework is easily adaptable to various workflows and provides a concise means
to communicate confidence in annotations. Further validation, refinements, and adoption
of this framework will ideally lead to harmonization across the field, helping to improve
the quality and interpretability of compound annotations obtained in GC-HRMS.
3
1. INTRODUCTION
Environmental risk factors are major determinants of health, with current estimates
suggesting that environmental exposures are responsible for about 23% of the global
disease burden (Prüss-Üstün et al., 2016) and chemical exposures responsible for one
in six deaths (Landrigan et al., 2018). However, research has historically prioritized the
characterization of genetic risk factors (Prüss-Üstün et al., 2016; Rappaport, 2016;
Wheelock and Rappaport, 2020; Wild, 2012, 2005). Characterization of environmental
factors primarily relies on self-reporting or community-level exposures (Rappaport, 2016).
Identifying and preventing/reducing harmful environmental exposures could lead to
significant reductions in global disease. Assessment of the chemical exposome, which
includes the comprehensive identification of endogenous and exogenous compounds at
the individual level, is essential to better understanding the link between the environment
and human health (Vermeulen et al., 2020).
Hundreds of thousands of chemicals are used in commercial and industrial applications.
Once released to the environment, these chemicals often transform, increasing the
number of potential chemical exposures to millions of compounds. Traditional methods
for high-confidence annotation and quantitation require synthesized chemical standards
for each analyte of interest, semi-manual peak integration, calibration curves, and often
specific chromatographic methods. Therefore based on cost and feasibility, the diversity
of chemicals existing in the exposome presents a monumental challenge to traditional
analytical methods, which typically interrogate less than 100 compounds (Doherty et al.,
2021; Mazur et al., 2021). To overcome this limitation, non-targeted analytical methods
that employ high resolution mass spectrometry (HRMS) have been developed to screen
and identify thousands of chemicals that may be present environmental and biological
samples. These methods use HRMS evidence indicative of universal chemical properties
to annotate structure and estimate chemical abundances, and hence these methods are
not solely focused on known or commonly expected chemicals. Because non-targeted
results are more tentative then targeted results, a key need for such exposome research
is to establish and apply transparency in instrumentation, data-processing, and reporting
criteria when communicating research findings.
Confirming the molecular structure of identified compounds using HRMS data remains a
challenge. Due to limitations in scope and spectral quality of chemical databases and the
difficulty of capturing mass fragmentation patterns for low abundance peaks, the
confidence of chemical annotations can vary widely. Therefore, reporting may be limited
to chemical formula, or even simply a set of retention time and mass spectral
measurements, rather than an exact chemical structure. Chemical features that are not
fully characterized can provide novel insights in exposomics research and links between
environment, biological effects, and disease. For example, chemical formulas assigned
to accurate mass features can aid in the characterization of unexpected and previously
unknown chemical exposures that can be prioritized for study in other populations and
sample types to understand their distribution in the environment. The advantage of using
this approach for identifying environmental exposures is highlighted by the discovery of
polychlorinated biphenyls (PCBs), which are one of the most widely studied persistent
4
organic pollutants. Their initial discovery in environmental and biological samples was
based upon detection of unknown peaks in multiple sample types, and it was their
measurement across multiple samples and recognition of their accumulation in the food
chain that led to the eventual identification of PCBs as highly persistent pollutants
(Jensen, 1972). More recent examples of the use of non-targeted mass spectrometry,
include the discovery of 6PPD-quinone as a chemical responsible for fish die offs
(Stokstad, 2020), and a brominated pesticide transformation products as responsible for
eagle die offs (US EPA, 2021).
When reporting annotations, it is essential to report measurements within a consistent
framework that allows interpretation of the uncertainty in compound annotation. Defining
a framework for reporting the confidence of chemical annotations is especially important
given the potential for high false positive rates and over-reporting in non-targeted and
suspect screening HRMS studies across various classes of compounds (Koelmel et al.,
2017; Scheubert et al., 2017; Schymanski et al., 2014; von Mering et al., 2002). One of
the first examples of a confidence scoring framework for non-targeted environmental
chemical profiling was proposed by Schymanski et al. for liquid chromatography high-
resolution tandem mass spectrometry (LC-HRMS/MS) (Schrimpe-Rutledge et al., 2016;
Schymanski et al., 2012). Gas Chromatography high-resolution mass spectrometry (GC-
HRMS) provides complementary coverage of exposome chemicals that are of lower
solubility, higher volatility, or are not ionized by LC-HRMS approaches (Zhang et al.,
2021) (Table S1). The applicability of Schymanski schema was reported to GC-HRMS to
be limited due to differences in ionization, data acquisition, and data-processing
(Rostkowski et al., 2019).
Recently, a scoring framework for GC-HRMS using electron ionization (EI) was proposed
(Travis et al., 2021) categorizing confidence of feature identifications into four levels.
Levels ranged from the most confident annotations in Level 1 with confirmed
identifications to the weakest in Level 4 with features described only by exact mass. While
this framework provides an important first step in defining scoring metrics for GC-HRMS,
the parameters proposed for assigning confidence levels are potentially subject to high
false positive and false negative rates. For example, Level 2 assignment requires isotopic
distribution while Level 3 is assigned for exact mass matches to fragment ions. The
necessity of an isotopic pattern match requires the molecular ion, which is often not
observed. Assignment of Level 3 is further challenged by the requirement of an isotopic
pattern and exact mass matches which require exact mass libraries with known formulas,
which are not currently available for many compounds. The schema does not incorporate
retention index, which we demonstrate to be an essential piece of information for reducing
false positives. Thus, this approach may be suitable for databases with establish, high
mass accuracy fragmentation spectra, but may be limited when using larger databases
and in silico approaches.
Here, we present a new annotation confidence scoring framework specific for non-
targeted GC-HRMS that can be used to determine and communicate confidence of
chemical assignments made using several strategies. The framework draws on input from
a community of GC-HRMS users and considers the breadth of chemical evidence that
5
can be obtained from these powerful instruments. The current work focuses on
environmental contaminants. While this framework can be applied to certain biological
chemicals, the use of derivatization (Halket et al., 2005) and class-based mass or
fragmentation patterns may be required.
2. LEVERAGING EVIDENCE FROM GC-HRMS TO REDUCE FALSE POSITIVES
AND NEGATIVES
GC-HRMS provides data-rich evidence that can be used in identifying structural
annotation of detected features, including various modes of ionization and fragmentation,
accurate mass, retention information, and isotopic pattern. It is important to understand
the unique advantages and limitations for each layer of evidence to decide the necessary
pieces of information for accurate assignment of chemicals (Table S1).
2.1. Ionization
The most common ionization method used in GC-HRMS is electron ionization (EI). Any
molecule introduced to the source in this ionization mode is theoretically ionized with
equal efficiency. In practice, however, on-column degradation, inlet/liner discrimination,
and other factors (Barwick, 1999), total summed ion signal can still be a factor of chemical
structure.
EI is a ‘hard’ ionization method, meaning that each molecule generates a spectrum with
a high degree of fragmentation in which only traces of the intact molecular ion are usually
detected in most cases. In a GC-EI-MS analysis, all the reported peaks have an
associated fragmentation spectrum which is reconstructed by a deconvolution algorithm.
Having fragmentation from all ions makes it possible to generate annotation levels for
every sample, therefore improving confidence of chemical assignments. Generation of
fragmentation spectra for all ionized compounds enables MS1 data can be used to
confirm identified chemicals, eliminating the need for precursor selection and
fragmentation by tandem mass spectrometry (MS/MS). To establish reproduceable EI
spectra libraries, ionization voltages of 70 eV are commonly used and combined with
libraries containing retention index. Retention index and fragmentation are often similar
irrespective of chromatographic method and instrument (within the same instrument type,
e.g., quadrupole time-of-flight MS (Q-TOF-MS)), and therefore have been successfully
shared across labs. (Mallett, 2014).
GC-EI-MS spectra provide key evidence for compound annotation, yet several challenges
may still lead to false positives. While the diverse fragments observed in EI spectra allow
for better structural characterization, the spread of signal across fragment ions can
reduces overall sensitivity, often with no molecular ion signal present. Without observation
of the molecular ion, many compounds can give similar spectra, even when they are not
isomers. Because there is rarely a high signal from the molecular ion, ion selection
followed by fragmentation cannot be applied in GC-EI-MS. Therefore, GC spectra must
be deconvoluted across chromatographic time, often leading to the inclusion of artifact
peaks due to high background or co-eluting compounds. Deconvolution can be a
significant issue in increasing false negatives and false positives.
6
Alternative ionization strategies can be used with GC-HRMS to improve detection of the
molecular ion, although these often lack the robustness, sensitivity, and reproducibility of
EI. These techniques include positive chemical ionization (PCI), electron capture negative
ionization (ECNI), and atmospheric pressure chemical ionization (APCI). The advantages
and disadvantages of these techniques are described in the supplemental information.
2.2. Accurate Mass
Continual advances in high resolution Orbitrap and Q-TOF MS instrumentation have
enabled accurate measurement of mass to charge ratios ranging from sub-ppm to 30
ppm mass accuracy windows and resolutions up to 240,000 to date. Accurate mass
(ability to measure the correct mass) and high resolution (ability to distinguish close
masses) provides additional evidence for compound annotation such as fine isotopic
patterns. Unequivocal formula prediction may be possible for low mass fragments,
especially when including isotopic patterns. Isotopic pattern and mass defects can further
be used to determine unique chemical structures containing certain functional groups
(e.g., Cl, Br, F), aiding in the detection of unknown chemical exposures.
Suspect screening using accurate mass matching can reduce false positives. The largest
GC-EI-MS libraries (e.g., NIST and Wiley) include spectra acquired using unit mass
instruments; only a handful of smaller libraries area available for spectral matching using
accurate mass. Algorithms can be used to reduce false positives when matching
experimental spectra obtained with GC-HRMS to unit resolution libraries (Stein, 2012).
One example is applying a high-resolution mass filter (HRMF) (Kwiecien et al., 2015).
HRMF calculates the percentage of fragment ions with formulae which can be predicted
when setting atom constraints for formula matching to only those contained in the
proposed molecular formula. Reverse HRMF can also be used, but this approach limits
scoring to only peaks found in the library, ignoring other peaks in the experimental spectra
for scoring purposes. Reverse HRMF reduces influences of artifacts during deconvolution
on scoring, hence reducing false negatives, but may also increase false positives when
experimental peaks are real and not found in the library.
2.3. Retention Index
Retention time is an orthogonal measurement that can improve annotation accuracy
when combined with EI MS spectral matches. As retention time is influenced by column
type, batch, and manufacturer as well as the method temperature gradient, standardized
retention indices can be calculated using a series of compounds that span the entire
chromatographic run (i.e., alkanes (Kováts, 1958), fatty acid methyl esters (Miwa et al.,
1960)). retention indices can also be useful for identifying isomers with the same or similar
EI spectra. Compared to LC, GC retention times are typically more consistent within and
across laboratories (Aalizadeh et al., 2021; Hall et al., 2012). This increases the reliability
of retention index calibrations and enables matching of experimental retention indices to
public libraries as well as prediction of retention time indices using chemical properties
(Peng, 2010a; Stein et al., 2007a; Strehmel et al., 2008).
7
Retention indices provide key evidence to improve accuracy of compound annotation.
There are a few considerations when implementing retention indices into a workflow.
Methods using linear gradients may be preferable, to reduce errors in retention index
calculation (Zellner et al., 2008), although non-linear temperature gradients may still be
deployed with accurate RI calculated (Peng, 2010b). While retention time indices improve
annotation confidence, their incomplete availability in spectral libraries may limit their use.
The largest libraries (e.g., NIST and Wiley) have a high percentage of chemicals without
retention index information, limiting the search space and coverage if retention index
matching is required. The current release of NIST libraries contain predicted retention
indices for many compounds; however, these predictions may not be accurate for all
chemical classes (Stein et al., 2007b). Careful consideration of retention time indices is
also necessary depending on the chemicals being measured. For example, alkanes are
not appropriate for calculating RI for persistent mobile organic compounds including per-
and polyfluoroalkyl substances (PFAS), organophosphate esters (OPEs), and certain
polar compounds amendable to GC (Boegelsack et al., 2021; Grőbler, 1972).
2.4. Compound Metadata for Improving Candidate Ranking
While spectral and retention time information can be used to assign a structure or
substructure to a chemical, additional evidence available from GC-HRMS can increase
confidence of chemical identifications. Study context and likelihood of exposure are also
important to consider. For example, a sizable portion of 100+ million chemical entries in
PubChem have never been synthesized, their existence is limited to a patent or database
(Kim et al., 2021), or if synthesized were never generated at large scale. Hence tens of
millions of these entries are irrelevant for exposomics studies and will lead to false positive
annotations (Schymanski et al., 2021). To reduce false positives and remove top-ranked
compounds which are not likely to exist in the environment, more selective databases
such as the EPA CompTox Chemistry Dashboard (Williams et al., 2017) or PubChem Lite
(Schymanski et al., 2021) can be used. For GC-HRMS a more actionable solution, given
that the NIST and WILEY libraries are commonly used, is to include meta-data on
database, patent and/or literature occurrence (reference counting), or production levels,
for determining the likelihood a chemical exists in the environment. This can drastically
reduce false positives by removing unlikely candidates and ease interpretation by
retaining chemical species with known information (McEachran et al., 2017; “National
Report on Human Exposure to Environmental Chemicals,” 2021; Williams et al., 2017).
This method is biased to more well characterized chemicals, and any scoring methods
are highly dependent on which databases/algorithms are used for determining compound
occurrences. Hence, this method may lead to false negatives in the case of uncommon
or unexpected chemicals, although the drastic reduction in false positives may far
outweigh the potential of missed compounds.
There are several tools that can provide meta-data on a chemical’s likely occurrence to
improve non-targeted annotations in exposomics including; NHANES Predicted
Exposures (“National Report on Human Exposure to Environmental Chemicals,” 2021),
Data Sources (McEachran et al., 2017), Number of PubMed articles containing chemical
name, Number of PubChem data sources, and CPDat Product Occurrence Count
(McEachran et al., 2017).
8
2.5. Peak Filtering to Remove Background Contaminants and Artifacts
Analytical artifacts can be introduced during sample collection, transport, storage,
preparation/ extraction, and acquisition, as well as through instrument noise. Complete
physical removal of contaminants is not possible, in particular when background and
artifact peaks overlap. Additional information, such as clustering of fragments across
multiple samples can be used to identify highly correlated fragments based on peak
intensities. Matrix effects in GC (Fujiyoshi et al., 2016; Sakthiselvi et al., 2020) are
important for determining the extent of background noise. These matrix effects will
depend on sample type, concentration, preparation, and introduction, as well as the
chromatography and acquisition strategies employed. These factors can impact spectral
matches and annotation confidence, and are challenging to incorporate into scoring
metrics.
One approach to removing these interferences is blank feature filtering where sample
abundances are retained if above a blank threshold. Blank feature filtering is ideally
conducted using field blanks that have been treated identical to samples, undergoing the
same transport, storage, sample preparation, and acquisition protocols. Multiple methods
are available for blank filtering to remove background contaminants and artifacts
(Borgsmüller et al., 2019; Koelmel et al., 2021; Mahieu et al., 2014; Matsuo et al., 2017;
Patterson et al., 2017; Risum and Bro, 2019). Various methods for blank filtering are
described in the supplemental information.
3. COMPILING GC-HRMS EVIDENCE TO COMMUNICATE ANNOTATION
CONFIDENCE
There are three approaches for communicating confidence in annotations: non-
probabilistic scores, probabilistic scores, and qualitative assignments of confidence.
Individual layers of GC-HRMS evidence have community accepted standards for scoring.
For example, EI match scores are generally calculated using dot-product or reverse dot-
product, retention index matches are usually based on an absolute deviation, meta-data
scores are based on numbers of instances, and HRMF scores are based on a percentage
of fragment formulae which can be predicted given the proposed structures composite
atoms. Cut-offs for spectral match scores, retention index delta, or other metrics based
on experience, community accepted values, or optimized via experiments can then be
assigned to remove compounds unlikely to be true positives.
Combining different scores from various layers of GC-HRMS evidence to communicate
confidence is non-trivial. Weighted linear models, neural networks, and mixed models, for
example, have all been used to combine spectral similarity and retention index
information to optimize a single match function (Matyushin et al., 2020; Wei et al., 2014).
Scores determined using these quantitative methods can improve candidate sorting
compared to qualitative methods; however, statistical and coding experience are required
which may be beyond the expertise of most users. Establishing harmonized algorithms
and development of user-friendly software would ensure consistency across different user
groups. Consistency across different software, workflows and laboratories is much easier
9
to achieve using qualitative assignment of confidence, e.g. (Schymanski et al., 2014).
Here we introduce a qualitative means to assign and communicate confidence of feature
annotations. We also show the use of more quantitative scoring metrics alongside
qualitative methods for ranking candidates within a certain qualitative confidence
assignment.
Proposed System for Assigning Confidence
Level 1: Confirmed Identification (Retention time, EI spectra and reference
masses) using In-House library
⮚ Retention time matches standard database generated in-house using the same
method, background matrix, and instrument (spectra from certified reference
standards, not from an external library). Retention times should match within the
RSD of the associated standards (no more than 1% deviation).
⮚ Observation of 2+ EI spectral reference peaks from standards at correct ratios
(within 20%) or EI spectral match with in-house library >600. Often more than two
reference masses and ratios are needed for confident annotation, 2 is minimum.
For high-throughput screening with automated data extraction, verification of
identity of each chemical in each sample by these criteria may be impractical. In
such cases, procedures and assumptions should be clearly defined.
Advantage:
⮚ Confident identification for exact structure (with a few possible exceptions,
especially enantiomers or structurally similar isomers which cannot be separated
by retention time).
Limitation:
⮚ Expensive to purchase, prepare, run, and generate database for large number of
chemicals. Standards may need to be reacquired each experiment if retention
times are not stable between batches (e.g., column clipping).
⮚ Coverage is limited to standards available for purchase. Synthesized standards
are often too costly for routine library building.
Level 2: Probable structure or close isomer using external libraries (RI match,
[molecular ion], and EI match to exact mass library or including metrics
incorporating accurate mass information)
⮚ Reverse dot-product EI spectral match (>600) and dot-product >500. As EI
spectral prediction improves, predicted spectra may also be used.
⮚ NCI/PCI/APCI when spectra contain more information than EI and at least 5
fragments are matched to a library database.
⮚ Retention index match (<50 and <1.5%) or predicted retention index (<100
difference) using validated methods.
⮚ Match to an exact mass library (reverse dot-product >600) or using metrics
incorporating exact mass (e.g., high reverse high-resolution mass filter score
(>75)).
Advantage:
⮚ Correct annotation, or the assigned structure is a structurally similar isomer (with
certain exceptions)
10
Limitation:
⮚ Retention index match requirement may limit annotations if databases do not
contain many compounds with retention index. Therefore, this assignment of
confidence is ideal when libraries have high retention index coverage
(experimental or predicted).
⮚ In certain edge cases, reverse high-resolution mass filter might lead to false
positives when gas phase reactions in Orbitraps lead to fragments which cannot
be predicted using molecular formula.
Level 3: Tentative candidate (EI accurate mass spectral match or EI match with
metrics incorporating accurate mass) using external library
⮚ Reverse dot-product EI spectral match (>600) and dot-product (>500). As EI
spectral prediction improves, predicted spectra may also be used.
⮚ NCI/PCI/APCI when spectra contain more information than EI and at least 5
fragments are matched to a library database.
⮚ Match to an exact mass library (reverse dot-product >600) or using metrics
incorporating exact mass (e.g., high reverse high-resolution mass filter score
(>75)).
Advantage:
⮚ Widest coverage while assigning a possible structure.
Limitation:
⮚ High rate of false positives (both in terms of exact structure and assigning a
structurally similar isomer).
⮚ Possibility of multiple peaks with similar fragmentation patterns (i.e., structurally
similar isomers), but different retention times.
⮚ In certain edge cases, reverse high-resolution mass filter might lead to false
positives when gas phase reactions lead to fragments which cannot be predicted
using molecular formula.
Level 4: Chemical Group or Exact Formula
Level 4A: Identification of unequivocal chemical formula from databases (PCI, NCI,
or in certain cases, EI)
⮚ Only one formula is possible given the exact mass of the precursor ion and isotopic
distribution.
o OR multiple formulas are possible, but after only including atoms which can
predict all dominant fragment ions, only one formula is possible for the
molecule.
Level 4B: Possible chemical series (for example homologous series with repeating
chemical constituents)
⮚ Follows mass defect series (e.g., using Kendrick mass defect (Kendrick, 1963)).
o Homologous series have linear retention indices – so detection via GC can
be precise – even for variable temperature programs.
⮚ Has one or more fragments (with exact mass match) indicative of class.
11
o OR shows defined subnetwork clustering in spectral similarity networks.
Level 4C: Possible chemical class (chemicals grouped based on structural motifs
or similarity)
⮚ Has one or more fragments (with exact mass match) indicative of class.
o OR clusters in chemical similarity networks.
OR other well accepted class specific method (e.g., Lee index for polycyclic
aromatic hydrocarbons (PAHs) (Lee et al., 1979; Škrbić and Onjia, 2006)
Note: Levels 4A-C are not necessarily in order of confidence or utility, for example in
PFAS analysis the goal may be to find series of 3 or more compounds (Level 4B). In non-
targeted analysis across all chemical classes molecular formula may be of more
importance (Level 4A). Note that for Level 4B and 4C one could also have an exact
formula match, these could be indicated as Level 4AB / 4AC.
Level 5: Unknown feature (retention time and reference mass or deconvoluted
spectra)
⮚ Reproducibly detected deconvoluted spectra with reference mass intensity or
combined intensity of all spectral ions. This feature may also have an annotation,
and other information, but does not fit any of the criteria to be deemed Level 1, 2,
3, or 4.
Important Considerations
1) To avoid annotation of background artifacts, peak filtering conditions must be met
for all confidence levels (e.g., see discussion of blank feature filtering in the
supplemental information).
2) If ion selection and fragmentation with CID/HCD is employed (PCI, NCI, and APCI,
for example), then the Schymanski schema can be used to assign confidence
levels.
3) For large studies it is important to account for spectral signals across multiple
samples - for example, was there only one good spectral match in one sample
for a study with 300 samples? This case often indicates a false positive (larger
sample size increases the number of detected features, enhancing the chance
for matching noise in library spectra). However, it is critical to note that in
exposomics studies, these features may represent unique exposures of interest.
Multiple injections of a single sample, when possible, could aid in annotating rare
chemical occurrences.
4) In reporting results, it should be clear whether multiple structures have met the
proposed criteria. If more than one structure meets the criteria for an assignment
of confidence level other information can be used to discern the likely correct
annotation: e.g., top hit has a significantly higher evidence count (by factor of 10),
the compound is the only one expected for the matrix studied, the compound has
a much more accurate retention index (by 30), the compound has a significant
higher reverse dot-product score (by at least 50) or the compound has a much
more accurate reverse high-resolution mass filter score (by at least 10). This is
also where quantitative scoring metrics for ranking candidates can be utilized
12
(total scores). Note that even if certain conditions above are met, there may still
be false positives (see Results Section). If multiple top-hits still exist after
applying our schema, the feature should be flagged accordingly.
5) While we have not made necessary observation of the molecular ion for Level 2,
especially since there is a retention index match, observation of the molecular ion
provides additional confidence and should be achieved when possible.
Figure 1: Proposed schema for assigning five levels of confidence in compound
annotation using common evidence provided by GC-HRMS. Acronyms: retention
time (RT), reverse search index (RSI), retention index match (ΔRI), and reverse high-
resolution mass filter (RHRMF).
13
Comparison to the Schymanski Schema for LC-HRMS/MS
While our schema is based on the Schymanski schema for LC-HRMS/MS, it was modified
to incorporate criteria specific to GC-HRMS. Included criteria were selected based on
feasibility for replication across labs with the overall objective of implementing a
harmonized scoring rubric. In addition, we include blank feature filtering, or similar, as a
necessary criterion for all annotations, as well as chemical series and class-based
annotation which is important in various fields.
Level 1 – Criteria for GC-HRMS is similar to Schymanski Level 1 and provides compound
identification. For GC-HRMS confidence scoring, the molecular ion and MS2 is not
required due to fragmentation spectra available from MS1 when using EI.
Level 2 – Criteria for GC-HRMS is similar to Schymanski Level 2, however the use of
retention indices has the potential to improve annotation confidence and assign
unambiguous structures to isomers. The Schymanski schema states “spectrum-structure
match is unambiguous” (Level 2A) or “represents the case where no other structure fits
the experimental information” (Level 2B), which may be too stringent of an interpretation
given the evidence provided. We would not confidently state this is true based on the
suggested criteria for either LC or GC.
Level 3 – Criteria for GC-HRMS fit within the Schymanski schema for Level 3, except
predicted retention index can be included as evidence for Level 2 given the accurate
predictions.
Levels 4A-C – These levels represent unequivocal formulae and/or chemical series and
chemical class, which are important in applications where the exact structure is not known
but the series or formula provides valuable information. For example, these could be
useful for polymers, PFAS, petrol chemicals, chlorinated by-products, and other
applications where repeating series and discernable chemical classes of importance may
be found. Unequivocal molecular formula is the same as Schymanski Level 4. It is useful
to note that there was no place for series/determination of chemical class in the original
Schymanski schema.
Level 5 – This is different than in Schymanski Level 5 in that the exact mass of the feature
cannot be discerned necessarily from EI spectra. Level 5 for GC-HRMS scoring to any
reproducibly detected deconvoluted spectra which does not meet the criteria to be ranked
in any other level and remains after filtering for artifacts and background contamination
(e.g., above blank signal thresholds).
4. METHODS: ESTIMATING FALSE POSITIVES AND FALSE NEGATIVES FOR
DIFFERENT CONFIDENT LEVELS USING AN OPEN-SOURCE SOFTWARE
We developed a software tool, SIF-GC (Scoring, Integrating total spectral ion abundance,
and Filtering for Gas Chromatography), to automatically assign confidence levels to
features identified in GC-HRMS data. The software covers all steps covered in the
described scoring framework provided the user has an output with retention index,
14
reverse similarity index, molecular ion, reverse high-resolution mass filter (or similar), and
integrated peak areas, heights, or total spectral ion abundance, for all samples and
blanks. Our software initially assigns Level 2, 3, or 5, and reports the top molecule in
terms of a weighted score or other metric for each level, and whether multiple hits exist
for each level (Level 1 is reserved for manual targeted techniques, and Level 4 may be
project specific and is not implemented at this time). The software then can be used to
apply blank feature filtering (user modifiable parameters), and further filter features to only
retain compounds with unique CAS-RN or other identifiers and compound names
(removes duplicates). Furthermore, the SIF-GC software can be used to compute total
spectral ion abundance for all compounds given an average ratio of total spectral ion
abundance to peak area or height. All steps are automated for Compound Discoverer
(Thermo) outputs but could work, with some minor adjustments, with other vendor
platforms. The SIF-GC software is open source and freely available at
innovativeomics.com/software/.
To evaluate the performance of our SIF-GC software and our schema for reporting
confidence, we applied the tool to datasets acquired from biological and environmental
samples and investigated the false positive and false negative rate assigned for each
level. The first dataset was of human serum samples spiked with 112 standards (90 native
and 22 isotopic labeled chemicals; Table S1A), with a final concentration range of 6.7 to
33.6 ppb. The spiked serum validation dataset consisted mainly of brominated flame
retardants, chlorinated flame retardants, organochlorine pesticides, polycyclic aromatic
hydrocarbons, oxygenated polycyclic aromatic hydrocarbons, nitrated polycyclic aromatic
hydrocarbons, polychlorinated biphenyls, polychlorinated dibenzodioxins, and
polychlorinated dibenzofurans. The second dataset describes indoor and outdoor air
samples collected using passive samplers from homes near natural gas compressor
stations. A targeted panel of 81 standards (70 native and 11 isotopic labeled chemicals;
Table S1B) with a final concentration range of 32 to 1000 ppb were assessed for these
environmental samples. This validation set consisted of an alkaloid, benzodioxoles,
benzopyrans, brominated flame retardants, chlorinated hydrocarbons, haloethers,
nitroaromatics, nitrosamines, phthalates, pyrethroids, organochlorine pesticides,
organophosphates, polycyclic aromatic hydrocarbons, polychlorinated biphenyls, and
volatile organic compounds. We compared Level 1 assigned compounds (targeted
methods used to confidently assigned compounds) with compounds assigned Level 2, 3,
or 5 using the GC-SIF software. A list of experimental protocols is provided in the
supplemental information.
5. RESULTS: HOW CONFIDENT ARE THE CONFIDENCE LEVELS?
We selected qualitative criteria that can be readily implemented given common metrics
provided by most GC open source and vendor library search software. These common
metrics included retention index (delta), spectral match (in this case reverse similarity
index was used), and a metric incorporating exact mass (in this case reverse high-
resolution mass filter criteria was used). Using the GC-SIF software, we determined the
false negative, false positive, and true positive rate when utilizing our criteria for assigning
confidence at Levels 2 and 3. For the human serum samples, of the 90 spiked unlabeled
15
standards which were detected by GC-HRMS, 80 were deconvoluted by Thermo
Compound Discoverer. Of these 80 chemicals, 61 were assigned a Level 2 annotation
(76%), and of these Level 2 annotations, the correct exact structure was only identified
for 28 compounds (46%). Hence in absolute terms, using our scoring criteria, the highest
level of confidence for suspect screening, there was a 54% false positive for Level 2
assignments, 60% false positive rate for Level 3 assignments, and 84% false positive rate
for top-ranked assignments without Level 2 or 3 criteria (Figure 2A). Of the deconvoluted
spiked standards, 19 (24%) had no Level 2 assignment and hence were considered false
negatives. In many cases these false negatives are due to unavailable libraries or
retention index values for the standard.
Figure 2: False positive rates and number of assignments across each
confidence level for 81 observed unlabeled standards spiked into human serum
(A), and 19 Level 1 assignments from outdoor and indoor air samples (B).
16
For air samples, there was a 22% false positive rate for Level 2 assignments, 89% false
positive rate for Level 3 assignments, and 74% false positive rate for top-ranked
assignments without Level 2 or 3 criteria (Figure 2B). It should be noted there was a
lower false positive rate in air samples compared to the human serum samples. This
difference is attributable the standards used with each sample type: serum samples were
spiked with a mixture that was mainly comprised of halogenated isomers which could not
be distinguished by retention index and EI spectra, whereas the air samples included a
wider range of chemicals. Therefore, for exact structure the air samples may be more
representative of the false positive rate for Level 2 when the range of chemicals being
annotated is a cross a diverse array of structures, many of which do not have close
isomers.
Even when using retention index, reverse high-resolution mass filter, reverse search
index, molecular ion, and search index there are subtle isomeric differences (e.g., Figure
3), the exact structure often cannot be discerned using GC-EI-HRMS (hence the 54%
false positive rate in serum samples and 22% false positive rate in air samples). When
subtle isomeric differences in the position of methyl, chlorine, heteroatoms, and aromatic
rings were not considered false positives, 13% of the 61 top-ranked Level 2 assignments
in the spiked serum were false positives, 25% of the top-ranked Level 3 assignments
were false positives, and 61% of the top-ranked assignments without Level 2 or 3 criteria
(Level 5) were false positives (Figure 2A). Similarly, when not considering highly similar
isomers false positives, for air samples, 11% of the 18 top-ranked Level 2 assignments
were false positives, 79% of the top-ranked Level 3 assignments were false positives,
and 68% of the top-ranked assignments without Level 2 or 3 criteria (Level 5) were false
positives (Figure 2B).
Figure 3: Examples of 10 isomers (dimethyl naphthalene and ethyl naphthalene)
which were all assigned Level 2 for a single feature. The dimethyl naphthalene
species represented one of the highest abundance features in air samples.
Application of the proposed GC-HRMS scoring framework in datasets from biological and
environmental samples highlighted that: 1) retention index is essential for confident
assignments and 2) structurally similar isomers often cannot be resolved even with a
retention index match. Exact mass filters (e.g., reverse high-resolution mass filter used
here) were also shown to be valuable, especially when heteroatoms exist. There were
26% false positives using reverse high-resolution mass filter (Level 3) compared to the
63% false positive rate identified without any filters for the validation set containing mostly
chlorinated species (Figure 2A). Reverse high-resolution mass filter was also found to
be valuable in the validation set containing mostly phthalates and PAHs (Table 1).
17
A summary of each annotation filter, the number of candidates retained, and the
percentage of candidates removed after applying the filters for air samples is shown in
Table 1. The individual filters which removed the most candidates on average were
requiring molecular ion (83 ± 14% removed), followed by retention index match < 100 (76
± 10% removed), reverse high-resolution mass filter > 75% (40 ± 27% removed), and
reverse search index (40 ± 12% removed) (Table 1). These findings highlight the
importance of the retention index criteria for removing false positives. When retention
index is combined with reverse high-resolution mass filter and reverse search index for a
Level 2 assignment, 84 ± 8% candidates were removed on average. Hence combining
these three filters further lowers the false positive rate. While use of molecular ion was
the most stringent filtering criterion, reducing false positives, decreasing the false positive
rate must be balanced against the increase in false negatives (number of correct
candidates removed using the respective filter). Use of the molecular ion drastically
increased false negatives.
Table 1: Summary of confidence in annotated chemicals detected in outdoor and
indoor air samples. Individual layers of GC-HRMS evidence were not sufficient at
differentiating between similar chemicals. Detected chemicals were filtered based on
multiple criteria including retention index, reverse search index, reverse high-resolution
mass filter, and observation of the molecular ion. The number of retained candidates after
applying all criteria for Levels 2 and 3 is shown for individual chemicals as well as each
Level 2 criteria. The overall percent of candidates removed is also presented.
False negatives were designated when the correct assignment (exact structural match)
was removed by the respective filter. Requiring the molecular ion led to five false
negatives out of the 19 validated assignments (26%; Table 1), use of retention index led
to 11% false negatives (2/19), reverse search index to 5% false negatives (1/19), and
reverse high-resolution mass filter to 0% false negatives (data not shown). When
retention index, reverse search index, and reverse high-resolution mass filter were
combined (Level 2 assignment) there were 16% false negatives (3/19; Table 1). The false
negative rate for requiring molecular ion may be even higher, as this validation set
consisted of PAHs rbons and derivatives (over 50% of validation set); PAHs have a high
molecular ion signal. Therefore, the generalized criteria we present do not necessitate
Total
Detected
Level 3,
All Criteria
Level 2,
All Criteria
Level 2,
Rentention Index
Criteria
Level 2,
Reverse Search
Index Criteria
Level 2,
Reverse High-Resolution
Mass Filter Criteria
Correct Molecular
Ion Observed
Benz[a]anthracene 25 14 3 5 18 16 11 Yes Yes
Chrysene 39 19 3 4 33 24 11 Yes Yes
Fluoranthene 54 28 9 12 34 43 6 Yes Yes
Fluorene 33 29 7 9 32 29 7 Yes Yes
Acenaphthylene 33 23 7 12 33 23 5 Yes Yes
Naphthalene 40 32 8 11 40 32 9 Yes Yes
Phenanthrene 38 38 7 7 38 38 13 Yes Yes
Pyrene 53 23 7 7 36 34 6 Yes Yes
2-Chloronaphthalene 27 13 2 6 27 13 3 Yes Yes
Dibenzofuran 49 29 5 16 48 29 12 Yes Yes
Hexachlorobenzene 32 9 6 12 31 9 5 Yes Yes
Hexachlorobutadiene 32 2 1 2 26 2 1 Yes Yes
Tris(1-chloro-2-propyl) phosphate 20 5 3 7 17 7 0 No No
4,4'-Dibromooctafluorobiphenyl 27 1 1 2 27 1 1 Yes Yes
Isophorone 49 46 19 19 49 46 8 Yes Yes
N-Nitrosodiphenylamine 46 35 4 4 46 35 23 No No
Butylbenzyl phthalate 21 7 4 7 18 7 3 Yes No
Diethyl phthalate 28 18 4 6 27 18 0 Yes No
Di-n-octyl phthalate 53 45 9 10 52 45 1 No No
Percent Filtered 44 ± 28 84 ± 8 76 ± 10 9 ± 12 40 ± 27 83 ± 14
Number of Candidates
Molecular Ion
Observed for
Correct Candidate
Correct Level 2
Assignment
Unlabeled Standard
18
the molecular ion (retention index removes many of the same candidates and the false
negative rate of requiring molecular ion is too high), but depending on chemical class, for
example when looking at PAHs, necessitating molecular ion may be a beneficial
additional filter. When adding molecular ion as a requirement for Level 2 assignments a
slight decrease in false positives was observed, but a significant increase in false
negatives was also observed (Figure S1).
6. CONCLUDING REMARKS: UTILITY, POTENTIAL CHALLENGES, AND FUTURE
WORK
It is important to note that the assignment of confidence is based on filters, and often
multiple candidates remain even after stringent filtering criteria. For example, for the
passive air sampler dataset, only 2 out of the 19 features with a Level 1 assignment had
a single candidate retained after Level 2 filtering, the remaining 89% of assigned features
had more than one Level 2 candidate (Table 1). Current filter cutoffs are voluntaristic,
based on expert consensus. Optimization of filters may be able to further reduce false
positives, while limiting false negatives, but this will be compound class and workflow
specific. Furthermore, multiple candidates will always exist for multiple features
regardless of thresholds to keep false negatives low. Therefore, ranking algorithms are
essential. When adding the Level 2 criteria to the ranking algorithm from Thermo
Compound Discoverer a significant number of false positives were removed without a
significant increase in false negatives for both validation datasets (Figure 2). Using
search index scores for ranking (e.g., dot product) performed similarly, and hence simple
scoring metrics can be chosen which are not vendor specific. Further ranking based on
meta-data can also be beneficial using information such as database occurrence. For
example, in the air samples evaluated, DEET was observed, but a close isomer which is
not commonly found the environment was ranked first and DEET was ranked second
(both Level 2 assignments). Database matching could be used to rank DEET higher; this
compound was expected as individuals residing in residences where air samples were
collected reported use of this insect repellant.
We outline a qualitative scoring framework for assigning confidence to chemical
assignments in GC-HRMS data. The proposed schema was developed to be actionable
with easy implementation using most deconvolution and annotation algorithm outputs.
Furthermore, we introduce an open access software tool (SIF-GC) to automate feature
scoring. The assignment of confidence follows a five-level scoring framework that has
been well-established by Schymanski et al. for LC-HRMS/MS. The criteria used to
determine confidence for GC-HRMS are common metrics in the GC field, including blank
filtering, (reverse) spectral matches, (reverse) high-resolution mass filters or exact mass
spectral matches, retention index, and molecular ion. Application of the proposed scoring
system to biological and environmental datasets resulted in a false positive rate of 12%
and 11%, respectively, for Level 2 assignments, the highest level which can be assigned
when using suspect screening approaches based on our schema. The false negative rate
(removal of the correct candidate) for Level 2 assignments was found to be 16% for the
air samples and 24% for the spiked human serum. When close isomers (e.g., position of
chlorine(s) on PCBs or methyl groups on PAHs) are considered false positives this false
positive rate is much higher (22% for air samples and 54% for human serum). Hence
19
when using suspect screening for GC, in most cases retention index alone could not
distinguish close isomers. In this case, elution order rather than retention index can be
valuable, although this is difficult to automate and implement on a large scale.
While Level 2 assignments can provide relatively confident annotations, at least at the
level of isomers with subtle difference from the exact structure, Level 3 assignments were
more tentative, and confidence varied widely depending on structure. Nonetheless Level
3 criteria improved assignment confidence over not using any criteria when chemicals
with heteroatoms are investigated. Furthermore, Level 4 assignments can be used to
assign chemical classes, when for example, all PAHs are of interest even when exact
structure cannot be determined. It is important to note that depending on matrix,
acquisition methods, and analyte classes, different false positive and negative rates may
be determined. Future work applying the confidence levels presented here on standard
reference material can be used to have quantitative definition of levels of confidence a
priori, e.g., Level 1 – 95%, Level 2 – 80%, and Level 3 – 50%.
The use of filtering criteria decreased the false positive rate as compared to common
ranking algorithms alone (e.g., Thermo Compound Discover’s ranking algorithm). For the
>100 molecules validated in this study, use of retention index, reverse search index, and
reverse high-resolution mass filtering did not substantially increase false negatives.
Therefore, Level 2 annotations can be exclusively considered for many suspect screening
applications. The requirement of molecular ion for EI did increase false negatives
significantly and is only recommended for certain chemical classes which are highly
stable, for example most PAHs. It is important to mention that the use of accurate mass,
either through molecular ion accurate mass matches, high-resolution mass spectral
libraries, or through reverse high-resolution mass filters or other techniques is beneficial
for reducing false positives. We did not discern between instruments with different
resolving power (our validation datasets were acquired at 40,000 and 60,000 for the air
and serum samples, respectively), but this may influence the false positive and negative
rate when using Level 2 and Level 3 filters. Furthermore, Orbitraps and other trap
instruments may have ions generated which cannot be predicted from molecular formula,
and hence Q-TOFs may have a lower false negative rate in this regard.
While the criteria included in the proposed GC-HRMS scoring framework substantially
decreased false positives and can be used to assign more confident annotations, other
metrics may also be explored including chemical database occurrence (higher occurrence
means it may be more likely to exist in samples), peak shape, and use of ECNI, PCI,
APCI, and other ionization techniques for validation. Database matching may be
especially helpful when close isomers are observed, but only a common isomer exists.
Furthermore, even after Level 2 filtering, in nearly all cases multiple candidates were
retained. Therefore, more quantitative scoring metrics for ranking need still be applied to
determine the correct candidate. We find that total score provided by Compound
Discoverer or simply using reverse search index to rank candidates works well in tandem
with Level 2 filters. The Level 2 filters were further found to reduce false positives
substantially using either of these ranking algorithms, then when using these ranking
algorithms alone.
20
The proposed confidence scores provide a strategy for communicating levels of evidence
that were used to predict annotation of chemicals detected using non-targeted analysis.
While false positives were still present across all annotation levels, this approach assists
in identifying the likelihood a match is correct and can be used to facilitate selection of
potential compounds for further evaluation and validation using additional analytical
techniques. This scoring framework can be easily extended to GC-HRMS data for
exposome analysis and provides a foundation for linking annotations performed across
multiple laboratories and studies and building cumulative databases.
21
REFERENCES
Aalizadeh, R., Alygizakis, N.A., Schymanski, E.L., Krauss, M., Schulze, T., Ibáñez, M.,
McEachran, A.D., Chao, A., Williams, A.J., Gago-Ferrero, P., Covaci, A., Moschet, C.,
Young, T.M., Hollender, J., Slobodnik, J., Thomaidis, N.S., 2021. Development and
Application of Liquid Chromatographic Retention Time Indices in HRMS-Based Suspect
and Nontarget Screening. Anal. Chem. 93, 11601–11611.
https://doi.org/10.1021/acs.analchem.1c02348
Allen, F., Pon, A., Greiner, R., Wishart, D., 2016. Computational Prediction of Electron
Ionization Mass Spectra to Assist in GC/MS Compound Identification. Anal. Chem. 88,
7689–7697. https://doi.org/10.1021/acs.analchem.6b01622
Barwick, V.J., 1999. Sources of uncertainty in gas chromatography and high-performance liquid
chromatography. Journal of Chromatography A 849, 13–33.
https://doi.org/10.1016/S0021-9673(99)00537-3
Boegelsack, N., Sandau, C., McMartin, D.W., Withey, J.M., O’Sullivan, G., 2021. Development
of retention time indices for comprehensive multidimensional gas chromatography and
application to ignitable liquid residue mapping in wildfire investigations. Journal of
Chromatography A 1635, 461717. https://doi.org/10.1016/j.chroma.2020.461717
Borgsmüller, N., Gloaguen, Y., Opialla, T., Blanc, E., Sicard, E., Royer, A.-L., Le Bizec, B.,
Durand, S., Migné, C., Pétéra, M., Pujos-Guillot, E., Giacomoni, F., Guitton, Y., Beule,
D., Kirwan, J., 2019. WiPP: Workflow for Improved Peak Picking for Gas
Chromatography-Mass Spectrometry (GC-MS) Data. Metabolites 9, 171.
https://doi.org/10.3390/metabo9090171
Doherty, B.T., Koelmel, J.P., Lin, E.Z., Romano, M.E., Godri Pollitt, K.J., 2021. Use of
Exposomic Methods Incorporating Sensors in Environmental Epidemiology. Curr Envir
Health Rpt 8, 34–41. https://doi.org/10.1007/s40572-021-00306-8
Elie, N., Santerre, C., Touboul, D., 2019. Generation of a Molecular Network from Electron
Ionization Mass Spectrometry Data by Combining MZmine2 and MetGem Software.
Anal. Chem. 91, 11489–11492. https://doi.org/10.1021/acs.analchem.9b02802
Fujiyoshi, T., Ikami, T., Sato, T., Kikukawa, K., Kobayashi, M., Ito, H., Yamamoto, A., 2016.
Evaluation of the matrix effect on gas chromatography – mass spectrometry with carrier
gas containing ethylene glycol as an analyte protectant. Journal of Chromatography A
1434, 136–141. https://doi.org/10.1016/j.chroma.2015.12.085
Grőbler, A., 1972. A Polar Retention Index System for Gas-Liquid Chromatography. Journal of
Chromatographic Science 10, 128. https://doi.org/10.1093/chromsci/10.2.128
Halket, J.M., Waterman, D., Przyborowska, A.M., Patel, R.K.P., Fraser, P.D., Bramley, P.M.,
2005. Chemical derivatization and mass spectral libraries in metabolic profiling by
GC/MS and LC/MS/MS. Journal of Experimental Botany 56, 219–243.
https://doi.org/10.1093/jxb/eri069
Hall, L.M., Hall, L.H., Kertesz, T.M., Hill, D.W., Sharp, T.R., Oblak, E.Z., Dong, Y.W., Wishart,
D.S., Chen, M.-H., Grant, D.F., 2012. Development of Ecom50 and Retention Index
Models for Nontargeted Metabolomics: Identification of 1,3-Dicyclohexylurea in Human
Serum by HPLC/Mass Spectrometry. J. Chem. Inf. Model. 52, 1222–1237.
https://doi.org/10.1021/ci300092s
Hummel, J., Strehmel, N., Selbig, J., Walther, D., Kopka, J., 2010. Decision tree supported
substructure prediction of metabolites from GC-MS profiles. Metabolomics 6, 322–333.
https://doi.org/10.1007/s11306-010-0198-7
Jensen, S., 1972. The PCB Story. Ambio 1, 123–131.
Kendrick, Edward., 1963. A Mass Scale Based on CH2 = 14.0000 for High Resolution Mass
Spectrometry of Organic Compounds. Anal. Chem. 35, 2146–2154.
https://doi.org/10.1021/ac60206a048
22
Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B.A., Thiessen,
P.A., Yu, B., Zaslavsky, L., Zhang, J., Bolton, E.E., 2021. PubChem in 2021: new data
content and improved web interfaces. Nucleic Acids Research 49, D1388–D1395.
https://doi.org/10.1093/nar/gkaa971
Koelmel, J.P., Lin, E.Z., Guo, P., Zhou, J., He, J., Chen, A., Gao, Y., Deng, F., Dong, H., Liu, Y.,
Cha, Y., Fang, J., Beecher, C., Shi, X., Tang, S., Godri Pollitt, K.J., 2021. Exploring the
external exposome using wearable passive samplers - The China BAPE study. Environ
Pollut 270, 116228. https://doi.org/10.1016/j.envpol.2020.116228
Koelmel, J.P., Ulmer, C.Z., Jones, C.M., Yost, R.A., Bowden, J.A., 2017. Common cases of
improper lipid annotation using high-resolution tandem mass spectrometry data and
corresponding limitations in biological interpretation. Biochim Biophys Acta Mol Cell Biol
Lipids 1862, 766–770. https://doi.org/10.1016/j.bbalip.2017.02.016
Koopman, J., Grimme, S., 2021. From QCEIMS to QCxMS: A Tool to Routinely Calculate CID
Mass Spectra Using Molecular Dynamics. J. Am. Soc. Mass Spectrom. 32, 1735–1751.
https://doi.org/10.1021/jasms.1c00098
Kováts, E., 1958. Gas-chromatographische Charakterisierung organischer Verbindungen. Teil
1: Retentionsindices aliphatischer Halogenide, Alkohole, Aldehyde und Ketone.
Helvetica Chimica Acta 41, 1915–1932. https://doi.org/10.1002/hlca.19580410703
Kwiecien, N.W., Bailey, D.J., Rush, M.J.P., Cole, J.S., Ulbrich, A., Hebert, A.S., Westphall,
M.S., Coon, J.J., 2015. High-Resolution Filtering for Improved Small Molecule
Identification via GC/MS. Anal. Chem. 87, 8328–8335.
https://doi.org/10.1021/acs.analchem.5b01503
Lai, Z., Tsugawa, H., Wohlgemuth, G., Mehta, S., Mueller, M., Zheng, Y., Ogiwara, A., Meissen,
J., Showalter, M., Takeuchi, K., Kind, T., Beal, P., Arita, M., Fiehn, O., 2018. Identifying
metabolites by integrating metabolome databases with mass spectrometry
cheminformatics. Nat Methods 15, 53–56. https://doi.org/10.1038/nmeth.4512
Landrigan, P.J., Fuller, R., Acosta, N.J.R., Adeyi, O., Arnold, R., Basu, N. (Nil), Baldé, A.B.,
Bertollini, R., Bose-O’Reilly, S., Boufford, J.I., Breysse, P.N., Chiles, T., Mahidol, C.,
Coll-Seck, A.M., Cropper, M.L., Fobil, J., Fuster, V., Greenstone, M., Haines, A.,
Hanrahan, D., Hunter, D., Khare, M., Krupnick, A., Lanphear, B., Lohani, B., Martin, K.,
Mathiasen, K.V., McTeer, M.A., Murray, C.J.L., Ndahimananjara, J.D., Perera, F.,
Potočnik, J., Preker, A.S., Ramesh, J., Rockström, J., Salinas, C., Samson, L.D.,
Sandilya, K., Sly, P.D., Smith, K.R., Steiner, A., Stewart, R.B., Suk, W.A., Schayck,
O.C.P. van, Yadama, G.N., Yumkella, K., Zhong, M., 2018. The Lancet Commission on
pollution and health. The Lancet 391, 462–512. https://doi.org/10.1016/S0140-
6736(17)32345-0
Lee, M.L., Vassilaros, D.L., White, C.M., 1979. Retention indices for programmed-temperature
capillary-column gas chromatography of polycyclic aromatic hydrocarbons. Anal. Chem.
51, 768–773. https://doi.org/10.1021/ac50042a043
Little, J.L., Howard, A.S., 2013. Qualitative Gas Chromatography–Mass Spectrometry Analyses
Using Amines as Chemical Ionization Reagent Gases. J. Am. Soc. Mass Spectrom. 24,
1913–1918. https://doi.org/10.1007/s13361-013-0740-8
Mahieu, N.G., Huang, X., Chen, Y.-J., Patti, G.J., 2014. Credentialing Features: A Platform to
Benchmark and Optimize Untargeted Metabolomic Methods. Anal. Chem. 86, 9583–
9589. https://doi.org/10.1021/ac503092d
Mallett, T., 2014. Mass Spectral Libraries - Reproducibility in EI, API and Tandem Mass
Spectrometry. Wiley Analytical Science.
Martin, K.A.V., Lin, E.Z., Hilbert, T.J., Pollitt, K.J.G., Haynes, E.N., 2021. Survey of airborne
organic compounds in residential communities near a natural gas compressor station:
Response to community concern. Environmental Advances 5, 100076.
https://doi.org/10.1016/j.envadv.2021.100076
23
Matsuo, T., Tsugawa, H., Miyagawa, H., Fukusaki, E., 2017. Integrated Strategy for Unknown
EI–MS Identification Using Quality Control Calibration Curve, Multivariate Analysis, EI–
MS Spectral Database, and Retention Index Prediction. Anal. Chem. 89, 6766–6773.
https://doi.org/10.1021/acs.analchem.7b01010
Matyushin, D.D., Sholokhova, A.Yu., Karnaeva, A.E., Buryak, A.K., 2020. Various aspects of
retention index usage for GC-MS library search: A statistical investigation using a
diverse data set. Chemometrics and Intelligent Laboratory Systems 202, 104042.
https://doi.org/10.1016/j.chemolab.2020.104042
Mazur, D.M., Detenchuk, E.A., Sosnova, A.A., Artaev, V.B., Lebedev, A.T., 2021. GC-HRMS
with Complementary Ionization Techniques for Target and Non-target Screening for
Chemical Exposure: Expanding the Insights of the Air Pollution Markers in Moscow
Snow. Science of The Total Environment 761, 144506.
https://doi.org/10.1016/j.scitotenv.2020.144506
McEachran, A.D., Sobus, J.R., Williams, A.J., 2017. Identifying known unknowns using the US
EPA’s CompTox Chemistry Dashboard. Anal Bioanal Chem 409, 1729–1735.
https://doi.org/10.1007/s00216-016-0139-z
Miwa, T.K., Mikolajczak, K.L., Earle, F.R., Wolbt, I.A., 1960. Gas chromatographic
characterization of fatty acids. Identification constants for mono- and tlicarboxylic methyl
esters. Analytical Chemistry 32, 1739–1742.
National Report on Human Exposure to Environmental Chemicals, 2021.
https://doi.org/10.15620/cdc:105345
Patterson, R.E., Kirpich, A.S., Koelmel, J.P., Kalavalapalli, S., Morse, A.M., Cusi, K., Sunny,
N.E., McIntyre, L.M., Garrett, T.J., Yost, R.A., 2017. Improved experimental data
processing for UHPLC–HRMS/MS lipidomics applied to nonalcoholic fatty liver disease.
Metabolomics 13, 142. https://doi.org/10.1007/s11306-017-1280-1
Peng, C.T., 2010a. Prediction of retention indices. VI: Isothermal and temperature-programmed
retention indices, methylene value, functionality constant, electronic and steric effects.
Journal of Chromatography A 1217, 3683–3694.
https://doi.org/10.1016/j.chroma.2010.02.005
Peng, C.T., 2010b. Prediction of retention indices. VI: Isothermal and temperature-programmed
retention indices, methylene value, functionality constant, electronic and steric effects.
Journal of Chromatography A 1217, 3683–3694.
https://doi.org/10.1016/j.chroma.2010.02.005
Prüss-Üstün, A., Wolf, J., Corvalán, C., Organization, W.H., Bos, R., Neira, D.M., 2016.
Preventing Disease Through Healthy Environments: A Global Assessment of the Burden
of Disease from Environmental Risks. World Health Organization.
Rappaport, S.M., 2016. Genetic Factors Are Not the Major Causes of Chronic Diseases. PLOS
ONE 11, e0154387. https://doi.org/10.1371/journal.pone.0154387
Risum, A.B., Bro, R., 2019. Using deep learning to evaluate peaks in chromatographic data.
Talanta 204, 255–260. https://doi.org/10.1016/j.talanta.2019.05.053
Rostkowski, P., Haglund, P., Aalizadeh, R., Alygizakis, N., Thomaidis, N., Arandes, J.B.,
Nizzetto, P.B., Booij, P., Budzinski, H., Brunswick, P., Covaci, A., Gallampois, C.,
Grosse, S., Hindle, R., Ipolyi, I., Jobst, K., Kaserzon, S.L., Leonards, P., Lestremau, F.,
Letzel, T., Magnér, J., Matsukami, H., Moschet, C., Oswald, P., Plassmann, M.,
Slobodnik, J., Yang, C., 2019. The strength in numbers: comprehensive characterization
of house dust using complementary mass spectrometric techniques. Anal Bioanal Chem
411, 1957–1977. https://doi.org/10.1007/s00216-019-01615-6
Sakthiselvi, T., Paramasivam, M., Vasanthi, D., Bhuvaneswari, K., 2020. Persistence, dietary
and ecological risk assessment of indoxacarb residue in/on tomato and soil using GC–
MS. Food Chemistry 328, 127134. https://doi.org/10.1016/j.foodchem.2020.127134
24
Scheubert, K., Hufsky, F., Petras, D., Wang, M., Nothias, L.-F., Dührkop, K., Bandeira, N.,
Dorrestein, P.C., Böcker, S., 2017. Significance estimation for large scale metabolomics
annotations by spectral matching. Nat Commun 8, 1494. https://doi.org/10.1038/s41467-
017-01318-5
Schreckenbach, S.A., Anderson, J.S.M., Koopman, J., Grimme, S., Simpson, M.J., Jobst, K.J.,
2021. Predicting the Mass Spectra of Environmental Pollutants Using Computational
Chemistry: A Case Study and Critical Evaluation. J. Am. Soc. Mass Spectrom. 32,
1508–1518. https://doi.org/10.1021/jasms.1c00078
Schrimpe-Rutledge, A.C., Codreanu, S.G., Sherrod, S.D., McLean, J.A., 2016. Untargeted
Metabolomics Strategies—Challenges and Emerging Directions. J. Am. Soc. Mass
Spectrom. 27, 1897–1905. https://doi.org/10.1007/s13361-016-1469-y
Schymanski, E.L., Gallampois, C.M.J., Krauss, M., Meringer, M., Neumann, S., Schulze, T.,
Wolf, S., Brack, W., 2012. Consensus Structure Elucidation Combining GC/EI-MS,
Structure Generation, and Calculated Properties. Anal. Chem. 84, 3287–3295.
https://doi.org/10.1021/ac203471y
Schymanski, E.L., Jeon, J., Gulde, R., Fenner, K., Ruff, M., Singer, H.P., Hollender, J., 2014.
Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating
Confidence. Environ. Sci. Technol. 48, 2097–2098. https://doi.org/10.1021/es5002105
Schymanski, E.L., Kondić, T., Neumann, S., Thiessen, P.A., Zhang, J., Bolton, E.E., 2021.
Empowering large chemical knowledge bases for exposomics: PubChemLite meets
MetFrag. Journal of Cheminformatics 13, 19. https://doi.org/10.1186/s13321-021-00489-
0
Škrbić, B., Onjia, A., 2006. Prediction of the Lee retention indices of polycyclic aromatic
hydrocarbons by artificial neural network. Journal of Chromatography A 1108, 279–284.
https://doi.org/10.1016/j.chroma.2006.01.080
Spackman, P.R., Bohman, B., Karton, A., Jayatilaka, D., 2018. Quantum chemical electron
impact mass spectrum prediction for de novo structure elucidation: Assessment against
experimental reference data and comparison to competitive fragmentation modeling.
International Journal of Quantum Chemistry 118, e25460.
https://doi.org/10.1002/qua.25460
Stein, S., 2012. Mass Spectral Reference Libraries: An Ever-Expanding Resource for Chemical
Identification. Anal. Chem. 84, 7274–7282. https://doi.org/10.1021/ac301205z
Stein, S.E., 1995. Chemical substructure identification by mass spectral library searching. J.
Am. Soc. Mass Spectrom. 6, 644–655. https://doi.org/10.1016/1044-0305(95)00291-K
Stein, S.E., Babushok, V.I., Brown, R.L., Linstrom, P.J., 2007a. Estimation of Kováts Retention
Indices Using Group Contributions. J. Chem. Inf. Model. 47, 975–980.
https://doi.org/10.1021/ci600548y
Stein, S.E., Babushok, V.I., Brown, R.L., Linstrom, P.J., 2007b. Estimation of Kováts Retention
Indices Using Group Contributions. J. Chem. Inf. Model. 47, 975–980.
https://doi.org/10.1021/ci600548y
Stokstad, E., 2020. Why were salmon dying? The answer washed off the road. Science 370,
1145–1145. https://doi.org/10.1126/science.370.6521.1145
Strehmel, N., Hummel, J., Erban, A., Strassburg, K., Kopka, J., 2008. Retention index
thresholds for compound matching in GC–MS metabolite profiling. Journal of
Chromatography B, Hyphenated Techniques for Global Metabolite Profiling 871, 182–
190. https://doi.org/10.1016/j.jchromb.2008.04.042
Travis, S.C., Kordas, K., Aga, D.S., 2021. Optimized workflow for unknown screening using gas
chromatography high-resolution mass spectrometry expands identification of
contaminants in silicone personal passive samplers. Rapid Commun Mass Spectrom 35,
e9048. https://doi.org/10.1002/rcm.9048
25
US EPA, O., 2021. What Killed the Eagles? EPA Researchers Help Solve 25+ Year Mystery
[WWW Document]. URL https://www.epa.gov/sciencematters/what-killed-eagles-epa-
researchers-help-solve-25-year-mystery (accessed 4.7.22).
Vermeulen, R., Schymanski, E.L., Barabási, A.-L., Miller, G.W., 2020. The exposome and
health: Where chemistry meets biology. Science 367, 392–396.
https://doi.org/10.1126/science.aay3164
von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P., 2002.
Comparative assessment of large-scale data sets of protein–protein interactions. Nature
417, 399–403. https://doi.org/10.1038/nature750
Wang, S., Kind, T., Tantillo, D.J., Fiehn, O., 2020. Predicting in silico electron ionization mass
spectra using quantum chemistry. Journal of Cheminformatics 12, 63.
https://doi.org/10.1186/s13321-020-00470-3
Wei, J.N., Belanger, D., Adams, R.P., Sculley, D., 2019. Rapid Prediction of Electron–Ionization
Mass Spectrometry Using Neural Networks. ACS Cent. Sci. 5, 700–708.
https://doi.org/10.1021/acscentsci.9b00085
Wei, X., Koo, I., Kim, S., Zhang, X., 2014. Compound identification in GC-MS by simultaneously
evaluating the mass spectrum and retention index. Analyst 139, 2507–2514.
https://doi.org/10.1039/C3AN02171H
Wheelock, C.E., Rappaport, S.M., 2020. The role of gene–environment interactions in lung
disease: the urgent need for the exposome. European Respiratory Journal 55.
https://doi.org/10.1183/13993003.02064-2019
Wild, C.P., 2012. The exposome: from concept to utility. International Journal of Epidemiology
41, 24–32. https://doi.org/10.1093/ije/dyr236
Wild, C.P., 2005. Complementing the Genome with an “Exposome”: The Outstanding Challenge
of Environmental Exposure Measurement in Molecular Epidemiology. Cancer Epidemiol
Biomarkers Prev 14, 1847–1850. https://doi.org/10.1158/1055-9965.EPI-05-0456
Williams, A.J., Grulke, C.M., Edwards, J., McEachran, A.D., Mansouri, K., Baker, N.C.,
Patlewicz, G., Shah, I., Wambaugh, J.F., Judson, R.S., Richard, A.M., 2017. The
CompTox Chemistry Dashboard: a community data resource for environmental
chemistry. Journal of Cheminformatics 9, 61. https://doi.org/10.1186/s13321-017-0247-6
Zellner, B. d’Acampora, Bicchi, C., Dugo, P., Rubiolo, P., Dugo, G., Mondello, L., 2008. Linear
retention indices in gas chromatographic analysis: a review. Flavour and Fragrance
Journal 23, 297–314. https://doi.org/10.1002/ffj.1887
Zhang, P., Carlsten, C., Chaleckis, R., Hanhineva, K., Huang, M., Isobe, T., Koistinen, V.M.,
Meister, I., Papazian, S., Sdougkou, K., Xie, H., Martin, J.W., Rappaport, S.M.,
Tsugawa, H., Walker, D.I., Woodruff, T.J., Wright, R.O., Wheelock, C.E., 2021. Defining
the Scope of Exposome Studies and Research Needs from a Multidisciplinary
Perspective. Environ. Sci. Technol. Lett. 8, 839–852.
https://doi.org/10.1021/acs.estlett.1c00648
26
SUPPLEMENTAL INFORMATION
An Actionable Annotation Scoring Framework For GC-HRMS
Jeremy P. Koelmel1, Hongyu Xie 2, Elliott J. Price3, Elizabeth Z. Lin1, Katherine E.
Manz4, Paul Stelben1, Mathew Paige1, Stefano Papazian2, Joseph Okeme1, Dean P.
Jones5, Dinesh Barupal5, John Bowden11,12, Pawel Rostkowski6, Kurt D. Pennell4,
Vladimir Nikiforov7, Thanh Wang8, Xin Hu10, Yunjia Lai9, Gary W. Miller9, Douglas I.
Walker5*, Jonathan W. Martin2, and Krystal J. Godri Pollitt1*
1 Department of Environmental Health Science,
Yale School of Public Health, New Haven, CT, 06520, United States
2 Department of Environmental Science, Science for Life Laboratory,
Stockholm University, Stockholm 10691, Sweden
3 RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic
4 School of Engineering, Brown University, Providence, RI 02912, United States
5 Department of Medicine, School of Medicine,
Emory University, Atlanta 30322, United States
6 Department of Environmental Medicine and Public Health,
Icahn School of Medicine at Mount Sinai, New York, NY, 10029, United States
7 NILU- Norwegian Institute for Air Research, 2007 Kjeller, Norway
8 NILU- Norwegian Institute for Air Research, Framsenteret, 9007 Tromsø, Norway
9 MTM Research Centre, Örebro University, SE-701 82, Örebro, Sweden
10 Department of Environmental Health Sciences, Mailman School of Public Health,
Columbia University, New York, NY 10032, United States
11 Center for Environmental and Human Toxicology & Department of Physiological
Sciences, University of Florida, Gainesville, FL, 32603, United States
12 Department of Chemistry, University of Florida, Gainesville, FL, 32603 United States
*Corresponding author
Krystal J. Godri Pollitt - krystal.pollitt@yale.edu
Douglas I. Walker - douglas.walker@mssm.edu
27
S1. LEVERAGING EVIDENCE FROM GC-HRMS TO REDUCE FALSE POSITIVES
AND NEGATIVES
S1.1. Feature filtering to remove artifacts and signal from background
After acquiring data from blanks which are both field and extraction blanks, algorithms
can be implemented to establish a compound specific blank threshold. If a certain
percentile of samples does not fall above this blank threshold, the feature can be
removed. For example, one method, blank feature filtering has been found to outperform
most other filtering methods (Koelmel et al., 2021; Patterson et al., 2017). Using this
method, the percentile or average of samples must be greater than the blank feature filter
threshold (BFFthreshold) calculated as:
BFFthreshold = c x (Baverage+3×Bstandard deviation)
Where c is a constant (usually ranging between 2 and 10) and B is the blank signal.
A complimentary method to remove blank signatures is to look for signals which do not
behave like well retained compounds (Borgsmüller et al., 2019; Risum and Bro, 2019).
Many background signals (e.g., column bleed) will not form sharp peaks and/or will be
highly variable in quality control samples injected across the instrument run. Consistency
of deconvoluted spectra with respect to differing intensity in samples across batch in
quality control samples or pools can therefore be used to remove background features
(when many QCs are acquired). Furthermore, quality control calibration curves can be
used to remove non-linear peaks, but this requires that the measured compounds are
generally within the linear range in the quality controls (Matsuo et al., 2017). In this case
non-linear peaks at higher concentrations of the quality controls likely indicate that the
source of the chemical is not from the sample, and if it is, that relative comparisons
between samples is challenging. All these methods which do not implement a blank
sample will not remove a well-behaved chromatographic peak which is a contaminant,
and hence blank feature filtering is recommended as well if these techniques are used.
Other methods can be used to generate a unique signature for compounds, therefore,
rather than determine and remove blank signals, the analytes can be determined uniquely
and retained. One such method is credentialing using isotopic labeling (Mahieu et al.,
2014). This method incorporates isotopic labeling into cell cultures, and screens for the
characteristic isotopic signals to only retain those metabolites which are generated in the
experiment, effectively removing nearly all background. This method is specific to in vitro
studies, and hence to only a subset of internal exposome studies.
S1.2. Alternative EI spectral libraries and ionization methods to improve coverage
For the discovery of unknown unknowns where no libraries or standards exist, the in-silico
prediction of EI spectra (Allen et al., 2016; Spackman et al., 2018; Wang et al., 2020; Wei
et al., 2019) is more straight forward than collision-induced dissociation (CID) (Koopman
and Grimme, 2021) and higher energy C-trap dissociation (HCD) spectra used for LC-
HRMS analysis. Although, it is important to recognize that false positive rates vary across
all in silico approaches (Schreckenbach et al., 2021). In addition to predicted spectral
matching, EI spectra can be used for molecular networks and substructure
characterization (e.g., (Elie et al., 2019; Hummel et al., 2010; Lai et al., 2018; Stein,
28
1995)). The high degree of information contained in EI fragmentation reduces the number
of similar spectral matches and allows for detailed substructural analysis (e.g., chemical
similarity networks).
Alternative ionization strategies can be used with GC-HRMS to improve detection of the
molecular ion, although these often lack the robustness, sensitivity, and reproducibility of
EI. These techniques include positive chemical ionization (PCI), electron capture negative
ionization (ECNI), and atmospheric pressure chemical ionization (APCI). Unlike EI, the
ionization efficiency for PCI, ECNI, and APCI is dependent on chemical structure. While
ECNI is the least universal ionization technique (in terms of ionization efficiencies across
chemical classes), this approach can be highly selective and provide high sensitivity for
ions which are formed favorably by electron capture (e.g., halide containing compounds
such as chlorinated and brominated byproducts). PCI can be adapted for various
chemical analytes because reagent gases are ionized and used to transfer positive
charge to the analytes. Whereas methane gas is usually used, other reagent gases can
be selected to increase sensitivity for specific classes of molecules (Little and Howard,
2013). APCI can provide higher sensitivity for certain molecules, for example, negative
mode APCI, like ECNI, can reveal halogenated compounds (such as halogenated
paraffins) that standard GC-EI-MS misses. While these methods can be challenging to
implement in standardized nontargeted workflows, they provide an important alternative
ionization method for detection molecular ions and predicting potential formulas in the
identification of unknowns.
S2. METHODS: ACQUISITION OF GC-HRMS DATASETS
S2.1. Acquisition of the human serum validation dataset
The GC-HRMS dataset of human serum samples was obtained by spiking 112 standards
(90 native and 22 labeled chemicals) to commercial pooled human serum (Type AB,
Sigma-Aldrich, Germany), and 500 µL aliquots were extracted after removal of proteins
and lipids at Stockholm University. After spiking with the standard mixture (list in Table
S1A), the extracts were analyzed by GC-HRMS (Q Exactive GC Orbitrap, Thermo
Scientific) using electron ionization (EI) in full scan mode (44–750 m/z, 60,000 resolution
FWHM at 200 m/z). The injection volume was 1 µL for each run. A capillary DB5 column
(30 m × 0.25 mm × 0.25 µm film thickness) was used with the temperature program: 50
°C held for 1 min, ramped 10 °C min−1 to 315 °C and held for 8 min.
S2.2. Acquisition of the outdoor and indoor air validation dataset
Passive air samplers (Fresh Air Clips) were deployed inside and outside homes proximal
to a natural gas compressor station in Jefferson County, Ohio. Methods have been
previously described (Martin et al., 2021). Briefly, airborne containments were collected
using polydimethylsiloxane (PDMS) sorbent bars that were housed in a
polytetrafluoroethylene (PTFE) chamber. This assembly was mounted in a magnetic clip
under a weather cover during the sampling period. PDMS sorbent bars were cleaned at
the Yale School of Public Health and shipped together with other materials to the study
site. Four cleaned PDMS sorbent bars were placed into the PTFE chamber and deployed
at indoor and outdoor locations at the home of study participants. At the end of the
sampling period, passive air samples were collected, shipped back to the Yale School of
29
Public Health using cold chain transport, and stored at -80 °C until analysis. Samples
were spiked with a mixture of labelled standards and extracted by thermal desorption
(Gerstel TDU) and analyzed by GC-HRMS (Q Exactive GC Orbitrap, Thermo Scientific)
using electron ionization (EI) in full scan mode (53–800 m/z, 40,000 resolution FWHM at
200 m/z). A capillary TG-5SILMS column (30 m × 0.25 mm × 0.25 µm film thickness) was
used with the temperature program: 70 °C held for 1 min, ramped 7 °C min−1 to 300 °C
and held for 4 min. A total of 81 chemicals were assessed (Table S1B).
30
Table S1A. List of native and isotopic labeled standards spiked into human serum
samples. Acronyms: brominated flame retardants (BFR); chlorinated flame retardants
(CFR); organochlorine pesticides (OCP); polycyclic aromatic hydrocarbons (PAH);
oxygenated polycyclic aromatic hydrocarbons (oxy-PAH); nitrated polycyclic aromatic
hydrocarbons (nitro-PAH); polychlorinated biphenyls (PCB); polychlorinated
dibenzodioxins (PcDD); polychlorinated dibenzofurans (PcDF).
Chemical
Class
Detected Metabolite CAS-RN
Molecular
Formula
Concentration
(ppb)
BFR 2,4,4´-Tribromodiphenyl ether (BDE-28) 41318-75-6 C12H7Br3O 16
BFR 2,2´,4,4´-Tetrabromodiphenyl ether (BDE-47) 5436-43-1 C12H6Br4O 16
BFR 2,3´,4´,4-Tetrabromodiphenyl ether (BDE-66) 189084-61-5 C12H6Br4O 16
BFR 2,2´,3,4,4´-Pentabromodiphenyl ether (BDE-85) 182346-21-0 C12H5Br5O 16
BFR 2,2´,4,4´,5-Pentabromodiphenyl ether (BDE-99) 60348-60-9 C12H5Br5O 16
BFR 2,2´,4,4´,6-Pentabromodiphenyl ether (BDE-100) 189084-64-8 C12H5Br5O 16
BFR 2,2´,4,4´,5,5´-Hexabromodiphenyl ether (BDE-153) 68631-49-2 C12H4Br6O 16
BFR 2,2´,4,4´,5,6´-Hexabromodiphenyl ether (BDE-154) 207122-15-4 C12H4Br6O 16
CFR Dechlorane 603 13560-92-4 C17H8Cl12 16
OCP Pentachlorobenzene (PeCB) 608-93-5 C6HCl5 16
OCP alpha-Hexachlorocyclohexane (α-HCH) 319-84-6 C6H6Cl6 16
OCP Hexachlorobenzene (HCB) 118-74-1 C6Cl6 16
OCP beta-Hexachlorocyclohexane (β-HCH) 319-85-7 C6H6Cl6 16
OCP gamma-Hexachlorocyclohexane (γ-HCH) 58-89-9 C6H6Cl6 16
OCP delta-Hexachlorocyclohexane (δ-HCH) 608-73-1 C6H6Cl6 16
OCP Heptachlor 76-44-8 C10H5Cl7 16
OCP Aldrin 309-00-2 C12H8Cl6 16
OCP Heptachlor epoxide B 1024-57-3 C10H5Cl7O 16
OCP Heptachlor Epoxide A 28044-83-9 C10H5Cl7O 16
OCP β-chlordane(trans-Chlordane) 5103-74-2 C10H6Cl8 16
OCP O, P’-DDE 3424-82-6 C14H8Cl4 16
OCP α-chlordane(cis-Chlordane) 5103-71-9 C10H6Cl8 16
OCP trans-Nonachlor 39765-80-5 C10H5Cl9 16
OCP p,p’-DDE 72-55-9 C14H8Cl4 16
OCP Dieldrin 60-57-1 C12H8OCl6 16
OCP O, P’-DDD 53-19-0 C14H10Cl4 16
OCP Endrin 72-20-8 C12H8Ocl6 16
OCP p,p’-DDD 72-54-8 C14H10Cl4 16
OCP cis-Nonachlor 5103-73-1 C10H5Cl9 16
OCP o,p’-DDT 789-02-6 C14H9Cl5 16
OCP p,p’-DDT 50-29-3 C14H9Cl5 16
Oxy-PAH Phenalen-1-one 548-39-0 C13H8O 16
Oxy-PAH Anthracene-9,10-dione 483-35-2 C14H8O2 16
Oxy-PAH 4H-Cyclopenta[def]phenanthrene-4-one 5737-13-3 C15H8O 16
Oxy-PAH 2-Methylanthracene-9,10-dione 84-54-8 C15H10O2 16
Oxy-PAH 9H-Fluoren-9-one 486-25-9 C13H8O 16
Nitro-PAH 1-Nitro naphthalene 86-57-7 C10H7NO2 18
PCB 2-Chlorobiphenyl (PCB 1) 2051-60-7 C12H9Cl 16
PCB 4-Chlorobiphenyl (PCB 3) 2051-62-9 C12H9Cl 16
PCB 2,2’-Dichlorobiphenyl (PCB 4) 13029-08-8 C12H8Cl2 16
PCB 2,2’,6-Trichlorobiphenyl (PCB19) 38444-73-4 C12H7Cl3 16
PCB 4,4’-Dichlorobiphenyl (PCB 15) 2050-68-2 C12H8Cl2 16
PCB 2,2’,6,6’-Tetrachlorobiphenyl (PCB54 15968-05-5 C12H6Cl4 16
PCB 2,2’,4,6,6’-Pentachlorobiphenyl (PCB 104) 56558-16-8 C12H5Cl5 16
PCB 3,4,4’-Trichlorobiphenyl (PCB37) 38444-90-5 C12H7Cl3 16
PCB 2,2’,4,4’,6,6’-Hexachlorobiphenyl (PCB 155 33979-03-2 C12H4Cl6 16
PCB 3,3’,4,4’-Tetrachlorobiphenyl (PCB 77 32598-13-3 C12H8Cl4 16
PCB 3,4,4’,5-Tetrachlorobiphenyl (PCB 81) 70362-50-4 C12H6Cl4 16
PCB 2’,3,4,4’,5-Pentachlorobiphenyl (PCB 123) 65510-44-3 C12H5Cl5 16
PCB 2,3’,4,4’,5-Pentachlorobiphenyl (PCB 118) 31508-00-6 C12H5Cl5 16
PCB 2,3,4,4’,5-Pentachlorobiphenyl (PCB 114) 74472-37-0 C12H5Cl5 16
31
PCB 2,2’,3,4’,5,6,6’-Heptachlorobiphenyl (PCB 188 74487-85-7 C12H3Cl7 16
PCB 2,2’,3,4,4’,5’ -Hexachlorobiphenyl (PCB 138) 52712-04-6 C12H4Cl6 16
PCB 2,3,3’,4,4’-Pentachlorobiphenyl (PCB 105) 32598-14-4 C12H5Cl5 16
PCB 3,3’,4,4’,5-Pentachlorobiphenyl (PCB 126) 57465-28-8 C12H5Cl5 16
PCB 2,3’,4,4’,5,5’-Hexachlorobiphenyl (PCB 167) 52663-72-6 C12H4Cl6 16
PCB 2,2’,3,3’,5,5’,6,6’-Octachlorobiphenyl (PCB 202) 2136-99-4 C12H2Cl8 16
PCB 2,3,3’,4,4’,5-Hexachlorobiphenyl (PCB 156) 38380-08-4 C12H4Cl6 16
PCB 2,3,3’,4,4’,5’-Hexachlorobiphenyl (PCB 157) 69782-90-7 C12H4Cl6 16
PCB 3,3’,4,4’,5,5’-Hexachlorobiphenyl (PCB 169) 32774-16-6 C12H4Cl6 16
PCB 2,3,3’,4,4’,5,5’-Heptachlorobiphenyl (PCB 189) 39635-31-9 C12H3Cl7 16
PCB 2,2’,3,3’,4,5,5’,6,6’-Nonachlorobiphenyl (PCB 208 52663-77-1 C12HCl9 16
PCB 2,3,3’,4,4’,5,5’,6-Octachlorobiphenyl (PCB 205 74472-53-0 C12H2Cl8 16
PCB 2,2’,3,3’,4,4’,5,5’,6-Nonachlorobiphenyl (PCB 206) 40186-72-9 C12HCl9 16
PCB
2,2’,3,3’,4,4’,5,5’,6,6’-Decachlorobiphenyl (PCB
209)
2051-24-3 C12Cl10 16
PcDD 2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) 1746-01-6 C12H4Cl4O2 6.7
PcDD
1,2,3,7,8-Pentachlorodibenzo-p-dioxin
(PnCDD/PeCDD)
40321-76-4 C12H3Cl5O2 16.8
PcDD 1,2,3,6,7,8-Hexachlorodibenzo-p-dioxin (HxCDD) 57653-85-7 C12H2Cl6O2 16.8
PcDD
1,2,3,4,6,7,8-Heptachlorodibenzo-p-dioxin
(HpCDD)
35822-46-9 C12HCl7O2 16.8
PcDD 1,2,3,4,6,7,8,9-Octachlorodibenzo-p-dioxin (OCDD) 3268-87-9 C12Cl8O2 33.6
PcDF 2,3,7,8-Tetrachlorodibenzofuran (TCDF) 51207-31-9 C12H4Cl4O 6.7
PcDF
1,2,3,7,8-Pentachlorodibenzofuran
(PnCDF/PeCDF)
57117-41-6 C12H3Cl5O 16.8
PcDF 1,2,3,6,7,8-Hexachlorodibenzofuran (HxCDF) 57117-44-9 C12H2Cl6O 16.8
PcDF 1,2,3,4,6,7,8-Heptachlorodibenzofuran (HpCDF) 67562-39-4 C12HCl7O 16.8
PcDF 1,2,3,4,6,7,8,9-Octachlorodibenzofuran (OCDF) 39001-02-0 C12Cl8O 33.6
Phthalate Dimethyl phthalate 131-11-3 C10H10O4 28
Phthalate Diethyl phthalate 84-66-2 C12H14O4 28
Phthalate Diisobutyl phthalate 84-69-5 C16H22O4 28
Phthalate Di-n-butyl phthalate 84-74-2 C16H22O4 28
Phthalate Diisohexyl phthalate 84-63-9 C20H30O4 28
Phthalate Diethoxyethyl phthalate 605-54-9 C16H22O6 28
Phthalate Dipentyl phthalate 131-18-0 C18H26O4 28
Phthalate Dihexyl phthalate 84-75-3 C20H30O4 28
Phthalate Benzyl butyl phthalate 85-68-7 C19H20O4 28
Phthalate Di(2-butoxyethyl) phthalate 117-83-9 C20H30O6 28
Phthalate Dicyclohexyl phthalate 84-61-7 C20H26O4 28
Phthalate Di-2-ethylhexyl phthalate 117-81-7 C24H38O4 28
Phthalate Diphenyl phthalate 84-62-8 C20H14O4 28
Phthalate Di-n-octyl phthalate 117-84-0 C24H38O4 28
Phthalate Dinonyl phthalate 84-76-4 C26H42O4 28
Labeled PAH Acenaphthene-d10 -- C12D10 28
Labeled PAH Fluorene-d10 -- C13D10 28
Labeled PAH Phenanthrene-d10 -- C14D10 28
Labeled PAH Anthracene-d10 -- C14D10 28
Labeled PAH Fluoranthene-d10 -- C16D10 28
Labeled PAH Pyrene-d10 -- C16D10 28
Labeled PAH Benz[a]anthracene-d12 -- C18D12 28
Labeled PAH Chrysene-d12 -- C18D12 28
Labeled PAH Benzo[b]fluoranthene-d12 -- C20D12 28
Labeled PAH Benzo[k]fluoranthene-d12 -- C20D12 28
Labeled PAH Benzo[a]pyrene-d12 -- C20D12 28
Labeled PAH Indeno[1,2,3-cd]pyrene-d12 -- C22D12 28
Labeled PAH Dibenzo[a,h]anthracene-d14 -- C22D14 28
Labeled PAH Benzo[g,h,i]perylene-d12 -- C22D12 28
Labeled PAH Naphthalene-d8 -- C10D8 28
Labeled PAH Acenaphthylene-d8 -- C12D8 28
Labeled PCB 2,4,4’-Trichlorobiphenyl-13C12 (PCB 28) -- C12H7Cl3 10
Labeled PCB 2,2’,5,5’-Tetrachlorobiphenyl-13C12 (PCB 52) -- 13C12H6Cl4 10
Labeled PCB 2,2’,4,5,5’-Pentachlorobiphenyl-13C12 (PCB 101) -- 13C12H5Cl5 10
32
Labeled PCB
2,2’,3,4,4’,5’ -Hexachlorobiphenyl-13C12 (PCB
138)
-- 13C12H4Cl6 10
Labeled PCB 2,2’,4,4’,5,5’-Hexachlorobiphenyl-13C12 (PCB 153) -- 13C12H4Cl6 10
Labeled PCB
2,2’,3,4,4’,5,5’-Heptachlorobiphenyl-13C12 (PCB
180)
-- 13C12H3Cl7 10
33
Table S1B. List of native and isotopic labeled standards used for targeted analysis
(Level 1 identifications) in air samples. Acronyms: Brominated flame retardants (BFR);
hydrocarbon (HC); organochlorine pesticides (OCP); organophosphates (OPE);
polycyclic aromatic hydrocarbons (PAH); polychlorinated biphenyls (PCB); VOC: volatile
organic compounds (VOC).
Chemical Class Detected Metabolite CAS-RN Molecular Formula
Alkaloid Nicotine 54-11-5 C10H14N2
Benzodioxole Fludioxonil 131341-86-1 C12H6F2N2O2
Benzopyran Delta-9-tetrahydrocannabinol 1972-08-3 C21H30O2
BFR 2,4,4'-Tribromodiphenyl ether (BDE-28) 41318-75-6 C12H7Br3O
BFR 2,2',4,4'-Tetrabromodiphenyl ether (BDE-66) 5436-43-1 C12H6Br4O
BFR 2,2',4,4',5-Pentabromodiphenyl ether (BDE-99) 60348-60-9 C12H5Br5O
BFR 2,2',4,4',6-Pentabromodiphenyl ether (BDE-100) 189084-64-8 C12H5Br5O
BFR 2,2',4,4',5,5'-Hexabromodiphenyl ether (BDE-153) 68631-49-2 C12H4Br6O
BFR 2,2',4,4',5,6'-Hexabromodiphenyl ether (BDE-154) 207122-15-4 C12H4Br6O
BFR 2-Ethylhexyl 2,3,4,5-tetrabromobenzoate 183658-27-7 C15H18Br4O2
Chlorinated HC 1,2,4-Trichlorobenzene 120-82-1 C6H3Cl3
Chlorinated HC Hexachlorobutadiene (HCBD) 87-68-3 C4Cl6
Chlorinated HC Hexachlorocyclopentadiene 77-47-4 C5Cl6
Chlorinated HC Hexachloroethane 67-72-1 C2Cl6
Haloethers Bis(2-chloro-1-methylethyl) ether (BCEE) 108-60-1 C6H12Cl2O
Haloethers Bis(2-chloroethyl)ether 111-44-4 C4H8Cl2O
Haloethers Carfentrazone-ethyl 128639-02-1 C15H14Cl2F3N3O3
Nitroaromatic 2,4-Dinitrotoluene 121-14-2 C7H6N2O4
Nitroaromatic 2,6-Dinitrotoluene 606-20-2 C7H6N2O4
Nitroaromatic 4-Chloroaniline 106-47-8 C6H6ClN
Nitroaromatic 4-Nitroaniline 100-01-6 C6H6N2O2
Nitroaromatic Isophorone 78-59-1 C9H14O
Nitrosoamine Nitrobenzene 98-95-3 C6H5NO2
Nitrosoamine N-Nitrosodi-n-propylamine 621-64-7 C6H14N2O
Nitrosoamine N-Nitrosodiphenylamine 86-30-6 C12H10N2O
OCP alpha-Hexachlorocyclohexane (α-HCH) 319-84-6 C6H6Cl6
OCP Chlorothalonil 1897-45-6 C8Cl4N2
OCP Dieldrin 60-57-1 C12H8Cl6O
OCP Endosulfan I 959-98-8 C9H6Cl6O3S
OCP Endrin 72-20-8 C12H8Cl6O
OCP gamma-Hexachlorocyclohexane (γ-HCH) 58-89-9 C6H6Cl6
OCP Hexachlorobenzene 118-74-1 C6Cl6
OCP Methoxychlor 72-43-5 C16H15Cl3O2
OCP p,p'-DDD 72-54-8 C14H10Cl4
OCP p,p'-DDT 50-29-3 C14H9Cl5
OCP Tetrachloro-m-xylene 877-09-8 C8H6Cl4
OPE Triphenyl phosphate (TPHP) 115-86-6 C18H15O4P
OPE Tris(1-chloro-2-propyl) phosphate (TCPP) 13674-84-5 C9H18Cl3O4P
PAH 1-Bromo-4-phenoxybenzene 101-55-3 C12H9BrO
PAH 1-Chloro-4-phenoxybenzene 7005-72-3 C12H9ClO
PAH 2-Chloronaphthalene 91-58-7 C10H7Cl
PAH 2-Methylnaphthalene 91-57-6 C11H10
PAH Acenaphthene 83-32-9 C12H10
PAH Acenaphthylene 208-96-8 C12H8
PAH Anthracene 120-12-7 C14H10
PAH Benz[a]anthracene 56-55-3 C18H12
PAH Benzo[a]pyrene 50-32-8 C20H12
PAH Benzo[b]fluoranthene 205-99-2 C20H12
PAH Benzo[ghi]perylene 191-24-2 C22H12
PAH Benzo[k]fluoranthene 207-08-9 C20H12
PAH Chrysene 218-01-9 C18H12
PAH Dibenz[a,h]anthracene 53-70-3 C22H14
PAH Dibenzofuran 132-64-9 C12H8O
34
PAH Fluoranthene 206-44-0 C16H10
PAH Fluorene 86-73-7 C13H10
PAH Indeno[1,2,3-cd]pyrene 193-39-5 C22H12
PAH Naphthalene 91-20-3 C10H8
PAH Phenanthrene 85-01-8 C14H10
PAH Pyrene 129-00-0 C16H10
PCB 2,2',3,3',4,4',5,5',6,6'-Decachlorobiphenyl (PCB 209) 2051-24-3 C12Cl10
Phthalate Bis(2-ethylhexyl) phthalate 117-81-7 C24H38O4
Phthalate Butylbenzyl phthalate 85-68-7 C19H20O4
Phthalate Diethyl phthalate 84-66-2 C12H14O4
Phthalate Dimethyl phthalate 131-11-3 C10H10O4
Phthalate Di-n-butyl phthalate 84-74-2 C16H22O4
Phthalate Di-n-octyl phthalate 117-84-0 C24H38O4
Pyrethroid Piperonyl butoxide 51-03-6 C19H30O5
VOC 1,2-dichlorobenzene 95-50-1 C6H4Cl2
VOC 1,3-dichlorobenzene 541-73-1 C6H4Cl2
VOC 1,4-dichlorobenzene 106-46-7 C6H4Cl2
35
Figure S1: Molecular ions can be used to reduce the false positive rate (albeit
increasing the false negative rate) as shown using a GC-HRMS dataset of air
samples.
A. Filtering by retention index, reverse
search index, Reverse high-resolution
mass filter criteria
B. Additional requirement for molecular
ion

More Related Content

Similar to An Actionable Annotation Scoring Framework For Gas Chromatography - High Resolution Mass Spectrometry (GCHRMS)

Toxicological Screening and Quantitation Using Liquid Chromatography/Time-of-...
Toxicological Screening and Quantitation Using Liquid Chromatography/Time-of-...Toxicological Screening and Quantitation Using Liquid Chromatography/Time-of-...
Toxicological Screening and Quantitation Using Liquid Chromatography/Time-of-...Annex Publishers
 
00047 Jc Silva 2005 Anal Chem V77p2187
00047 Jc Silva 2005 Anal Chem V77p218700047 Jc Silva 2005 Anal Chem V77p2187
00047 Jc Silva 2005 Anal Chem V77p2187jcruzsilva
 
Molecular epidemiology.pptx
Molecular epidemiology.pptxMolecular epidemiology.pptx
Molecular epidemiology.pptxSumanLohani2
 
Open Access ToxCast/Tox21, Toxicological Priority Index (ToxPi) and Integrate...
Open Access ToxCast/Tox21, Toxicological Priority Index (ToxPi) and Integrate...Open Access ToxCast/Tox21, Toxicological Priority Index (ToxPi) and Integrate...
Open Access ToxCast/Tox21, Toxicological Priority Index (ToxPi) and Integrate...Premier Publishers
 
Poster on systems pharmacology of the cholesterol biosynthesis pathway
Poster on systems pharmacology of the cholesterol biosynthesis pathwayPoster on systems pharmacology of the cholesterol biosynthesis pathway
Poster on systems pharmacology of the cholesterol biosynthesis pathwayGuide to PHARMACOLOGY
 
Chromosomal damage risk assessment to benzene exposure
Chromosomal damage risk assessment to benzene exposureChromosomal damage risk assessment to benzene exposure
Chromosomal damage risk assessment to benzene exposureAlexander Decker
 
11.chromosomal damage risk assessment to benzene exposure
11.chromosomal damage risk assessment to benzene exposure11.chromosomal damage risk assessment to benzene exposure
11.chromosomal damage risk assessment to benzene exposureAlexander Decker
 
Nc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational ToxicologyNc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational ToxicologySean Ekins
 
A comparative risk assessment of burden of disease and injury attributable to...
A comparative risk assessment of burden of disease and injury attributable to...A comparative risk assessment of burden of disease and injury attributable to...
A comparative risk assessment of burden of disease and injury attributable to...Chuco Diaz
 
Talk at Yale University April 26th 2011: Applying Computational Models for To...
Talk at Yale University April 26th 2011: Applying Computational Modelsfor To...Talk at Yale University April 26th 2011: Applying Computational Modelsfor To...
Talk at Yale University April 26th 2011: Applying Computational Models for To...Sean Ekins
 
Biomedicine & Pharmacotherapy
Biomedicine & PharmacotherapyBiomedicine & Pharmacotherapy
Biomedicine & PharmacotherapyTrustlife
 
EPA Guidelines for Carcinogen Risk Assessment.pptx
EPA Guidelines for Carcinogen Risk Assessment.pptxEPA Guidelines for Carcinogen Risk Assessment.pptx
EPA Guidelines for Carcinogen Risk Assessment.pptxMegh Vithalkar
 
Isotope ratio mass spectrometry minireview
Isotope ratio mass spectrometry   minireviewIsotope ratio mass spectrometry   minireview
Isotope ratio mass spectrometry minireviewMahbubul Hassan
 
Ich guidelines
Ich guidelinesIch guidelines
Ich guidelinesR Nadaf
 

Similar to An Actionable Annotation Scoring Framework For Gas Chromatography - High Resolution Mass Spectrometry (GCHRMS) (20)

Toxicological Screening and Quantitation Using Liquid Chromatography/Time-of-...
Toxicological Screening and Quantitation Using Liquid Chromatography/Time-of-...Toxicological Screening and Quantitation Using Liquid Chromatography/Time-of-...
Toxicological Screening and Quantitation Using Liquid Chromatography/Time-of-...
 
genotoxic_impurities-Gowtham
genotoxic_impurities-Gowthamgenotoxic_impurities-Gowtham
genotoxic_impurities-Gowtham
 
00047 Jc Silva 2005 Anal Chem V77p2187
00047 Jc Silva 2005 Anal Chem V77p218700047 Jc Silva 2005 Anal Chem V77p2187
00047 Jc Silva 2005 Anal Chem V77p2187
 
Molecular epidemiology.pptx
Molecular epidemiology.pptxMolecular epidemiology.pptx
Molecular epidemiology.pptx
 
The Future of Computational Models for Predicting Human Toxicities
The Future of Computational Models for Predicting Human ToxicitiesThe Future of Computational Models for Predicting Human Toxicities
The Future of Computational Models for Predicting Human Toxicities
 
Open Access ToxCast/Tox21, Toxicological Priority Index (ToxPi) and Integrate...
Open Access ToxCast/Tox21, Toxicological Priority Index (ToxPi) and Integrate...Open Access ToxCast/Tox21, Toxicological Priority Index (ToxPi) and Integrate...
Open Access ToxCast/Tox21, Toxicological Priority Index (ToxPi) and Integrate...
 
Poster on systems pharmacology of the cholesterol biosynthesis pathway
Poster on systems pharmacology of the cholesterol biosynthesis pathwayPoster on systems pharmacology of the cholesterol biosynthesis pathway
Poster on systems pharmacology of the cholesterol biosynthesis pathway
 
Chromosomal damage risk assessment to benzene exposure
Chromosomal damage risk assessment to benzene exposureChromosomal damage risk assessment to benzene exposure
Chromosomal damage risk assessment to benzene exposure
 
11.chromosomal damage risk assessment to benzene exposure
11.chromosomal damage risk assessment to benzene exposure11.chromosomal damage risk assessment to benzene exposure
11.chromosomal damage risk assessment to benzene exposure
 
Nc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational ToxicologyNc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational Toxicology
 
Carcinogenicity testing
Carcinogenicity testingCarcinogenicity testing
Carcinogenicity testing
 
A comparative risk assessment of burden of disease and injury attributable to...
A comparative risk assessment of burden of disease and injury attributable to...A comparative risk assessment of burden of disease and injury attributable to...
A comparative risk assessment of burden of disease and injury attributable to...
 
Eular recomendasi ra
Eular recomendasi   raEular recomendasi   ra
Eular recomendasi ra
 
PTSM 2.pptx
PTSM 2.pptxPTSM 2.pptx
PTSM 2.pptx
 
Talk at Yale University April 26th 2011: Applying Computational Models for To...
Talk at Yale University April 26th 2011: Applying Computational Modelsfor To...Talk at Yale University April 26th 2011: Applying Computational Modelsfor To...
Talk at Yale University April 26th 2011: Applying Computational Models for To...
 
Biomedicine & Pharmacotherapy
Biomedicine & PharmacotherapyBiomedicine & Pharmacotherapy
Biomedicine & Pharmacotherapy
 
EPA Guidelines for Carcinogen Risk Assessment.pptx
EPA Guidelines for Carcinogen Risk Assessment.pptxEPA Guidelines for Carcinogen Risk Assessment.pptx
EPA Guidelines for Carcinogen Risk Assessment.pptx
 
Aaa
AaaAaa
Aaa
 
Isotope ratio mass spectrometry minireview
Isotope ratio mass spectrometry   minireviewIsotope ratio mass spectrometry   minireview
Isotope ratio mass spectrometry minireview
 
Ich guidelines
Ich guidelinesIch guidelines
Ich guidelines
 

More from Nicole Heredia

Buy Essays Online From Successful Essay - Essay
Buy Essays Online From Successful Essay - EssayBuy Essays Online From Successful Essay - Essay
Buy Essays Online From Successful Essay - EssayNicole Heredia
 
PPT - Top Essay Writing Company In Australia, UK U
PPT - Top Essay Writing Company In Australia, UK UPPT - Top Essay Writing Company In Australia, UK U
PPT - Top Essay Writing Company In Australia, UK UNicole Heredia
 
Informative Essay Examples, Samples, Outline To Write
Informative Essay Examples, Samples, Outline To WriteInformative Essay Examples, Samples, Outline To Write
Informative Essay Examples, Samples, Outline To WriteNicole Heredia
 
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7T
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7TRANGER SHOWCASE INFORMATION - Mrs. RatliffS 7T
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7TNicole Heredia
 
Writing Objectives Part 1 - YouTube
Writing Objectives Part 1 - YouTubeWriting Objectives Part 1 - YouTube
Writing Objectives Part 1 - YouTubeNicole Heredia
 
How To Improve Your Essay Writing Quickly A Step-By
How To Improve Your Essay Writing Quickly A Step-ByHow To Improve Your Essay Writing Quickly A Step-By
How To Improve Your Essay Writing Quickly A Step-ByNicole Heredia
 
College Essay Help - Get Online College Essay Writi
College Essay Help - Get Online College Essay WritiCollege Essay Help - Get Online College Essay Writi
College Essay Help - Get Online College Essay WritiNicole Heredia
 
Typewriter With Blank Sheet Of Paper Arts Entertainment Stock
Typewriter With Blank Sheet Of Paper Arts Entertainment StockTypewriter With Blank Sheet Of Paper Arts Entertainment Stock
Typewriter With Blank Sheet Of Paper Arts Entertainment StockNicole Heredia
 
How To Write An Essay Introduction Paragraph
How To Write An Essay Introduction ParagraphHow To Write An Essay Introduction Paragraph
How To Write An Essay Introduction ParagraphNicole Heredia
 
Descriptive Essays GCE Guide - How To Write A Des
Descriptive Essays GCE Guide - How To Write A DesDescriptive Essays GCE Guide - How To Write A Des
Descriptive Essays GCE Guide - How To Write A DesNicole Heredia
 
Fall Scarecrow Stationary For Kids Fall Scarecrows, Fre
Fall Scarecrow Stationary For Kids Fall Scarecrows, FreFall Scarecrow Stationary For Kids Fall Scarecrows, Fre
Fall Scarecrow Stationary For Kids Fall Scarecrows, FreNicole Heredia
 
Apa Research Paper Order
Apa Research Paper OrderApa Research Paper Order
Apa Research Paper OrderNicole Heredia
 
Tips How To Write A Paper Presentation By Presentation Paper
Tips How To Write A Paper Presentation By Presentation PaperTips How To Write A Paper Presentation By Presentation Paper
Tips How To Write A Paper Presentation By Presentation PaperNicole Heredia
 
Dr. MLKJ Reading Passage - Read, Answer Qs, Underlin
Dr. MLKJ Reading Passage - Read, Answer Qs, UnderlinDr. MLKJ Reading Passage - Read, Answer Qs, Underlin
Dr. MLKJ Reading Passage - Read, Answer Qs, UnderlinNicole Heredia
 
Thesis Statement For Character Analysis Essay Example - Thesi
Thesis Statement For Character Analysis Essay Example - ThesiThesis Statement For Character Analysis Essay Example - Thesi
Thesis Statement For Character Analysis Essay Example - ThesiNicole Heredia
 
Writing Paper - Differentiated Writing Paper With Rubric
Writing Paper - Differentiated Writing Paper With RubricWriting Paper - Differentiated Writing Paper With Rubric
Writing Paper - Differentiated Writing Paper With RubricNicole Heredia
 
Sample Scholarship Announcement Maker Sc
Sample Scholarship Announcement Maker ScSample Scholarship Announcement Maker Sc
Sample Scholarship Announcement Maker ScNicole Heredia
 
Sample Essay On Human Resource Management Pra
Sample Essay On Human Resource Management PraSample Essay On Human Resource Management Pra
Sample Essay On Human Resource Management PraNicole Heredia
 
Lap Desk, 10 Writing Board, Best American Wood Leather Littleton
Lap Desk, 10 Writing Board, Best American Wood Leather LittletonLap Desk, 10 Writing Board, Best American Wood Leather Littleton
Lap Desk, 10 Writing Board, Best American Wood Leather LittletonNicole Heredia
 
They Say I Essay Templates For College
They Say I Essay Templates For CollegeThey Say I Essay Templates For College
They Say I Essay Templates For CollegeNicole Heredia
 

More from Nicole Heredia (20)

Buy Essays Online From Successful Essay - Essay
Buy Essays Online From Successful Essay - EssayBuy Essays Online From Successful Essay - Essay
Buy Essays Online From Successful Essay - Essay
 
PPT - Top Essay Writing Company In Australia, UK U
PPT - Top Essay Writing Company In Australia, UK UPPT - Top Essay Writing Company In Australia, UK U
PPT - Top Essay Writing Company In Australia, UK U
 
Informative Essay Examples, Samples, Outline To Write
Informative Essay Examples, Samples, Outline To WriteInformative Essay Examples, Samples, Outline To Write
Informative Essay Examples, Samples, Outline To Write
 
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7T
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7TRANGER SHOWCASE INFORMATION - Mrs. RatliffS 7T
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7T
 
Writing Objectives Part 1 - YouTube
Writing Objectives Part 1 - YouTubeWriting Objectives Part 1 - YouTube
Writing Objectives Part 1 - YouTube
 
How To Improve Your Essay Writing Quickly A Step-By
How To Improve Your Essay Writing Quickly A Step-ByHow To Improve Your Essay Writing Quickly A Step-By
How To Improve Your Essay Writing Quickly A Step-By
 
College Essay Help - Get Online College Essay Writi
College Essay Help - Get Online College Essay WritiCollege Essay Help - Get Online College Essay Writi
College Essay Help - Get Online College Essay Writi
 
Typewriter With Blank Sheet Of Paper Arts Entertainment Stock
Typewriter With Blank Sheet Of Paper Arts Entertainment StockTypewriter With Blank Sheet Of Paper Arts Entertainment Stock
Typewriter With Blank Sheet Of Paper Arts Entertainment Stock
 
How To Write An Essay Introduction Paragraph
How To Write An Essay Introduction ParagraphHow To Write An Essay Introduction Paragraph
How To Write An Essay Introduction Paragraph
 
Descriptive Essays GCE Guide - How To Write A Des
Descriptive Essays GCE Guide - How To Write A DesDescriptive Essays GCE Guide - How To Write A Des
Descriptive Essays GCE Guide - How To Write A Des
 
Fall Scarecrow Stationary For Kids Fall Scarecrows, Fre
Fall Scarecrow Stationary For Kids Fall Scarecrows, FreFall Scarecrow Stationary For Kids Fall Scarecrows, Fre
Fall Scarecrow Stationary For Kids Fall Scarecrows, Fre
 
Apa Research Paper Order
Apa Research Paper OrderApa Research Paper Order
Apa Research Paper Order
 
Tips How To Write A Paper Presentation By Presentation Paper
Tips How To Write A Paper Presentation By Presentation PaperTips How To Write A Paper Presentation By Presentation Paper
Tips How To Write A Paper Presentation By Presentation Paper
 
Dr. MLKJ Reading Passage - Read, Answer Qs, Underlin
Dr. MLKJ Reading Passage - Read, Answer Qs, UnderlinDr. MLKJ Reading Passage - Read, Answer Qs, Underlin
Dr. MLKJ Reading Passage - Read, Answer Qs, Underlin
 
Thesis Statement For Character Analysis Essay Example - Thesi
Thesis Statement For Character Analysis Essay Example - ThesiThesis Statement For Character Analysis Essay Example - Thesi
Thesis Statement For Character Analysis Essay Example - Thesi
 
Writing Paper - Differentiated Writing Paper With Rubric
Writing Paper - Differentiated Writing Paper With RubricWriting Paper - Differentiated Writing Paper With Rubric
Writing Paper - Differentiated Writing Paper With Rubric
 
Sample Scholarship Announcement Maker Sc
Sample Scholarship Announcement Maker ScSample Scholarship Announcement Maker Sc
Sample Scholarship Announcement Maker Sc
 
Sample Essay On Human Resource Management Pra
Sample Essay On Human Resource Management PraSample Essay On Human Resource Management Pra
Sample Essay On Human Resource Management Pra
 
Lap Desk, 10 Writing Board, Best American Wood Leather Littleton
Lap Desk, 10 Writing Board, Best American Wood Leather LittletonLap Desk, 10 Writing Board, Best American Wood Leather Littleton
Lap Desk, 10 Writing Board, Best American Wood Leather Littleton
 
They Say I Essay Templates For College
They Say I Essay Templates For CollegeThey Say I Essay Templates For College
They Say I Essay Templates For College
 

Recently uploaded

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 

Recently uploaded (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 

An Actionable Annotation Scoring Framework For Gas Chromatography - High Resolution Mass Spectrometry (GCHRMS)

  • 1. 1 An Actionable Annotation Scoring Framework for Gas Chromatography - High Resolution Mass Spectrometry (GC-HRMS) Jeremy P. Koelmel1, Hongyu Xie 2, Elliott J. Price3, Elizabeth Z. Lin1, Katherine E. Manz4, Paul Stelben1, Matthew Paige1, Stefano Papazian2, Joseph Okeme1, Dean P. Jones5, Dinesh Barupal6, John Bowden11,12, Pawel Rostkowski7, Kurt D. Pennell4, Vladimir Nikiforov8, Thanh Wang9, Xin Hu5, Yunjia Lai10, Gary W. Miller10, Douglas I. Walker6*, Jonathan W. Martin2, Krystal J. Godri Pollitt1* 1 Department of Environmental Health Science, Yale School of Public Health, New Haven, CT, 06520, United States 2 Department of Environmental Science, Science for Life Laboratory, Stockholm University, Stockholm 10691, Sweden 3 RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic 4 School of Engineering, Brown University, Providence, RI 02912, United States 5 Department of Medicine, School of Medicine, Emory University, Atlanta 30322, United States 6 Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, United States 7 NILU- Norwegian Institute for Air Research, 2007 Kjeller, Norway 8 NILU- Norwegian Institute for Air Research, Framsenteret, 9007 Tromsø, Norway 9 MTM Research Centre, Örebro University, SE-701 82, Örebro, Sweden 10 Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY 10032, United States 11 Center for Environmental and Human Toxicology & Department of Physiological Sciences, University of Florida, Gainesville, FL, 32603, United States 12 Department of Chemistry, University of Florida, Gainesville, FL, 32603 United States *Corresponding author Krystal J. Godri Pollitt - krystal.pollitt@yale.edu Douglas I. Walker - douglas.walker@mssm.edu
  • 2. 2 Abstract Omics-based technologies have enabled comprehensive characterization of our exposure to environmental chemicals (exposome) as well as assessment of the corresponding biological responses at the molecular level (e.g., metabolome, lipidome, proteome, and genome). By systematically measuring personal exposures and linking these stimuli to biological perturbations, researchers can determine specific chemical exposures of concern, identify mechanisms and biomarkers of toxicity, and design interventions to reduce exposures. However, further advancement of metabolomics and exposomics approaches is limited by a lack of standardization and approaches for assigning confidence to chemical annotations. While a wealth of chemical data is generated by gas chromatography high-resolution mass spectrometry (GC-HRMS), incorporating GC-HRMS data into an annotation framework, and communicating confidence in these assignments is challenging. It is essential to be able to compare chemical data for exposomics studies across platforms to build upon prior knowledge and advance the technology. Here we discuss the major pieces of evidence provided by common GC-HRMS workflows, including retention time and retention index, electron ionization, positive chemical ionization, electron capture negative ionization, and atmospheric pressure chemical ionization spectral matching, molecular ion, accurate mass, isotopic patterns, database occurrence, and occurrence in blanks. We then provide a qualitative framework for incorporating these various lines of evidence for communicating confidence in GC-HRMS data by adapting the Schymanski scoring schema developed for reporting confidence levels by liquid chromatography high- resolution mass spectrometry (LC-HRMS). Validation of our framework is presented using standards spiked in plasma, and confident annotations in outdoor and indoor air samples, showing a false positive rate of 12% for suspect screening for chemical identifications assigned as Level 2 (when structurally similar isomers are not considered false positives). This framework is easily adaptable to various workflows and provides a concise means to communicate confidence in annotations. Further validation, refinements, and adoption of this framework will ideally lead to harmonization across the field, helping to improve the quality and interpretability of compound annotations obtained in GC-HRMS.
  • 3. 3 1. INTRODUCTION Environmental risk factors are major determinants of health, with current estimates suggesting that environmental exposures are responsible for about 23% of the global disease burden (Prüss-Üstün et al., 2016) and chemical exposures responsible for one in six deaths (Landrigan et al., 2018). However, research has historically prioritized the characterization of genetic risk factors (Prüss-Üstün et al., 2016; Rappaport, 2016; Wheelock and Rappaport, 2020; Wild, 2012, 2005). Characterization of environmental factors primarily relies on self-reporting or community-level exposures (Rappaport, 2016). Identifying and preventing/reducing harmful environmental exposures could lead to significant reductions in global disease. Assessment of the chemical exposome, which includes the comprehensive identification of endogenous and exogenous compounds at the individual level, is essential to better understanding the link between the environment and human health (Vermeulen et al., 2020). Hundreds of thousands of chemicals are used in commercial and industrial applications. Once released to the environment, these chemicals often transform, increasing the number of potential chemical exposures to millions of compounds. Traditional methods for high-confidence annotation and quantitation require synthesized chemical standards for each analyte of interest, semi-manual peak integration, calibration curves, and often specific chromatographic methods. Therefore based on cost and feasibility, the diversity of chemicals existing in the exposome presents a monumental challenge to traditional analytical methods, which typically interrogate less than 100 compounds (Doherty et al., 2021; Mazur et al., 2021). To overcome this limitation, non-targeted analytical methods that employ high resolution mass spectrometry (HRMS) have been developed to screen and identify thousands of chemicals that may be present environmental and biological samples. These methods use HRMS evidence indicative of universal chemical properties to annotate structure and estimate chemical abundances, and hence these methods are not solely focused on known or commonly expected chemicals. Because non-targeted results are more tentative then targeted results, a key need for such exposome research is to establish and apply transparency in instrumentation, data-processing, and reporting criteria when communicating research findings. Confirming the molecular structure of identified compounds using HRMS data remains a challenge. Due to limitations in scope and spectral quality of chemical databases and the difficulty of capturing mass fragmentation patterns for low abundance peaks, the confidence of chemical annotations can vary widely. Therefore, reporting may be limited to chemical formula, or even simply a set of retention time and mass spectral measurements, rather than an exact chemical structure. Chemical features that are not fully characterized can provide novel insights in exposomics research and links between environment, biological effects, and disease. For example, chemical formulas assigned to accurate mass features can aid in the characterization of unexpected and previously unknown chemical exposures that can be prioritized for study in other populations and sample types to understand their distribution in the environment. The advantage of using this approach for identifying environmental exposures is highlighted by the discovery of polychlorinated biphenyls (PCBs), which are one of the most widely studied persistent
  • 4. 4 organic pollutants. Their initial discovery in environmental and biological samples was based upon detection of unknown peaks in multiple sample types, and it was their measurement across multiple samples and recognition of their accumulation in the food chain that led to the eventual identification of PCBs as highly persistent pollutants (Jensen, 1972). More recent examples of the use of non-targeted mass spectrometry, include the discovery of 6PPD-quinone as a chemical responsible for fish die offs (Stokstad, 2020), and a brominated pesticide transformation products as responsible for eagle die offs (US EPA, 2021). When reporting annotations, it is essential to report measurements within a consistent framework that allows interpretation of the uncertainty in compound annotation. Defining a framework for reporting the confidence of chemical annotations is especially important given the potential for high false positive rates and over-reporting in non-targeted and suspect screening HRMS studies across various classes of compounds (Koelmel et al., 2017; Scheubert et al., 2017; Schymanski et al., 2014; von Mering et al., 2002). One of the first examples of a confidence scoring framework for non-targeted environmental chemical profiling was proposed by Schymanski et al. for liquid chromatography high- resolution tandem mass spectrometry (LC-HRMS/MS) (Schrimpe-Rutledge et al., 2016; Schymanski et al., 2012). Gas Chromatography high-resolution mass spectrometry (GC- HRMS) provides complementary coverage of exposome chemicals that are of lower solubility, higher volatility, or are not ionized by LC-HRMS approaches (Zhang et al., 2021) (Table S1). The applicability of Schymanski schema was reported to GC-HRMS to be limited due to differences in ionization, data acquisition, and data-processing (Rostkowski et al., 2019). Recently, a scoring framework for GC-HRMS using electron ionization (EI) was proposed (Travis et al., 2021) categorizing confidence of feature identifications into four levels. Levels ranged from the most confident annotations in Level 1 with confirmed identifications to the weakest in Level 4 with features described only by exact mass. While this framework provides an important first step in defining scoring metrics for GC-HRMS, the parameters proposed for assigning confidence levels are potentially subject to high false positive and false negative rates. For example, Level 2 assignment requires isotopic distribution while Level 3 is assigned for exact mass matches to fragment ions. The necessity of an isotopic pattern match requires the molecular ion, which is often not observed. Assignment of Level 3 is further challenged by the requirement of an isotopic pattern and exact mass matches which require exact mass libraries with known formulas, which are not currently available for many compounds. The schema does not incorporate retention index, which we demonstrate to be an essential piece of information for reducing false positives. Thus, this approach may be suitable for databases with establish, high mass accuracy fragmentation spectra, but may be limited when using larger databases and in silico approaches. Here, we present a new annotation confidence scoring framework specific for non- targeted GC-HRMS that can be used to determine and communicate confidence of chemical assignments made using several strategies. The framework draws on input from a community of GC-HRMS users and considers the breadth of chemical evidence that
  • 5. 5 can be obtained from these powerful instruments. The current work focuses on environmental contaminants. While this framework can be applied to certain biological chemicals, the use of derivatization (Halket et al., 2005) and class-based mass or fragmentation patterns may be required. 2. LEVERAGING EVIDENCE FROM GC-HRMS TO REDUCE FALSE POSITIVES AND NEGATIVES GC-HRMS provides data-rich evidence that can be used in identifying structural annotation of detected features, including various modes of ionization and fragmentation, accurate mass, retention information, and isotopic pattern. It is important to understand the unique advantages and limitations for each layer of evidence to decide the necessary pieces of information for accurate assignment of chemicals (Table S1). 2.1. Ionization The most common ionization method used in GC-HRMS is electron ionization (EI). Any molecule introduced to the source in this ionization mode is theoretically ionized with equal efficiency. In practice, however, on-column degradation, inlet/liner discrimination, and other factors (Barwick, 1999), total summed ion signal can still be a factor of chemical structure. EI is a ‘hard’ ionization method, meaning that each molecule generates a spectrum with a high degree of fragmentation in which only traces of the intact molecular ion are usually detected in most cases. In a GC-EI-MS analysis, all the reported peaks have an associated fragmentation spectrum which is reconstructed by a deconvolution algorithm. Having fragmentation from all ions makes it possible to generate annotation levels for every sample, therefore improving confidence of chemical assignments. Generation of fragmentation spectra for all ionized compounds enables MS1 data can be used to confirm identified chemicals, eliminating the need for precursor selection and fragmentation by tandem mass spectrometry (MS/MS). To establish reproduceable EI spectra libraries, ionization voltages of 70 eV are commonly used and combined with libraries containing retention index. Retention index and fragmentation are often similar irrespective of chromatographic method and instrument (within the same instrument type, e.g., quadrupole time-of-flight MS (Q-TOF-MS)), and therefore have been successfully shared across labs. (Mallett, 2014). GC-EI-MS spectra provide key evidence for compound annotation, yet several challenges may still lead to false positives. While the diverse fragments observed in EI spectra allow for better structural characterization, the spread of signal across fragment ions can reduces overall sensitivity, often with no molecular ion signal present. Without observation of the molecular ion, many compounds can give similar spectra, even when they are not isomers. Because there is rarely a high signal from the molecular ion, ion selection followed by fragmentation cannot be applied in GC-EI-MS. Therefore, GC spectra must be deconvoluted across chromatographic time, often leading to the inclusion of artifact peaks due to high background or co-eluting compounds. Deconvolution can be a significant issue in increasing false negatives and false positives.
  • 6. 6 Alternative ionization strategies can be used with GC-HRMS to improve detection of the molecular ion, although these often lack the robustness, sensitivity, and reproducibility of EI. These techniques include positive chemical ionization (PCI), electron capture negative ionization (ECNI), and atmospheric pressure chemical ionization (APCI). The advantages and disadvantages of these techniques are described in the supplemental information. 2.2. Accurate Mass Continual advances in high resolution Orbitrap and Q-TOF MS instrumentation have enabled accurate measurement of mass to charge ratios ranging from sub-ppm to 30 ppm mass accuracy windows and resolutions up to 240,000 to date. Accurate mass (ability to measure the correct mass) and high resolution (ability to distinguish close masses) provides additional evidence for compound annotation such as fine isotopic patterns. Unequivocal formula prediction may be possible for low mass fragments, especially when including isotopic patterns. Isotopic pattern and mass defects can further be used to determine unique chemical structures containing certain functional groups (e.g., Cl, Br, F), aiding in the detection of unknown chemical exposures. Suspect screening using accurate mass matching can reduce false positives. The largest GC-EI-MS libraries (e.g., NIST and Wiley) include spectra acquired using unit mass instruments; only a handful of smaller libraries area available for spectral matching using accurate mass. Algorithms can be used to reduce false positives when matching experimental spectra obtained with GC-HRMS to unit resolution libraries (Stein, 2012). One example is applying a high-resolution mass filter (HRMF) (Kwiecien et al., 2015). HRMF calculates the percentage of fragment ions with formulae which can be predicted when setting atom constraints for formula matching to only those contained in the proposed molecular formula. Reverse HRMF can also be used, but this approach limits scoring to only peaks found in the library, ignoring other peaks in the experimental spectra for scoring purposes. Reverse HRMF reduces influences of artifacts during deconvolution on scoring, hence reducing false negatives, but may also increase false positives when experimental peaks are real and not found in the library. 2.3. Retention Index Retention time is an orthogonal measurement that can improve annotation accuracy when combined with EI MS spectral matches. As retention time is influenced by column type, batch, and manufacturer as well as the method temperature gradient, standardized retention indices can be calculated using a series of compounds that span the entire chromatographic run (i.e., alkanes (Kováts, 1958), fatty acid methyl esters (Miwa et al., 1960)). retention indices can also be useful for identifying isomers with the same or similar EI spectra. Compared to LC, GC retention times are typically more consistent within and across laboratories (Aalizadeh et al., 2021; Hall et al., 2012). This increases the reliability of retention index calibrations and enables matching of experimental retention indices to public libraries as well as prediction of retention time indices using chemical properties (Peng, 2010a; Stein et al., 2007a; Strehmel et al., 2008).
  • 7. 7 Retention indices provide key evidence to improve accuracy of compound annotation. There are a few considerations when implementing retention indices into a workflow. Methods using linear gradients may be preferable, to reduce errors in retention index calculation (Zellner et al., 2008), although non-linear temperature gradients may still be deployed with accurate RI calculated (Peng, 2010b). While retention time indices improve annotation confidence, their incomplete availability in spectral libraries may limit their use. The largest libraries (e.g., NIST and Wiley) have a high percentage of chemicals without retention index information, limiting the search space and coverage if retention index matching is required. The current release of NIST libraries contain predicted retention indices for many compounds; however, these predictions may not be accurate for all chemical classes (Stein et al., 2007b). Careful consideration of retention time indices is also necessary depending on the chemicals being measured. For example, alkanes are not appropriate for calculating RI for persistent mobile organic compounds including per- and polyfluoroalkyl substances (PFAS), organophosphate esters (OPEs), and certain polar compounds amendable to GC (Boegelsack et al., 2021; Grőbler, 1972). 2.4. Compound Metadata for Improving Candidate Ranking While spectral and retention time information can be used to assign a structure or substructure to a chemical, additional evidence available from GC-HRMS can increase confidence of chemical identifications. Study context and likelihood of exposure are also important to consider. For example, a sizable portion of 100+ million chemical entries in PubChem have never been synthesized, their existence is limited to a patent or database (Kim et al., 2021), or if synthesized were never generated at large scale. Hence tens of millions of these entries are irrelevant for exposomics studies and will lead to false positive annotations (Schymanski et al., 2021). To reduce false positives and remove top-ranked compounds which are not likely to exist in the environment, more selective databases such as the EPA CompTox Chemistry Dashboard (Williams et al., 2017) or PubChem Lite (Schymanski et al., 2021) can be used. For GC-HRMS a more actionable solution, given that the NIST and WILEY libraries are commonly used, is to include meta-data on database, patent and/or literature occurrence (reference counting), or production levels, for determining the likelihood a chemical exists in the environment. This can drastically reduce false positives by removing unlikely candidates and ease interpretation by retaining chemical species with known information (McEachran et al., 2017; “National Report on Human Exposure to Environmental Chemicals,” 2021; Williams et al., 2017). This method is biased to more well characterized chemicals, and any scoring methods are highly dependent on which databases/algorithms are used for determining compound occurrences. Hence, this method may lead to false negatives in the case of uncommon or unexpected chemicals, although the drastic reduction in false positives may far outweigh the potential of missed compounds. There are several tools that can provide meta-data on a chemical’s likely occurrence to improve non-targeted annotations in exposomics including; NHANES Predicted Exposures (“National Report on Human Exposure to Environmental Chemicals,” 2021), Data Sources (McEachran et al., 2017), Number of PubMed articles containing chemical name, Number of PubChem data sources, and CPDat Product Occurrence Count (McEachran et al., 2017).
  • 8. 8 2.5. Peak Filtering to Remove Background Contaminants and Artifacts Analytical artifacts can be introduced during sample collection, transport, storage, preparation/ extraction, and acquisition, as well as through instrument noise. Complete physical removal of contaminants is not possible, in particular when background and artifact peaks overlap. Additional information, such as clustering of fragments across multiple samples can be used to identify highly correlated fragments based on peak intensities. Matrix effects in GC (Fujiyoshi et al., 2016; Sakthiselvi et al., 2020) are important for determining the extent of background noise. These matrix effects will depend on sample type, concentration, preparation, and introduction, as well as the chromatography and acquisition strategies employed. These factors can impact spectral matches and annotation confidence, and are challenging to incorporate into scoring metrics. One approach to removing these interferences is blank feature filtering where sample abundances are retained if above a blank threshold. Blank feature filtering is ideally conducted using field blanks that have been treated identical to samples, undergoing the same transport, storage, sample preparation, and acquisition protocols. Multiple methods are available for blank filtering to remove background contaminants and artifacts (Borgsmüller et al., 2019; Koelmel et al., 2021; Mahieu et al., 2014; Matsuo et al., 2017; Patterson et al., 2017; Risum and Bro, 2019). Various methods for blank filtering are described in the supplemental information. 3. COMPILING GC-HRMS EVIDENCE TO COMMUNICATE ANNOTATION CONFIDENCE There are three approaches for communicating confidence in annotations: non- probabilistic scores, probabilistic scores, and qualitative assignments of confidence. Individual layers of GC-HRMS evidence have community accepted standards for scoring. For example, EI match scores are generally calculated using dot-product or reverse dot- product, retention index matches are usually based on an absolute deviation, meta-data scores are based on numbers of instances, and HRMF scores are based on a percentage of fragment formulae which can be predicted given the proposed structures composite atoms. Cut-offs for spectral match scores, retention index delta, or other metrics based on experience, community accepted values, or optimized via experiments can then be assigned to remove compounds unlikely to be true positives. Combining different scores from various layers of GC-HRMS evidence to communicate confidence is non-trivial. Weighted linear models, neural networks, and mixed models, for example, have all been used to combine spectral similarity and retention index information to optimize a single match function (Matyushin et al., 2020; Wei et al., 2014). Scores determined using these quantitative methods can improve candidate sorting compared to qualitative methods; however, statistical and coding experience are required which may be beyond the expertise of most users. Establishing harmonized algorithms and development of user-friendly software would ensure consistency across different user groups. Consistency across different software, workflows and laboratories is much easier
  • 9. 9 to achieve using qualitative assignment of confidence, e.g. (Schymanski et al., 2014). Here we introduce a qualitative means to assign and communicate confidence of feature annotations. We also show the use of more quantitative scoring metrics alongside qualitative methods for ranking candidates within a certain qualitative confidence assignment. Proposed System for Assigning Confidence Level 1: Confirmed Identification (Retention time, EI spectra and reference masses) using In-House library ⮚ Retention time matches standard database generated in-house using the same method, background matrix, and instrument (spectra from certified reference standards, not from an external library). Retention times should match within the RSD of the associated standards (no more than 1% deviation). ⮚ Observation of 2+ EI spectral reference peaks from standards at correct ratios (within 20%) or EI spectral match with in-house library >600. Often more than two reference masses and ratios are needed for confident annotation, 2 is minimum. For high-throughput screening with automated data extraction, verification of identity of each chemical in each sample by these criteria may be impractical. In such cases, procedures and assumptions should be clearly defined. Advantage: ⮚ Confident identification for exact structure (with a few possible exceptions, especially enantiomers or structurally similar isomers which cannot be separated by retention time). Limitation: ⮚ Expensive to purchase, prepare, run, and generate database for large number of chemicals. Standards may need to be reacquired each experiment if retention times are not stable between batches (e.g., column clipping). ⮚ Coverage is limited to standards available for purchase. Synthesized standards are often too costly for routine library building. Level 2: Probable structure or close isomer using external libraries (RI match, [molecular ion], and EI match to exact mass library or including metrics incorporating accurate mass information) ⮚ Reverse dot-product EI spectral match (>600) and dot-product >500. As EI spectral prediction improves, predicted spectra may also be used. ⮚ NCI/PCI/APCI when spectra contain more information than EI and at least 5 fragments are matched to a library database. ⮚ Retention index match (<50 and <1.5%) or predicted retention index (<100 difference) using validated methods. ⮚ Match to an exact mass library (reverse dot-product >600) or using metrics incorporating exact mass (e.g., high reverse high-resolution mass filter score (>75)). Advantage: ⮚ Correct annotation, or the assigned structure is a structurally similar isomer (with certain exceptions)
  • 10. 10 Limitation: ⮚ Retention index match requirement may limit annotations if databases do not contain many compounds with retention index. Therefore, this assignment of confidence is ideal when libraries have high retention index coverage (experimental or predicted). ⮚ In certain edge cases, reverse high-resolution mass filter might lead to false positives when gas phase reactions in Orbitraps lead to fragments which cannot be predicted using molecular formula. Level 3: Tentative candidate (EI accurate mass spectral match or EI match with metrics incorporating accurate mass) using external library ⮚ Reverse dot-product EI spectral match (>600) and dot-product (>500). As EI spectral prediction improves, predicted spectra may also be used. ⮚ NCI/PCI/APCI when spectra contain more information than EI and at least 5 fragments are matched to a library database. ⮚ Match to an exact mass library (reverse dot-product >600) or using metrics incorporating exact mass (e.g., high reverse high-resolution mass filter score (>75)). Advantage: ⮚ Widest coverage while assigning a possible structure. Limitation: ⮚ High rate of false positives (both in terms of exact structure and assigning a structurally similar isomer). ⮚ Possibility of multiple peaks with similar fragmentation patterns (i.e., structurally similar isomers), but different retention times. ⮚ In certain edge cases, reverse high-resolution mass filter might lead to false positives when gas phase reactions lead to fragments which cannot be predicted using molecular formula. Level 4: Chemical Group or Exact Formula Level 4A: Identification of unequivocal chemical formula from databases (PCI, NCI, or in certain cases, EI) ⮚ Only one formula is possible given the exact mass of the precursor ion and isotopic distribution. o OR multiple formulas are possible, but after only including atoms which can predict all dominant fragment ions, only one formula is possible for the molecule. Level 4B: Possible chemical series (for example homologous series with repeating chemical constituents) ⮚ Follows mass defect series (e.g., using Kendrick mass defect (Kendrick, 1963)). o Homologous series have linear retention indices – so detection via GC can be precise – even for variable temperature programs. ⮚ Has one or more fragments (with exact mass match) indicative of class.
  • 11. 11 o OR shows defined subnetwork clustering in spectral similarity networks. Level 4C: Possible chemical class (chemicals grouped based on structural motifs or similarity) ⮚ Has one or more fragments (with exact mass match) indicative of class. o OR clusters in chemical similarity networks. OR other well accepted class specific method (e.g., Lee index for polycyclic aromatic hydrocarbons (PAHs) (Lee et al., 1979; Škrbić and Onjia, 2006) Note: Levels 4A-C are not necessarily in order of confidence or utility, for example in PFAS analysis the goal may be to find series of 3 or more compounds (Level 4B). In non- targeted analysis across all chemical classes molecular formula may be of more importance (Level 4A). Note that for Level 4B and 4C one could also have an exact formula match, these could be indicated as Level 4AB / 4AC. Level 5: Unknown feature (retention time and reference mass or deconvoluted spectra) ⮚ Reproducibly detected deconvoluted spectra with reference mass intensity or combined intensity of all spectral ions. This feature may also have an annotation, and other information, but does not fit any of the criteria to be deemed Level 1, 2, 3, or 4. Important Considerations 1) To avoid annotation of background artifacts, peak filtering conditions must be met for all confidence levels (e.g., see discussion of blank feature filtering in the supplemental information). 2) If ion selection and fragmentation with CID/HCD is employed (PCI, NCI, and APCI, for example), then the Schymanski schema can be used to assign confidence levels. 3) For large studies it is important to account for spectral signals across multiple samples - for example, was there only one good spectral match in one sample for a study with 300 samples? This case often indicates a false positive (larger sample size increases the number of detected features, enhancing the chance for matching noise in library spectra). However, it is critical to note that in exposomics studies, these features may represent unique exposures of interest. Multiple injections of a single sample, when possible, could aid in annotating rare chemical occurrences. 4) In reporting results, it should be clear whether multiple structures have met the proposed criteria. If more than one structure meets the criteria for an assignment of confidence level other information can be used to discern the likely correct annotation: e.g., top hit has a significantly higher evidence count (by factor of 10), the compound is the only one expected for the matrix studied, the compound has a much more accurate retention index (by 30), the compound has a significant higher reverse dot-product score (by at least 50) or the compound has a much more accurate reverse high-resolution mass filter score (by at least 10). This is also where quantitative scoring metrics for ranking candidates can be utilized
  • 12. 12 (total scores). Note that even if certain conditions above are met, there may still be false positives (see Results Section). If multiple top-hits still exist after applying our schema, the feature should be flagged accordingly. 5) While we have not made necessary observation of the molecular ion for Level 2, especially since there is a retention index match, observation of the molecular ion provides additional confidence and should be achieved when possible. Figure 1: Proposed schema for assigning five levels of confidence in compound annotation using common evidence provided by GC-HRMS. Acronyms: retention time (RT), reverse search index (RSI), retention index match (ΔRI), and reverse high- resolution mass filter (RHRMF).
  • 13. 13 Comparison to the Schymanski Schema for LC-HRMS/MS While our schema is based on the Schymanski schema for LC-HRMS/MS, it was modified to incorporate criteria specific to GC-HRMS. Included criteria were selected based on feasibility for replication across labs with the overall objective of implementing a harmonized scoring rubric. In addition, we include blank feature filtering, or similar, as a necessary criterion for all annotations, as well as chemical series and class-based annotation which is important in various fields. Level 1 – Criteria for GC-HRMS is similar to Schymanski Level 1 and provides compound identification. For GC-HRMS confidence scoring, the molecular ion and MS2 is not required due to fragmentation spectra available from MS1 when using EI. Level 2 – Criteria for GC-HRMS is similar to Schymanski Level 2, however the use of retention indices has the potential to improve annotation confidence and assign unambiguous structures to isomers. The Schymanski schema states “spectrum-structure match is unambiguous” (Level 2A) or “represents the case where no other structure fits the experimental information” (Level 2B), which may be too stringent of an interpretation given the evidence provided. We would not confidently state this is true based on the suggested criteria for either LC or GC. Level 3 – Criteria for GC-HRMS fit within the Schymanski schema for Level 3, except predicted retention index can be included as evidence for Level 2 given the accurate predictions. Levels 4A-C – These levels represent unequivocal formulae and/or chemical series and chemical class, which are important in applications where the exact structure is not known but the series or formula provides valuable information. For example, these could be useful for polymers, PFAS, petrol chemicals, chlorinated by-products, and other applications where repeating series and discernable chemical classes of importance may be found. Unequivocal molecular formula is the same as Schymanski Level 4. It is useful to note that there was no place for series/determination of chemical class in the original Schymanski schema. Level 5 – This is different than in Schymanski Level 5 in that the exact mass of the feature cannot be discerned necessarily from EI spectra. Level 5 for GC-HRMS scoring to any reproducibly detected deconvoluted spectra which does not meet the criteria to be ranked in any other level and remains after filtering for artifacts and background contamination (e.g., above blank signal thresholds). 4. METHODS: ESTIMATING FALSE POSITIVES AND FALSE NEGATIVES FOR DIFFERENT CONFIDENT LEVELS USING AN OPEN-SOURCE SOFTWARE We developed a software tool, SIF-GC (Scoring, Integrating total spectral ion abundance, and Filtering for Gas Chromatography), to automatically assign confidence levels to features identified in GC-HRMS data. The software covers all steps covered in the described scoring framework provided the user has an output with retention index,
  • 14. 14 reverse similarity index, molecular ion, reverse high-resolution mass filter (or similar), and integrated peak areas, heights, or total spectral ion abundance, for all samples and blanks. Our software initially assigns Level 2, 3, or 5, and reports the top molecule in terms of a weighted score or other metric for each level, and whether multiple hits exist for each level (Level 1 is reserved for manual targeted techniques, and Level 4 may be project specific and is not implemented at this time). The software then can be used to apply blank feature filtering (user modifiable parameters), and further filter features to only retain compounds with unique CAS-RN or other identifiers and compound names (removes duplicates). Furthermore, the SIF-GC software can be used to compute total spectral ion abundance for all compounds given an average ratio of total spectral ion abundance to peak area or height. All steps are automated for Compound Discoverer (Thermo) outputs but could work, with some minor adjustments, with other vendor platforms. The SIF-GC software is open source and freely available at innovativeomics.com/software/. To evaluate the performance of our SIF-GC software and our schema for reporting confidence, we applied the tool to datasets acquired from biological and environmental samples and investigated the false positive and false negative rate assigned for each level. The first dataset was of human serum samples spiked with 112 standards (90 native and 22 isotopic labeled chemicals; Table S1A), with a final concentration range of 6.7 to 33.6 ppb. The spiked serum validation dataset consisted mainly of brominated flame retardants, chlorinated flame retardants, organochlorine pesticides, polycyclic aromatic hydrocarbons, oxygenated polycyclic aromatic hydrocarbons, nitrated polycyclic aromatic hydrocarbons, polychlorinated biphenyls, polychlorinated dibenzodioxins, and polychlorinated dibenzofurans. The second dataset describes indoor and outdoor air samples collected using passive samplers from homes near natural gas compressor stations. A targeted panel of 81 standards (70 native and 11 isotopic labeled chemicals; Table S1B) with a final concentration range of 32 to 1000 ppb were assessed for these environmental samples. This validation set consisted of an alkaloid, benzodioxoles, benzopyrans, brominated flame retardants, chlorinated hydrocarbons, haloethers, nitroaromatics, nitrosamines, phthalates, pyrethroids, organochlorine pesticides, organophosphates, polycyclic aromatic hydrocarbons, polychlorinated biphenyls, and volatile organic compounds. We compared Level 1 assigned compounds (targeted methods used to confidently assigned compounds) with compounds assigned Level 2, 3, or 5 using the GC-SIF software. A list of experimental protocols is provided in the supplemental information. 5. RESULTS: HOW CONFIDENT ARE THE CONFIDENCE LEVELS? We selected qualitative criteria that can be readily implemented given common metrics provided by most GC open source and vendor library search software. These common metrics included retention index (delta), spectral match (in this case reverse similarity index was used), and a metric incorporating exact mass (in this case reverse high- resolution mass filter criteria was used). Using the GC-SIF software, we determined the false negative, false positive, and true positive rate when utilizing our criteria for assigning confidence at Levels 2 and 3. For the human serum samples, of the 90 spiked unlabeled
  • 15. 15 standards which were detected by GC-HRMS, 80 were deconvoluted by Thermo Compound Discoverer. Of these 80 chemicals, 61 were assigned a Level 2 annotation (76%), and of these Level 2 annotations, the correct exact structure was only identified for 28 compounds (46%). Hence in absolute terms, using our scoring criteria, the highest level of confidence for suspect screening, there was a 54% false positive for Level 2 assignments, 60% false positive rate for Level 3 assignments, and 84% false positive rate for top-ranked assignments without Level 2 or 3 criteria (Figure 2A). Of the deconvoluted spiked standards, 19 (24%) had no Level 2 assignment and hence were considered false negatives. In many cases these false negatives are due to unavailable libraries or retention index values for the standard. Figure 2: False positive rates and number of assignments across each confidence level for 81 observed unlabeled standards spiked into human serum (A), and 19 Level 1 assignments from outdoor and indoor air samples (B).
  • 16. 16 For air samples, there was a 22% false positive rate for Level 2 assignments, 89% false positive rate for Level 3 assignments, and 74% false positive rate for top-ranked assignments without Level 2 or 3 criteria (Figure 2B). It should be noted there was a lower false positive rate in air samples compared to the human serum samples. This difference is attributable the standards used with each sample type: serum samples were spiked with a mixture that was mainly comprised of halogenated isomers which could not be distinguished by retention index and EI spectra, whereas the air samples included a wider range of chemicals. Therefore, for exact structure the air samples may be more representative of the false positive rate for Level 2 when the range of chemicals being annotated is a cross a diverse array of structures, many of which do not have close isomers. Even when using retention index, reverse high-resolution mass filter, reverse search index, molecular ion, and search index there are subtle isomeric differences (e.g., Figure 3), the exact structure often cannot be discerned using GC-EI-HRMS (hence the 54% false positive rate in serum samples and 22% false positive rate in air samples). When subtle isomeric differences in the position of methyl, chlorine, heteroatoms, and aromatic rings were not considered false positives, 13% of the 61 top-ranked Level 2 assignments in the spiked serum were false positives, 25% of the top-ranked Level 3 assignments were false positives, and 61% of the top-ranked assignments without Level 2 or 3 criteria (Level 5) were false positives (Figure 2A). Similarly, when not considering highly similar isomers false positives, for air samples, 11% of the 18 top-ranked Level 2 assignments were false positives, 79% of the top-ranked Level 3 assignments were false positives, and 68% of the top-ranked assignments without Level 2 or 3 criteria (Level 5) were false positives (Figure 2B). Figure 3: Examples of 10 isomers (dimethyl naphthalene and ethyl naphthalene) which were all assigned Level 2 for a single feature. The dimethyl naphthalene species represented one of the highest abundance features in air samples. Application of the proposed GC-HRMS scoring framework in datasets from biological and environmental samples highlighted that: 1) retention index is essential for confident assignments and 2) structurally similar isomers often cannot be resolved even with a retention index match. Exact mass filters (e.g., reverse high-resolution mass filter used here) were also shown to be valuable, especially when heteroatoms exist. There were 26% false positives using reverse high-resolution mass filter (Level 3) compared to the 63% false positive rate identified without any filters for the validation set containing mostly chlorinated species (Figure 2A). Reverse high-resolution mass filter was also found to be valuable in the validation set containing mostly phthalates and PAHs (Table 1).
  • 17. 17 A summary of each annotation filter, the number of candidates retained, and the percentage of candidates removed after applying the filters for air samples is shown in Table 1. The individual filters which removed the most candidates on average were requiring molecular ion (83 ± 14% removed), followed by retention index match < 100 (76 ± 10% removed), reverse high-resolution mass filter > 75% (40 ± 27% removed), and reverse search index (40 ± 12% removed) (Table 1). These findings highlight the importance of the retention index criteria for removing false positives. When retention index is combined with reverse high-resolution mass filter and reverse search index for a Level 2 assignment, 84 ± 8% candidates were removed on average. Hence combining these three filters further lowers the false positive rate. While use of molecular ion was the most stringent filtering criterion, reducing false positives, decreasing the false positive rate must be balanced against the increase in false negatives (number of correct candidates removed using the respective filter). Use of the molecular ion drastically increased false negatives. Table 1: Summary of confidence in annotated chemicals detected in outdoor and indoor air samples. Individual layers of GC-HRMS evidence were not sufficient at differentiating between similar chemicals. Detected chemicals were filtered based on multiple criteria including retention index, reverse search index, reverse high-resolution mass filter, and observation of the molecular ion. The number of retained candidates after applying all criteria for Levels 2 and 3 is shown for individual chemicals as well as each Level 2 criteria. The overall percent of candidates removed is also presented. False negatives were designated when the correct assignment (exact structural match) was removed by the respective filter. Requiring the molecular ion led to five false negatives out of the 19 validated assignments (26%; Table 1), use of retention index led to 11% false negatives (2/19), reverse search index to 5% false negatives (1/19), and reverse high-resolution mass filter to 0% false negatives (data not shown). When retention index, reverse search index, and reverse high-resolution mass filter were combined (Level 2 assignment) there were 16% false negatives (3/19; Table 1). The false negative rate for requiring molecular ion may be even higher, as this validation set consisted of PAHs rbons and derivatives (over 50% of validation set); PAHs have a high molecular ion signal. Therefore, the generalized criteria we present do not necessitate Total Detected Level 3, All Criteria Level 2, All Criteria Level 2, Rentention Index Criteria Level 2, Reverse Search Index Criteria Level 2, Reverse High-Resolution Mass Filter Criteria Correct Molecular Ion Observed Benz[a]anthracene 25 14 3 5 18 16 11 Yes Yes Chrysene 39 19 3 4 33 24 11 Yes Yes Fluoranthene 54 28 9 12 34 43 6 Yes Yes Fluorene 33 29 7 9 32 29 7 Yes Yes Acenaphthylene 33 23 7 12 33 23 5 Yes Yes Naphthalene 40 32 8 11 40 32 9 Yes Yes Phenanthrene 38 38 7 7 38 38 13 Yes Yes Pyrene 53 23 7 7 36 34 6 Yes Yes 2-Chloronaphthalene 27 13 2 6 27 13 3 Yes Yes Dibenzofuran 49 29 5 16 48 29 12 Yes Yes Hexachlorobenzene 32 9 6 12 31 9 5 Yes Yes Hexachlorobutadiene 32 2 1 2 26 2 1 Yes Yes Tris(1-chloro-2-propyl) phosphate 20 5 3 7 17 7 0 No No 4,4'-Dibromooctafluorobiphenyl 27 1 1 2 27 1 1 Yes Yes Isophorone 49 46 19 19 49 46 8 Yes Yes N-Nitrosodiphenylamine 46 35 4 4 46 35 23 No No Butylbenzyl phthalate 21 7 4 7 18 7 3 Yes No Diethyl phthalate 28 18 4 6 27 18 0 Yes No Di-n-octyl phthalate 53 45 9 10 52 45 1 No No Percent Filtered 44 ± 28 84 ± 8 76 ± 10 9 ± 12 40 ± 27 83 ± 14 Number of Candidates Molecular Ion Observed for Correct Candidate Correct Level 2 Assignment Unlabeled Standard
  • 18. 18 the molecular ion (retention index removes many of the same candidates and the false negative rate of requiring molecular ion is too high), but depending on chemical class, for example when looking at PAHs, necessitating molecular ion may be a beneficial additional filter. When adding molecular ion as a requirement for Level 2 assignments a slight decrease in false positives was observed, but a significant increase in false negatives was also observed (Figure S1). 6. CONCLUDING REMARKS: UTILITY, POTENTIAL CHALLENGES, AND FUTURE WORK It is important to note that the assignment of confidence is based on filters, and often multiple candidates remain even after stringent filtering criteria. For example, for the passive air sampler dataset, only 2 out of the 19 features with a Level 1 assignment had a single candidate retained after Level 2 filtering, the remaining 89% of assigned features had more than one Level 2 candidate (Table 1). Current filter cutoffs are voluntaristic, based on expert consensus. Optimization of filters may be able to further reduce false positives, while limiting false negatives, but this will be compound class and workflow specific. Furthermore, multiple candidates will always exist for multiple features regardless of thresholds to keep false negatives low. Therefore, ranking algorithms are essential. When adding the Level 2 criteria to the ranking algorithm from Thermo Compound Discoverer a significant number of false positives were removed without a significant increase in false negatives for both validation datasets (Figure 2). Using search index scores for ranking (e.g., dot product) performed similarly, and hence simple scoring metrics can be chosen which are not vendor specific. Further ranking based on meta-data can also be beneficial using information such as database occurrence. For example, in the air samples evaluated, DEET was observed, but a close isomer which is not commonly found the environment was ranked first and DEET was ranked second (both Level 2 assignments). Database matching could be used to rank DEET higher; this compound was expected as individuals residing in residences where air samples were collected reported use of this insect repellant. We outline a qualitative scoring framework for assigning confidence to chemical assignments in GC-HRMS data. The proposed schema was developed to be actionable with easy implementation using most deconvolution and annotation algorithm outputs. Furthermore, we introduce an open access software tool (SIF-GC) to automate feature scoring. The assignment of confidence follows a five-level scoring framework that has been well-established by Schymanski et al. for LC-HRMS/MS. The criteria used to determine confidence for GC-HRMS are common metrics in the GC field, including blank filtering, (reverse) spectral matches, (reverse) high-resolution mass filters or exact mass spectral matches, retention index, and molecular ion. Application of the proposed scoring system to biological and environmental datasets resulted in a false positive rate of 12% and 11%, respectively, for Level 2 assignments, the highest level which can be assigned when using suspect screening approaches based on our schema. The false negative rate (removal of the correct candidate) for Level 2 assignments was found to be 16% for the air samples and 24% for the spiked human serum. When close isomers (e.g., position of chlorine(s) on PCBs or methyl groups on PAHs) are considered false positives this false positive rate is much higher (22% for air samples and 54% for human serum). Hence
  • 19. 19 when using suspect screening for GC, in most cases retention index alone could not distinguish close isomers. In this case, elution order rather than retention index can be valuable, although this is difficult to automate and implement on a large scale. While Level 2 assignments can provide relatively confident annotations, at least at the level of isomers with subtle difference from the exact structure, Level 3 assignments were more tentative, and confidence varied widely depending on structure. Nonetheless Level 3 criteria improved assignment confidence over not using any criteria when chemicals with heteroatoms are investigated. Furthermore, Level 4 assignments can be used to assign chemical classes, when for example, all PAHs are of interest even when exact structure cannot be determined. It is important to note that depending on matrix, acquisition methods, and analyte classes, different false positive and negative rates may be determined. Future work applying the confidence levels presented here on standard reference material can be used to have quantitative definition of levels of confidence a priori, e.g., Level 1 – 95%, Level 2 – 80%, and Level 3 – 50%. The use of filtering criteria decreased the false positive rate as compared to common ranking algorithms alone (e.g., Thermo Compound Discover’s ranking algorithm). For the >100 molecules validated in this study, use of retention index, reverse search index, and reverse high-resolution mass filtering did not substantially increase false negatives. Therefore, Level 2 annotations can be exclusively considered for many suspect screening applications. The requirement of molecular ion for EI did increase false negatives significantly and is only recommended for certain chemical classes which are highly stable, for example most PAHs. It is important to mention that the use of accurate mass, either through molecular ion accurate mass matches, high-resolution mass spectral libraries, or through reverse high-resolution mass filters or other techniques is beneficial for reducing false positives. We did not discern between instruments with different resolving power (our validation datasets were acquired at 40,000 and 60,000 for the air and serum samples, respectively), but this may influence the false positive and negative rate when using Level 2 and Level 3 filters. Furthermore, Orbitraps and other trap instruments may have ions generated which cannot be predicted from molecular formula, and hence Q-TOFs may have a lower false negative rate in this regard. While the criteria included in the proposed GC-HRMS scoring framework substantially decreased false positives and can be used to assign more confident annotations, other metrics may also be explored including chemical database occurrence (higher occurrence means it may be more likely to exist in samples), peak shape, and use of ECNI, PCI, APCI, and other ionization techniques for validation. Database matching may be especially helpful when close isomers are observed, but only a common isomer exists. Furthermore, even after Level 2 filtering, in nearly all cases multiple candidates were retained. Therefore, more quantitative scoring metrics for ranking need still be applied to determine the correct candidate. We find that total score provided by Compound Discoverer or simply using reverse search index to rank candidates works well in tandem with Level 2 filters. The Level 2 filters were further found to reduce false positives substantially using either of these ranking algorithms, then when using these ranking algorithms alone.
  • 20. 20 The proposed confidence scores provide a strategy for communicating levels of evidence that were used to predict annotation of chemicals detected using non-targeted analysis. While false positives were still present across all annotation levels, this approach assists in identifying the likelihood a match is correct and can be used to facilitate selection of potential compounds for further evaluation and validation using additional analytical techniques. This scoring framework can be easily extended to GC-HRMS data for exposome analysis and provides a foundation for linking annotations performed across multiple laboratories and studies and building cumulative databases.
  • 21. 21 REFERENCES Aalizadeh, R., Alygizakis, N.A., Schymanski, E.L., Krauss, M., Schulze, T., Ibáñez, M., McEachran, A.D., Chao, A., Williams, A.J., Gago-Ferrero, P., Covaci, A., Moschet, C., Young, T.M., Hollender, J., Slobodnik, J., Thomaidis, N.S., 2021. Development and Application of Liquid Chromatographic Retention Time Indices in HRMS-Based Suspect and Nontarget Screening. Anal. Chem. 93, 11601–11611. https://doi.org/10.1021/acs.analchem.1c02348 Allen, F., Pon, A., Greiner, R., Wishart, D., 2016. Computational Prediction of Electron Ionization Mass Spectra to Assist in GC/MS Compound Identification. Anal. Chem. 88, 7689–7697. https://doi.org/10.1021/acs.analchem.6b01622 Barwick, V.J., 1999. Sources of uncertainty in gas chromatography and high-performance liquid chromatography. Journal of Chromatography A 849, 13–33. https://doi.org/10.1016/S0021-9673(99)00537-3 Boegelsack, N., Sandau, C., McMartin, D.W., Withey, J.M., O’Sullivan, G., 2021. Development of retention time indices for comprehensive multidimensional gas chromatography and application to ignitable liquid residue mapping in wildfire investigations. Journal of Chromatography A 1635, 461717. https://doi.org/10.1016/j.chroma.2020.461717 Borgsmüller, N., Gloaguen, Y., Opialla, T., Blanc, E., Sicard, E., Royer, A.-L., Le Bizec, B., Durand, S., Migné, C., Pétéra, M., Pujos-Guillot, E., Giacomoni, F., Guitton, Y., Beule, D., Kirwan, J., 2019. WiPP: Workflow for Improved Peak Picking for Gas Chromatography-Mass Spectrometry (GC-MS) Data. Metabolites 9, 171. https://doi.org/10.3390/metabo9090171 Doherty, B.T., Koelmel, J.P., Lin, E.Z., Romano, M.E., Godri Pollitt, K.J., 2021. Use of Exposomic Methods Incorporating Sensors in Environmental Epidemiology. Curr Envir Health Rpt 8, 34–41. https://doi.org/10.1007/s40572-021-00306-8 Elie, N., Santerre, C., Touboul, D., 2019. Generation of a Molecular Network from Electron Ionization Mass Spectrometry Data by Combining MZmine2 and MetGem Software. Anal. Chem. 91, 11489–11492. https://doi.org/10.1021/acs.analchem.9b02802 Fujiyoshi, T., Ikami, T., Sato, T., Kikukawa, K., Kobayashi, M., Ito, H., Yamamoto, A., 2016. Evaluation of the matrix effect on gas chromatography – mass spectrometry with carrier gas containing ethylene glycol as an analyte protectant. Journal of Chromatography A 1434, 136–141. https://doi.org/10.1016/j.chroma.2015.12.085 Grőbler, A., 1972. A Polar Retention Index System for Gas-Liquid Chromatography. Journal of Chromatographic Science 10, 128. https://doi.org/10.1093/chromsci/10.2.128 Halket, J.M., Waterman, D., Przyborowska, A.M., Patel, R.K.P., Fraser, P.D., Bramley, P.M., 2005. Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS. Journal of Experimental Botany 56, 219–243. https://doi.org/10.1093/jxb/eri069 Hall, L.M., Hall, L.H., Kertesz, T.M., Hill, D.W., Sharp, T.R., Oblak, E.Z., Dong, Y.W., Wishart, D.S., Chen, M.-H., Grant, D.F., 2012. Development of Ecom50 and Retention Index Models for Nontargeted Metabolomics: Identification of 1,3-Dicyclohexylurea in Human Serum by HPLC/Mass Spectrometry. J. Chem. Inf. Model. 52, 1222–1237. https://doi.org/10.1021/ci300092s Hummel, J., Strehmel, N., Selbig, J., Walther, D., Kopka, J., 2010. Decision tree supported substructure prediction of metabolites from GC-MS profiles. Metabolomics 6, 322–333. https://doi.org/10.1007/s11306-010-0198-7 Jensen, S., 1972. The PCB Story. Ambio 1, 123–131. Kendrick, Edward., 1963. A Mass Scale Based on CH2 = 14.0000 for High Resolution Mass Spectrometry of Organic Compounds. Anal. Chem. 35, 2146–2154. https://doi.org/10.1021/ac60206a048
  • 22. 22 Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B.A., Thiessen, P.A., Yu, B., Zaslavsky, L., Zhang, J., Bolton, E.E., 2021. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Research 49, D1388–D1395. https://doi.org/10.1093/nar/gkaa971 Koelmel, J.P., Lin, E.Z., Guo, P., Zhou, J., He, J., Chen, A., Gao, Y., Deng, F., Dong, H., Liu, Y., Cha, Y., Fang, J., Beecher, C., Shi, X., Tang, S., Godri Pollitt, K.J., 2021. Exploring the external exposome using wearable passive samplers - The China BAPE study. Environ Pollut 270, 116228. https://doi.org/10.1016/j.envpol.2020.116228 Koelmel, J.P., Ulmer, C.Z., Jones, C.M., Yost, R.A., Bowden, J.A., 2017. Common cases of improper lipid annotation using high-resolution tandem mass spectrometry data and corresponding limitations in biological interpretation. Biochim Biophys Acta Mol Cell Biol Lipids 1862, 766–770. https://doi.org/10.1016/j.bbalip.2017.02.016 Koopman, J., Grimme, S., 2021. From QCEIMS to QCxMS: A Tool to Routinely Calculate CID Mass Spectra Using Molecular Dynamics. J. Am. Soc. Mass Spectrom. 32, 1735–1751. https://doi.org/10.1021/jasms.1c00098 Kováts, E., 1958. Gas-chromatographische Charakterisierung organischer Verbindungen. Teil 1: Retentionsindices aliphatischer Halogenide, Alkohole, Aldehyde und Ketone. Helvetica Chimica Acta 41, 1915–1932. https://doi.org/10.1002/hlca.19580410703 Kwiecien, N.W., Bailey, D.J., Rush, M.J.P., Cole, J.S., Ulbrich, A., Hebert, A.S., Westphall, M.S., Coon, J.J., 2015. High-Resolution Filtering for Improved Small Molecule Identification via GC/MS. Anal. Chem. 87, 8328–8335. https://doi.org/10.1021/acs.analchem.5b01503 Lai, Z., Tsugawa, H., Wohlgemuth, G., Mehta, S., Mueller, M., Zheng, Y., Ogiwara, A., Meissen, J., Showalter, M., Takeuchi, K., Kind, T., Beal, P., Arita, M., Fiehn, O., 2018. Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics. Nat Methods 15, 53–56. https://doi.org/10.1038/nmeth.4512 Landrigan, P.J., Fuller, R., Acosta, N.J.R., Adeyi, O., Arnold, R., Basu, N. (Nil), Baldé, A.B., Bertollini, R., Bose-O’Reilly, S., Boufford, J.I., Breysse, P.N., Chiles, T., Mahidol, C., Coll-Seck, A.M., Cropper, M.L., Fobil, J., Fuster, V., Greenstone, M., Haines, A., Hanrahan, D., Hunter, D., Khare, M., Krupnick, A., Lanphear, B., Lohani, B., Martin, K., Mathiasen, K.V., McTeer, M.A., Murray, C.J.L., Ndahimananjara, J.D., Perera, F., Potočnik, J., Preker, A.S., Ramesh, J., Rockström, J., Salinas, C., Samson, L.D., Sandilya, K., Sly, P.D., Smith, K.R., Steiner, A., Stewart, R.B., Suk, W.A., Schayck, O.C.P. van, Yadama, G.N., Yumkella, K., Zhong, M., 2018. The Lancet Commission on pollution and health. The Lancet 391, 462–512. https://doi.org/10.1016/S0140- 6736(17)32345-0 Lee, M.L., Vassilaros, D.L., White, C.M., 1979. Retention indices for programmed-temperature capillary-column gas chromatography of polycyclic aromatic hydrocarbons. Anal. Chem. 51, 768–773. https://doi.org/10.1021/ac50042a043 Little, J.L., Howard, A.S., 2013. Qualitative Gas Chromatography–Mass Spectrometry Analyses Using Amines as Chemical Ionization Reagent Gases. J. Am. Soc. Mass Spectrom. 24, 1913–1918. https://doi.org/10.1007/s13361-013-0740-8 Mahieu, N.G., Huang, X., Chen, Y.-J., Patti, G.J., 2014. Credentialing Features: A Platform to Benchmark and Optimize Untargeted Metabolomic Methods. Anal. Chem. 86, 9583– 9589. https://doi.org/10.1021/ac503092d Mallett, T., 2014. Mass Spectral Libraries - Reproducibility in EI, API and Tandem Mass Spectrometry. Wiley Analytical Science. Martin, K.A.V., Lin, E.Z., Hilbert, T.J., Pollitt, K.J.G., Haynes, E.N., 2021. Survey of airborne organic compounds in residential communities near a natural gas compressor station: Response to community concern. Environmental Advances 5, 100076. https://doi.org/10.1016/j.envadv.2021.100076
  • 23. 23 Matsuo, T., Tsugawa, H., Miyagawa, H., Fukusaki, E., 2017. Integrated Strategy for Unknown EI–MS Identification Using Quality Control Calibration Curve, Multivariate Analysis, EI– MS Spectral Database, and Retention Index Prediction. Anal. Chem. 89, 6766–6773. https://doi.org/10.1021/acs.analchem.7b01010 Matyushin, D.D., Sholokhova, A.Yu., Karnaeva, A.E., Buryak, A.K., 2020. Various aspects of retention index usage for GC-MS library search: A statistical investigation using a diverse data set. Chemometrics and Intelligent Laboratory Systems 202, 104042. https://doi.org/10.1016/j.chemolab.2020.104042 Mazur, D.M., Detenchuk, E.A., Sosnova, A.A., Artaev, V.B., Lebedev, A.T., 2021. GC-HRMS with Complementary Ionization Techniques for Target and Non-target Screening for Chemical Exposure: Expanding the Insights of the Air Pollution Markers in Moscow Snow. Science of The Total Environment 761, 144506. https://doi.org/10.1016/j.scitotenv.2020.144506 McEachran, A.D., Sobus, J.R., Williams, A.J., 2017. Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard. Anal Bioanal Chem 409, 1729–1735. https://doi.org/10.1007/s00216-016-0139-z Miwa, T.K., Mikolajczak, K.L., Earle, F.R., Wolbt, I.A., 1960. Gas chromatographic characterization of fatty acids. Identification constants for mono- and tlicarboxylic methyl esters. Analytical Chemistry 32, 1739–1742. National Report on Human Exposure to Environmental Chemicals, 2021. https://doi.org/10.15620/cdc:105345 Patterson, R.E., Kirpich, A.S., Koelmel, J.P., Kalavalapalli, S., Morse, A.M., Cusi, K., Sunny, N.E., McIntyre, L.M., Garrett, T.J., Yost, R.A., 2017. Improved experimental data processing for UHPLC–HRMS/MS lipidomics applied to nonalcoholic fatty liver disease. Metabolomics 13, 142. https://doi.org/10.1007/s11306-017-1280-1 Peng, C.T., 2010a. Prediction of retention indices. VI: Isothermal and temperature-programmed retention indices, methylene value, functionality constant, electronic and steric effects. Journal of Chromatography A 1217, 3683–3694. https://doi.org/10.1016/j.chroma.2010.02.005 Peng, C.T., 2010b. Prediction of retention indices. VI: Isothermal and temperature-programmed retention indices, methylene value, functionality constant, electronic and steric effects. Journal of Chromatography A 1217, 3683–3694. https://doi.org/10.1016/j.chroma.2010.02.005 Prüss-Üstün, A., Wolf, J., Corvalán, C., Organization, W.H., Bos, R., Neira, D.M., 2016. Preventing Disease Through Healthy Environments: A Global Assessment of the Burden of Disease from Environmental Risks. World Health Organization. Rappaport, S.M., 2016. Genetic Factors Are Not the Major Causes of Chronic Diseases. PLOS ONE 11, e0154387. https://doi.org/10.1371/journal.pone.0154387 Risum, A.B., Bro, R., 2019. Using deep learning to evaluate peaks in chromatographic data. Talanta 204, 255–260. https://doi.org/10.1016/j.talanta.2019.05.053 Rostkowski, P., Haglund, P., Aalizadeh, R., Alygizakis, N., Thomaidis, N., Arandes, J.B., Nizzetto, P.B., Booij, P., Budzinski, H., Brunswick, P., Covaci, A., Gallampois, C., Grosse, S., Hindle, R., Ipolyi, I., Jobst, K., Kaserzon, S.L., Leonards, P., Lestremau, F., Letzel, T., Magnér, J., Matsukami, H., Moschet, C., Oswald, P., Plassmann, M., Slobodnik, J., Yang, C., 2019. The strength in numbers: comprehensive characterization of house dust using complementary mass spectrometric techniques. Anal Bioanal Chem 411, 1957–1977. https://doi.org/10.1007/s00216-019-01615-6 Sakthiselvi, T., Paramasivam, M., Vasanthi, D., Bhuvaneswari, K., 2020. Persistence, dietary and ecological risk assessment of indoxacarb residue in/on tomato and soil using GC– MS. Food Chemistry 328, 127134. https://doi.org/10.1016/j.foodchem.2020.127134
  • 24. 24 Scheubert, K., Hufsky, F., Petras, D., Wang, M., Nothias, L.-F., Dührkop, K., Bandeira, N., Dorrestein, P.C., Böcker, S., 2017. Significance estimation for large scale metabolomics annotations by spectral matching. Nat Commun 8, 1494. https://doi.org/10.1038/s41467- 017-01318-5 Schreckenbach, S.A., Anderson, J.S.M., Koopman, J., Grimme, S., Simpson, M.J., Jobst, K.J., 2021. Predicting the Mass Spectra of Environmental Pollutants Using Computational Chemistry: A Case Study and Critical Evaluation. J. Am. Soc. Mass Spectrom. 32, 1508–1518. https://doi.org/10.1021/jasms.1c00078 Schrimpe-Rutledge, A.C., Codreanu, S.G., Sherrod, S.D., McLean, J.A., 2016. Untargeted Metabolomics Strategies—Challenges and Emerging Directions. J. Am. Soc. Mass Spectrom. 27, 1897–1905. https://doi.org/10.1007/s13361-016-1469-y Schymanski, E.L., Gallampois, C.M.J., Krauss, M., Meringer, M., Neumann, S., Schulze, T., Wolf, S., Brack, W., 2012. Consensus Structure Elucidation Combining GC/EI-MS, Structure Generation, and Calculated Properties. Anal. Chem. 84, 3287–3295. https://doi.org/10.1021/ac203471y Schymanski, E.L., Jeon, J., Gulde, R., Fenner, K., Ruff, M., Singer, H.P., Hollender, J., 2014. Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence. Environ. Sci. Technol. 48, 2097–2098. https://doi.org/10.1021/es5002105 Schymanski, E.L., Kondić, T., Neumann, S., Thiessen, P.A., Zhang, J., Bolton, E.E., 2021. Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFrag. Journal of Cheminformatics 13, 19. https://doi.org/10.1186/s13321-021-00489- 0 Škrbić, B., Onjia, A., 2006. Prediction of the Lee retention indices of polycyclic aromatic hydrocarbons by artificial neural network. Journal of Chromatography A 1108, 279–284. https://doi.org/10.1016/j.chroma.2006.01.080 Spackman, P.R., Bohman, B., Karton, A., Jayatilaka, D., 2018. Quantum chemical electron impact mass spectrum prediction for de novo structure elucidation: Assessment against experimental reference data and comparison to competitive fragmentation modeling. International Journal of Quantum Chemistry 118, e25460. https://doi.org/10.1002/qua.25460 Stein, S., 2012. Mass Spectral Reference Libraries: An Ever-Expanding Resource for Chemical Identification. Anal. Chem. 84, 7274–7282. https://doi.org/10.1021/ac301205z Stein, S.E., 1995. Chemical substructure identification by mass spectral library searching. J. Am. Soc. Mass Spectrom. 6, 644–655. https://doi.org/10.1016/1044-0305(95)00291-K Stein, S.E., Babushok, V.I., Brown, R.L., Linstrom, P.J., 2007a. Estimation of Kováts Retention Indices Using Group Contributions. J. Chem. Inf. Model. 47, 975–980. https://doi.org/10.1021/ci600548y Stein, S.E., Babushok, V.I., Brown, R.L., Linstrom, P.J., 2007b. Estimation of Kováts Retention Indices Using Group Contributions. J. Chem. Inf. Model. 47, 975–980. https://doi.org/10.1021/ci600548y Stokstad, E., 2020. Why were salmon dying? The answer washed off the road. Science 370, 1145–1145. https://doi.org/10.1126/science.370.6521.1145 Strehmel, N., Hummel, J., Erban, A., Strassburg, K., Kopka, J., 2008. Retention index thresholds for compound matching in GC–MS metabolite profiling. Journal of Chromatography B, Hyphenated Techniques for Global Metabolite Profiling 871, 182– 190. https://doi.org/10.1016/j.jchromb.2008.04.042 Travis, S.C., Kordas, K., Aga, D.S., 2021. Optimized workflow for unknown screening using gas chromatography high-resolution mass spectrometry expands identification of contaminants in silicone personal passive samplers. Rapid Commun Mass Spectrom 35, e9048. https://doi.org/10.1002/rcm.9048
  • 25. 25 US EPA, O., 2021. What Killed the Eagles? EPA Researchers Help Solve 25+ Year Mystery [WWW Document]. URL https://www.epa.gov/sciencematters/what-killed-eagles-epa- researchers-help-solve-25-year-mystery (accessed 4.7.22). Vermeulen, R., Schymanski, E.L., Barabási, A.-L., Miller, G.W., 2020. The exposome and health: Where chemistry meets biology. Science 367, 392–396. https://doi.org/10.1126/science.aay3164 von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P., 2002. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399–403. https://doi.org/10.1038/nature750 Wang, S., Kind, T., Tantillo, D.J., Fiehn, O., 2020. Predicting in silico electron ionization mass spectra using quantum chemistry. Journal of Cheminformatics 12, 63. https://doi.org/10.1186/s13321-020-00470-3 Wei, J.N., Belanger, D., Adams, R.P., Sculley, D., 2019. Rapid Prediction of Electron–Ionization Mass Spectrometry Using Neural Networks. ACS Cent. Sci. 5, 700–708. https://doi.org/10.1021/acscentsci.9b00085 Wei, X., Koo, I., Kim, S., Zhang, X., 2014. Compound identification in GC-MS by simultaneously evaluating the mass spectrum and retention index. Analyst 139, 2507–2514. https://doi.org/10.1039/C3AN02171H Wheelock, C.E., Rappaport, S.M., 2020. The role of gene–environment interactions in lung disease: the urgent need for the exposome. European Respiratory Journal 55. https://doi.org/10.1183/13993003.02064-2019 Wild, C.P., 2012. The exposome: from concept to utility. International Journal of Epidemiology 41, 24–32. https://doi.org/10.1093/ije/dyr236 Wild, C.P., 2005. Complementing the Genome with an “Exposome”: The Outstanding Challenge of Environmental Exposure Measurement in Molecular Epidemiology. Cancer Epidemiol Biomarkers Prev 14, 1847–1850. https://doi.org/10.1158/1055-9965.EPI-05-0456 Williams, A.J., Grulke, C.M., Edwards, J., McEachran, A.D., Mansouri, K., Baker, N.C., Patlewicz, G., Shah, I., Wambaugh, J.F., Judson, R.S., Richard, A.M., 2017. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. Journal of Cheminformatics 9, 61. https://doi.org/10.1186/s13321-017-0247-6 Zellner, B. d’Acampora, Bicchi, C., Dugo, P., Rubiolo, P., Dugo, G., Mondello, L., 2008. Linear retention indices in gas chromatographic analysis: a review. Flavour and Fragrance Journal 23, 297–314. https://doi.org/10.1002/ffj.1887 Zhang, P., Carlsten, C., Chaleckis, R., Hanhineva, K., Huang, M., Isobe, T., Koistinen, V.M., Meister, I., Papazian, S., Sdougkou, K., Xie, H., Martin, J.W., Rappaport, S.M., Tsugawa, H., Walker, D.I., Woodruff, T.J., Wright, R.O., Wheelock, C.E., 2021. Defining the Scope of Exposome Studies and Research Needs from a Multidisciplinary Perspective. Environ. Sci. Technol. Lett. 8, 839–852. https://doi.org/10.1021/acs.estlett.1c00648
  • 26. 26 SUPPLEMENTAL INFORMATION An Actionable Annotation Scoring Framework For GC-HRMS Jeremy P. Koelmel1, Hongyu Xie 2, Elliott J. Price3, Elizabeth Z. Lin1, Katherine E. Manz4, Paul Stelben1, Mathew Paige1, Stefano Papazian2, Joseph Okeme1, Dean P. Jones5, Dinesh Barupal5, John Bowden11,12, Pawel Rostkowski6, Kurt D. Pennell4, Vladimir Nikiforov7, Thanh Wang8, Xin Hu10, Yunjia Lai9, Gary W. Miller9, Douglas I. Walker5*, Jonathan W. Martin2, and Krystal J. Godri Pollitt1* 1 Department of Environmental Health Science, Yale School of Public Health, New Haven, CT, 06520, United States 2 Department of Environmental Science, Science for Life Laboratory, Stockholm University, Stockholm 10691, Sweden 3 RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic 4 School of Engineering, Brown University, Providence, RI 02912, United States 5 Department of Medicine, School of Medicine, Emory University, Atlanta 30322, United States 6 Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, United States 7 NILU- Norwegian Institute for Air Research, 2007 Kjeller, Norway 8 NILU- Norwegian Institute for Air Research, Framsenteret, 9007 Tromsø, Norway 9 MTM Research Centre, Örebro University, SE-701 82, Örebro, Sweden 10 Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY 10032, United States 11 Center for Environmental and Human Toxicology & Department of Physiological Sciences, University of Florida, Gainesville, FL, 32603, United States 12 Department of Chemistry, University of Florida, Gainesville, FL, 32603 United States *Corresponding author Krystal J. Godri Pollitt - krystal.pollitt@yale.edu Douglas I. Walker - douglas.walker@mssm.edu
  • 27. 27 S1. LEVERAGING EVIDENCE FROM GC-HRMS TO REDUCE FALSE POSITIVES AND NEGATIVES S1.1. Feature filtering to remove artifacts and signal from background After acquiring data from blanks which are both field and extraction blanks, algorithms can be implemented to establish a compound specific blank threshold. If a certain percentile of samples does not fall above this blank threshold, the feature can be removed. For example, one method, blank feature filtering has been found to outperform most other filtering methods (Koelmel et al., 2021; Patterson et al., 2017). Using this method, the percentile or average of samples must be greater than the blank feature filter threshold (BFFthreshold) calculated as: BFFthreshold = c x (Baverage+3×Bstandard deviation) Where c is a constant (usually ranging between 2 and 10) and B is the blank signal. A complimentary method to remove blank signatures is to look for signals which do not behave like well retained compounds (Borgsmüller et al., 2019; Risum and Bro, 2019). Many background signals (e.g., column bleed) will not form sharp peaks and/or will be highly variable in quality control samples injected across the instrument run. Consistency of deconvoluted spectra with respect to differing intensity in samples across batch in quality control samples or pools can therefore be used to remove background features (when many QCs are acquired). Furthermore, quality control calibration curves can be used to remove non-linear peaks, but this requires that the measured compounds are generally within the linear range in the quality controls (Matsuo et al., 2017). In this case non-linear peaks at higher concentrations of the quality controls likely indicate that the source of the chemical is not from the sample, and if it is, that relative comparisons between samples is challenging. All these methods which do not implement a blank sample will not remove a well-behaved chromatographic peak which is a contaminant, and hence blank feature filtering is recommended as well if these techniques are used. Other methods can be used to generate a unique signature for compounds, therefore, rather than determine and remove blank signals, the analytes can be determined uniquely and retained. One such method is credentialing using isotopic labeling (Mahieu et al., 2014). This method incorporates isotopic labeling into cell cultures, and screens for the characteristic isotopic signals to only retain those metabolites which are generated in the experiment, effectively removing nearly all background. This method is specific to in vitro studies, and hence to only a subset of internal exposome studies. S1.2. Alternative EI spectral libraries and ionization methods to improve coverage For the discovery of unknown unknowns where no libraries or standards exist, the in-silico prediction of EI spectra (Allen et al., 2016; Spackman et al., 2018; Wang et al., 2020; Wei et al., 2019) is more straight forward than collision-induced dissociation (CID) (Koopman and Grimme, 2021) and higher energy C-trap dissociation (HCD) spectra used for LC- HRMS analysis. Although, it is important to recognize that false positive rates vary across all in silico approaches (Schreckenbach et al., 2021). In addition to predicted spectral matching, EI spectra can be used for molecular networks and substructure characterization (e.g., (Elie et al., 2019; Hummel et al., 2010; Lai et al., 2018; Stein,
  • 28. 28 1995)). The high degree of information contained in EI fragmentation reduces the number of similar spectral matches and allows for detailed substructural analysis (e.g., chemical similarity networks). Alternative ionization strategies can be used with GC-HRMS to improve detection of the molecular ion, although these often lack the robustness, sensitivity, and reproducibility of EI. These techniques include positive chemical ionization (PCI), electron capture negative ionization (ECNI), and atmospheric pressure chemical ionization (APCI). Unlike EI, the ionization efficiency for PCI, ECNI, and APCI is dependent on chemical structure. While ECNI is the least universal ionization technique (in terms of ionization efficiencies across chemical classes), this approach can be highly selective and provide high sensitivity for ions which are formed favorably by electron capture (e.g., halide containing compounds such as chlorinated and brominated byproducts). PCI can be adapted for various chemical analytes because reagent gases are ionized and used to transfer positive charge to the analytes. Whereas methane gas is usually used, other reagent gases can be selected to increase sensitivity for specific classes of molecules (Little and Howard, 2013). APCI can provide higher sensitivity for certain molecules, for example, negative mode APCI, like ECNI, can reveal halogenated compounds (such as halogenated paraffins) that standard GC-EI-MS misses. While these methods can be challenging to implement in standardized nontargeted workflows, they provide an important alternative ionization method for detection molecular ions and predicting potential formulas in the identification of unknowns. S2. METHODS: ACQUISITION OF GC-HRMS DATASETS S2.1. Acquisition of the human serum validation dataset The GC-HRMS dataset of human serum samples was obtained by spiking 112 standards (90 native and 22 labeled chemicals) to commercial pooled human serum (Type AB, Sigma-Aldrich, Germany), and 500 µL aliquots were extracted after removal of proteins and lipids at Stockholm University. After spiking with the standard mixture (list in Table S1A), the extracts were analyzed by GC-HRMS (Q Exactive GC Orbitrap, Thermo Scientific) using electron ionization (EI) in full scan mode (44–750 m/z, 60,000 resolution FWHM at 200 m/z). The injection volume was 1 µL for each run. A capillary DB5 column (30 m × 0.25 mm × 0.25 µm film thickness) was used with the temperature program: 50 °C held for 1 min, ramped 10 °C min−1 to 315 °C and held for 8 min. S2.2. Acquisition of the outdoor and indoor air validation dataset Passive air samplers (Fresh Air Clips) were deployed inside and outside homes proximal to a natural gas compressor station in Jefferson County, Ohio. Methods have been previously described (Martin et al., 2021). Briefly, airborne containments were collected using polydimethylsiloxane (PDMS) sorbent bars that were housed in a polytetrafluoroethylene (PTFE) chamber. This assembly was mounted in a magnetic clip under a weather cover during the sampling period. PDMS sorbent bars were cleaned at the Yale School of Public Health and shipped together with other materials to the study site. Four cleaned PDMS sorbent bars were placed into the PTFE chamber and deployed at indoor and outdoor locations at the home of study participants. At the end of the sampling period, passive air samples were collected, shipped back to the Yale School of
  • 29. 29 Public Health using cold chain transport, and stored at -80 °C until analysis. Samples were spiked with a mixture of labelled standards and extracted by thermal desorption (Gerstel TDU) and analyzed by GC-HRMS (Q Exactive GC Orbitrap, Thermo Scientific) using electron ionization (EI) in full scan mode (53–800 m/z, 40,000 resolution FWHM at 200 m/z). A capillary TG-5SILMS column (30 m × 0.25 mm × 0.25 µm film thickness) was used with the temperature program: 70 °C held for 1 min, ramped 7 °C min−1 to 300 °C and held for 4 min. A total of 81 chemicals were assessed (Table S1B).
  • 30. 30 Table S1A. List of native and isotopic labeled standards spiked into human serum samples. Acronyms: brominated flame retardants (BFR); chlorinated flame retardants (CFR); organochlorine pesticides (OCP); polycyclic aromatic hydrocarbons (PAH); oxygenated polycyclic aromatic hydrocarbons (oxy-PAH); nitrated polycyclic aromatic hydrocarbons (nitro-PAH); polychlorinated biphenyls (PCB); polychlorinated dibenzodioxins (PcDD); polychlorinated dibenzofurans (PcDF). Chemical Class Detected Metabolite CAS-RN Molecular Formula Concentration (ppb) BFR 2,4,4´-Tribromodiphenyl ether (BDE-28) 41318-75-6 C12H7Br3O 16 BFR 2,2´,4,4´-Tetrabromodiphenyl ether (BDE-47) 5436-43-1 C12H6Br4O 16 BFR 2,3´,4´,4-Tetrabromodiphenyl ether (BDE-66) 189084-61-5 C12H6Br4O 16 BFR 2,2´,3,4,4´-Pentabromodiphenyl ether (BDE-85) 182346-21-0 C12H5Br5O 16 BFR 2,2´,4,4´,5-Pentabromodiphenyl ether (BDE-99) 60348-60-9 C12H5Br5O 16 BFR 2,2´,4,4´,6-Pentabromodiphenyl ether (BDE-100) 189084-64-8 C12H5Br5O 16 BFR 2,2´,4,4´,5,5´-Hexabromodiphenyl ether (BDE-153) 68631-49-2 C12H4Br6O 16 BFR 2,2´,4,4´,5,6´-Hexabromodiphenyl ether (BDE-154) 207122-15-4 C12H4Br6O 16 CFR Dechlorane 603 13560-92-4 C17H8Cl12 16 OCP Pentachlorobenzene (PeCB) 608-93-5 C6HCl5 16 OCP alpha-Hexachlorocyclohexane (α-HCH) 319-84-6 C6H6Cl6 16 OCP Hexachlorobenzene (HCB) 118-74-1 C6Cl6 16 OCP beta-Hexachlorocyclohexane (β-HCH) 319-85-7 C6H6Cl6 16 OCP gamma-Hexachlorocyclohexane (γ-HCH) 58-89-9 C6H6Cl6 16 OCP delta-Hexachlorocyclohexane (δ-HCH) 608-73-1 C6H6Cl6 16 OCP Heptachlor 76-44-8 C10H5Cl7 16 OCP Aldrin 309-00-2 C12H8Cl6 16 OCP Heptachlor epoxide B 1024-57-3 C10H5Cl7O 16 OCP Heptachlor Epoxide A 28044-83-9 C10H5Cl7O 16 OCP β-chlordane(trans-Chlordane) 5103-74-2 C10H6Cl8 16 OCP O, P’-DDE 3424-82-6 C14H8Cl4 16 OCP α-chlordane(cis-Chlordane) 5103-71-9 C10H6Cl8 16 OCP trans-Nonachlor 39765-80-5 C10H5Cl9 16 OCP p,p’-DDE 72-55-9 C14H8Cl4 16 OCP Dieldrin 60-57-1 C12H8OCl6 16 OCP O, P’-DDD 53-19-0 C14H10Cl4 16 OCP Endrin 72-20-8 C12H8Ocl6 16 OCP p,p’-DDD 72-54-8 C14H10Cl4 16 OCP cis-Nonachlor 5103-73-1 C10H5Cl9 16 OCP o,p’-DDT 789-02-6 C14H9Cl5 16 OCP p,p’-DDT 50-29-3 C14H9Cl5 16 Oxy-PAH Phenalen-1-one 548-39-0 C13H8O 16 Oxy-PAH Anthracene-9,10-dione 483-35-2 C14H8O2 16 Oxy-PAH 4H-Cyclopenta[def]phenanthrene-4-one 5737-13-3 C15H8O 16 Oxy-PAH 2-Methylanthracene-9,10-dione 84-54-8 C15H10O2 16 Oxy-PAH 9H-Fluoren-9-one 486-25-9 C13H8O 16 Nitro-PAH 1-Nitro naphthalene 86-57-7 C10H7NO2 18 PCB 2-Chlorobiphenyl (PCB 1) 2051-60-7 C12H9Cl 16 PCB 4-Chlorobiphenyl (PCB 3) 2051-62-9 C12H9Cl 16 PCB 2,2’-Dichlorobiphenyl (PCB 4) 13029-08-8 C12H8Cl2 16 PCB 2,2’,6-Trichlorobiphenyl (PCB19) 38444-73-4 C12H7Cl3 16 PCB 4,4’-Dichlorobiphenyl (PCB 15) 2050-68-2 C12H8Cl2 16 PCB 2,2’,6,6’-Tetrachlorobiphenyl (PCB54 15968-05-5 C12H6Cl4 16 PCB 2,2’,4,6,6’-Pentachlorobiphenyl (PCB 104) 56558-16-8 C12H5Cl5 16 PCB 3,4,4’-Trichlorobiphenyl (PCB37) 38444-90-5 C12H7Cl3 16 PCB 2,2’,4,4’,6,6’-Hexachlorobiphenyl (PCB 155 33979-03-2 C12H4Cl6 16 PCB 3,3’,4,4’-Tetrachlorobiphenyl (PCB 77 32598-13-3 C12H8Cl4 16 PCB 3,4,4’,5-Tetrachlorobiphenyl (PCB 81) 70362-50-4 C12H6Cl4 16 PCB 2’,3,4,4’,5-Pentachlorobiphenyl (PCB 123) 65510-44-3 C12H5Cl5 16 PCB 2,3’,4,4’,5-Pentachlorobiphenyl (PCB 118) 31508-00-6 C12H5Cl5 16 PCB 2,3,4,4’,5-Pentachlorobiphenyl (PCB 114) 74472-37-0 C12H5Cl5 16
  • 31. 31 PCB 2,2’,3,4’,5,6,6’-Heptachlorobiphenyl (PCB 188 74487-85-7 C12H3Cl7 16 PCB 2,2’,3,4,4’,5’ -Hexachlorobiphenyl (PCB 138) 52712-04-6 C12H4Cl6 16 PCB 2,3,3’,4,4’-Pentachlorobiphenyl (PCB 105) 32598-14-4 C12H5Cl5 16 PCB 3,3’,4,4’,5-Pentachlorobiphenyl (PCB 126) 57465-28-8 C12H5Cl5 16 PCB 2,3’,4,4’,5,5’-Hexachlorobiphenyl (PCB 167) 52663-72-6 C12H4Cl6 16 PCB 2,2’,3,3’,5,5’,6,6’-Octachlorobiphenyl (PCB 202) 2136-99-4 C12H2Cl8 16 PCB 2,3,3’,4,4’,5-Hexachlorobiphenyl (PCB 156) 38380-08-4 C12H4Cl6 16 PCB 2,3,3’,4,4’,5’-Hexachlorobiphenyl (PCB 157) 69782-90-7 C12H4Cl6 16 PCB 3,3’,4,4’,5,5’-Hexachlorobiphenyl (PCB 169) 32774-16-6 C12H4Cl6 16 PCB 2,3,3’,4,4’,5,5’-Heptachlorobiphenyl (PCB 189) 39635-31-9 C12H3Cl7 16 PCB 2,2’,3,3’,4,5,5’,6,6’-Nonachlorobiphenyl (PCB 208 52663-77-1 C12HCl9 16 PCB 2,3,3’,4,4’,5,5’,6-Octachlorobiphenyl (PCB 205 74472-53-0 C12H2Cl8 16 PCB 2,2’,3,3’,4,4’,5,5’,6-Nonachlorobiphenyl (PCB 206) 40186-72-9 C12HCl9 16 PCB 2,2’,3,3’,4,4’,5,5’,6,6’-Decachlorobiphenyl (PCB 209) 2051-24-3 C12Cl10 16 PcDD 2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) 1746-01-6 C12H4Cl4O2 6.7 PcDD 1,2,3,7,8-Pentachlorodibenzo-p-dioxin (PnCDD/PeCDD) 40321-76-4 C12H3Cl5O2 16.8 PcDD 1,2,3,6,7,8-Hexachlorodibenzo-p-dioxin (HxCDD) 57653-85-7 C12H2Cl6O2 16.8 PcDD 1,2,3,4,6,7,8-Heptachlorodibenzo-p-dioxin (HpCDD) 35822-46-9 C12HCl7O2 16.8 PcDD 1,2,3,4,6,7,8,9-Octachlorodibenzo-p-dioxin (OCDD) 3268-87-9 C12Cl8O2 33.6 PcDF 2,3,7,8-Tetrachlorodibenzofuran (TCDF) 51207-31-9 C12H4Cl4O 6.7 PcDF 1,2,3,7,8-Pentachlorodibenzofuran (PnCDF/PeCDF) 57117-41-6 C12H3Cl5O 16.8 PcDF 1,2,3,6,7,8-Hexachlorodibenzofuran (HxCDF) 57117-44-9 C12H2Cl6O 16.8 PcDF 1,2,3,4,6,7,8-Heptachlorodibenzofuran (HpCDF) 67562-39-4 C12HCl7O 16.8 PcDF 1,2,3,4,6,7,8,9-Octachlorodibenzofuran (OCDF) 39001-02-0 C12Cl8O 33.6 Phthalate Dimethyl phthalate 131-11-3 C10H10O4 28 Phthalate Diethyl phthalate 84-66-2 C12H14O4 28 Phthalate Diisobutyl phthalate 84-69-5 C16H22O4 28 Phthalate Di-n-butyl phthalate 84-74-2 C16H22O4 28 Phthalate Diisohexyl phthalate 84-63-9 C20H30O4 28 Phthalate Diethoxyethyl phthalate 605-54-9 C16H22O6 28 Phthalate Dipentyl phthalate 131-18-0 C18H26O4 28 Phthalate Dihexyl phthalate 84-75-3 C20H30O4 28 Phthalate Benzyl butyl phthalate 85-68-7 C19H20O4 28 Phthalate Di(2-butoxyethyl) phthalate 117-83-9 C20H30O6 28 Phthalate Dicyclohexyl phthalate 84-61-7 C20H26O4 28 Phthalate Di-2-ethylhexyl phthalate 117-81-7 C24H38O4 28 Phthalate Diphenyl phthalate 84-62-8 C20H14O4 28 Phthalate Di-n-octyl phthalate 117-84-0 C24H38O4 28 Phthalate Dinonyl phthalate 84-76-4 C26H42O4 28 Labeled PAH Acenaphthene-d10 -- C12D10 28 Labeled PAH Fluorene-d10 -- C13D10 28 Labeled PAH Phenanthrene-d10 -- C14D10 28 Labeled PAH Anthracene-d10 -- C14D10 28 Labeled PAH Fluoranthene-d10 -- C16D10 28 Labeled PAH Pyrene-d10 -- C16D10 28 Labeled PAH Benz[a]anthracene-d12 -- C18D12 28 Labeled PAH Chrysene-d12 -- C18D12 28 Labeled PAH Benzo[b]fluoranthene-d12 -- C20D12 28 Labeled PAH Benzo[k]fluoranthene-d12 -- C20D12 28 Labeled PAH Benzo[a]pyrene-d12 -- C20D12 28 Labeled PAH Indeno[1,2,3-cd]pyrene-d12 -- C22D12 28 Labeled PAH Dibenzo[a,h]anthracene-d14 -- C22D14 28 Labeled PAH Benzo[g,h,i]perylene-d12 -- C22D12 28 Labeled PAH Naphthalene-d8 -- C10D8 28 Labeled PAH Acenaphthylene-d8 -- C12D8 28 Labeled PCB 2,4,4’-Trichlorobiphenyl-13C12 (PCB 28) -- C12H7Cl3 10 Labeled PCB 2,2’,5,5’-Tetrachlorobiphenyl-13C12 (PCB 52) -- 13C12H6Cl4 10 Labeled PCB 2,2’,4,5,5’-Pentachlorobiphenyl-13C12 (PCB 101) -- 13C12H5Cl5 10
  • 32. 32 Labeled PCB 2,2’,3,4,4’,5’ -Hexachlorobiphenyl-13C12 (PCB 138) -- 13C12H4Cl6 10 Labeled PCB 2,2’,4,4’,5,5’-Hexachlorobiphenyl-13C12 (PCB 153) -- 13C12H4Cl6 10 Labeled PCB 2,2’,3,4,4’,5,5’-Heptachlorobiphenyl-13C12 (PCB 180) -- 13C12H3Cl7 10
  • 33. 33 Table S1B. List of native and isotopic labeled standards used for targeted analysis (Level 1 identifications) in air samples. Acronyms: Brominated flame retardants (BFR); hydrocarbon (HC); organochlorine pesticides (OCP); organophosphates (OPE); polycyclic aromatic hydrocarbons (PAH); polychlorinated biphenyls (PCB); VOC: volatile organic compounds (VOC). Chemical Class Detected Metabolite CAS-RN Molecular Formula Alkaloid Nicotine 54-11-5 C10H14N2 Benzodioxole Fludioxonil 131341-86-1 C12H6F2N2O2 Benzopyran Delta-9-tetrahydrocannabinol 1972-08-3 C21H30O2 BFR 2,4,4'-Tribromodiphenyl ether (BDE-28) 41318-75-6 C12H7Br3O BFR 2,2',4,4'-Tetrabromodiphenyl ether (BDE-66) 5436-43-1 C12H6Br4O BFR 2,2',4,4',5-Pentabromodiphenyl ether (BDE-99) 60348-60-9 C12H5Br5O BFR 2,2',4,4',6-Pentabromodiphenyl ether (BDE-100) 189084-64-8 C12H5Br5O BFR 2,2',4,4',5,5'-Hexabromodiphenyl ether (BDE-153) 68631-49-2 C12H4Br6O BFR 2,2',4,4',5,6'-Hexabromodiphenyl ether (BDE-154) 207122-15-4 C12H4Br6O BFR 2-Ethylhexyl 2,3,4,5-tetrabromobenzoate 183658-27-7 C15H18Br4O2 Chlorinated HC 1,2,4-Trichlorobenzene 120-82-1 C6H3Cl3 Chlorinated HC Hexachlorobutadiene (HCBD) 87-68-3 C4Cl6 Chlorinated HC Hexachlorocyclopentadiene 77-47-4 C5Cl6 Chlorinated HC Hexachloroethane 67-72-1 C2Cl6 Haloethers Bis(2-chloro-1-methylethyl) ether (BCEE) 108-60-1 C6H12Cl2O Haloethers Bis(2-chloroethyl)ether 111-44-4 C4H8Cl2O Haloethers Carfentrazone-ethyl 128639-02-1 C15H14Cl2F3N3O3 Nitroaromatic 2,4-Dinitrotoluene 121-14-2 C7H6N2O4 Nitroaromatic 2,6-Dinitrotoluene 606-20-2 C7H6N2O4 Nitroaromatic 4-Chloroaniline 106-47-8 C6H6ClN Nitroaromatic 4-Nitroaniline 100-01-6 C6H6N2O2 Nitroaromatic Isophorone 78-59-1 C9H14O Nitrosoamine Nitrobenzene 98-95-3 C6H5NO2 Nitrosoamine N-Nitrosodi-n-propylamine 621-64-7 C6H14N2O Nitrosoamine N-Nitrosodiphenylamine 86-30-6 C12H10N2O OCP alpha-Hexachlorocyclohexane (α-HCH) 319-84-6 C6H6Cl6 OCP Chlorothalonil 1897-45-6 C8Cl4N2 OCP Dieldrin 60-57-1 C12H8Cl6O OCP Endosulfan I 959-98-8 C9H6Cl6O3S OCP Endrin 72-20-8 C12H8Cl6O OCP gamma-Hexachlorocyclohexane (γ-HCH) 58-89-9 C6H6Cl6 OCP Hexachlorobenzene 118-74-1 C6Cl6 OCP Methoxychlor 72-43-5 C16H15Cl3O2 OCP p,p'-DDD 72-54-8 C14H10Cl4 OCP p,p'-DDT 50-29-3 C14H9Cl5 OCP Tetrachloro-m-xylene 877-09-8 C8H6Cl4 OPE Triphenyl phosphate (TPHP) 115-86-6 C18H15O4P OPE Tris(1-chloro-2-propyl) phosphate (TCPP) 13674-84-5 C9H18Cl3O4P PAH 1-Bromo-4-phenoxybenzene 101-55-3 C12H9BrO PAH 1-Chloro-4-phenoxybenzene 7005-72-3 C12H9ClO PAH 2-Chloronaphthalene 91-58-7 C10H7Cl PAH 2-Methylnaphthalene 91-57-6 C11H10 PAH Acenaphthene 83-32-9 C12H10 PAH Acenaphthylene 208-96-8 C12H8 PAH Anthracene 120-12-7 C14H10 PAH Benz[a]anthracene 56-55-3 C18H12 PAH Benzo[a]pyrene 50-32-8 C20H12 PAH Benzo[b]fluoranthene 205-99-2 C20H12 PAH Benzo[ghi]perylene 191-24-2 C22H12 PAH Benzo[k]fluoranthene 207-08-9 C20H12 PAH Chrysene 218-01-9 C18H12 PAH Dibenz[a,h]anthracene 53-70-3 C22H14 PAH Dibenzofuran 132-64-9 C12H8O
  • 34. 34 PAH Fluoranthene 206-44-0 C16H10 PAH Fluorene 86-73-7 C13H10 PAH Indeno[1,2,3-cd]pyrene 193-39-5 C22H12 PAH Naphthalene 91-20-3 C10H8 PAH Phenanthrene 85-01-8 C14H10 PAH Pyrene 129-00-0 C16H10 PCB 2,2',3,3',4,4',5,5',6,6'-Decachlorobiphenyl (PCB 209) 2051-24-3 C12Cl10 Phthalate Bis(2-ethylhexyl) phthalate 117-81-7 C24H38O4 Phthalate Butylbenzyl phthalate 85-68-7 C19H20O4 Phthalate Diethyl phthalate 84-66-2 C12H14O4 Phthalate Dimethyl phthalate 131-11-3 C10H10O4 Phthalate Di-n-butyl phthalate 84-74-2 C16H22O4 Phthalate Di-n-octyl phthalate 117-84-0 C24H38O4 Pyrethroid Piperonyl butoxide 51-03-6 C19H30O5 VOC 1,2-dichlorobenzene 95-50-1 C6H4Cl2 VOC 1,3-dichlorobenzene 541-73-1 C6H4Cl2 VOC 1,4-dichlorobenzene 106-46-7 C6H4Cl2
  • 35. 35 Figure S1: Molecular ions can be used to reduce the false positive rate (albeit increasing the false negative rate) as shown using a GC-HRMS dataset of air samples. A. Filtering by retention index, reverse search index, Reverse high-resolution mass filter criteria B. Additional requirement for molecular ion