SlideShare a Scribd company logo
1 of 80
Cheminformatics Support for Mass
Spectrometry Supporting
Exposomics at the US-EPA
September 2023: Metabolomics Association of North America
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
Antony Williams
Center for Computational Toxicology and Exposure, US-EPA, RTP, NC
The role of cheminformatics at EPA
• Our branch is in the Center for Computational
Toxicology and Exposure (CCTE)
• We develop curated chemistry data streams to
support our applications and models
• We develop prediction models, web-based
applications and data streams to support others
• Today’s presentation: how do our efforts
support Exposomics and especially NTA efforts
– What’s public and what’s in development?
1
Why Does EPA Need Measurement Data?
2
• Measurement data needed to ensure chemical
safety
• Characterize risk
• Regulate use & disposal
• Manage human & ecological exposures
• Ensure compliance under federal statutes
Chemical Monitoring Needs
Exposure
Assessment
Dose-
Response
Assessment
Risk
Characterization
Hazard
Identification
Challenges
• High-quality monitoring data are unavailable for most chemicals
• Measurement data normally generated using “targeted” methods
• Targeted analytical methods:
- Require a priori knowledge of chemicals of interest
- Produce data for few selected analytes (10s-100s)
- Standards for method development & compound quantitation
- Are blind to emerging contaminants
- Can’t keep pace with needs of 21st century risk characterizations
• Data gaps being filled with exposure models and “NTA” methods
3
Relevant Questions of NTA Studies?
• Which chemicals are where?
• Do we see any “new” chemicals?
• Do observed co-occurrences highlight:
– Important exposure sources?
– Stressor-response relationships?
– What is the concentration of each chemical?
– Do estimated concentrations suggest unacceptable risk?
• How does cheminformatics support this effort?
4
Everything is underpinned by the
DSSTox Database
5
• >1.2M substances
• Highly curated data
• Mapped relationships
• The data are made
available via the
Dashboard…
Accessing DSSTox chemistry:
CompTox Chemicals Dashboard
• A publicly accessible website delivering:
– 1.2M chemicals with related property data
– Experimental/predicted physicochemical property data
– Experimental Human and Ecological hazard data
– Integration to “biological assay data” (ToxCast/Tox21)
– Information regarding chemicals in consumer products
– Links to other agency websites and public data resources
– Related substances: transformation products, metabolites
– “Batch searching” for tens to thousands of chemicals
6
CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
7
Detailed Chemical Pages
• Chemical page: intrinsic properties, structural identifiers,
linked substances, navigation to Hazard and Exposure data
9
SEARCH
TOX DATA
BIOACTIVITY
SIMILARITY
READ-ACROSS
PUBMED
BATCH SEARCH
CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
Batch Searching is a big enabler
https://pubs.acs.org/doi/10.1021/acs.jcim.0c01273
10
Batch Searching
• Singleton searches are useful but we work
with thousands of masses and formulae!
• Typical questions
– What is the list of chemicals for the formula CxHyOz
– What is the list of chemicals for a mass +/- error
– Can I get chemical lists in Excel files? In SDF files?
– Can I include properties in the download file?
11
Batch Search
Batch Search – Excel, CSV, SDF file
Batch Search
Chemical Lists
• Chemical lists are focused on regulations,
specific research efforts and categories
• 425 lists and growing
– TSCA Inventory
– Clean Water Act Hazardous Substances
– Consumer Products database
– Chemicals of Emerging Concern
– PFAS lists
– Extractables and Leachables
– …lists are versioned and updated and new lists added
15
The ELSIE Database
https://comptox.epa.gov/dashboard/chemical-lists/ELSIE
16
Extractables
17
Tire Crumb Rubber (298)
18
Hydraulic Fracturing (1640)
19
Disinfection By-Products
20
PFAS lists of Chemicals
21
Consumer Products Database
22
Benefits of bringing it all together
• The true dashboard benefit is integration
• Rank potential candidates for toxicity using
available data – hazard, exposure, in vitro
23
Supporting Exposomics Research
• DSSTox database substances map to
– Their structures (mass/formulae/InChIs etc)
– Hazard data : human, mammalian and ecotox
– Exposure data: products in commerce, categories and
functional use, measured concentrations, etc.
• There are many types of metadata that can
be used for candidate ranking (old approach)
24
Data Source Ranking of
“known unknowns”
25
• A mass and/or formula search is
for an unknown chemical but it
is a known chemical contained
within a reference database
• Most likely candidate chemicals
have the most associated data
sources, most associated
literature articles or both
C14H22N2O3
266.16304
Chemical
Reference
Database
Sorted candidate
structures
Data Streams for Ranking
• Dashboard Data Sources
• PubChem Data Source Count
• PubMed Reference Count
• Toxcast in vitro bioactivity
• Presence in Consumer Products database
• Predicted physicochemical Properties
BIG databases are GREAT!
P
u
b
C
h
e
m
C
A
S
R
e
g
i
s
t
r
y
C
h
e
m
S
p
i
d
e
r
E
P
A
D
S
S
T
o
x
B
l
o
o
d
E
x
p
o
s
o
m
e
1 0 4
1 0 5
1 0 6
1 0 7
1 0 8
1 0 9
C
h
e
m
ic
a
l
S
u
b
s
ta
n
c
e
s
• Thanks to all of the public database efforts
• So much benefit from what’s been done
• There are hundreds of them at this point…
Is a bigger database better?
28
• ChemSpider was 26 million chemicals for
the original work
• Much BIGGER today
• Is bigger better??
• Are there other metadata to use for ranking?
Comparing Search Performance
29
• When dashboard contained 720k chemicals
• Only 3% of ChemSpider size
• What was the comparison in performance?
Identifying “Known Unknowns”
Bigger is not necessarily better
30
How did performance compare?
31
For the same 162 chemicals,
Dashboard outperforms
ChemSpider for both Mass and
Formula Ranking
Identifying “Known Unknowns”
Bigger is not necessarily better
32
Data Quality is important
• Data quality in free web-based databases!
33
Vomitoxin
34
Vomitoxin - ChemSpider
• 19 “Vomitoxins” – 3 isotopically labeled
35
PubChem – “virtual chemistry”
• Other databases grow quickly…a lot of “virtual
chemistry” and “make on demand” compounds.
• Efforts such as the BloodExposome and
PubChemLite are critical to focus efforts
36
Applications at the EPA
• We have ongoing efforts applying
NTA to multiple challenges including
– PFAS identification
– Pesticides in various matrices
– CECs in water
– Biosolids
• Examples include…
37
Example 1: Consumer Product Analysis
38
Example 1: Consumer Product Analysis
39
Many chemicals
observed in
consumer product
extracts
More observed
chemicals not
known to be in
consumer products
Why might the
‘other’ chemicals be
in the products?
Many observed
chemicals known to
be in consumer
products
Example 2: Recycled Product Analysis
40
Example 2: Recycled Product Analysis
41
Significant differences
between chemicals in
recycled vs. virgin products
for certain product & use
categories
Most differences observed in
paper products and
construction materials
Some uses (e.g., fragrances)
highly represented across all
product/use categories
Example 3: Placental Tissue Analysis
42
Supporting Exposomics Research
• DSSTox database substances map to
– Their structures (mass/formulae/InChIs etc)
– Hazard data : human, mammalian and ecotox
– Exposure data: products in commerce, categories
and functional use, measured concentrations, etc.
• Structures have to be standardized…
44
“MS-Ready Chemicals”
• MS-Ready chemical standardization is ESSENTIAL to our
support of Non-Targeted Analysis
• It links chemicals across the Dashboard and facilitates
detection linking back to products in commerce
45
https://jcheminf.biomedcentral.com/articles/
10.1186/s13321-018-0299-2
Predicted Mass Spectra
http://cfmid.wishartlab.com/
• MS/MS spectra prediction for ESI+, ESI-, and EI
• Predictions generated for MS-Ready structures
• Use experimental vs predicted spectral searches
for candidate identification
46
Predicted Data Already Public
Publication and Data Files
47
https://epa.figshare.com/articles/CFM-ID_Paper_Data/7776212/1
We have proven the value…
48
CASMI
49
CASMI
50
Candidate Identification is only
PART of the process
• Whatever the approach for candidate
identification chemical hazard is important
• Hazard Comparison Profiling is important
https://www.epa.gov/chemical-research/cheminformatics
51
Related
Work-in-Progress
AMOS: Analytical Methods and
Spectra Database
• Three types of data in the database:
– Methods (regulatory, lab manuals and SOPs, publications,
tech notes)
– Spectra (from public domain and our own laboratories)
– Fact Sheets (harvested from SWGDRUG and other sites)
• Some methods have associated spectra
• Some data are just externally linked
• Currently contains around 200,000 spectra,
700,000 external links, 3000 “Fact Sheets”
and ~4000 methods
• ALL data are growing in number
53
Embedded Method PDFs
54
Literature articles, SOPs, Protocols
55
Integrated spectrum library
56
Linking to actual spectra
57
Linking to actual spectra
58
• We are doing a lot of chemical curation as
we build the database
59
Why not just Regulatory Methods?
Why not just Regulatory Methods?
Because we need methods faster
60
Rules need optimizing for
MS-Ready standardization
• We can now add/tweak the rules…add new
rules, edit existing rules
61
Example: Tautomer Rules
• We control rules for
– Tautomers
– Mesomers
– Neutralize/De-radicalize
– Break salts
– Standard checks
– etc….
• Necessary for mapping
chemicals in DSSTox
62
Related substance mappings
exist but are limited
63
Integration to Chemical Transformation
Database (ChET)
64
Chemical Transformation Simulator
https://qed.epa.gov/cts/
65
Manual Curation and Annotation
Analytical QC data for Tox21
• ~9000 chemicals with tens of thousands of
spectra (LCMS, GCMS & NMR)
• These data will feed prediction algorithms…
66
Amenability Prediction Algorithms
• New paper just submitted
67
VSSTox for UVCB Chemicals
68
“Markush Structures”
https://en.wikipedia.org/wiki/Markush_structure
69
UVCBs challenge in non-target analysis
70
Homologue screening plots from
Swiss Wastewater (Schymanski et al
2014, left) and Novi Sad (right)
o Complex mixtures (UVCBs) are a huge
and very challenging part of the
unknowns in many environmental
samples
Our cheminformatics work supports
the “NTA WebApp”
71
Our cheminformatics work supports
the “NTA WebApp”
72
Full presentation
https://t.ly/4MxFe
73
Data and Services
used by the
Community
74
NORMAN Suspect List Exchange
https://www.norman-network.com/?q=node/236
75
Our Data via services
https://api-ccte.epa.gov/docs/
76
An API Key is required
77
Conclusions
• Our data resources underpin our research
efforts – data quality and curation is key
• Our web-based applications deliver our data
to the community for multiple use cases
• Our support for Exposomics is multi-fold
– Curated chemistry data streams
– Experimental and predicted properties, toxicity, etc.
• The NTA WebApp in development will use
all of these data streams to support analysis
78
Acknowledgments
• DSSTox curation team
• CCTE IT team for software development, DevOPs
• Mass spectrometry scientists across EPA,
especially the NTA team
• Open Databases: PubChem, ChEMBL, Mona,
MassBank, GNPS, SWGDRUG, Cayman Chems.
• Instrument vendors – many have contributed
methods to the AMOS database
• …and thank you to you for your time 79
Contact Information
• Contact info: williams.antony@epa.gov
• Send methods for inclusion in AMOS
• We fully support Open Data so ask us for what
you need
• Slides at: https://www.slideshare.net/AntonyWilliams/
80

More Related Content

Similar to Cheminformatics Support for MS Supporting Exposomics

The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...Andrew McEachran
 

Similar to Cheminformatics Support for MS Supporting Exposomics (20)

TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
Success in decision making data relevance curation
Success in decision making data relevance curationSuccess in decision making data relevance curation
Success in decision making data relevance curation
 
US-EPA Chemicals Dashboard and Applications to Digital Design of Molecules
US-EPA Chemicals Dashboard and Applications to Digital Design  of MoleculesUS-EPA Chemicals Dashboard and Applications to Digital Design  of Molecules
US-EPA Chemicals Dashboard and Applications to Digital Design of Molecules
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
 
Delivering access to chemistry and bioassay data from the National Center for...
Delivering access to chemistry and bioassay data from the National Center for...Delivering access to chemistry and bioassay data from the National Center for...
Delivering access to chemistry and bioassay data from the National Center for...
 

Recently uploaded

Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 

Recently uploaded (20)

Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 

Cheminformatics Support for MS Supporting Exposomics

  • 1. Cheminformatics Support for Mass Spectrometry Supporting Exposomics at the US-EPA September 2023: Metabolomics Association of North America http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA Antony Williams Center for Computational Toxicology and Exposure, US-EPA, RTP, NC
  • 2. The role of cheminformatics at EPA • Our branch is in the Center for Computational Toxicology and Exposure (CCTE) • We develop curated chemistry data streams to support our applications and models • We develop prediction models, web-based applications and data streams to support others • Today’s presentation: how do our efforts support Exposomics and especially NTA efforts – What’s public and what’s in development? 1
  • 3. Why Does EPA Need Measurement Data? 2 • Measurement data needed to ensure chemical safety • Characterize risk • Regulate use & disposal • Manage human & ecological exposures • Ensure compliance under federal statutes Chemical Monitoring Needs Exposure Assessment Dose- Response Assessment Risk Characterization Hazard Identification
  • 4. Challenges • High-quality monitoring data are unavailable for most chemicals • Measurement data normally generated using “targeted” methods • Targeted analytical methods: - Require a priori knowledge of chemicals of interest - Produce data for few selected analytes (10s-100s) - Standards for method development & compound quantitation - Are blind to emerging contaminants - Can’t keep pace with needs of 21st century risk characterizations • Data gaps being filled with exposure models and “NTA” methods 3
  • 5. Relevant Questions of NTA Studies? • Which chemicals are where? • Do we see any “new” chemicals? • Do observed co-occurrences highlight: – Important exposure sources? – Stressor-response relationships? – What is the concentration of each chemical? – Do estimated concentrations suggest unacceptable risk? • How does cheminformatics support this effort? 4
  • 6. Everything is underpinned by the DSSTox Database 5 • >1.2M substances • Highly curated data • Mapped relationships • The data are made available via the Dashboard…
  • 7. Accessing DSSTox chemistry: CompTox Chemicals Dashboard • A publicly accessible website delivering: – 1.2M chemicals with related property data – Experimental/predicted physicochemical property data – Experimental Human and Ecological hazard data – Integration to “biological assay data” (ToxCast/Tox21) – Information regarding chemicals in consumer products – Links to other agency websites and public data resources – Related substances: transformation products, metabolites – “Batch searching” for tens to thousands of chemicals 6
  • 9. Detailed Chemical Pages • Chemical page: intrinsic properties, structural identifiers, linked substances, navigation to Hazard and Exposure data
  • 10. 9 SEARCH TOX DATA BIOACTIVITY SIMILARITY READ-ACROSS PUBMED BATCH SEARCH CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard
  • 11. Batch Searching is a big enabler https://pubs.acs.org/doi/10.1021/acs.jcim.0c01273 10
  • 12. Batch Searching • Singleton searches are useful but we work with thousands of masses and formulae! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 11
  • 14. Batch Search – Excel, CSV, SDF file
  • 16. Chemical Lists • Chemical lists are focused on regulations, specific research efforts and categories • 425 lists and growing – TSCA Inventory – Clean Water Act Hazardous Substances – Consumer Products database – Chemicals of Emerging Concern – PFAS lists – Extractables and Leachables – …lists are versioned and updated and new lists added 15
  • 19. Tire Crumb Rubber (298) 18
  • 22. PFAS lists of Chemicals 21
  • 24. Benefits of bringing it all together • The true dashboard benefit is integration • Rank potential candidates for toxicity using available data – hazard, exposure, in vitro 23
  • 25. Supporting Exposomics Research • DSSTox database substances map to – Their structures (mass/formulae/InChIs etc) – Hazard data : human, mammalian and ecotox – Exposure data: products in commerce, categories and functional use, measured concentrations, etc. • There are many types of metadata that can be used for candidate ranking (old approach) 24
  • 26. Data Source Ranking of “known unknowns” 25 • A mass and/or formula search is for an unknown chemical but it is a known chemical contained within a reference database • Most likely candidate chemicals have the most associated data sources, most associated literature articles or both C14H22N2O3 266.16304 Chemical Reference Database Sorted candidate structures
  • 27. Data Streams for Ranking • Dashboard Data Sources • PubChem Data Source Count • PubMed Reference Count • Toxcast in vitro bioactivity • Presence in Consumer Products database • Predicted physicochemical Properties
  • 28. BIG databases are GREAT! P u b C h e m C A S R e g i s t r y C h e m S p i d e r E P A D S S T o x B l o o d E x p o s o m e 1 0 4 1 0 5 1 0 6 1 0 7 1 0 8 1 0 9 C h e m ic a l S u b s ta n c e s • Thanks to all of the public database efforts • So much benefit from what’s been done • There are hundreds of them at this point…
  • 29. Is a bigger database better? 28 • ChemSpider was 26 million chemicals for the original work • Much BIGGER today • Is bigger better?? • Are there other metadata to use for ranking?
  • 30. Comparing Search Performance 29 • When dashboard contained 720k chemicals • Only 3% of ChemSpider size • What was the comparison in performance?
  • 31. Identifying “Known Unknowns” Bigger is not necessarily better 30
  • 32. How did performance compare? 31 For the same 162 chemicals, Dashboard outperforms ChemSpider for both Mass and Formula Ranking
  • 33. Identifying “Known Unknowns” Bigger is not necessarily better 32
  • 34. Data Quality is important • Data quality in free web-based databases! 33
  • 36. Vomitoxin - ChemSpider • 19 “Vomitoxins” – 3 isotopically labeled 35
  • 37. PubChem – “virtual chemistry” • Other databases grow quickly…a lot of “virtual chemistry” and “make on demand” compounds. • Efforts such as the BloodExposome and PubChemLite are critical to focus efforts 36
  • 38. Applications at the EPA • We have ongoing efforts applying NTA to multiple challenges including – PFAS identification – Pesticides in various matrices – CECs in water – Biosolids • Examples include… 37
  • 39. Example 1: Consumer Product Analysis 38
  • 40. Example 1: Consumer Product Analysis 39 Many chemicals observed in consumer product extracts More observed chemicals not known to be in consumer products Why might the ‘other’ chemicals be in the products? Many observed chemicals known to be in consumer products
  • 41. Example 2: Recycled Product Analysis 40
  • 42. Example 2: Recycled Product Analysis 41 Significant differences between chemicals in recycled vs. virgin products for certain product & use categories Most differences observed in paper products and construction materials Some uses (e.g., fragrances) highly represented across all product/use categories
  • 43. Example 3: Placental Tissue Analysis 42
  • 44. Supporting Exposomics Research • DSSTox database substances map to – Their structures (mass/formulae/InChIs etc) – Hazard data : human, mammalian and ecotox – Exposure data: products in commerce, categories and functional use, measured concentrations, etc. • Structures have to be standardized… 44
  • 45. “MS-Ready Chemicals” • MS-Ready chemical standardization is ESSENTIAL to our support of Non-Targeted Analysis • It links chemicals across the Dashboard and facilitates detection linking back to products in commerce 45 https://jcheminf.biomedcentral.com/articles/ 10.1186/s13321-018-0299-2
  • 46. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated for MS-Ready structures • Use experimental vs predicted spectral searches for candidate identification 46
  • 47. Predicted Data Already Public Publication and Data Files 47 https://epa.figshare.com/articles/CFM-ID_Paper_Data/7776212/1
  • 48. We have proven the value… 48
  • 51. Candidate Identification is only PART of the process • Whatever the approach for candidate identification chemical hazard is important • Hazard Comparison Profiling is important https://www.epa.gov/chemical-research/cheminformatics 51
  • 53. AMOS: Analytical Methods and Spectra Database • Three types of data in the database: – Methods (regulatory, lab manuals and SOPs, publications, tech notes) – Spectra (from public domain and our own laboratories) – Fact Sheets (harvested from SWGDRUG and other sites) • Some methods have associated spectra • Some data are just externally linked • Currently contains around 200,000 spectra, 700,000 external links, 3000 “Fact Sheets” and ~4000 methods • ALL data are growing in number 53
  • 57. Linking to actual spectra 57
  • 58. Linking to actual spectra 58 • We are doing a lot of chemical curation as we build the database
  • 59. 59 Why not just Regulatory Methods?
  • 60. Why not just Regulatory Methods? Because we need methods faster 60
  • 61. Rules need optimizing for MS-Ready standardization • We can now add/tweak the rules…add new rules, edit existing rules 61
  • 62. Example: Tautomer Rules • We control rules for – Tautomers – Mesomers – Neutralize/De-radicalize – Break salts – Standard checks – etc…. • Necessary for mapping chemicals in DSSTox 62
  • 63. Related substance mappings exist but are limited 63
  • 64. Integration to Chemical Transformation Database (ChET) 64
  • 66. Manual Curation and Annotation Analytical QC data for Tox21 • ~9000 chemicals with tens of thousands of spectra (LCMS, GCMS & NMR) • These data will feed prediction algorithms… 66
  • 67. Amenability Prediction Algorithms • New paper just submitted 67
  • 68. VSSTox for UVCB Chemicals 68
  • 70. UVCBs challenge in non-target analysis 70 Homologue screening plots from Swiss Wastewater (Schymanski et al 2014, left) and Novi Sad (right) o Complex mixtures (UVCBs) are a huge and very challenging part of the unknowns in many environmental samples
  • 71. Our cheminformatics work supports the “NTA WebApp” 71
  • 72. Our cheminformatics work supports the “NTA WebApp” 72
  • 74. Data and Services used by the Community 74
  • 75. NORMAN Suspect List Exchange https://www.norman-network.com/?q=node/236 75
  • 76. Our Data via services https://api-ccte.epa.gov/docs/ 76
  • 77. An API Key is required 77
  • 78. Conclusions • Our data resources underpin our research efforts – data quality and curation is key • Our web-based applications deliver our data to the community for multiple use cases • Our support for Exposomics is multi-fold – Curated chemistry data streams – Experimental and predicted properties, toxicity, etc. • The NTA WebApp in development will use all of these data streams to support analysis 78
  • 79. Acknowledgments • DSSTox curation team • CCTE IT team for software development, DevOPs • Mass spectrometry scientists across EPA, especially the NTA team • Open Databases: PubChem, ChEMBL, Mona, MassBank, GNPS, SWGDRUG, Cayman Chems. • Instrument vendors – many have contributed methods to the AMOS database • …and thank you to you for your time 79
  • 80. Contact Information • Contact info: williams.antony@epa.gov • Send methods for inclusion in AMOS • We fully support Open Data so ask us for what you need • Slides at: https://www.slideshare.net/AntonyWilliams/ 80