SlideShare a Scribd company logo
1 of 62
Cheminformatics tools and chemistry data
underpinning mass spectrometry
analyses at the US-EPA
March 2024: Spring Fall Meeting, New Orleans, LA
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
Antony Williams
Center for Computational Toxicology and Exposure, US-EPA, RTP, NC
The role of cheminformatics at EPA
• Our branch is in the Center for Computational
Toxicology and Exposure (CCTE)
• We develop curated chemistry data streams to
support our applications and models
• We develop prediction models, web-based
applications and data streams to support others
• Today’s presentation: how do our efforts
support mass spec. and especially NTA efforts
– What’s public and what’s in development?
1
Why Does EPA Need Measurement Data?
2
• Measurement data needed to ensure chemical
safety
• Characterize risk
• Regulate use & disposal
• Manage human & ecological exposures
• Ensure compliance under federal statutes
Chemical Monitoring Needs
Exposure
Assessment
Dose-
Response
Assessment
Risk
Characterization
Hazard
Identification
Challenges
• High-quality monitoring data are unavailable for most chemicals
• Measurement data normally generated using “targeted” methods
• Targeted analytical methods:
- Require a priori knowledge of chemicals of interest
- Produce data for few selected analytes (10s-100s)
- Standards for method development & compound quantitation
- Are blind to emerging contaminants
- Can’t keep pace with needs of 21st century risk characterizations
• Data gaps being filled with exposure models and “NTA” methods
3
Relevant Questions of NTA Studies?
• Which chemicals are where?
• Do we see any “new” chemicals?
• Do observed co-occurrences highlight:
– Important exposure sources?
– Stressor-response relationships?
– What is the concentration of each chemical?
– Do estimated concentrations suggest unacceptable risk?
• How does cheminformatics support this effort?
4
Everything is underpinned by the
DSSTox Database
5
• >1.2M substances
• Highly curated data
• Mapped relationships
• The data are made
available via the
Dashboard…
CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
6
Detailed Chemical Pages
• Chemical page: intrinsic properties, structural identifiers,
linked substances, navigation to Hazard and Exposure data
8
SEARCH
TOX DATA
BIOACTIVITY
SIMILARITY
READ-ACROSS
PUBMED
BATCH SEARCH
CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
Batch Searching is a big enabler
https://pubs.acs.org/doi/10.1021/acs.jcim.0c01273
9
Batch Searching
• Singleton searches are useful but we work
with thousands of masses and formulae!
• Typical questions
– What is the list of chemicals for the formula CxHyOz
– What is the list of chemicals for a mass +/- error
– Can I get chemical lists in Excel files? In SDF files?
– Can I include properties in the download file?
10
Batch Search – Excel, CSV, SDF file
Batch Search
Chemical Lists
• Chemical lists are focused on regulations,
specific research efforts and categories
• 425 lists and growing
– TSCA Inventory
– Clean Water Act Hazardous Substances
– Consumer Products database
– Chemicals of Emerging Concern
– PFAS lists
– Extractables and Leachables
– …lists are versioned and updated and new lists added
13
The ELSIE Database
https://comptox.epa.gov/dashboard/chemical-lists/ELSIE
14
Extractables
15
Tire Crumb Rubber (298)
16
Hydraulic Fracturing (1640)
17
PFAS lists of Chemicals
18
Benefits of bringing it all together
• The true dashboard benefit is integration
• Rank potential candidates for toxicity using
available data – hazard, exposure, in vitro
19
Supporting Exposomics Research
• DSSTox database substances map to
– Their structures (mass/formulae/InChIs etc)
– Hazard data : human, mammalian and ecotox
– Exposure data: products in commerce, categories and
functional use, measured concentrations, etc.
• There are many types of metadata that can
be used for candidate ranking (old approach)
20
Data Source Ranking of
“known unknowns”
21
• A mass and/or formula search is
for an unknown chemical but it
is a known chemical contained
within a reference database
• Most likely candidate chemicals
have the most associated data
sources, most associated
literature articles or both
C14H22N2O3
266.16304
Chemical
Reference
Database
Sorted candidate
structures
Data Streams for Ranking
• Dashboard Data Sources
• PubChem Data Source Count
• PubMed Reference Count
• Toxcast in vitro bioactivity
• Presence in Consumer Products database
• Predicted physicochemical Properties
BIG databases are GREAT!
P
u
b
C
h
e
m
C
A
S
R
e
g
i
s
t
r
y
C
h
e
m
S
p
i
d
e
r
E
P
A
D
S
S
T
o
x
B
l
o
o
d
E
x
p
o
s
o
m
e
1 0 4
1 0 5
1 0 6
1 0 7
1 0 8
1 0 9
C
h
e
m
ic
a
l
S
u
b
s
ta
n
c
e
s
• Thanks to all of the public database efforts
• So much benefit from what’s been done
• There are hundreds of them at this point…
Is a bigger database better?
24
• ChemSpider was 26 million chemicals for
the original work
• Much BIGGER today
• Is bigger better??
• Are there other metadata to use for ranking?
Comparing Search Performance
25
• When dashboard contained 720k chemicals
• Only 3% of ChemSpider size
• What was the comparison in performance?
How did performance compare?
26
For the same 162 chemicals,
Dashboard outperforms
ChemSpider for both Mass and
Formula Ranking
Identifying “Known Unknowns”
Bigger is not necessarily better
27
Vomitoxin
28
Vomitoxin - ChemSpider
• 19 “Vomitoxins” – 3 isotopically labeled
29
PubChem – “virtual chemistry”
• Other databases grow quickly…a lot of “virtual
chemistry” and “make on demand” compounds.
• Efforts such as the BloodExposome and
PubChemLite are critical to focus efforts
30
Applications at the EPA
• We have ongoing efforts applying
NTA to multiple challenges including
– PFAS identification
– Pesticides in various matrices
– CECs in water
– Biosolids
• Examples include…
31
Example 1: Consumer Product Analysis
32
Example 1: Consumer Product Analysis
33
Many chemicals
observed in
consumer product
extracts
More observed
chemicals not
known to be in
consumer products
Why might the
‘other’ chemicals be
in the products?
Many observed
chemicals known to
be in consumer
products
Example 2: Recycled Product Analysis
34
Example 2: Recycled Product Analysis
35
Significant differences
between chemicals in
recycled vs. virgin products
for certain product & use
categories
Most differences observed in
paper products and
construction materials
Some uses (e.g., fragrances)
highly represented across all
product/use categories
Example 3: Placental Tissue Analysis
36
Supporting Exposomics Research
• DSSTox database substances map to
– Their structures (mass/formulae/InChIs etc)
– Hazard data : human, mammalian and ecotox
– Exposure data: products in commerce, categories
and functional use, measured concentrations, etc.
• Structures have to be standardized…
38
“MS-Ready Chemicals”
• MS-Ready chemical standardization is ESSENTIAL to our
support of Non-Targeted Analysis
• It links chemicals across the Dashboard and facilitates
detection linking back to products in commerce
39
https://jcheminf.biomedcentral.com/articles/
10.1186/s13321-018-0299-2
Predicted Mass Spectra
http://cfmid.wishartlab.com/
• MS/MS spectra prediction for ESI+, ESI-, and EI
• Predictions generated for MS-Ready structures
• Use experimental vs predicted spectral searches
for candidate identification
40
Predicted Data Already Public
Publication and Data Files
41
https://epa.figshare.com/articles/CFM-ID_Paper_Data/7776212/1
We have proven the value…
42
CASMI
43
CASMI
44
Candidate Identification is only
PART of the process
• Whatever the approach for candidate
identification chemical hazard is important
• Hazard Comparison Profiling is important
https://www.epa.gov/chemical-research/cheminformatics
45
Related
Work-in-Progress
AMOS: Analytical Methods and
Spectra Database
• Three types of data in the database:
– Methods (regulatory, lab manuals and SOPs, publications,
tech notes)
– Spectra (from public domain and our own laboratories)
– Fact Sheets (harvested from SWGDRUG and other sites)
• Some methods have associated spectra
• Some data are just externally linked
• Currently contains around 200,000 spectra,
700,000 external links, 3000 “Fact Sheets”
and ~4000 methods
• ALL data are growing in number
47
Embedded Method PDFs
48
Literature articles, SOPs, Protocols
49
Integrated spectrum library
50
Linking to actual spectra
51
Linking to actual spectra
52
• We are doing a lot of chemical curation as
we build the database
Rules need optimizing for
MS-Ready standardization
• We can now add/tweak the rules…add new
rules, edit existing rules
53
Example: Tautomer Rules
• We control rules for
– Tautomers
– Mesomers
– Neutralize/De-radicalize
– Break salts
– Standard checks
– etc….
• Necessary for mapping
chemicals in DSSTox
54
Manual Curation and Annotation
Analytical QC data for Tox21
• ~9000 chemicals with tens of thousands of
spectra (LCMS, GCMS & NMR)
• These data will feed prediction algorithms…
55
Amenability Prediction Algorithms
• New paper just submitted
56
Our cheminformatics work supports
the “NTA WebApp”
57
Our cheminformatics work supports
the “NTA WebApp”
58
Full presentation
https://t.ly/4MxFe
59
Conclusions
• Our data resources underpin our research
efforts – data quality and curation is key
• Our web-based applications deliver our data
to the community for multiple use cases
• Our support for Exposomics is multi-fold
– Curated chemistry data streams
– Experimental and predicted properties, toxicity, etc.
• The NTA WebApp in development will use
all of these data streams to support analysis
60
Acknowledgments
• DSSTox curation team
• CCTE IT team for software development, DevOPs
• Mass spectrometry scientists across EPA,
especially the NTA team
• Open Databases: PubChem, ChEMBL, Mona,
MassBank, GNPS, SWGDRUG, Cayman Chems.
• Instrument vendors – many have contributed
methods to the AMOS database
• …and thank you to you for your time 61
Contact Information
• Contact info: williams.antony@epa.gov
• We fully support Open Data so ask us for what
you need
62

More Related Content

Similar to Cheminformatics tools and chemistry data underpinning mass spectrometry analyses at the US Environmental Protection Agency

The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...Andrew McEachran
 

Similar to Cheminformatics tools and chemistry data underpinning mass spectrometry analyses at the US Environmental Protection Agency (20)

TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
US-EPA Chemicals Dashboard and Applications to Digital Design of Molecules
US-EPA Chemicals Dashboard and Applications to Digital Design  of MoleculesUS-EPA Chemicals Dashboard and Applications to Digital Design  of Molecules
US-EPA Chemicals Dashboard and Applications to Digital Design of Molecules
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
 
Progress in delivering transparency in research data
Progress in delivering transparency in research dataProgress in delivering transparency in research data
Progress in delivering transparency in research data
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Success in decision making data relevance curation
Success in decision making data relevance curationSuccess in decision making data relevance curation
Success in decision making data relevance curation
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Delivering access to chemistry and bioassay data from the National Center for...
Delivering access to chemistry and bioassay data from the National Center for...Delivering access to chemistry and bioassay data from the National Center for...
Delivering access to chemistry and bioassay data from the National Center for...
 

Recently uploaded

Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 

Cheminformatics tools and chemistry data underpinning mass spectrometry analyses at the US Environmental Protection Agency

  • 1. Cheminformatics tools and chemistry data underpinning mass spectrometry analyses at the US-EPA March 2024: Spring Fall Meeting, New Orleans, LA http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA Antony Williams Center for Computational Toxicology and Exposure, US-EPA, RTP, NC
  • 2. The role of cheminformatics at EPA • Our branch is in the Center for Computational Toxicology and Exposure (CCTE) • We develop curated chemistry data streams to support our applications and models • We develop prediction models, web-based applications and data streams to support others • Today’s presentation: how do our efforts support mass spec. and especially NTA efforts – What’s public and what’s in development? 1
  • 3. Why Does EPA Need Measurement Data? 2 • Measurement data needed to ensure chemical safety • Characterize risk • Regulate use & disposal • Manage human & ecological exposures • Ensure compliance under federal statutes Chemical Monitoring Needs Exposure Assessment Dose- Response Assessment Risk Characterization Hazard Identification
  • 4. Challenges • High-quality monitoring data are unavailable for most chemicals • Measurement data normally generated using “targeted” methods • Targeted analytical methods: - Require a priori knowledge of chemicals of interest - Produce data for few selected analytes (10s-100s) - Standards for method development & compound quantitation - Are blind to emerging contaminants - Can’t keep pace with needs of 21st century risk characterizations • Data gaps being filled with exposure models and “NTA” methods 3
  • 5. Relevant Questions of NTA Studies? • Which chemicals are where? • Do we see any “new” chemicals? • Do observed co-occurrences highlight: – Important exposure sources? – Stressor-response relationships? – What is the concentration of each chemical? – Do estimated concentrations suggest unacceptable risk? • How does cheminformatics support this effort? 4
  • 6. Everything is underpinned by the DSSTox Database 5 • >1.2M substances • Highly curated data • Mapped relationships • The data are made available via the Dashboard…
  • 8. Detailed Chemical Pages • Chemical page: intrinsic properties, structural identifiers, linked substances, navigation to Hazard and Exposure data
  • 9. 8 SEARCH TOX DATA BIOACTIVITY SIMILARITY READ-ACROSS PUBMED BATCH SEARCH CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard
  • 10. Batch Searching is a big enabler https://pubs.acs.org/doi/10.1021/acs.jcim.0c01273 9
  • 11. Batch Searching • Singleton searches are useful but we work with thousands of masses and formulae! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 10
  • 12. Batch Search – Excel, CSV, SDF file
  • 14. Chemical Lists • Chemical lists are focused on regulations, specific research efforts and categories • 425 lists and growing – TSCA Inventory – Clean Water Act Hazardous Substances – Consumer Products database – Chemicals of Emerging Concern – PFAS lists – Extractables and Leachables – …lists are versioned and updated and new lists added 13
  • 17. Tire Crumb Rubber (298) 16
  • 19. PFAS lists of Chemicals 18
  • 20. Benefits of bringing it all together • The true dashboard benefit is integration • Rank potential candidates for toxicity using available data – hazard, exposure, in vitro 19
  • 21. Supporting Exposomics Research • DSSTox database substances map to – Their structures (mass/formulae/InChIs etc) – Hazard data : human, mammalian and ecotox – Exposure data: products in commerce, categories and functional use, measured concentrations, etc. • There are many types of metadata that can be used for candidate ranking (old approach) 20
  • 22. Data Source Ranking of “known unknowns” 21 • A mass and/or formula search is for an unknown chemical but it is a known chemical contained within a reference database • Most likely candidate chemicals have the most associated data sources, most associated literature articles or both C14H22N2O3 266.16304 Chemical Reference Database Sorted candidate structures
  • 23. Data Streams for Ranking • Dashboard Data Sources • PubChem Data Source Count • PubMed Reference Count • Toxcast in vitro bioactivity • Presence in Consumer Products database • Predicted physicochemical Properties
  • 24. BIG databases are GREAT! P u b C h e m C A S R e g i s t r y C h e m S p i d e r E P A D S S T o x B l o o d E x p o s o m e 1 0 4 1 0 5 1 0 6 1 0 7 1 0 8 1 0 9 C h e m ic a l S u b s ta n c e s • Thanks to all of the public database efforts • So much benefit from what’s been done • There are hundreds of them at this point…
  • 25. Is a bigger database better? 24 • ChemSpider was 26 million chemicals for the original work • Much BIGGER today • Is bigger better?? • Are there other metadata to use for ranking?
  • 26. Comparing Search Performance 25 • When dashboard contained 720k chemicals • Only 3% of ChemSpider size • What was the comparison in performance?
  • 27. How did performance compare? 26 For the same 162 chemicals, Dashboard outperforms ChemSpider for both Mass and Formula Ranking
  • 28. Identifying “Known Unknowns” Bigger is not necessarily better 27
  • 30. Vomitoxin - ChemSpider • 19 “Vomitoxins” – 3 isotopically labeled 29
  • 31. PubChem – “virtual chemistry” • Other databases grow quickly…a lot of “virtual chemistry” and “make on demand” compounds. • Efforts such as the BloodExposome and PubChemLite are critical to focus efforts 30
  • 32. Applications at the EPA • We have ongoing efforts applying NTA to multiple challenges including – PFAS identification – Pesticides in various matrices – CECs in water – Biosolids • Examples include… 31
  • 33. Example 1: Consumer Product Analysis 32
  • 34. Example 1: Consumer Product Analysis 33 Many chemicals observed in consumer product extracts More observed chemicals not known to be in consumer products Why might the ‘other’ chemicals be in the products? Many observed chemicals known to be in consumer products
  • 35. Example 2: Recycled Product Analysis 34
  • 36. Example 2: Recycled Product Analysis 35 Significant differences between chemicals in recycled vs. virgin products for certain product & use categories Most differences observed in paper products and construction materials Some uses (e.g., fragrances) highly represented across all product/use categories
  • 37. Example 3: Placental Tissue Analysis 36
  • 38. Supporting Exposomics Research • DSSTox database substances map to – Their structures (mass/formulae/InChIs etc) – Hazard data : human, mammalian and ecotox – Exposure data: products in commerce, categories and functional use, measured concentrations, etc. • Structures have to be standardized… 38
  • 39. “MS-Ready Chemicals” • MS-Ready chemical standardization is ESSENTIAL to our support of Non-Targeted Analysis • It links chemicals across the Dashboard and facilitates detection linking back to products in commerce 39 https://jcheminf.biomedcentral.com/articles/ 10.1186/s13321-018-0299-2
  • 40. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated for MS-Ready structures • Use experimental vs predicted spectral searches for candidate identification 40
  • 41. Predicted Data Already Public Publication and Data Files 41 https://epa.figshare.com/articles/CFM-ID_Paper_Data/7776212/1
  • 42. We have proven the value… 42
  • 45. Candidate Identification is only PART of the process • Whatever the approach for candidate identification chemical hazard is important • Hazard Comparison Profiling is important https://www.epa.gov/chemical-research/cheminformatics 45
  • 47. AMOS: Analytical Methods and Spectra Database • Three types of data in the database: – Methods (regulatory, lab manuals and SOPs, publications, tech notes) – Spectra (from public domain and our own laboratories) – Fact Sheets (harvested from SWGDRUG and other sites) • Some methods have associated spectra • Some data are just externally linked • Currently contains around 200,000 spectra, 700,000 external links, 3000 “Fact Sheets” and ~4000 methods • ALL data are growing in number 47
  • 51. Linking to actual spectra 51
  • 52. Linking to actual spectra 52 • We are doing a lot of chemical curation as we build the database
  • 53. Rules need optimizing for MS-Ready standardization • We can now add/tweak the rules…add new rules, edit existing rules 53
  • 54. Example: Tautomer Rules • We control rules for – Tautomers – Mesomers – Neutralize/De-radicalize – Break salts – Standard checks – etc…. • Necessary for mapping chemicals in DSSTox 54
  • 55. Manual Curation and Annotation Analytical QC data for Tox21 • ~9000 chemicals with tens of thousands of spectra (LCMS, GCMS & NMR) • These data will feed prediction algorithms… 55
  • 56. Amenability Prediction Algorithms • New paper just submitted 56
  • 57. Our cheminformatics work supports the “NTA WebApp” 57
  • 58. Our cheminformatics work supports the “NTA WebApp” 58
  • 60. Conclusions • Our data resources underpin our research efforts – data quality and curation is key • Our web-based applications deliver our data to the community for multiple use cases • Our support for Exposomics is multi-fold – Curated chemistry data streams – Experimental and predicted properties, toxicity, etc. • The NTA WebApp in development will use all of these data streams to support analysis 60
  • 61. Acknowledgments • DSSTox curation team • CCTE IT team for software development, DevOPs • Mass spectrometry scientists across EPA, especially the NTA team • Open Databases: PubChem, ChEMBL, Mona, MassBank, GNPS, SWGDRUG, Cayman Chems. • Instrument vendors – many have contributed methods to the AMOS database • …and thank you to you for your time 61
  • 62. Contact Information • Contact info: williams.antony@epa.gov • We fully support Open Data so ask us for what you need 62