This presentation was given at the "Department of Defense's (DoD) Energy and Environment Innovation Symposium" in Arlington, Virginia on December 1st 2023 (https://serdp-estcp.org/events/details/04d444f1-aa19-4e66-bb5c-5163964cc4dd/symposium-2023)
FAIRSpectra - Enabling the FAIRification of Analytical Science
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of Emerging Concern
1. US-EPA Cheminformatics Support for
Delivering Data Related to Chemicals
of Emerging Concern
Antony Williams
Center for Computational Toxicology and Exposure, US-EPA, RTP, NC
The views expressed in this presentation are those of the author
and do not necessarily reflect the views or policies of the U.S. EPA
2. The role of cheminformatics at EPA
• I am from the EPA Center for Computational Toxicology and Exposure
• We develop lots of prediction models and web-based applications
• Today’s presentation: how do our efforts support data dissemination
regarding chemicals of emerging concern and MS-NTA
2
2
Chemical Monitoring Needs
Exposure
Assessment
Dose-
Response
Assessment
Risk
Characterization
Hazard
Identification
3. Free-Access Cheminformatics Tools
• The Center for Computational Toxicology and Exposure many tools
• CompTox Chemicals Dashboard
• Proof-of-Concept cheminformatics modules
• Chemicals Hazard Profiling
• Chemical Transformations database (ChET)
• Analytical Methods and Open Spectra database (AMOS)
• All chemicals are stored/curated in DSSTox
3
5. Accessing DSSTox chemistry:
CompTox Chemicals Dashboard
•A publicly accessible website delivering:
• 1.2M chemicals with related property data
• Related substances: transformation products, mono/polymer
• Experimental/predicted physicochemical property data
• Experimental Human and Ecological hazard data
• Integration to “biological assay data”
• Information regarding chemicals in consumer products
• Links to other agency websites and public data resources
• “Batch searching” for tens to thousands of chemicals
5
9. Experimental Data
9
• Experimental data harvested
from public domain databases
and journal articles
• Data link back to provenance
• Data are used to build QSAR
models for real time predictions
• Data are available for download
and reuse
10. What is PFOS Called?
Synonyms, CASRNs and more
10
12. Relationships in the data
12
• Structure mappings -
between parent and
salts, multicomponent
chemicals, isotopomers
• Related substances –
monomer to polymer,
parent to transformation
products
13. Batch Searching is a big enabler
https://pubs.acs.org/doi/10.1021/acs.jcim.0c01273
13
14. Batch Searching
• Singleton searches are useful but people work with groups of chemicals
• Typical questions
• Find me all data based on the input of 1000 CASRNs, or 1000 names
• What are the physicochemical properties for a set of identifiers?
• What is the list of chemicals for the formula CxHyOz?
• What is the list of chemicals for a mass +/- error?
• Can I get chemical lists in Excel files? In SDF files?
• Can I include properties in the download file?
14
17. Batch Search
• All data can be
downloaded into Excel
files, CSV files or SDF
files and reused
• All data are Open
18. Chemical Lists
https://comptox.epa.gov/dashboard/chemical_lists
• Chemical lists are focused on regulations, research efforts and categories
• 425 lists and growing
• TSCA Inventory
• Clean Water Act Hazardous Substances
• Consumer Products database
• Chemicals of Emerging Concern
• PFAS lists
• Extractables and Leachables
• Lists are versioned and updated and new lists added regularly
18
25. Applications at the EPA
•We have ongoing efforts applying NTA to multiple
challenges including
• PFAS identification
• Pesticides in various matrices
• CECs in water
• Biosolids
•Examples include…
25
27. 27
Many chemicals observed in
consumer product extracts
More observed chemicals not
known to be in consumer
products
Why might the ‘other’
chemicals be in the products?
Many observed chemicals
known to be in consumer
products
Example 1: Consumer Product Analysis
29. 29
Significant differences between
chemicals in recycled vs. virgin products
for certain product & use categories
Most differences observed in paper
products and construction materials
Some uses (e.g., fragrances) highly
represented across all product/use
categories
Example 2: Recycled Product Analysis
31. Lots of “proof-of-concept” tools in development
• PoCs are research software builds to prove approaches before moving
into production software environments
• Assemble data, develop data model(s), test user interface approaches,
work with test user base to garner feedback
• Since PoCs are internal access data refreshes and application updates
can be more
31
34. AMOS: Analytical Methods and Spectra Database
• Three types of data in the database:
• Methods (regulatory, lab manuals and SOPs, publications, tech notes)
• Spectra (from public domain and our own laboratories)
• Fact Sheets (harvested from SWGDRUG and other sites)
• Currently contains >210,000 spectra, >700,000 external links, 4000
“Fact Sheets” and ~4000 methods
• ALL data are growing in number weekly at present
34
42. Our Data via services
https://api-ccte.epa.gov/docs/
42
43. Conclusions
• Our data resources underpin our research efforts – data quality is key
• Our web-based applications deliver our data to the community
• Our support for identifying chemicals of emerging concern is multi-fold
• Curated chemistry data streams
• Non-targeted analysis tool development and cheminformatics support
• NTA WebApp in development uses all data streams to support analysis
43
44. Acknowledgements and Contact Information
• The work presented here represents an enormous team of contributors
• Chemical curators
• Software developers and contractors
• Postdocs, SMEs and PIs
• Contact info: williams.antony@epa.gov
• Slides will be available at: https://www.slideshare.net/AntonyWilliams/
44