US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of Emerging Concern

US-EPA Cheminformatics Support for
Delivering Data Related to Chemicals
of Emerging Concern
Antony Williams
Center for Computational Toxicology and Exposure, US-EPA, RTP, NC
The views expressed in this presentation are those of the author
and do not necessarily reflect the views or policies of the U.S. EPA

The role of cheminformatics at EPA
• I am from the EPA Center for Computational Toxicology and Exposure
• We develop lots of prediction models and web-based applications
• Today’s presentation: how do our efforts support data dissemination
regarding chemicals of emerging concern and MS-NTA
2
2
Chemical Monitoring Needs
Exposure
Assessment
Dose-
Response
Assessment
Risk
Characterization
Hazard
Identification

Free-Access Cheminformatics Tools
• The Center for Computational Toxicology and Exposure many tools
• CompTox Chemicals Dashboard
• Proof-of-Concept cheminformatics modules
• Chemicals Hazard Profiling
• Chemical Transformations database (ChET)
• Analytical Methods and Open Spectra database (AMOS)
• All chemicals are stored/curated in DSSTox
3

Accessing DSSTox chemistry:
CompTox Chemicals Dashboard
•A publicly accessible website delivering:
• 1.2M chemicals with related property data
• Related substances: transformation products, mono/polymer
• Experimental/predicted physicochemical property data
• Experimental Human and Ecological hazard data
• Integration to “biological assay data”
• Information regarding chemicals in consumer products
• Links to other agency websites and public data resources
• “Batch searching” for tens to thousands of chemicals
5

CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
6

Experimental Data
9
• Experimental data harvested
from public domain databases
and journal articles
• Data link back to provenance
• Data are used to build QSAR
models for real time predictions
• Data are available for download
and reuse

What is PFOS Called?
Synonyms, CASRNs and more
10

Substance Relationship Mappings
• Similar compounds -
based on structure
“fingerprints”
11

Relationships in the data
12
• Structure mappings -
between parent and
salts, multicomponent
chemicals, isotopomers
• Related substances –
monomer to polymer,
parent to transformation
products

Batch Searching is a big enabler
https://pubs.acs.org/doi/10.1021/acs.jcim.0c01273
13

Batch Searching
• Singleton searches are useful but people work with groups of chemicals
• Typical questions
• Find me all data based on the input of 1000 CASRNs, or 1000 names
• What are the physicochemical properties for a set of identifiers?
• What is the list of chemicals for the formula CxHyOz?
• What is the list of chemicals for a mass +/- error?
• Can I get chemical lists in Excel files? In SDF files?
• Can I include properties in the download file?
14

Batch Search
• All data can be
downloaded into Excel
files, CSV files or SDF
files and reused
• All data are Open

Chemical Lists
https://comptox.epa.gov/dashboard/chemical_lists
• Chemical lists are focused on regulations, research efforts and categories
• 425 lists and growing
• TSCA Inventory
• Clean Water Act Hazardous Substances
• Consumer Products database
• Chemicals of Emerging Concern
• PFAS lists
• Extractables and Leachables
• Lists are versioned and updated and new lists added regularly
18

PFAS Lists of Chemicals (51/426)
23

Applications at the EPA
•We have ongoing efforts applying NTA to multiple
challenges including
• PFAS identification
• Pesticides in various matrices
• CECs in water
• Biosolids
•Examples include…
25

Example 1: Consumer Product Analysis
26

27
Many chemicals observed in
consumer product extracts
More observed chemicals not
known to be in consumer
products
Why might the ‘other’
chemicals be in the products?
Many observed chemicals
known to be in consumer
products
Example 1: Consumer Product Analysis

28
Example 2: Recycled Product Analysis

29
Significant differences between
chemicals in recycled vs. virgin products
for certain product & use categories
Most differences observed in paper
products and construction materials
Some uses (e.g., fragrances) highly
represented across all product/use
categories
Example 2: Recycled Product Analysis

Example 3: Placental Tissue Analysis
30

Lots of “proof-of-concept” tools in development
• PoCs are research software builds to prove approaches before moving
into production software environments
• Assemble data, develop data model(s), test user interface approaches,
work with test user base to garner feedback
• Since PoCs are internal access data refreshes and application updates
can be more
31

32
Cheminformatics PoC Modules
https://www.epa.gov/chemical-research/cheminformatics

Easy Export of all data to Excel
33

AMOS: Analytical Methods and Spectra Database
• Three types of data in the database:
• Methods (regulatory, lab manuals and SOPs, publications, tech notes)
• Spectra (from public domain and our own laboratories)
• Fact Sheets (harvested from SWGDRUG and other sites)
• Currently contains >210,000 spectra, >700,000 external links, 4000
“Fact Sheets” and ~4000 methods
• ALL data are growing in number weekly at present
34

Literature articles, SOPs, Protocols
36

Linking to actual spectra
38
• We are doing a lot of chemical curation as we
build the database

Why not just Regulatory Methods?
39

Why not just Regulatory Methods?
Because we need methods faster
40

Full presentation
https://t.ly/4MxFe
41

Our Data via services
https://api-ccte.epa.gov/docs/
42

Conclusions
• Our data resources underpin our research efforts – data quality is key
• Our web-based applications deliver our data to the community
• Our support for identifying chemicals of emerging concern is multi-fold
• Curated chemistry data streams
• Non-targeted analysis tool development and cheminformatics support
• NTA WebApp in development uses all data streams to support analysis
43

Acknowledgements and Contact Information
• The work presented here represents an enormous team of contributors
• Chemical curators
• Software developers and contractors
• Postdocs, SMEs and PIs
• Contact info: williams.antony@epa.gov
• Slides will be available at: https://www.slideshare.net/AntonyWilliams/
44

US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of Emerging Concern

Recommended

Recommended

More Related Content

Similar to US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of Emerging Concern

Similar to US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of Emerging Concern (20)

Recently uploaded

Recently uploaded (20)

US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of Emerging Concern