Chemistry Data Delivery from the US-EPA Center for Computational Toxicology and Exposure to Support Environmental Chemistry

The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
Chemistry data delivery from the US-EPA
to support environmental chemistry
Antony Williams
European Food Safety Authority: May 2024

US EPA: Office of Research and Development
• Office of Research and Development (ORD) is
the research arm of EPA
• Public health and environmental assessment
• Computational toxicology, exposure & modeling
• I work for the Center for Computational
Toxicology and Exposure in the Computational
Chemistry and Cheminformatics Branch

Data, Model and Tool Development
• There are many tools developed by our cheminformatics team
and across other centers in EPA. I will represent ours only…
• We have production level public-facing tools, proof-of-concept
public-facing tools, and many tools in development…
• We focus on FAIR data releasing it to the community and
making it available on Public APIs
2

Free-Access Cheminformatics Tools
• The Center for Computational Toxicology and Exposure has
delivered many tools including
– CompTox Chemicals Dashboard (primary tool from the center)
– Proof-of-Concept cheminformatics modules
• Chemicals Hazard Profiling
• Chemical Transformations Database
• Analytical Methods and Spectra
• Chemical Safety Profiling
3

Research Projects we apply them to

5

7

Curating Chemistry into the DSSTox Database
8
• Chemistry underpins all of our tools
• Data assembly and curation is critical
• DSSTox assembled over 25 years

Assembling data is easy. Curation is hard
https://pubs.acs.org/doi/10.1021/acs.jcim.2c00268
• It is very easy to harvest and download massive amounts
of data. FAIRness has expanded access…
• Open API and downloadable dataset – contributing
CASRNs, Names and Structures to Open Chemistry
9

Stoichiometry is important
• SIMPLE example…1 to 3 stoichiometry
• 1000s of structures with bad stoichiometry into the wild
10

Data Quality issues proliferate
Taxol skeletons (105 CS/202 PubChem)
11

Assembly and curation of data
• Chemistry data as the foundation of identifiers, structures,
chemical list assemblies and relationship mappings
• Chemical property, fate and transport data (expt. and pred.)
• Toxicity data assembled from public domain and EPA
databases – in vivo and in vitro, ecotoxicity
• Exposure data from public resources including EPA databases,
safety data sheets, experimental and predicted
• Delivered via multiple applications based on context 12

CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard/

The Charge for the Dashboard
• Develop a “first-stop-shop” for environmental chemical data to
support EPA and partner decision making:
– Centralized location for relevant chemical data
– Chemistry, exposure, hazard and dosimetry
– Combination of existing data and predictive models
– Publicly accessible, periodically updated, curated
• Easy access to data improves efficiency and ultimately
accelerates chemical risk assessment

“Executive Summary”
• Overview of toxicity-
related info
• Quantitative values
• Physchem. and Fate &
Transport
• Adverse Outcome
Pathway links
• In vitro bioactivity
summary plot

Experimental and Predicted Data
• Physchem and Fate & Transport
experimental and predicted data
• Data can be downloaded as Excel,
TSV and CSV files

Hazard Data for Copper
• 2246 rows of human/eco hazard data harvested with 3 clicks

Sources of Exposure to Chemicals

Integrated Modules – Generalized Read-Across
https://comptox.epa.gov/genra/
23

Integrated Modules – Abstract Sifter
https://comptox.epa.gov/dashboard/chemical/pubmed-abstract-sifter/

Substance Relationship Mappings
contained in the data model
• Similar compounds - based on structure “fingerprints”
• Structure mappings - between parent and salts, isotopomers,
multi- component chemicals
• Related substances – monomer to polymer, parent to
transformation products
26

Complex Mappings for FORMULATIONS
• Example: AFFF formulations can be registered in the
database as reported in publications
28

Mixture Formulation contents
29

Polymers can map to components
30

Chemical Lists
(not all lists are created equal…)

Chemical Lists
• Chemical lists are focused on regulations, specific research
efforts and categories
• 450 lists and growing
– TSCA Inventory
– Clean Water Act Hazardous Substances
– Consumer Products database
– Chemicals of Emerging Concern
– PFAS lists
– Extractables and Leachables
– …lists are versioned and updated and new lists added
32

Remember those Research Projects?

Harvesting Data en masse
• Harvesting data for 726 biosolid related chemicals
– Physicochemical properties
– Fate and transport
– Toxicity values
– Bioactivity data in 100s of in vitro data
– Exposure data
– Chemical identifiers
– Links to regulatory assessments

Batch Searching is a big enabler
https://pubs.acs.org/doi/10.1021/acs.jcim.0c01273
39

Batch Search – Excel, CSV, SDF file

We supply predicted data for many endpoints
• Property prediction – e.g., water solubility, vapor pressure
• Fate and Transport – e.g., bioaccumulation, bioconcentration
• Bioactivity – e.g., endocrine disruption
• Models are constantly updated with fresh data, are transparent
in their data, and are open source
43

QSAR Modeled Data are available
• We build models then apply then to our curated datasets
for release, PLUS deliver the models for realtime use
44

Where is all the calculation detail? Are
predictions in applicability domain etc?
• For OPERA and TEST models we have all the details
– OPERA https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0263-1
– TEST https://www.epa.gov/comptox-tools/toxicity-estimation-software-tool-test
45

Why this detail is required
• Predicted Fish Biotrans. Half-Life (Km) of PFOS is 2.7 days
49

50

QMRF details
https://comptox.epa.gov/dashboard-api/ccdapp1/qmrfdata/file/by-modelid/28
51

52

Access to Real Time Predictions
54

Multiple modeling approaches plus Consensus
55

Our approaches to building models
• Exemplified through our recent water solubility work
56

OECD Principles for Modeling
https://www.oecd.org/chemicalsafety/risk-assessment/37849783.pdf
• To facilitate the consideration of a (Q)SAR model for
regulatory purposes, it should be associated with the
following information:
1) a defined endpoint
2) an unambiguous algorithm
3) a defined domain of applicability
4) appropriate measures of goodness-of–fit, robustness and predictivity
5) a mechanistic interpretation, if possible
• These principles have been around a long time…
57

Lots of descriptors to choose from
• Many Descriptors to choose: commercial and open source
• We use Padel, Mordred and TEST descriptors (open)
• Example: http://www.yapcwsoft.com/dd/padeldescriptor/
58

Feature Selection and Variables can help
mechanistic understanding
59
Todd Martin, SERMACS, 2023
Without Feature Selection – 427 variables With Feature Selection – 19 variables
R2 =0.822 R2 =0.816

Coming Soon:
Excel report for models for each data set
• Cover sheet with model metadata
• Training and test set statistics
60
• Training and test set statistics
• Prediction results for each method

Where do we use predictions like this?
• Models are used in many places in our computational
toxicology research
• They are used in the analytical labs to help guide non-
targeted analysis
61

toxicology research
targeted analysis
• By stakeholders for Hazard
profiling of chemicals
62

toxicology research
targeted analysis
• By stakeholders for Hazard
profiling of chemicals
• Predictions for breakdown
products in the environment
63

So now you know the Dashboard…
64

Lots of “proof-of-concept” tools in development
• PoCs are research software builds to prove approaches
before moving into production software environments
• PoCs are to figure out how to address specific questions
• Assemble data, develop data model(s), test user interface
approaches, work with test user base to garner feedback
• Since PoCs are internal access data refreshes and application
updates can be more
• Underlying APIs are being used in our research
65

PoCs have been rebuilt for production
• Examples of PoCs integrated into production apps
– WebTEST predictions on the Dashboard
– Structure/substructure/similarity search
66

How to compare Hazard Data?
67

How to compare Hazard Data?
NOT Easy to interpret…
68

Hazard Profile
69
• Hazard Comparison module profiles toxicity across chemicals
https://www.epa.gov/chemical-research/cheminformatics

Hazard Profile
On-Hover view of trumping scheme call
70

Hazard Profile
On-click view of underlying data
71

Linked to Chemical Transformation Simulator
73

Linked to Chemical Transformation Simulator
74

Simple Analog “read-across”
• Suppose a chemical has limited data – perform an analog
search to find related chemicals with data
75

Simple Analog “read-across”
Similarity
76

Where can our tools be applied
• Emergency Response utility is obvious…
• Consider East Palestine
77
https://www.cleveland19.com/2023/
02/14/ntsb-announces-preliminary-
malfunction-that-caused-east-
palestine-train-derailment/
POLYPROPYLENE
POLYETHYLENE
Residue lube oil
VINYL CHLORIDE
DIPROPYLENE GLYCOL
PROPYLENE GLYCOL
DIETHYLENE GLYCOL
COMBUSTIBLE LIQ., NOS (ETHYLENE GLYCOL MONOBUTYL ETHER)
SEMOLINA
COMBUSTIBLE LIQ., NOS (ETHYLHEXYL ACRYLATE)
POLYVINYL
PETROLEUM LUBEOIL
POLYPROPYL GLYCOL
ISOBUTYLENE
BUTYL ACRYLATES, STABILIZED
PETRO OIL, NEC
ADDITIVES, FUEL
BALLS,CTN,M EDCL
SHEET STEEL
VEGTABLE, FROZEN
BENZENE
PARAFFIN WAX
FLAKES, POWDER
HYDRAULIC CEMENT
AUTOS PASSENGER
MALT LIQUORS

Hazard Comparison Profiling
78

Perfect Example of FAIR Data and APIs
• We owe a lot to FAIR data and availability of information
• We curate a lot of our chemistry data using public resources
such as PubChem, ChEBI, Common Chemistry and others
• The availability of Public APIs takes things to another level!
• We have been using the PubChem API to harvest data so
we can build new applications, like the Safety Module
80

Cheminformatics Safety Module (NOT PUBLIC)
Integrate multiple data streams…
81

WebTEST Batch Prediction
• Batch prediction of all WebTEST predictions
• Display of experimental and predicted data and reports
82

QSAR-Ready/MS-Ready Standardizer
• “QSAR and MS-Ready” standardization underpins models and linking
• MS-Ready is ESSENTIAL to our support of Non-Targeted Analysis
• QSAR-Ready rules need tweaking
83
https://jcheminf.biomedcentral.com/articles/10.1186/s1332
1-018-0299-2

Structure Standardization
• We CONTROL the rules…add new rules, edit existing rules
84

Example: Tautomer Rules
• We control rules for
– Tautomers
– Mesomers
– Neutralize/De-radicalize
– Break salts
– Standard checks
– etc….
• Necessary for mapping
chemicals in DSSTox
85

Structure Alerts Module
• Structure “Alerts” module based on:
– SMARTS (PAINS)
– ToxPrints (Ashby and TTC)
– SMILES (IARC 1, 2, 3a and 3b)
86
ID Chemical aim ashby iarc1 …

EPA Measurement Data
87
• Measurement data are needed to ensure chemical safety
• Characterize risk
• Regulate use & disposal
• Manage human & ecological exposures
• Ensure compliance under federal statutes
Chemical Monitoring Needs
Exposure
Assessment
Dose-
Response
Assessment
Risk
Characterization
Hazard
Identification

Applications of Exposomics at EPA
• Ongoing efforts applying NTA to exposomics challenges including
– PFAS identification
– Pesticides in various matrices
– CECs in water
– Biosolids
• Examples include…
88

Example 1: Consumer Product Analysis
89

Example 2: Recycled Product Analysis
90

Example 3: Placental Tissue Analysis
91

Applications of Exposomics at EPA
• Ongoing efforts applying NTA to exposomics challenges including
– PFAS identification
– Pesticides in various matrices
– CECs in water
– Biosolids
• Cheminformatics is a key component of NTA analysis
– Structure standardization (MS-Ready structure forms)
– Predictive models (LCMS amenability, retention time prediction)
– in silico mass spectrometry prediction
– Chemical Space Mapping
– Chemical Transformation database
– Analytical Methods and Open Spectral database 92

AMOS: Analytical Methods and Open Spectra
(NOT PUBLIC yet)
• Simple Vision: I want to find the best method(s) associated with a
chemical and/or class of chemicals
• Answer the question “I cannot find a method for my chemical” - HELP
• The Approach:
– Aggregate MS method documents (and adjust the definition of “what is a useful method”)
– Extract chemistry (mostly CASRN and Names)
– Map CASRN and Names to structures
– Deliver a proof-of-concept application to search a database by names, CASRNs, InChIKeys
and ultimately structure
93

AMOS: Analytical Methods and Open Spectra
(NOT PUBLIC yet)
• Three types of data in the database:
– Methods (regulatory, lab manuals and SOPs, publications, tech notes)
– Spectra (from public domain and our own laboratories)
– Monographs (harvested from SWGDRUG and other sites)
• Some methods have associated spectra
• Some data are just externally linked
• Currently contains around 285,000 spectra, 600,000 external
links, >5000 “Fact Sheets” and >5100 methods
• Spectra – LC-MS, GC-MS, NMR
• ALL data are growing in number with weekly releases
94

Chemical Transformation Simulator Database
98

ChET: Chemical Transformations Database

ChET Visual Reaction Maps
• Compare and overlap maps
• Load all maps containing a
particular chemical
• Prune and filter maps
101

Chemical Space Mapping (CheMSTER)
Chemical Mapping of Space Translated into Enhanced
Representations
102
• Initially built to support
NTA research
• Functionality to overlap
and compare datasets
• Selection of chemicals
based on variables
(predicted properties)
• Plug-in growing model set
to add variables for
comparison

The CompTox API is now public
https://api-ccte.epa.gov/docs/index.html
103

Conclusions
• Underpinning chemistry data is from the DSSTox database
• CompTox Chemicals Dashboard is public access to DSSTox
and other related databases
• Proof-of-Concept (PoC) tools are built to prove approaches
• Everything is increasingly API driven and APIs are now public
104

Some Related Publications of Interest

You want to know more…
• Lots of resources available
– Presentations: https://tinyurl.com/w5hqs55
– Communities of Practice Videos: https://rb.gy/qsbno1
– Manual: https://rb.gy/4fgydc
– Latest News: https://comptox.epa.gov/dashboard/news_info
106

This talk is an overview
• This talk is a high-level overview only. We
can provide trainings into the individual
modules and data as required
• LOTS of training materials are available
https://www.epa.gov/chemical-research/new-approach-methods-nams-training

Acknowledgments
• Our DSSTox curation team
• SCDCD software development and DevOps teams
• Scientists and students across CCTE
• Non-targeted analysis and mass spectrometry team
• Dashboard project team – Nisha Sipes & Phuc Do
• Cheminformatics Modules and Modeling Team – Valery
Tkachenko, Todd Martin, Nate Charest, Charlie Lowe
• ChET – Adam Edelman-Munoz, Caroline Stevens and team
• ChemSTER – Nate Charest and Adam Edelman-Munoz
108

Contact Information
• Contact info: williams.antony@epa.gov
• Slides available at: https://www.slideshare.net/AntonyWilliams/
• Obtain articles from Google Scholar Profile
109

Chemistry Data Delivery from the US-EPA Center for Computational Toxicology and Exposure to Support Environmental Chemistry

Recommended

Recommended

More Related Content

Similar to Chemistry Data Delivery from the US-EPA Center for Computational Toxicology and Exposure to Support Environmental Chemistry

Similar to Chemistry Data Delivery from the US-EPA Center for Computational Toxicology and Exposure to Support Environmental Chemistry (20)

Recently uploaded

Recently uploaded (20)

Chemistry Data Delivery from the US-EPA Center for Computational Toxicology and Exposure to Support Environmental Chemistry