SlideShare a Scribd company logo
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
Chemistry data delivery from the US-EPA
to support environmental chemistry
Antony Williams
European Food Safety Authority: May 2024
US EPA: Office of Research and Development
• Office of Research and Development (ORD) is
the research arm of EPA
• Public health and environmental assessment
• Computational toxicology, exposure & modeling
• I work for the Center for Computational
Toxicology and Exposure in the Computational
Chemistry and Cheminformatics Branch
Data, Model and Tool Development
• There are many tools developed by our cheminformatics team
and across other centers in EPA. I will represent ours only…
• We have production level public-facing tools, proof-of-concept
public-facing tools, and many tools in development…
• We focus on FAIR data releasing it to the community and
making it available on Public APIs
2
Free-Access Cheminformatics Tools
• The Center for Computational Toxicology and Exposure has
delivered many tools including
– CompTox Chemicals Dashboard (primary tool from the center)
– Proof-of-Concept cheminformatics modules
• Chemicals Hazard Profiling
• Chemical Transformations Database
• Analytical Methods and Spectra
• Chemical Safety Profiling
3
Research Projects we apply them to
5
Research Projects we apply them to
Research Projects we apply them to
7
Research Projects we apply them to
Curating Chemistry into the DSSTox Database
8
• Chemistry underpins all of our tools
• Data assembly and curation is critical
• DSSTox assembled over 25 years
Assembling data is easy. Curation is hard
https://pubs.acs.org/doi/10.1021/acs.jcim.2c00268
• It is very easy to harvest and download massive amounts
of data. FAIRness has expanded access…
• Open API and downloadable dataset – contributing
CASRNs, Names and Structures to Open Chemistry
9
Stoichiometry is important
• SIMPLE example…1 to 3 stoichiometry
• 1000s of structures with bad stoichiometry into the wild
10
Data Quality issues proliferate
Taxol skeletons (105 CS/202 PubChem)
11
Assembly and curation of data
• Chemistry data as the foundation of identifiers, structures,
chemical list assemblies and relationship mappings
• Chemical property, fate and transport data (expt. and pred.)
• Toxicity data assembled from public domain and EPA
databases – in vivo and in vitro, ecotoxicity
• Exposure data from public resources including EPA databases,
safety data sheets, experimental and predicted
• Delivered via multiple applications based on context 12
CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard/
The Charge for the Dashboard
• Develop a “first-stop-shop” for environmental chemical data to
support EPA and partner decision making:
– Centralized location for relevant chemical data
– Chemistry, exposure, hazard and dosimetry
– Combination of existing data and predictive models
– Publicly accessible, periodically updated, curated
• Easy access to data improves efficiency and ultimately
accelerates chemical risk assessment
Detailed Chemical Pages
“Executive Summary”
• Overview of toxicity-
related info
• Quantitative values
• Physchem. and Fate &
Transport
• Adverse Outcome
Pathway links
• In vitro bioactivity
summary plot
Experimental and Predicted Data
• Physchem and Fate & Transport
experimental and predicted data
• Data can be downloaded as Excel,
TSV and CSV files
Chemical Hazard Data
Hazard Data for Copper
• 2246 rows of human/eco hazard data harvested with 3 clicks
Safety Data
Sources of Exposure to Chemicals
ToxCast
Integrated Modules – Generalized Read-Across
https://comptox.epa.gov/genra/
23
Integrated Modules – Abstract Sifter
https://comptox.epa.gov/dashboard/chemical/pubmed-abstract-sifter/
DSSTox has a
RICH Data Model
Substance Relationship Mappings
contained in the data model
• Similar compounds - based on structure “fingerprints”
• Structure mappings - between parent and salts, isotopomers,
multi- component chemicals
• Related substances – monomer to polymer, parent to
transformation products
26
Relationships in the data
27
Complex Mappings for FORMULATIONS
• Example: AFFF formulations can be registered in the
database as reported in publications
28
Mixture Formulation contents
29
Polymers can map to components
30
Chemical Lists
(not all lists are created equal…)
Chemical Lists
• Chemical lists are focused on regulations, specific research
efforts and categories
• 450 lists and growing
– TSCA Inventory
– Clean Water Act Hazardous Substances
– Consumer Products database
– Chemicals of Emerging Concern
– PFAS lists
– Extractables and Leachables
– …lists are versioned and updated and new lists added
32
Remember those Research Projects?
Some Research Projects…
Some Research Projects…
Some Research Projects…
Harvesting Data en masse
• Harvesting data for 726 biosolid related chemicals
– Physicochemical properties
– Fate and transport
– Toxicity values
– Bioactivity data in 100s of in vitro data
– Exposure data
– Chemical identifiers
– Links to regulatory assessments
Batch Searching
Batch Searching is a big enabler
https://pubs.acs.org/doi/10.1021/acs.jcim.0c01273
39
Batch Search
Batch Search – Excel, CSV, SDF file
Batch Search
We supply predicted data for many endpoints
• Property prediction – e.g., water solubility, vapor pressure
• Fate and Transport – e.g., bioaccumulation, bioconcentration
• Bioactivity – e.g., endocrine disruption
• Models are constantly updated with fresh data, are transparent
in their data, and are open source
43
QSAR Modeled Data are available
• We build models then apply then to our curated datasets
for release, PLUS deliver the models for realtime use
44
Where is all the calculation detail? Are
predictions in applicability domain etc?
• For OPERA and TEST models we have all the details
– OPERA https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0263-1
– TEST https://www.epa.gov/comptox-tools/toxicity-estimation-software-tool-test
45
OPERA Model Details
46
OPERA Model Details
47
OPERA Model Details
48
Why this detail is required
• Predicted Fish Biotrans. Half-Life (Km) of PFOS is 2.7 days
49
Why this detail is required
50
QMRF details
https://comptox.epa.gov/dashboard-api/ccdapp1/qmrfdata/file/by-modelid/28
51
Why this detail is required
52
TEST Prediction Reports
53
Access to Real Time Predictions
54
Multiple modeling approaches plus Consensus
55
Our approaches to building models
• Exemplified through our recent water solubility work
56
OECD Principles for Modeling
https://www.oecd.org/chemicalsafety/risk-assessment/37849783.pdf
• To facilitate the consideration of a (Q)SAR model for
regulatory purposes, it should be associated with the
following information:
1) a defined endpoint
2) an unambiguous algorithm
3) a defined domain of applicability
4) appropriate measures of goodness-of–fit, robustness and predictivity
5) a mechanistic interpretation, if possible
• These principles have been around a long time…
57
Lots of descriptors to choose from
• Many Descriptors to choose: commercial and open source
• We use Padel, Mordred and TEST descriptors (open)
• Example: http://www.yapcwsoft.com/dd/padeldescriptor/
58
Feature Selection and Variables can help
mechanistic understanding
59
Todd Martin, SERMACS, 2023
Without Feature Selection – 427 variables With Feature Selection – 19 variables
R2 =0.822 R2 =0.816
Coming Soon:
Excel report for models for each data set
• Cover sheet with model metadata
• Training and test set statistics
60
• Training and test set statistics
• Prediction results for each method
Where do we use predictions like this?
• Models are used in many places in our computational
toxicology research
• They are used in the analytical labs to help guide non-
targeted analysis
61
Where do we use predictions like this?
• Models are used in many places in our computational
toxicology research
• They are used in the analytical labs to help guide non-
targeted analysis
• By stakeholders for Hazard
profiling of chemicals
62
Where do we use predictions like this?
• Models are used in many places in our computational
toxicology research
• They are used in the analytical labs to help guide non-
targeted analysis
• By stakeholders for Hazard
profiling of chemicals
• Predictions for breakdown
products in the environment
63
So now you know the Dashboard…
64
Lots of “proof-of-concept” tools in development
• PoCs are research software builds to prove approaches
before moving into production software environments
• PoCs are to figure out how to address specific questions
• Assemble data, develop data model(s), test user interface
approaches, work with test user base to garner feedback
• Since PoCs are internal access data refreshes and application
updates can be more
• Underlying APIs are being used in our research
65
PoCs have been rebuilt for production
• Examples of PoCs integrated into production apps
– WebTEST predictions on the Dashboard
– Structure/substructure/similarity search
66
How to compare Hazard Data?
67
How to compare Hazard Data?
NOT Easy to interpret…
68
Hazard Profile
69
• Hazard Comparison module profiles toxicity across chemicals
https://www.epa.gov/chemical-research/cheminformatics
Hazard Profile
On-Hover view of trumping scheme call
70
Hazard Profile
On-click view of underlying data
71
Data to Excel in <60s
72
Linked to Chemical Transformation Simulator
73
Linked to Chemical Transformation Simulator
74
Simple Analog “read-across”
• Suppose a chemical has limited data – perform an analog
search to find related chemicals with data
75
Simple Analog “read-across”
Similarity
76
Where can our tools be applied
• Emergency Response utility is obvious…
• Consider East Palestine
77
https://www.cleveland19.com/2023/
02/14/ntsb-announces-preliminary-
malfunction-that-caused-east-
palestine-train-derailment/
POLYPROPYLENE
POLYETHYLENE
Residue lube oil
VINYL CHLORIDE
DIPROPYLENE GLYCOL
PROPYLENE GLYCOL
DIETHYLENE GLYCOL
COMBUSTIBLE LIQ., NOS (ETHYLENE GLYCOL MONOBUTYL ETHER)
SEMOLINA
COMBUSTIBLE LIQ., NOS (ETHYLHEXYL ACRYLATE)
POLYVINYL
PETROLEUM LUBEOIL
POLYPROPYL GLYCOL
ISOBUTYLENE
BUTYL ACRYLATES, STABILIZED
PETRO OIL, NEC
ADDITIVES, FUEL
BALLS,CTN,M EDCL
SHEET STEEL
VEGTABLE, FROZEN
BENZENE
PARAFFIN WAX
FLAKES, POWDER
HYDRAULIC CEMENT
AUTOS PASSENGER
MALT LIQUORS
Hazard Comparison Profiling
78
Hazard Comparison Profiling
Perfect Example of FAIR Data and APIs
• We owe a lot to FAIR data and availability of information
• We curate a lot of our chemistry data using public resources
such as PubChem, ChEBI, Common Chemistry and others
• The availability of Public APIs takes things to another level!
• We have been using the PubChem API to harvest data so
we can build new applications, like the Safety Module
80
Cheminformatics Safety Module (NOT PUBLIC)
Integrate multiple data streams…
81
WebTEST Batch Prediction
• Batch prediction of all WebTEST predictions
• Display of experimental and predicted data and reports
82
QSAR-Ready/MS-Ready Standardizer
• “QSAR and MS-Ready” standardization underpins models and linking
• MS-Ready is ESSENTIAL to our support of Non-Targeted Analysis
• QSAR-Ready rules need tweaking
83
https://jcheminf.biomedcentral.com/articles/10.1186/s1332
1-018-0299-2
Structure Standardization
• We CONTROL the rules…add new rules, edit existing rules
84
Example: Tautomer Rules
• We control rules for
– Tautomers
– Mesomers
– Neutralize/De-radicalize
– Break salts
– Standard checks
– etc….
• Necessary for mapping
chemicals in DSSTox
85
Structure Alerts Module
• Structure “Alerts” module based on:
– SMARTS (PAINS)
– ToxPrints (Ashby and TTC)
– SMILES (IARC 1, 2, 3a and 3b)
86
ID Chemical aim ashby iarc1 …
EPA Measurement Data
87
• Measurement data are needed to ensure chemical safety
• Characterize risk
• Regulate use & disposal
• Manage human & ecological exposures
• Ensure compliance under federal statutes
Chemical Monitoring Needs
Exposure
Assessment
Dose-
Response
Assessment
Risk
Characterization
Hazard
Identification
Applications of Exposomics at EPA
• Ongoing efforts applying NTA to exposomics challenges including
– PFAS identification
– Pesticides in various matrices
– CECs in water
– Biosolids
• Examples include…
88
Example 1: Consumer Product Analysis
89
Example 2: Recycled Product Analysis
90
Example 3: Placental Tissue Analysis
91
Applications of Exposomics at EPA
• Ongoing efforts applying NTA to exposomics challenges including
– PFAS identification
– Pesticides in various matrices
– CECs in water
– Biosolids
• Cheminformatics is a key component of NTA analysis
– Structure standardization (MS-Ready structure forms)
– Predictive models (LCMS amenability, retention time prediction)
– in silico mass spectrometry prediction
– Chemical Space Mapping
– Chemical Transformation database
– Analytical Methods and Open Spectral database 92
AMOS: Analytical Methods and Open Spectra
(NOT PUBLIC yet)
• Simple Vision: I want to find the best method(s) associated with a
chemical and/or class of chemicals
• Answer the question “I cannot find a method for my chemical” - HELP
• The Approach:
– Aggregate MS method documents (and adjust the definition of “what is a useful method”)
– Extract chemistry (mostly CASRN and Names)
– Map CASRN and Names to structures
– Deliver a proof-of-concept application to search a database by names, CASRNs, InChIKeys
and ultimately structure
93
AMOS: Analytical Methods and Open Spectra
(NOT PUBLIC yet)
• Three types of data in the database:
– Methods (regulatory, lab manuals and SOPs, publications, tech notes)
– Spectra (from public domain and our own laboratories)
– Monographs (harvested from SWGDRUG and other sites)
• Some methods have associated spectra
• Some data are just externally linked
• Currently contains around 285,000 spectra, 600,000 external
links, >5000 “Fact Sheets” and >5100 methods
• Spectra – LC-MS, GC-MS, NMR
• ALL data are growing in number with weekly releases
94
AMOS database
Linking to actual spectra
96
Linking to actual spectra
97
Chemical Transformation Simulator Database
98
ChET: Chemical Transformations Database
ChET Reaction Map Lists
100
ChET Visual Reaction Maps
• Compare and overlap maps
• Load all maps containing a
particular chemical
• Prune and filter maps
101
Chemical Space Mapping (CheMSTER)
Chemical Mapping of Space Translated into Enhanced
Representations
102
• Initially built to support
NTA research
• Functionality to overlap
and compare datasets
• Selection of chemicals
based on variables
(predicted properties)
• Plug-in growing model set
to add variables for
comparison
The CompTox API is now public
https://api-ccte.epa.gov/docs/index.html
103
Conclusions
• Underpinning chemistry data is from the DSSTox database
• CompTox Chemicals Dashboard is public access to DSSTox
and other related databases
• Proof-of-Concept (PoC) tools are built to prove approaches
• Everything is increasingly API driven and APIs are now public
104
Some Related Publications of Interest
You want to know more…
• Lots of resources available
– Presentations: https://tinyurl.com/w5hqs55
– Communities of Practice Videos: https://rb.gy/qsbno1
– Manual: https://rb.gy/4fgydc
– Latest News: https://comptox.epa.gov/dashboard/news_info
106
This talk is an overview
• This talk is a high-level overview only. We
can provide trainings into the individual
modules and data as required
• LOTS of training materials are available
https://www.epa.gov/chemical-research/new-approach-methods-nams-training
Acknowledgments
• Our DSSTox curation team
• SCDCD software development and DevOps teams
• Scientists and students across CCTE
• Non-targeted analysis and mass spectrometry team
• Dashboard project team – Nisha Sipes & Phuc Do
• Cheminformatics Modules and Modeling Team – Valery
Tkachenko, Todd Martin, Nate Charest, Charlie Lowe
• ChET – Adam Edelman-Munoz, Caroline Stevens and team
• ChemSTER – Nate Charest and Adam Edelman-Munoz
108
Contact Information
• Contact info: williams.antony@epa.gov
• Slides available at: https://www.slideshare.net/AntonyWilliams/
• Obtain articles from Google Scholar Profile
109

More Related Content

Similar to Chemistry Data Delivery from the US-EPA Center for Computational Toxicology and Exposure to Support Environmental Chemistry

Integrating Mass Spectrometry Non-Targeted Analysis and Computational Chemis...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Chemis...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Chemis...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Chemis...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Chemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistryChemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistry
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Delivering web-based access to data and algorithms to support computational t...
Delivering web-based access to data and algorithms to support computational t...Delivering web-based access to data and algorithms to support computational t...
Delivering web-based access to data and algorithms to support computational t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
Andrew McEachran
 
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to Chemistry Data Delivery from the US-EPA Center for Computational Toxicology and Exposure to Support Environmental Chemistry (20)

Integrating Mass Spectrometry Non-Targeted Analysis and Computational Chemis...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Chemis...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Chemis...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Chemis...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
Chemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistryChemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistry
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Delivering web-based access to data and algorithms to support computational t...
Delivering web-based access to data and algorithms to support computational t...Delivering web-based access to data and algorithms to support computational t...
Delivering web-based access to data and algorithms to support computational t...
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
 
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
 
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
 
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
 
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
 

Recently uploaded

Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
QusayMaghayerh
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li..._Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
LucyHearn1
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 

Recently uploaded (20)

Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li..._Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 

Chemistry Data Delivery from the US-EPA Center for Computational Toxicology and Exposure to Support Environmental Chemistry

  • 1. The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA Chemistry data delivery from the US-EPA to support environmental chemistry Antony Williams European Food Safety Authority: May 2024
  • 2. US EPA: Office of Research and Development • Office of Research and Development (ORD) is the research arm of EPA • Public health and environmental assessment • Computational toxicology, exposure & modeling • I work for the Center for Computational Toxicology and Exposure in the Computational Chemistry and Cheminformatics Branch
  • 3. Data, Model and Tool Development • There are many tools developed by our cheminformatics team and across other centers in EPA. I will represent ours only… • We have production level public-facing tools, proof-of-concept public-facing tools, and many tools in development… • We focus on FAIR data releasing it to the community and making it available on Public APIs 2
  • 4. Free-Access Cheminformatics Tools • The Center for Computational Toxicology and Exposure has delivered many tools including – CompTox Chemicals Dashboard (primary tool from the center) – Proof-of-Concept cheminformatics modules • Chemicals Hazard Profiling • Chemical Transformations Database • Analytical Methods and Spectra • Chemical Safety Profiling 3
  • 5. Research Projects we apply them to
  • 6. 5 Research Projects we apply them to
  • 7. Research Projects we apply them to
  • 8. 7 Research Projects we apply them to
  • 9. Curating Chemistry into the DSSTox Database 8 • Chemistry underpins all of our tools • Data assembly and curation is critical • DSSTox assembled over 25 years
  • 10. Assembling data is easy. Curation is hard https://pubs.acs.org/doi/10.1021/acs.jcim.2c00268 • It is very easy to harvest and download massive amounts of data. FAIRness has expanded access… • Open API and downloadable dataset – contributing CASRNs, Names and Structures to Open Chemistry 9
  • 11. Stoichiometry is important • SIMPLE example…1 to 3 stoichiometry • 1000s of structures with bad stoichiometry into the wild 10
  • 12. Data Quality issues proliferate Taxol skeletons (105 CS/202 PubChem) 11
  • 13. Assembly and curation of data • Chemistry data as the foundation of identifiers, structures, chemical list assemblies and relationship mappings • Chemical property, fate and transport data (expt. and pred.) • Toxicity data assembled from public domain and EPA databases – in vivo and in vitro, ecotoxicity • Exposure data from public resources including EPA databases, safety data sheets, experimental and predicted • Delivered via multiple applications based on context 12
  • 15. The Charge for the Dashboard • Develop a “first-stop-shop” for environmental chemical data to support EPA and partner decision making: – Centralized location for relevant chemical data – Chemistry, exposure, hazard and dosimetry – Combination of existing data and predictive models – Publicly accessible, periodically updated, curated • Easy access to data improves efficiency and ultimately accelerates chemical risk assessment
  • 17. “Executive Summary” • Overview of toxicity- related info • Quantitative values • Physchem. and Fate & Transport • Adverse Outcome Pathway links • In vitro bioactivity summary plot
  • 18. Experimental and Predicted Data • Physchem and Fate & Transport experimental and predicted data • Data can be downloaded as Excel, TSV and CSV files
  • 20. Hazard Data for Copper • 2246 rows of human/eco hazard data harvested with 3 clicks
  • 22. Sources of Exposure to Chemicals
  • 24. Integrated Modules – Generalized Read-Across https://comptox.epa.gov/genra/ 23
  • 25. Integrated Modules – Abstract Sifter https://comptox.epa.gov/dashboard/chemical/pubmed-abstract-sifter/
  • 26. DSSTox has a RICH Data Model
  • 27. Substance Relationship Mappings contained in the data model • Similar compounds - based on structure “fingerprints” • Structure mappings - between parent and salts, isotopomers, multi- component chemicals • Related substances – monomer to polymer, parent to transformation products 26
  • 29. Complex Mappings for FORMULATIONS • Example: AFFF formulations can be registered in the database as reported in publications 28
  • 31. Polymers can map to components 30
  • 32. Chemical Lists (not all lists are created equal…)
  • 33. Chemical Lists • Chemical lists are focused on regulations, specific research efforts and categories • 450 lists and growing – TSCA Inventory – Clean Water Act Hazardous Substances – Consumer Products database – Chemicals of Emerging Concern – PFAS lists – Extractables and Leachables – …lists are versioned and updated and new lists added 32
  • 38. Harvesting Data en masse • Harvesting data for 726 biosolid related chemicals – Physicochemical properties – Fate and transport – Toxicity values – Bioactivity data in 100s of in vitro data – Exposure data – Chemical identifiers – Links to regulatory assessments
  • 40. Batch Searching is a big enabler https://pubs.acs.org/doi/10.1021/acs.jcim.0c01273 39
  • 42. Batch Search – Excel, CSV, SDF file
  • 44. We supply predicted data for many endpoints • Property prediction – e.g., water solubility, vapor pressure • Fate and Transport – e.g., bioaccumulation, bioconcentration • Bioactivity – e.g., endocrine disruption • Models are constantly updated with fresh data, are transparent in their data, and are open source 43
  • 45. QSAR Modeled Data are available • We build models then apply then to our curated datasets for release, PLUS deliver the models for realtime use 44
  • 46. Where is all the calculation detail? Are predictions in applicability domain etc? • For OPERA and TEST models we have all the details – OPERA https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0263-1 – TEST https://www.epa.gov/comptox-tools/toxicity-estimation-software-tool-test 45
  • 50. Why this detail is required • Predicted Fish Biotrans. Half-Life (Km) of PFOS is 2.7 days 49
  • 51. Why this detail is required 50
  • 53. Why this detail is required 52
  • 55. Access to Real Time Predictions 54
  • 56. Multiple modeling approaches plus Consensus 55
  • 57. Our approaches to building models • Exemplified through our recent water solubility work 56
  • 58. OECD Principles for Modeling https://www.oecd.org/chemicalsafety/risk-assessment/37849783.pdf • To facilitate the consideration of a (Q)SAR model for regulatory purposes, it should be associated with the following information: 1) a defined endpoint 2) an unambiguous algorithm 3) a defined domain of applicability 4) appropriate measures of goodness-of–fit, robustness and predictivity 5) a mechanistic interpretation, if possible • These principles have been around a long time… 57
  • 59. Lots of descriptors to choose from • Many Descriptors to choose: commercial and open source • We use Padel, Mordred and TEST descriptors (open) • Example: http://www.yapcwsoft.com/dd/padeldescriptor/ 58
  • 60. Feature Selection and Variables can help mechanistic understanding 59 Todd Martin, SERMACS, 2023 Without Feature Selection – 427 variables With Feature Selection – 19 variables R2 =0.822 R2 =0.816
  • 61. Coming Soon: Excel report for models for each data set • Cover sheet with model metadata • Training and test set statistics 60 • Training and test set statistics • Prediction results for each method
  • 62. Where do we use predictions like this? • Models are used in many places in our computational toxicology research • They are used in the analytical labs to help guide non- targeted analysis 61
  • 63. Where do we use predictions like this? • Models are used in many places in our computational toxicology research • They are used in the analytical labs to help guide non- targeted analysis • By stakeholders for Hazard profiling of chemicals 62
  • 64. Where do we use predictions like this? • Models are used in many places in our computational toxicology research • They are used in the analytical labs to help guide non- targeted analysis • By stakeholders for Hazard profiling of chemicals • Predictions for breakdown products in the environment 63
  • 65. So now you know the Dashboard… 64
  • 66. Lots of “proof-of-concept” tools in development • PoCs are research software builds to prove approaches before moving into production software environments • PoCs are to figure out how to address specific questions • Assemble data, develop data model(s), test user interface approaches, work with test user base to garner feedback • Since PoCs are internal access data refreshes and application updates can be more • Underlying APIs are being used in our research 65
  • 67. PoCs have been rebuilt for production • Examples of PoCs integrated into production apps – WebTEST predictions on the Dashboard – Structure/substructure/similarity search 66
  • 68. How to compare Hazard Data? 67
  • 69. How to compare Hazard Data? NOT Easy to interpret… 68
  • 70. Hazard Profile 69 • Hazard Comparison module profiles toxicity across chemicals https://www.epa.gov/chemical-research/cheminformatics
  • 71. Hazard Profile On-Hover view of trumping scheme call 70
  • 72. Hazard Profile On-click view of underlying data 71
  • 73. Data to Excel in <60s 72
  • 74. Linked to Chemical Transformation Simulator 73
  • 75. Linked to Chemical Transformation Simulator 74
  • 76. Simple Analog “read-across” • Suppose a chemical has limited data – perform an analog search to find related chemicals with data 75
  • 78. Where can our tools be applied • Emergency Response utility is obvious… • Consider East Palestine 77 https://www.cleveland19.com/2023/ 02/14/ntsb-announces-preliminary- malfunction-that-caused-east- palestine-train-derailment/ POLYPROPYLENE POLYETHYLENE Residue lube oil VINYL CHLORIDE DIPROPYLENE GLYCOL PROPYLENE GLYCOL DIETHYLENE GLYCOL COMBUSTIBLE LIQ., NOS (ETHYLENE GLYCOL MONOBUTYL ETHER) SEMOLINA COMBUSTIBLE LIQ., NOS (ETHYLHEXYL ACRYLATE) POLYVINYL PETROLEUM LUBEOIL POLYPROPYL GLYCOL ISOBUTYLENE BUTYL ACRYLATES, STABILIZED PETRO OIL, NEC ADDITIVES, FUEL BALLS,CTN,M EDCL SHEET STEEL VEGTABLE, FROZEN BENZENE PARAFFIN WAX FLAKES, POWDER HYDRAULIC CEMENT AUTOS PASSENGER MALT LIQUORS
  • 81. Perfect Example of FAIR Data and APIs • We owe a lot to FAIR data and availability of information • We curate a lot of our chemistry data using public resources such as PubChem, ChEBI, Common Chemistry and others • The availability of Public APIs takes things to another level! • We have been using the PubChem API to harvest data so we can build new applications, like the Safety Module 80
  • 82. Cheminformatics Safety Module (NOT PUBLIC) Integrate multiple data streams… 81
  • 83. WebTEST Batch Prediction • Batch prediction of all WebTEST predictions • Display of experimental and predicted data and reports 82
  • 84. QSAR-Ready/MS-Ready Standardizer • “QSAR and MS-Ready” standardization underpins models and linking • MS-Ready is ESSENTIAL to our support of Non-Targeted Analysis • QSAR-Ready rules need tweaking 83 https://jcheminf.biomedcentral.com/articles/10.1186/s1332 1-018-0299-2
  • 85. Structure Standardization • We CONTROL the rules…add new rules, edit existing rules 84
  • 86. Example: Tautomer Rules • We control rules for – Tautomers – Mesomers – Neutralize/De-radicalize – Break salts – Standard checks – etc…. • Necessary for mapping chemicals in DSSTox 85
  • 87. Structure Alerts Module • Structure “Alerts” module based on: – SMARTS (PAINS) – ToxPrints (Ashby and TTC) – SMILES (IARC 1, 2, 3a and 3b) 86 ID Chemical aim ashby iarc1 …
  • 88. EPA Measurement Data 87 • Measurement data are needed to ensure chemical safety • Characterize risk • Regulate use & disposal • Manage human & ecological exposures • Ensure compliance under federal statutes Chemical Monitoring Needs Exposure Assessment Dose- Response Assessment Risk Characterization Hazard Identification
  • 89. Applications of Exposomics at EPA • Ongoing efforts applying NTA to exposomics challenges including – PFAS identification – Pesticides in various matrices – CECs in water – Biosolids • Examples include… 88
  • 90. Example 1: Consumer Product Analysis 89
  • 91. Example 2: Recycled Product Analysis 90
  • 92. Example 3: Placental Tissue Analysis 91
  • 93. Applications of Exposomics at EPA • Ongoing efforts applying NTA to exposomics challenges including – PFAS identification – Pesticides in various matrices – CECs in water – Biosolids • Cheminformatics is a key component of NTA analysis – Structure standardization (MS-Ready structure forms) – Predictive models (LCMS amenability, retention time prediction) – in silico mass spectrometry prediction – Chemical Space Mapping – Chemical Transformation database – Analytical Methods and Open Spectral database 92
  • 94. AMOS: Analytical Methods and Open Spectra (NOT PUBLIC yet) • Simple Vision: I want to find the best method(s) associated with a chemical and/or class of chemicals • Answer the question “I cannot find a method for my chemical” - HELP • The Approach: – Aggregate MS method documents (and adjust the definition of “what is a useful method”) – Extract chemistry (mostly CASRN and Names) – Map CASRN and Names to structures – Deliver a proof-of-concept application to search a database by names, CASRNs, InChIKeys and ultimately structure 93
  • 95. AMOS: Analytical Methods and Open Spectra (NOT PUBLIC yet) • Three types of data in the database: – Methods (regulatory, lab manuals and SOPs, publications, tech notes) – Spectra (from public domain and our own laboratories) – Monographs (harvested from SWGDRUG and other sites) • Some methods have associated spectra • Some data are just externally linked • Currently contains around 285,000 spectra, 600,000 external links, >5000 “Fact Sheets” and >5100 methods • Spectra – LC-MS, GC-MS, NMR • ALL data are growing in number with weekly releases 94
  • 97. Linking to actual spectra 96
  • 98. Linking to actual spectra 97
  • 101. ChET Reaction Map Lists 100
  • 102. ChET Visual Reaction Maps • Compare and overlap maps • Load all maps containing a particular chemical • Prune and filter maps 101
  • 103. Chemical Space Mapping (CheMSTER) Chemical Mapping of Space Translated into Enhanced Representations 102 • Initially built to support NTA research • Functionality to overlap and compare datasets • Selection of chemicals based on variables (predicted properties) • Plug-in growing model set to add variables for comparison
  • 104. The CompTox API is now public https://api-ccte.epa.gov/docs/index.html 103
  • 105. Conclusions • Underpinning chemistry data is from the DSSTox database • CompTox Chemicals Dashboard is public access to DSSTox and other related databases • Proof-of-Concept (PoC) tools are built to prove approaches • Everything is increasingly API driven and APIs are now public 104
  • 107. You want to know more… • Lots of resources available – Presentations: https://tinyurl.com/w5hqs55 – Communities of Practice Videos: https://rb.gy/qsbno1 – Manual: https://rb.gy/4fgydc – Latest News: https://comptox.epa.gov/dashboard/news_info 106
  • 108. This talk is an overview • This talk is a high-level overview only. We can provide trainings into the individual modules and data as required • LOTS of training materials are available https://www.epa.gov/chemical-research/new-approach-methods-nams-training
  • 109. Acknowledgments • Our DSSTox curation team • SCDCD software development and DevOps teams • Scientists and students across CCTE • Non-targeted analysis and mass spectrometry team • Dashboard project team – Nisha Sipes & Phuc Do • Cheminformatics Modules and Modeling Team – Valery Tkachenko, Todd Martin, Nate Charest, Charlie Lowe • ChET – Adam Edelman-Munoz, Caroline Stevens and team • ChemSTER – Nate Charest and Adam Edelman-Munoz 108
  • 110. Contact Information • Contact info: williams.antony@epa.gov • Slides available at: https://www.slideshare.net/AntonyWilliams/ • Obtain articles from Google Scholar Profile 109