The US-EPA first delivered the CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) to the community in 2016 and it has become one of the definitive online resources for sourcing chemical toxicity data. Used by thousands of scientists a day, the stability, uptime and data quality for such a production resource is critical. The assembly of increasingly diverse data types and approaches to interrogating and delivering such data is best investigated by the delivery of proof-of-concept applications for community feedback. This has been achieved by the delivery of the EPA Cheminformatics Modules (https://www.epa.gov/comptox-tools/cheminformatics).
Methods
The Cheminformatics Modules suite of tools is available to the community and delivers novel approaches for the hazard profiling of chemicals, QSAR batch prediction of physiochemical properties and toxicological endpoints, structure alerts based on structure and substructure, and profiling of chemicals using ToxPrint chemotypes using enrichment statistics associated with ToxCast in vitro bioactivity screening. These modules provide access to new capabilities not presently available to users of the Dashboard and enable new analyses for users interested in applications of computational toxicology data and resources.
Results
The Cheminformatics Modules have been used by multiple EPA regional offices, program offices and state offices to source data and perform analyses in terms of alternatives assessment for chlorinated solvents, aggregation of data related to chemicals of emerging concern (e.g., 6PPD and related phenylenediamines) and to support the development of state specific prioritization lists for chemicals. The modules are being used to provide access to data for emergency responders and onsite coordinators and is presently being expanded with safety data sourced from PubChem to provide GHS data, transportation details, and details regarding regulatory screening levels associated with federal and state specific guidelines. The ease of use of the modules has made the harvesting of data faster and easier for a breadth of general stakeholders as screening data for use in potential decision making.
Conclusions
The delivery of chemistry, toxicity, exposure and related data from the US-EPA’s Center for Computational Toxicology and Exposure to the community is facilitated using a number of web-based applications. Proof-of-concept tools provide a means by which to deliver novel search, visualization and data delivery approaches for testing and feedback prior to adding the capabilities into production-level tools.
This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Proof-of-Concept Publicly Accessible Data Dashboards from the US-EPA.pptx
1. Innovative Research for a Sustainable Future
www.epa.gov/research
Proof-of-Concept Publicly Accessible Data Dashboards from the US-EPA
Antony Williams1, Todd Martin1 and Valery Tkachenko2
1. Center for Computational Toxicology and Exposure (CCTE), US-EPA, Research Triangle Park, NC-27709, USA
2. ScienceDataExperts Inc., MD-20850, USA
`
Society of Toxicology
March 10th – 14th, 2024
The DSSTox database (1) is the chemistry database underlying
all software applications from CCTE. This includes the publicly
available CompTox Chemicals Dashboard (2) available at the
URL: (https://comptox.epa.gov/dashboard).
Hazard Comparison Module (Publicly Accessible)
DSSTox and the CompTox Chemicals Dashboard
Safety Module (not yet Publicly Accessible)
Description
The Center for Computational Toxicology and Exposure
(CCTE) at the US-EPA has delivered web-based applications
delivering data which include properties, in vivo and in vitro
hazard data, exposure data, and other data types. The
CompTox Chemicals Dashboard delivers data for >1.2 million
chemicals, has been available since 2016, and has thousands
of users daily. To garner interest in novel ways of interrogating
and visualizing data, proof-of-concept tools have also been
delivered which include both Hazard and Safety Profile
modules which integrate publicly available data streams into a
web-interface delivering additional types of toxicity data, details
regarding measures of accidental release (including cleanup
and disposal), and firefighting measures.
The authors thank the data curation team for their rigorous
work in annotating and identifying information in the records.
Chemical data extraction, curation and annotation is an
essential part of this work.
The Hazard Comparison Module, already publicly accessible at:
https://www.epa.gov/comptox-tools/cheminformatics, allows for
the profiling of a set of chemicals according to various types of
toxicity data. Chemical search inputs can include the usual
identifiers of CASRN, names or DTXSID identifiers.
Other Cheminformatics Modules
Acknowledgements
Disclaimers
The views expressed in this poster are those of the authors and do
not necessarily reflect the views or policies of the US EPA.
References
1. EPA’s DSSTox database: History of development of a curated
chemistry resource supporting computational toxicology research:
https://doi.org/10.1016/j.comtox.2019.100096
2. The CompTox Chemistry Dashboard: a community data resource
for environmental chemistry: https://doi.org/10.1186/s13321-017-
0247-6
3. An automated framework for compiling and integrating chemical
hazard data: https://doi.org/10.1007/s10098-019-01795-w
A new safety module, not yet publicly accessible, is presently in
development and does for Safety profiling what is offered for
Hazard profiling. Following input of a set of chemicals for
searching the safety profile provides access to data of particular
interest to emergency responders and onsite coordinators
The Hazard Comparison and Safety modules are integrated to other
modules including an “identifier search module, a structure,
substructure and similarity search module, and a QSAR prediction
platform. A typical workflow would include searching all chemicals
structures associated with DSSTox using either a substructure or
similarity search against the DSSTox content of 1.2 million
chemicals, and passing the resulting hitlist over to the Hazard or
Safety profiling modules in order to generate the relevant reports.
Chemical registrations
include mappings to
relevant identifiers such as
CASRNs, chemical names
and chemical structures.
Chemical curation includes
harvesting of data from
public domain databases,
including PubChem, ECHA’s database and other resources.
Thorough processes are applied to curate and expand
chemical substance registrations to release in the Dashboard.
In the figure above the structure of Atrazine was retrieved from a
name-based search, edited to the triazine substructure in the
drawing canvas, and then searched for all chemicals above a
Tanimoto similarity of 0.6, giving a total of 94 records as a hit list.
The figure above shows a hazard profile subset of 5 chemicals,
form an input of 365 chemicals, processed in less then 60 secs,
to show different types of toxicity data. The Very High (VH),
High (H), Medium (M), Low (L) and Inconclusive (I).data are
profiled according to a trumping scheme defined in (3). Data
can be exported into an Excel file or SDF file for storage.