Metabolic similarity is a key consideration in read-across but approaches to characterise and quantify the contribution metabolism plays are still evolving. A major challenge lies in the lack of a standardized database of human xenobiotic metabolism pathways for environmental chemicals. To address this issue, we developed a metabolic simulation framework called MetSim, comprised of three main components. First, we propose a harmonised graph-based representation for managing xenobiotic metabolism pathway information between different in silico tools and empirical evidence from the literature. This schema is implemented in a Mongo database to store, retrieve and analyze large-scale metabolic graphs. Second, MetSim includes a standardised application programming interface (API) for available metabolic simulators, including BioTransformer, the OECD Toolbox, and Tissue Metabolism Simulator (TIMES). Third, MetSim includes functions to systematically evaluate the performance of metabolism simulators using recall, precision and overall accuracy on benchmark data sets. Here we report on the overall architecture of MetSim, and performance results for two data sets: (a) 59 drugs (mostly NSAIDs) and their 179 published metabolites, and (b) 718 diverse substances in the EPA Distributed Structure-Searchable Toxicity (DSSTox) database and their 1632 metabolites. The 59 drugs were processed with MetSim using BioTransformer (CypReact model with 3 cycles of human Phase I metabolism), TIMES (in vivo rat simulator model) and OECD Toolbox (in vitro rat Liver S9), producing 11202, 590, and 539 metabolites, respectively. The recall for Biotransformer, TIMES and OECD Toolbox was 0.62, 0.41 and 0.52, respectively. For the larger DSSTox dataset, two cycles of human phase I (CypReact) and one cycle of phase II metabolism were modeled using BioTransformer, and both TIMES and OECD Toolbox using the same two rat liver models, producing 60097, 6654, and 5204 metabolites, respectively. The recall for Biotransformer, TIMES and OECD Toolbox was 0.16, 0.41 and 0.38, respectively. We summarized the performance of these tools by data set, chemical class (using ClassyFire) and metabolic simulator. All tools performed well for phenanthrenes, piperadines, lactams, and azoles but poorly for pyrrolines, organonitrogen compounds, and nucleotide analogues. BioTransformer performed well for benzoxazines, benzothaizepines, and quinolines, but poorly for steroids, benzothiazines, and diazines. Conversely, TIMES and the OECD Toolbox performed well for steroids, benzothiazines, and diazines, but poorly for benzoxazines, diazinines and organooxygen compounds. MetSim provides useful data and insights on the performance and limitations of in silico metabolism tools, which will inform our subsequent efforts in characterising metabolic similarity. This abstract does not reflect EPA policy.
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
In Silico Metabolism Tools Predict Phase I & II Metabolites
1. Innovative Research for a Sustainable Future
www.epa.gov/research
Background: In Silico Metabolism and its Potential
Use in Read-Across Frameworks
Results: Predictive Performance of MetSim Tools Discussion: Diversity of Tools Can Enhance
Metabolite Recall
ORCID: 0000-0002-6357-0508
Louis Groff l groff.louis@epa.gov l 919-541-4256
MetSim: Integrated Programmatic Access and Pathway Management for Xenobiotic
Metabolism Tools
Louis Groff1, Imran Shah1, and Grace Patlewicz1
1U.S. Environmental Protection Agency, Office of Research and Development
Center for Computational Toxicology and Exposure, Research Triangle Park, NC
• Substances were assigned into
Chemical classes using Wishart
Lab ClassyFire API
• Hierarchically clustered heatmap
on left shows mean recall rate per
chemical class, scaled white to red
from 0 to 1, respectively
• Histogram on right shows number
of compounds within each
chemical class, on a log 2 scale
• Pyrimidine nucleosides, fluorenes,
pyrenes, and benzimidazoles
classes reflected the highest mean
recall rates across all tools
• Classes vinyl halides, cinnamic
acids, organic oxoanionic
compounds, and flavonoids
showed the lowest mean recall
rates observed across all tools
• Highest frequency chemical
classes include benzenes and
substituted derivatives,
azobenzenes, naphthalenes, and
organooxygen compounds
• The predictions from each tool were
evaluated on a set of literature data for
pharmaceuticals and non-steroidal anti-
inflammatory drug (NSAID) chemicals
including their phase I metabolites
• Individual recall per tool was 0.6 for
BioTransformer, 0.4 for TIMES/Toolbox,
and combined recall is 0.8
• Of the five metabolites, all three tools
captured one hydroxylated aripiprazole
metabolite
• All tools missed the oxidized metabolite
dehydroaripiprazole
References
Upcoming Publication: Groff, L. C., et al. Chem. Res. Tox. 2023, In Final Preparation.
GenRA Tool: https://comptox.epa.gov/genra/
Cheminformatics Modules Standardizer: https://hazard-dev.sciencedataexperts.com/#/stdizer
ClassyFire API: http://classyfire.wishartlab.com/
Frequency
21 22
23 24
MetSim Model
Input
Data
DTXSID
SMILES
CASRN
Organism
Rat
Human
Type
of
Metabolism
Phase I Only
Phase I &
Phase II
Metabolism
Simulator
OECD Toolbox
TIMES
BioTransformer
Simulator
Outputs
Precursor
SMILES
Metabolite
SMILES
Reactions &
Enzymes
Cheminformatics
Modules
For Precursors &
Metabolites:
SMILES
InChIKey
CASRN
DTXSID
Chemical Names
Reactions &
Enzymes
Data
Storage
Harmonize
output
formats
Store in
Mongo
Database
Methods: Metabolism Simulation (MetSim) Algorithm
• Analysis of environmental samples via high-throughput screening methods has the potential to
uncover new or data-poor chemicals, or contaminants of emerging concern (CECs)
• Read-across methodologies such as EPA’s Generalized Read-Across (GenRA) can be helpful to
identify source analogue chemicals with available toxicity information including points of departure
(POD)
• Source analogues are typically identified on the basis of structural similarity
• We seek to extend this framework to incorporate metabolism similarity between CECs and their
analogues as an additional context for analogue identification within GenRA
Recall Rate Hierarchically Clustered on Chemical Class With Histogram of Class Frequency
Conclusions and Future Directions
MetSim Algorithm Summary Flow Chart
Example: Tool Specific Performance on Aripiprazole and its Phase I Metabolites
Chemical
Class
DSSTox 700
Phase I &
Phase II
Metabolites
BioTransformer
2x Phase I &
1x Phase II
TIMES
In Vitro
TIMES
In Vivo
OECD
Toolbox
In Vitro
OECD
Toolbox
In Vivo
True Positives 266 595 648 595 679
False Positives 58357 4416 6018 4485 6937
False Negatives 1629 1305 1249 1304 1222
Total Predictions 58623 5410 6666 5080 7616
Total Precision 0.004 0.11 0.10 0.12 0.09
Total Recall 0.14 0.31 0.34 0.31 0.36
Total Reported
Metabolites
1895 1895 1895 1895 1895
MetSim Performance Metrics by Choice of Tool/Model on DSSTox-Registered Metabolites
• Dataset consists of 700 drugs and
environmental chemicals of interest to
EPA
• Regardless of tool choice, recall rate
was low
• BioTransformer generates many
predictions, but yielded the lowest recall
for this dataset
• Toolbox/TIMES generated fewer
predictions, resulting in both higher
precision and higher recall rates than
BioTransformer for this dataset
• Toolbox In Vivo Rat Simulator had the
highest recall of all tools evaluated
• All metabolism prediction tools require either an application programming interface (API) or
command line interface (CLI) for incorporation into MetSim
• All tools in this study used SMILES as an input, which can be obtained from DSSTox Substance ID
(DTXSID) searches, if registered. CAS Registry Number (CASRN) is an alternate input that can be
used to query the OECD Toolbox WebAPI
• Human tissue metabolism is handled with BioTransformer, which performs both phase I & II
metabolism
• Both OECD Toolbox and TIMES contain simulators for In Vitro and In Vivo Rat tissue metabolism.
The Toolbox only performs phase I metabolism, TIMES includes phase II metabolism as well
• All tools were applied to a dataset of 59 common drugs to predict their phase I metabolites, and 700
EPA-relevant chemicals registered in DSSTox to predict their phase I and phase II metabolites
• Outputs are aggregated into a harmonized dictionary format containing precursor and metabolite
SMILES, as well as reaction and enzyme information from BioTransformer and TIMES
• DTXSID and CASRN for a metabolite were added to the dictionary where available by querying the
EPA Cheminformatics Modules Standardizer API using metabolite SMILES
• Output dictionaries for each parent are loaded into a Mongo database for subsequent use
• TIMES and OECD Toolbox captured the second of two aripiprazole hydroxides but missed the
other three metabolites
• BioTransformer identified 7-(4-hydroxybutoxy)-3,4-dihydroquinolin-2(1H)-one and 1-(2,3-
Dichlorophenyl) piperazine metabolites but missed the remaining two metabolites
Usefulness of Tools Applied in this Study
• Low precision of BioTransformer is offset by the fact that expansive sets of predictions can be
helpful for high-throughput screening studies aimed at identifying CECs, such as Non-Targeted
Analysis (NTA), where CECs could be metabolites in an environmental sample, and later correlated
with their respective parents
• Toolbox/TIMES training datasets contain more environmentally relevant chemicals
• Our MetSim algorithm successfully harmonized inputs and outputs between different in silico
metabolism tools, to facilitate storage and interaction of prediction data within a Mongo database
• MetSim could be extended to accommodate information from other in silico tools if they possess an
API or CLI
• Individual tools can be utilized to predict metabolism in human or rat tissues
• All tools yielded low recall on the dataset of 700 drugs and environmental chemicals of interest to
EPA
• Inspection of false negative predictions on a per chemical class basis can provide useful insights on
what refinements to the transformation rule sets are merited for each tool
• Future work in progress will focus on analyzing the graphs stored within MetSim to evaluate graph
similarity with different metrics to inform metabolic similarity within read-across (MetGraph)
Motivation for Metabolism Within Read-Across
Challenges to Address for Successful Implementation into GenRA
• A variety of in silico tools are available that enable metabolites to be simulated.
• Each tool outputs its prediction information in a different format. For comparison purposes, a
harmonized output data format is required.
• No single standardized database of xenobiotic metabolism pathways of environmental chemicals
exists at present.
• Develop a database to store prediction data to facilitate analysis of metabolic graphical networks
Disclaimer: The views expressed in this presentation are those of the
authors and do not necessarily reflect the views or policies of the U.S. EPA.