Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones at the receptor level and alter synthesis, transport and metabolism pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being partially addressed by the use of high-throughput screening (HTS) in vitro approaches and computational modeling. In the framework of the Endocrine Disruptor Screening Program (EDSP), the U.S. EPA led two worldwide consortiums to “virtually” (i.e., in silico) screen chemicals for their potential estrogenic and androgenic activities. The Collaborative Estrogen Receptor (ER) Activity Prediction Project (CERAPP) [1] predicted activities for 32,464 chemicals and the Collaborative Modeling Project for Androgen Receptor (AR) Activity (CoMPARA) generated predictions on the CERAPP list with additional simulated metabolites, totaling 55,450 unique structures. Modelers and computational toxicology scientists from 30 international groups contributed structure-based models and results for activity prediction to one or both projects, with methods ranging from QSARs to docking to predict binding, agonism and antagonism activities. Models were based on a common training set of 1746 chemicals having ToxCast/Tox21 HTS in vitro assay results (18 assays for ER and 11 for AR) integrated into computational networks. The models were then validated using curated literature data from different sources (~7,000 results for ER and ~5,000 results for AR). To overcome the limitations of single approaches, CERAPP and CoMPARA models were each combined into consensus models reaching high predictive accuracy. These consensus models were extended beyond the initially designed datasets by implementing them into the free and open-source application OPERA to avoid running every single model on new chemicals [2]. This implementation was used to screen the entire EPA DSSTox database of ~750,000 chemicals and predicted ER and AR activity is made freely available on the CompTox Chemistry dashboard (https://comptox.epa.gov/dashboard) [3].
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
Similar to Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors. ACS 2018 (New Orleans, USA)
Using available tools for tiered assessments and rapid MoERebeccaClewell
Similar to Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors. ACS 2018 (New Orleans, USA) (20)
Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors. ACS 2018 (New Orleans, USA)
1. Virtual screening of chemicals for endocrine
disrupting activity: Case studies of the estrogen
and androgen receptors
Kamel Mansouri, PhD
Research Investigator
Computational Chemistry
919-558-1356
kmansouri@scitovation.com www.scitovation.com
American Chemical Society meeting
New Orleans, LA
March 21, 2018
Kamel Mansouri: ScitoVation LLC
Nicole Kleinstreuer, NICEATM/ NIEHS.
Christopher Grulke, Ann Richard, Imran Shah, Antony Williams, Richard Judson: NCCT/ US EPA
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
2. Background and Goals
• U.S. Congress mandated that the EPA screen chemicals for
their potential to be endocrine disruptors
• This led to the development of the Endocrine Disruptor
Screening Program (EDSP)
• The initial focus was on environmental estrogens, but the
program was expanded to include androgens and thyroid
pathway disruptors
3. Problem statement
• EDSP Consists of Tier 1 and Tier 2 tests
• Tier 1 is a battery of 11 in vitro and in vivo assays
• Cost ~$1,000,000 per chemical
• Throughput is ~50 chemicals / year
• Total cost of Tier 1 is billions of dollars and will take
100 years at the current rate
• Need pre-tier 1 filter
• Use combination of structure modeling tools and
high-throughput screening “EDSP21”
4. Computational Toxicology
Too many chemicals to test with standard animal-
based methods
–Cost (~$1,000,000/chemical), time, animal welfare
–10,000 chemicals to be tested for EDSP
–Fill the data gaps and bridge the lack of knowledge
Alternative
5. Quantitative Structure Activity/Property Relationships
(QSAR/QSPR)
Congenericity principle: QSARs correlate, within congeneric series of compounds,
their chemical or biological activities, either with certain structural features or with
atomic, group or molecular descriptors.
Katritzky, A. R.; Lobanov, V. S.; Karelson, M. Chem. Soc. Rev. 1995, 279-287
Original Structure
Representation Feature selection
Activities
Descriptors
Y = f(bi , X)
X - descriptors (selected variables)
bi - fitted parameters
6. Development of a QSAR model
• Curation of experimental data
• Preparation of training and test sets
• Calculation of an initial set of descriptors
• Selection of a mathematical method
• Variable selection technique
• Validation of the model’s predictive ability
• Define the Applicability Domain
7. In silico screening of chemicals for ER and AR activity
CoMPARA
Collaborative Modeling Project
for Androgen Receptor Activity
Mansouri et al. (http://ehp.niehs.nih.gov/15-10267/) Mansouri et al. (DOI: 10.13140/RG.2.2.19612.80009)
9. Group ID Institution Country
1 ATSDR
Agency for Toxic Substances and
Disease Registry. CDC.
USA
2 DTU Technical University of Denmark. Denmark
3 ECUST
East China University of Science and
Technology.
China
4 IBMC Institute of Biomedical Chemistry. Russia
5 IDEA IdeaConsult, Ltd. Bulgaria
6 ILS Integrated Laboratory Systems, Inc. USA
7 INSLA University of Insubria. Italy
8 INSLA Lanzhou University. China
9 IRFMN
Istituto di Ricerche Farmacologiche
“Mario Negri”.
Italy
10 JRC
Joint Research Centre of the
European Commission.
Italy
11 LM Lockheed Martin IS&GS. USA
12 MTI Molecules Theurapetiques In silico. France
13 NCATS
National Center for Advancing
Translational Sciences.
USA
14 NCCT
National Center for Computational
Toxicology. EPA.
USA
15 NCI National Cancer Institute. USA
Group ID Institution Country
16 NCSU North Carolina State University. USA
17 NCTR
National Center for Toxicological Research.
FDA.
USA
18 NICEATM
NTP Interagency Center for the Evaluation of
Alternative Toxicological Methods.
USA
19 NRMRL
National Risk Management Research
Laboratory. EPA.
USA
20 RIFM Research Institute for Fragrance Materials USA
21 SWETOX Swedish toxicology research center. Sweden
22 TARTU University of Tartu. Estonia
23 TUM Technical University Munich. Germany
24 UFG Federal University of Golas. Brazil
25 UMEA University of UMEA. Sweden
26 UNC University of North Carolina. USA
27 UNIBA University of Bari. Italy
28 UNIMIB University of Milano-Bicocca. Italy
29 UNISTRA University of Strasbourg. France
30 VCCLAB Virtual Computational Chemistry Laboratory. Germany
10. Plan of the projects
Steps Tasks
1: Training and prioritization sets
Organizers
- ToxCast assays for training set data
- AUC values and discrete classes for continuous/classification modeling
- QSAR-ready training set and prioritization set
2: Experimental validation set
Organizers
- Collect and clean experimental data from the literature
- Prepare validation sets for qualitative and quantitative models
3: Modeling & predictions
All participants
- Train/refine the models based on the training set
- Deliver predictions and applicability domains for evaluation
4: Model evaluation
Organizers
- Evaluate the predictions of each model separately
- Assign a score for each model based on the evaluation step
5: Consensus modeling
Organizers
- Use the weighting scheme based on the scores to generate the consensus
- Use the same validation set to evaluate consensus predictions
6: Manuscript writing
All participants
- Descriptions of modeling approaches for each individual model
- Input of the participants on the draft of the manuscript
11. Judson et al Toxicol. Sci. (2015) 148: 137-154
Tox21/ToxCast Pathway Models
Kleinstreuer N. C. et al. 2017 30 (4), 946-964.
Tox21/ToxCast ER Pathway Model Tox21/ToxCast AR Pathway Model
12. ER & AR assyas combination to train the models
AUC=0.1
Equivalent to
AC50=100 uM
ER training data AR training data
Active Inactive Total Active Inactive Total
Binding 237 1440 1677 198 1464 1662
Agonist 219 1458 1677 43 1616 1659
Antagonist 41 1636 1677 159 1366 1525
Total 237 1440 1677 198 1648 1688
13. Remove of
duplicates
Normalize of
tautomers
Clean salts and
counterions
Remove inorganics
and mixtures
Final inspection
QSAR-ready
structures
Indigo
Aim of the workflow:
• Combine different procedures and ideas
• Minimize the differences between the structures used for prediction
• Produce a flexible free and open source workflow to be shared
QSAR-ready KNIME workflow
Mansouri et al. (http://ehp.niehs.nih.gov/15-10267/)
14. Chemicals for Prediction: covering the Human Exposure Universe
• EDSP Universe (10K)
• Chemicals with known use (40K) (CPCat & ACToR)
• Canadian Domestic Substances List (DSL) (23K)
• EPA DSSTox – structures of EPA/FDA interest (15K)
• ToxCast and Tox21 (In vitro ER data) (8K)
32,464 unique QSAR-ready structures
• CERAPP list: ~32k structures
• EINECS: European INventory
• ~60k structures
• ~55k QSAR-ready structures
• ~38k non overlapping with the CERAPP list
• ~18k overlap with DSSTox
• ToxCast metabolites: ~6k unique structures
47,888 + 6592 = 55,450 QSAR ready structures
CoMPARA Prediction setCERAPP Prediction set
15. a) Tox21: ~8k chemicals
b) FDA EDKB DB: ~8k chemicals;
c) METI DB: ~2k chemicals
d) ChEMBL DB: ~2k chemicals
~60k experimental values for ~15k chemicals
Mansouri et al. (2016) EHP 124:1023–1033
Experimental data for evaluation set
CERAPP validation set CoMPARA validation set
Harris and Judson (in preparation)
1.2 million assay records
2.1 million chemical structures
10 thousand protein targets
Scrub Chem
~80K experimental values for ~11k chemicals
ER validation data AR validation data
Active Inactive Total Active Inactive Total
Binding 1982 5301 7283 453 3429 3882
Agonist 350 5969 6319 167 4672 4839
Antagonist 284 6255 6539 355 3685 4040
Total 2017 7024 7522 487 4928 5273
16. Received models for both projects
Validation of the models:
CERAPP participants models CoMPARA participants models
Categorical Continuous Total Categorical Continuous Total
Binding 21 3 24 32 5 37
Agonist 11 3 14 20 5 25
Antagonist 8 2 10 22 3 25
Total 40 8 48 74 13 87
18. Coverage of the models
Distributions of the number of the predicted chemical structures by all binding models.
CERAPP models CoMPARA models
19. Consensus models assessment
Binding Agonist Antagonist
Train Test Train Test Train Test
Sn 0.93 0.58 0.85 0.94 0.67 0.18
Sp 0.97 0.92 0.98 0.94 0.94 0.90
BA 0.95 0.75 0.92 0.94 0.80 0.54
CERAPP CoMPARA
Active Inactive Active Inactive
Binding 4001 28463 8202 40656
Agonist 2475 29989 1764 47094
Antagonist 2793 29671 9899 38959
Total 4001 28463 10623 47613
ToxCast metabolites
Active Inactive
Binding 1609 4983
Agonist 428 6164
Antagonist 1820 4772
Total 1989 6325
Binding Agonist Antagonist
Train Test Train Test Train Test
Sn 0.99 0.69 0.95 0.74 1.00 0.61
Sp 0.91 0.87 0.98 0.97 0.95 0.87
BA 0.95 0.78 0.97 0.86 0.97 0.74
CERAPP consensus CoMPARA consensus
CoMPARA predictions
20. Concordance and prioritization
Concordance of binding models on the active compounds of the prediction sets.
757 chemicals
have >75%
positive
concordance
Most models predict most
chemicals as ER non-binders
1772 chemicals
have >75%
positive
concordance
Most models predict most
chemicals as AR non-binders
CERAPP models CoMPARA models
21. High general concordance across all binding models included in the consensus.
Confidence in prediction
CERAPP models CoMPARA models
22.
23. OPERA & the EPA CompTox Dashboard
https://github.com/kmansouri/OPERA
https://comptox.epa.gov/dashboard/
24. Summary
• Prioritized tens of thousands of chemicals for ER & AR in a fast accurate and
economic way to help with the EDSP program.
• Generated high quality data and models that can be reused
• Free & open-source code and workflows
• Published manuscripts in peer reviewed journals
• Data and predictions available for visualization on the EPA’s dashboard:
25. Thank you for your attention
National Center for Computational Toxicology, US EPA
CERAPP participants
CoMPARA participants
Acknowledgements: