Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors. ACS 2018 (New Orleans, USA)

Virtual screening of chemicals for endocrine
disrupting activity: Case studies of the estrogen
and androgen receptors
Kamel Mansouri, PhD
Research Investigator
Computational Chemistry
919-558-1356
kmansouri@scitovation.com www.scitovation.com
American Chemical Society meeting
New Orleans, LA
March 21, 2018
Kamel Mansouri: ScitoVation LLC
Nicole Kleinstreuer, NICEATM/ NIEHS.
Christopher Grulke, Ann Richard, Imran Shah, Antony Williams, Richard Judson: NCCT/ US EPA
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA

Background and Goals
• U.S. Congress mandated that the EPA screen chemicals for
their potential to be endocrine disruptors
• This led to the development of the Endocrine Disruptor
Screening Program (EDSP)
• The initial focus was on environmental estrogens, but the
program was expanded to include androgens and thyroid
pathway disruptors

Problem statement
• EDSP Consists of Tier 1 and Tier 2 tests
• Tier 1 is a battery of 11 in vitro and in vivo assays
• Cost ~$1,000,000 per chemical
• Throughput is ~50 chemicals / year
• Total cost of Tier 1 is billions of dollars and will take
100 years at the current rate
• Need pre-tier 1 filter
• Use combination of structure modeling tools and
high-throughput screening “EDSP21”

Computational Toxicology
Too many chemicals to test with standard animal-
based methods
–Cost (~$1,000,000/chemical), time, animal welfare
–10,000 chemicals to be tested for EDSP
–Fill the data gaps and bridge the lack of knowledge
Alternative

Quantitative Structure Activity/Property Relationships
(QSAR/QSPR)
Congenericity principle: QSARs correlate, within congeneric series of compounds,
their chemical or biological activities, either with certain structural features or with
atomic, group or molecular descriptors.
Katritzky, A. R.; Lobanov, V. S.; Karelson, M. Chem. Soc. Rev. 1995, 279-287
Original Structure
Representation Feature selection
Activities
Descriptors
Y = f(bi , X)
X - descriptors (selected variables)
bi - fitted parameters

Development of a QSAR model
• Curation of experimental data
• Preparation of training and test sets
• Calculation of an initial set of descriptors
• Selection of a mathematical method
• Variable selection technique
• Validation of the model’s predictive ability
• Define the Applicability Domain

In silico screening of chemicals for ER and AR activity
CoMPARA
Collaborative Modeling Project
for Androgen Receptor Activity
Mansouri et al. (http://ehp.niehs.nih.gov/15-10267/) Mansouri et al. (DOI: 10.13140/RG.2.2.19612.80009)

Participants
8
Interactive map:
https://batchgeo.com/map/9d3ff810a72d8a84093c74ab0601f01d

Group ID Institution Country
1 ATSDR
Agency for Toxic Substances and
Disease Registry. CDC.
USA
2 DTU Technical University of Denmark. Denmark
3 ECUST
East China University of Science and
Technology.
China
4 IBMC Institute of Biomedical Chemistry. Russia
5 IDEA IdeaConsult, Ltd. Bulgaria
6 ILS Integrated Laboratory Systems, Inc. USA
7 INSLA University of Insubria. Italy
8 INSLA Lanzhou University. China
9 IRFMN
Istituto di Ricerche Farmacologiche
“Mario Negri”.
Italy
10 JRC
Joint Research Centre of the
European Commission.
Italy
11 LM Lockheed Martin IS&GS. USA
12 MTI Molecules Theurapetiques In silico. France
13 NCATS
National Center for Advancing
Translational Sciences.
USA
14 NCCT
National Center for Computational
Toxicology. EPA.
USA
15 NCI National Cancer Institute. USA
Group ID Institution Country
16 NCSU North Carolina State University. USA
17 NCTR
National Center for Toxicological Research.
FDA.
USA
18 NICEATM
NTP Interagency Center for the Evaluation of
Alternative Toxicological Methods.
USA
19 NRMRL
National Risk Management Research
Laboratory. EPA.
USA
20 RIFM Research Institute for Fragrance Materials USA
21 SWETOX Swedish toxicology research center. Sweden
22 TARTU University of Tartu. Estonia
23 TUM Technical University Munich. Germany
24 UFG Federal University of Golas. Brazil
25 UMEA University of UMEA. Sweden
26 UNC University of North Carolina. USA
27 UNIBA University of Bari. Italy
28 UNIMIB University of Milano-Bicocca. Italy
29 UNISTRA University of Strasbourg. France
30 VCCLAB Virtual Computational Chemistry Laboratory. Germany

Plan of the projects
Steps Tasks
1: Training and prioritization sets
Organizers
- ToxCast assays for training set data
- AUC values and discrete classes for continuous/classification modeling
- QSAR-ready training set and prioritization set
2: Experimental validation set
Organizers
- Collect and clean experimental data from the literature
- Prepare validation sets for qualitative and quantitative models
3: Modeling & predictions
All participants
- Train/refine the models based on the training set
- Deliver predictions and applicability domains for evaluation
4: Model evaluation
Organizers
- Evaluate the predictions of each model separately
- Assign a score for each model based on the evaluation step
5: Consensus modeling
Organizers
- Use the weighting scheme based on the scores to generate the consensus
- Use the same validation set to evaluate consensus predictions
6: Manuscript writing
All participants
- Descriptions of modeling approaches for each individual model
- Input of the participants on the draft of the manuscript

Judson et al Toxicol. Sci. (2015) 148: 137-154
Tox21/ToxCast Pathway Models
Kleinstreuer N. C. et al. 2017 30 (4), 946-964.
Tox21/ToxCast ER Pathway Model Tox21/ToxCast AR Pathway Model

ER & AR assyas combination to train the models
AUC=0.1
Equivalent to
AC50=100 uM
ER training data AR training data
Active Inactive Total Active Inactive Total
Binding 237 1440 1677 198 1464 1662
Agonist 219 1458 1677 43 1616 1659
Antagonist 41 1636 1677 159 1366 1525
Total 237 1440 1677 198 1648 1688

Remove of
duplicates
Normalize of
tautomers
Clean salts and
counterions
Remove inorganics
and mixtures
Final inspection
QSAR-ready
structures
Indigo
Aim of the workflow:
• Combine different procedures and ideas
• Minimize the differences between the structures used for prediction
• Produce a flexible free and open source workflow to be shared
QSAR-ready KNIME workflow
Mansouri et al. (http://ehp.niehs.nih.gov/15-10267/)

Chemicals for Prediction: covering the Human Exposure Universe
• EDSP Universe (10K)
• Chemicals with known use (40K) (CPCat & ACToR)
• Canadian Domestic Substances List (DSL) (23K)
• EPA DSSTox – structures of EPA/FDA interest (15K)
• ToxCast and Tox21 (In vitro ER data) (8K)
32,464 unique QSAR-ready structures
• CERAPP list: ~32k structures
• EINECS: European INventory
• ~60k structures
• ~55k QSAR-ready structures
• ~38k non overlapping with the CERAPP list
• ~18k overlap with DSSTox
• ToxCast metabolites: ~6k unique structures
47,888 + 6592 = 55,450 QSAR ready structures
CoMPARA Prediction setCERAPP Prediction set

a) Tox21: ~8k chemicals
b) FDA EDKB DB: ~8k chemicals;
c) METI DB: ~2k chemicals
d) ChEMBL DB: ~2k chemicals
~60k experimental values for ~15k chemicals
Mansouri et al. (2016) EHP 124:1023–1033
Experimental data for evaluation set
CERAPP validation set CoMPARA validation set
Harris and Judson (in preparation)
1.2 million assay records
2.1 million chemical structures
10 thousand protein targets
Scrub Chem
~80K experimental values for ~11k chemicals
ER validation data AR validation data
Active Inactive Total Active Inactive Total
Binding 1982 5301 7283 453 3429 3882
Agonist 350 5969 6319 167 4672 4839
Antagonist 284 6255 6539 355 3685 4040
Total 2017 7024 7522 487 4928 5273

Received models for both projects
Validation of the models:
CERAPP participants models CoMPARA participants models
Categorical Continuous Total Categorical Continuous Total
Binding 21 3 24 32 5 37
Agonist 11 3 14 20 5 25
Antagonist 8 2 10 22 3 25
Total 40 8 48 74 13 87

Model validation & Consensus modeling
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CERAPP categorical models (40)
Binding Agonist Antagonist
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CoMPARA categorical models (74)

Coverage of the models
Distributions of the number of the predicted chemical structures by all binding models.
CERAPP models CoMPARA models

Consensus models assessment
Train Test Train Test Train Test
Sn 0.93 0.58 0.85 0.94 0.67 0.18
Sp 0.97 0.92 0.98 0.94 0.94 0.90
BA 0.95 0.75 0.92 0.94 0.80 0.54
CERAPP CoMPARA
Active Inactive Active Inactive
Binding 4001 28463 8202 40656
Agonist 2475 29989 1764 47094
Antagonist 2793 29671 9899 38959
Total 4001 28463 10623 47613
ToxCast metabolites
Active Inactive
Binding 1609 4983
Agonist 428 6164
Antagonist 1820 4772
Total 1989 6325
Train Test Train Test Train Test
Sn 0.99 0.69 0.95 0.74 1.00 0.61
Sp 0.91 0.87 0.98 0.97 0.95 0.87
BA 0.95 0.78 0.97 0.86 0.97 0.74
CERAPP consensus CoMPARA consensus
CoMPARA predictions

Concordance and prioritization
Concordance of binding models on the active compounds of the prediction sets.
757 chemicals
have >75%
positive
concordance
Most models predict most
chemicals as ER non-binders
1772 chemicals
have >75%
positive
concordance
Most models predict most
chemicals as AR non-binders

High general concordance across all binding models included in the consensus.
Confidence in prediction

OPERA & the EPA CompTox Dashboard
https://github.com/kmansouri/OPERA
https://comptox.epa.gov/dashboard/

Summary
• Prioritized tens of thousands of chemicals for ER & AR in a fast accurate and
economic way to help with the EDSP program.
• Generated high quality data and models that can be reused
• Free & open-source code and workflows
• Published manuscripts in peer reviewed journals
• Data and predictions available for visualization on the EPA’s dashboard:

Thank you for your attention
National Center for Computational Toxicology, US EPA
CERAPP participants
CoMPARA participants
Acknowledgements:

Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors. ACS 2018 (New Orleans, USA)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors. ACS 2018 (New Orleans, USA)

Similar to Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors. ACS 2018 (New Orleans, USA) (20)

Recently uploaded

Recently uploaded (20)

Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors. ACS 2018 (New Orleans, USA)