SlideShare a Scribd company logo
1 of 25
Download to read offline
Virtual screening of chemicals for endocrine
disrupting activity: Case studies of the estrogen
and androgen receptors
Kamel Mansouri, PhD
Research Investigator
Computational Chemistry
919-558-1356
kmansouri@scitovation.com www.scitovation.com
American Chemical Society meeting
New Orleans, LA
March 21, 2018
Kamel Mansouri: ScitoVation LLC
Nicole Kleinstreuer, NICEATM/ NIEHS.
Christopher Grulke, Ann Richard, Imran Shah, Antony Williams, Richard Judson: NCCT/ US EPA
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
Background and Goals
• U.S. Congress mandated that the EPA screen chemicals for
their potential to be endocrine disruptors
• This led to the development of the Endocrine Disruptor
Screening Program (EDSP)
• The initial focus was on environmental estrogens, but the
program was expanded to include androgens and thyroid
pathway disruptors
Problem statement
• EDSP Consists of Tier 1 and Tier 2 tests
• Tier 1 is a battery of 11 in vitro and in vivo assays
• Cost ~$1,000,000 per chemical
• Throughput is ~50 chemicals / year
• Total cost of Tier 1 is billions of dollars and will take
100 years at the current rate
• Need pre-tier 1 filter
• Use combination of structure modeling tools and
high-throughput screening “EDSP21”
Computational Toxicology
Too many chemicals to test with standard animal-
based methods
–Cost (~$1,000,000/chemical), time, animal welfare
–10,000 chemicals to be tested for EDSP
–Fill the data gaps and bridge the lack of knowledge
Alternative
Quantitative Structure Activity/Property Relationships
(QSAR/QSPR)
Congenericity principle: QSARs correlate, within congeneric series of compounds,
their chemical or biological activities, either with certain structural features or with
atomic, group or molecular descriptors.
Katritzky, A. R.; Lobanov, V. S.; Karelson, M. Chem. Soc. Rev. 1995, 279-287
Original Structure
Representation Feature selection
Activities
Descriptors
Y = f(bi , X)
X - descriptors (selected variables)
bi - fitted parameters
Development of a QSAR model
• Curation of experimental data
• Preparation of training and test sets
• Calculation of an initial set of descriptors
• Selection of a mathematical method
• Variable selection technique
• Validation of the model’s predictive ability
• Define the Applicability Domain
In silico screening of chemicals for ER and AR activity
CoMPARA
Collaborative Modeling Project
for Androgen Receptor Activity
Mansouri et al. (http://ehp.niehs.nih.gov/15-10267/) Mansouri et al. (DOI: 10.13140/RG.2.2.19612.80009)
Participants
8
Interactive map:
https://batchgeo.com/map/9d3ff810a72d8a84093c74ab0601f01d
Group ID Institution Country
1 ATSDR
Agency for Toxic Substances and
Disease Registry. CDC.
USA
2 DTU Technical University of Denmark. Denmark
3 ECUST
East China University of Science and
Technology.
China
4 IBMC Institute of Biomedical Chemistry. Russia
5 IDEA IdeaConsult, Ltd. Bulgaria
6 ILS Integrated Laboratory Systems, Inc. USA
7 INSLA University of Insubria. Italy
8 INSLA Lanzhou University. China
9 IRFMN
Istituto di Ricerche Farmacologiche
“Mario Negri”.
Italy
10 JRC
Joint Research Centre of the
European Commission.
Italy
11 LM Lockheed Martin IS&GS. USA
12 MTI Molecules Theurapetiques In silico. France
13 NCATS
National Center for Advancing
Translational Sciences.
USA
14 NCCT
National Center for Computational
Toxicology. EPA.
USA
15 NCI National Cancer Institute. USA
Group ID Institution Country
16 NCSU North Carolina State University. USA
17 NCTR
National Center for Toxicological Research.
FDA.
USA
18 NICEATM
NTP Interagency Center for the Evaluation of
Alternative Toxicological Methods.
USA
19 NRMRL
National Risk Management Research
Laboratory. EPA.
USA
20 RIFM Research Institute for Fragrance Materials USA
21 SWETOX Swedish toxicology research center. Sweden
22 TARTU University of Tartu. Estonia
23 TUM Technical University Munich. Germany
24 UFG Federal University of Golas. Brazil
25 UMEA University of UMEA. Sweden
26 UNC University of North Carolina. USA
27 UNIBA University of Bari. Italy
28 UNIMIB University of Milano-Bicocca. Italy
29 UNISTRA University of Strasbourg. France
30 VCCLAB Virtual Computational Chemistry Laboratory. Germany
Plan of the projects
Steps Tasks
1: Training and prioritization sets
Organizers
- ToxCast assays for training set data
- AUC values and discrete classes for continuous/classification modeling
- QSAR-ready training set and prioritization set
2: Experimental validation set
Organizers
- Collect and clean experimental data from the literature
- Prepare validation sets for qualitative and quantitative models
3: Modeling & predictions
All participants
- Train/refine the models based on the training set
- Deliver predictions and applicability domains for evaluation
4: Model evaluation
Organizers
- Evaluate the predictions of each model separately
- Assign a score for each model based on the evaluation step
5: Consensus modeling
Organizers
- Use the weighting scheme based on the scores to generate the consensus
- Use the same validation set to evaluate consensus predictions
6: Manuscript writing
All participants
- Descriptions of modeling approaches for each individual model
- Input of the participants on the draft of the manuscript
Judson et al Toxicol. Sci. (2015) 148: 137-154
Tox21/ToxCast Pathway Models
Kleinstreuer N. C. et al. 2017 30 (4), 946-964.
Tox21/ToxCast ER Pathway Model Tox21/ToxCast AR Pathway Model
ER & AR assyas combination to train the models
AUC=0.1
Equivalent to
AC50=100 uM
ER training data AR training data
Active Inactive Total Active Inactive Total
Binding 237 1440 1677 198 1464 1662
Agonist 219 1458 1677 43 1616 1659
Antagonist 41 1636 1677 159 1366 1525
Total 237 1440 1677 198 1648 1688
Remove of
duplicates
Normalize of
tautomers
Clean salts and
counterions
Remove inorganics
and mixtures
Final inspection
QSAR-ready
structures
Indigo
Aim of the workflow:
• Combine different procedures and ideas
• Minimize the differences between the structures used for prediction
• Produce a flexible free and open source workflow to be shared
QSAR-ready KNIME workflow
Mansouri et al. (http://ehp.niehs.nih.gov/15-10267/)
Chemicals for Prediction: covering the Human Exposure Universe
• EDSP Universe (10K)
• Chemicals with known use (40K) (CPCat & ACToR)
• Canadian Domestic Substances List (DSL) (23K)
• EPA DSSTox – structures of EPA/FDA interest (15K)
• ToxCast and Tox21 (In vitro ER data) (8K)
32,464 unique QSAR-ready structures
• CERAPP list: ~32k structures
• EINECS: European INventory
• ~60k structures
• ~55k QSAR-ready structures
• ~38k non overlapping with the CERAPP list
• ~18k overlap with DSSTox
• ToxCast metabolites: ~6k unique structures
47,888 + 6592 = 55,450 QSAR ready structures
CoMPARA Prediction setCERAPP Prediction set
a) Tox21: ~8k chemicals
b) FDA EDKB DB: ~8k chemicals;
c) METI DB: ~2k chemicals
d) ChEMBL DB: ~2k chemicals
~60k experimental values for ~15k chemicals
Mansouri et al. (2016) EHP 124:1023–1033
Experimental data for evaluation set
CERAPP validation set CoMPARA validation set
Harris and Judson (in preparation)
1.2 million assay records
2.1 million chemical structures
10 thousand protein targets
Scrub Chem
~80K experimental values for ~11k chemicals
ER validation data AR validation data
Active Inactive Total Active Inactive Total
Binding 1982 5301 7283 453 3429 3882
Agonist 350 5969 6319 167 4672 4839
Antagonist 284 6255 6539 355 3685 4040
Total 2017 7024 7522 487 4928 5273
Received models for both projects
Validation of the models:
CERAPP participants models CoMPARA participants models
Categorical Continuous Total Categorical Continuous Total
Binding 21 3 24 32 5 37
Agonist 11 3 14 20 5 25
Antagonist 8 2 10 22 3 25
Total 40 8 48 74 13 87
Model validation & Consensus modeling
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CERAPP categorical models (40)
Binding Agonist Antagonist
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CoMPARA categorical models (74)
Binding Agonist Antagonist
Coverage of the models
Distributions of the number of the predicted chemical structures by all binding models.
CERAPP models CoMPARA models
Consensus models assessment
Binding Agonist Antagonist
Train Test Train Test Train Test
Sn 0.93 0.58 0.85 0.94 0.67 0.18
Sp 0.97 0.92 0.98 0.94 0.94 0.90
BA 0.95 0.75 0.92 0.94 0.80 0.54
CERAPP CoMPARA
Active Inactive Active Inactive
Binding 4001 28463 8202 40656
Agonist 2475 29989 1764 47094
Antagonist 2793 29671 9899 38959
Total 4001 28463 10623 47613
ToxCast metabolites
Active Inactive
Binding 1609 4983
Agonist 428 6164
Antagonist 1820 4772
Total 1989 6325
Binding Agonist Antagonist
Train Test Train Test Train Test
Sn 0.99 0.69 0.95 0.74 1.00 0.61
Sp 0.91 0.87 0.98 0.97 0.95 0.87
BA 0.95 0.78 0.97 0.86 0.97 0.74
CERAPP consensus CoMPARA consensus
CoMPARA predictions
Concordance and prioritization
Concordance of binding models on the active compounds of the prediction sets.
757 chemicals
have >75%
positive
concordance
Most models predict most
chemicals as ER non-binders
1772 chemicals
have >75%
positive
concordance
Most models predict most
chemicals as AR non-binders
CERAPP models CoMPARA models
High general concordance across all binding models included in the consensus.
Confidence in prediction
CERAPP models CoMPARA models
OPERA & the EPA CompTox Dashboard
https://github.com/kmansouri/OPERA
https://comptox.epa.gov/dashboard/
Summary
• Prioritized tens of thousands of chemicals for ER & AR in a fast accurate and
economic way to help with the EDSP program.
• Generated high quality data and models that can be reused
• Free & open-source code and workflows
• Published manuscripts in peer reviewed journals
• Data and predictions available for visualization on the EPA’s dashboard:
Thank you for your attention
National Center for Computational Toxicology, US EPA
CERAPP participants
CoMPARA participants
Acknowledgements:

More Related Content

What's hot

EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...Kamel Mansouri
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in RRajarshi Guha
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Kamel Mansouri
 
Ovanes Mekenyan - Lush Prize Conference 2014
Ovanes Mekenyan - Lush Prize Conference 2014Ovanes Mekenyan - Lush Prize Conference 2014
Ovanes Mekenyan - Lush Prize Conference 2014LushPrize
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...Kamel Mansouri
 
SETAC Rome Non-Target Screening For Chemical Discovery
SETAC Rome Non-Target Screening For Chemical DiscoverySETAC Rome Non-Target Screening For Chemical Discovery
SETAC Rome Non-Target Screening For Chemical DiscoveryEmma Schymanski
 

What's hot (20)

EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in R
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 
Ovanes Mekenyan - Lush Prize Conference 2014
Ovanes Mekenyan - Lush Prize Conference 2014Ovanes Mekenyan - Lush Prize Conference 2014
Ovanes Mekenyan - Lush Prize Conference 2014
 
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
 
Exploiting enhanced non-testing approaches to meet the needs for sustainable ...
Exploiting enhanced non-testing approaches to meet the needs for sustainable ...Exploiting enhanced non-testing approaches to meet the needs for sustainable ...
Exploiting enhanced non-testing approaches to meet the needs for sustainable ...
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
SETAC Rome Non-Target Screening For Chemical Discovery
SETAC Rome Non-Target Screening For Chemical DiscoverySETAC Rome Non-Target Screening For Chemical Discovery
SETAC Rome Non-Target Screening For Chemical Discovery
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
 
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 

Similar to Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors. ACS 2018 (New Orleans, USA)

Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsDevelopment and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsSean Ekins
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Sunghwan Kim
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsAndrew McEachran
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology Sean Ekins
 
Predictive tox proposal_ver1
Predictive tox proposal_ver1Predictive tox proposal_ver1
Predictive tox proposal_ver1Ankur Khanna
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...Kamel Mansouri
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...Kamel Mansouri
 
Basics of QSAR Modeling
Basics of QSAR ModelingBasics of QSAR Modeling
Basics of QSAR ModelingPrachi Pradeep
 
Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Kamel Mansouri
 
Update on Phase 2 of the C4SL Project
Update on Phase 2 of the C4SL ProjectUpdate on Phase 2 of the C4SL Project
Update on Phase 2 of the C4SL ProjectIES / IAQM
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your dataAlex Henderson
 
Using available tools for tiered assessments and rapid MoE
Using available tools for tiered assessments and rapid MoEUsing available tools for tiered assessments and rapid MoE
Using available tools for tiered assessments and rapid MoERebeccaClewell
 

Similar to Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors. ACS 2018 (New Orleans, USA) (20)

Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
 
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsDevelopment and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
QSAR Modeling.pdf
QSAR Modeling.pdfQSAR Modeling.pdf
QSAR Modeling.pdf
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction models
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology
 
Predictive tox proposal_ver1
Predictive tox proposal_ver1Predictive tox proposal_ver1
Predictive tox proposal_ver1
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
Basics of QSAR Modeling
Basics of QSAR ModelingBasics of QSAR Modeling
Basics of QSAR Modeling
 
Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...
 
Lab'InSight Toxicological Risk Assessment présentation UNamur NNC 131024
Lab'InSight Toxicological Risk Assessment présentation UNamur NNC  131024Lab'InSight Toxicological Risk Assessment présentation UNamur NNC  131024
Lab'InSight Toxicological Risk Assessment présentation UNamur NNC 131024
 
Update on Phase 2 of the C4SL Project
Update on Phase 2 of the C4SL ProjectUpdate on Phase 2 of the C4SL Project
Update on Phase 2 of the C4SL Project
 
Chap3 1
Chap3 1Chap3 1
Chap3 1
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
Using available tools for tiered assessments and rapid MoE
Using available tools for tiered assessments and rapid MoEUsing available tools for tiered assessments and rapid MoE
Using available tools for tiered assessments and rapid MoE
 

Recently uploaded

Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiologyDrAnita Sharma
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 

Recently uploaded (20)

Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiology
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 

Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors. ACS 2018 (New Orleans, USA)

  • 1. Virtual screening of chemicals for endocrine disrupting activity: Case studies of the estrogen and androgen receptors Kamel Mansouri, PhD Research Investigator Computational Chemistry 919-558-1356 kmansouri@scitovation.com www.scitovation.com American Chemical Society meeting New Orleans, LA March 21, 2018 Kamel Mansouri: ScitoVation LLC Nicole Kleinstreuer, NICEATM/ NIEHS. Christopher Grulke, Ann Richard, Imran Shah, Antony Williams, Richard Judson: NCCT/ US EPA The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  • 2. Background and Goals • U.S. Congress mandated that the EPA screen chemicals for their potential to be endocrine disruptors • This led to the development of the Endocrine Disruptor Screening Program (EDSP) • The initial focus was on environmental estrogens, but the program was expanded to include androgens and thyroid pathway disruptors
  • 3. Problem statement • EDSP Consists of Tier 1 and Tier 2 tests • Tier 1 is a battery of 11 in vitro and in vivo assays • Cost ~$1,000,000 per chemical • Throughput is ~50 chemicals / year • Total cost of Tier 1 is billions of dollars and will take 100 years at the current rate • Need pre-tier 1 filter • Use combination of structure modeling tools and high-throughput screening “EDSP21”
  • 4. Computational Toxicology Too many chemicals to test with standard animal- based methods –Cost (~$1,000,000/chemical), time, animal welfare –10,000 chemicals to be tested for EDSP –Fill the data gaps and bridge the lack of knowledge Alternative
  • 5. Quantitative Structure Activity/Property Relationships (QSAR/QSPR) Congenericity principle: QSARs correlate, within congeneric series of compounds, their chemical or biological activities, either with certain structural features or with atomic, group or molecular descriptors. Katritzky, A. R.; Lobanov, V. S.; Karelson, M. Chem. Soc. Rev. 1995, 279-287 Original Structure Representation Feature selection Activities Descriptors Y = f(bi , X) X - descriptors (selected variables) bi - fitted parameters
  • 6. Development of a QSAR model • Curation of experimental data • Preparation of training and test sets • Calculation of an initial set of descriptors • Selection of a mathematical method • Variable selection technique • Validation of the model’s predictive ability • Define the Applicability Domain
  • 7. In silico screening of chemicals for ER and AR activity CoMPARA Collaborative Modeling Project for Androgen Receptor Activity Mansouri et al. (http://ehp.niehs.nih.gov/15-10267/) Mansouri et al. (DOI: 10.13140/RG.2.2.19612.80009)
  • 9. Group ID Institution Country 1 ATSDR Agency for Toxic Substances and Disease Registry. CDC. USA 2 DTU Technical University of Denmark. Denmark 3 ECUST East China University of Science and Technology. China 4 IBMC Institute of Biomedical Chemistry. Russia 5 IDEA IdeaConsult, Ltd. Bulgaria 6 ILS Integrated Laboratory Systems, Inc. USA 7 INSLA University of Insubria. Italy 8 INSLA Lanzhou University. China 9 IRFMN Istituto di Ricerche Farmacologiche “Mario Negri”. Italy 10 JRC Joint Research Centre of the European Commission. Italy 11 LM Lockheed Martin IS&GS. USA 12 MTI Molecules Theurapetiques In silico. France 13 NCATS National Center for Advancing Translational Sciences. USA 14 NCCT National Center for Computational Toxicology. EPA. USA 15 NCI National Cancer Institute. USA Group ID Institution Country 16 NCSU North Carolina State University. USA 17 NCTR National Center for Toxicological Research. FDA. USA 18 NICEATM NTP Interagency Center for the Evaluation of Alternative Toxicological Methods. USA 19 NRMRL National Risk Management Research Laboratory. EPA. USA 20 RIFM Research Institute for Fragrance Materials USA 21 SWETOX Swedish toxicology research center. Sweden 22 TARTU University of Tartu. Estonia 23 TUM Technical University Munich. Germany 24 UFG Federal University of Golas. Brazil 25 UMEA University of UMEA. Sweden 26 UNC University of North Carolina. USA 27 UNIBA University of Bari. Italy 28 UNIMIB University of Milano-Bicocca. Italy 29 UNISTRA University of Strasbourg. France 30 VCCLAB Virtual Computational Chemistry Laboratory. Germany
  • 10. Plan of the projects Steps Tasks 1: Training and prioritization sets Organizers - ToxCast assays for training set data - AUC values and discrete classes for continuous/classification modeling - QSAR-ready training set and prioritization set 2: Experimental validation set Organizers - Collect and clean experimental data from the literature - Prepare validation sets for qualitative and quantitative models 3: Modeling & predictions All participants - Train/refine the models based on the training set - Deliver predictions and applicability domains for evaluation 4: Model evaluation Organizers - Evaluate the predictions of each model separately - Assign a score for each model based on the evaluation step 5: Consensus modeling Organizers - Use the weighting scheme based on the scores to generate the consensus - Use the same validation set to evaluate consensus predictions 6: Manuscript writing All participants - Descriptions of modeling approaches for each individual model - Input of the participants on the draft of the manuscript
  • 11. Judson et al Toxicol. Sci. (2015) 148: 137-154 Tox21/ToxCast Pathway Models Kleinstreuer N. C. et al. 2017 30 (4), 946-964. Tox21/ToxCast ER Pathway Model Tox21/ToxCast AR Pathway Model
  • 12. ER & AR assyas combination to train the models AUC=0.1 Equivalent to AC50=100 uM ER training data AR training data Active Inactive Total Active Inactive Total Binding 237 1440 1677 198 1464 1662 Agonist 219 1458 1677 43 1616 1659 Antagonist 41 1636 1677 159 1366 1525 Total 237 1440 1677 198 1648 1688
  • 13. Remove of duplicates Normalize of tautomers Clean salts and counterions Remove inorganics and mixtures Final inspection QSAR-ready structures Indigo Aim of the workflow: • Combine different procedures and ideas • Minimize the differences between the structures used for prediction • Produce a flexible free and open source workflow to be shared QSAR-ready KNIME workflow Mansouri et al. (http://ehp.niehs.nih.gov/15-10267/)
  • 14. Chemicals for Prediction: covering the Human Exposure Universe • EDSP Universe (10K) • Chemicals with known use (40K) (CPCat & ACToR) • Canadian Domestic Substances List (DSL) (23K) • EPA DSSTox – structures of EPA/FDA interest (15K) • ToxCast and Tox21 (In vitro ER data) (8K) 32,464 unique QSAR-ready structures • CERAPP list: ~32k structures • EINECS: European INventory • ~60k structures • ~55k QSAR-ready structures • ~38k non overlapping with the CERAPP list • ~18k overlap with DSSTox • ToxCast metabolites: ~6k unique structures 47,888 + 6592 = 55,450 QSAR ready structures CoMPARA Prediction setCERAPP Prediction set
  • 15. a) Tox21: ~8k chemicals b) FDA EDKB DB: ~8k chemicals; c) METI DB: ~2k chemicals d) ChEMBL DB: ~2k chemicals ~60k experimental values for ~15k chemicals Mansouri et al. (2016) EHP 124:1023–1033 Experimental data for evaluation set CERAPP validation set CoMPARA validation set Harris and Judson (in preparation) 1.2 million assay records 2.1 million chemical structures 10 thousand protein targets Scrub Chem ~80K experimental values for ~11k chemicals ER validation data AR validation data Active Inactive Total Active Inactive Total Binding 1982 5301 7283 453 3429 3882 Agonist 350 5969 6319 167 4672 4839 Antagonist 284 6255 6539 355 3685 4040 Total 2017 7024 7522 487 4928 5273
  • 16. Received models for both projects Validation of the models: CERAPP participants models CoMPARA participants models Categorical Continuous Total Categorical Continuous Total Binding 21 3 24 32 5 37 Agonist 11 3 14 20 5 25 Antagonist 8 2 10 22 3 25 Total 40 8 48 74 13 87
  • 17. Model validation & Consensus modeling 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CERAPP categorical models (40) Binding Agonist Antagonist 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CoMPARA categorical models (74) Binding Agonist Antagonist
  • 18. Coverage of the models Distributions of the number of the predicted chemical structures by all binding models. CERAPP models CoMPARA models
  • 19. Consensus models assessment Binding Agonist Antagonist Train Test Train Test Train Test Sn 0.93 0.58 0.85 0.94 0.67 0.18 Sp 0.97 0.92 0.98 0.94 0.94 0.90 BA 0.95 0.75 0.92 0.94 0.80 0.54 CERAPP CoMPARA Active Inactive Active Inactive Binding 4001 28463 8202 40656 Agonist 2475 29989 1764 47094 Antagonist 2793 29671 9899 38959 Total 4001 28463 10623 47613 ToxCast metabolites Active Inactive Binding 1609 4983 Agonist 428 6164 Antagonist 1820 4772 Total 1989 6325 Binding Agonist Antagonist Train Test Train Test Train Test Sn 0.99 0.69 0.95 0.74 1.00 0.61 Sp 0.91 0.87 0.98 0.97 0.95 0.87 BA 0.95 0.78 0.97 0.86 0.97 0.74 CERAPP consensus CoMPARA consensus CoMPARA predictions
  • 20. Concordance and prioritization Concordance of binding models on the active compounds of the prediction sets. 757 chemicals have >75% positive concordance Most models predict most chemicals as ER non-binders 1772 chemicals have >75% positive concordance Most models predict most chemicals as AR non-binders CERAPP models CoMPARA models
  • 21. High general concordance across all binding models included in the consensus. Confidence in prediction CERAPP models CoMPARA models
  • 22.
  • 23. OPERA & the EPA CompTox Dashboard https://github.com/kmansouri/OPERA https://comptox.epa.gov/dashboard/
  • 24. Summary • Prioritized tens of thousands of chemicals for ER & AR in a fast accurate and economic way to help with the EDSP program. • Generated high quality data and models that can be reused • Free & open-source code and workflows • Published manuscripts in peer reviewed journals • Data and predictions available for visualization on the EPA’s dashboard:
  • 25. Thank you for your attention National Center for Computational Toxicology, US EPA CERAPP participants CoMPARA participants Acknowledgements: