SlideShare a Scribd company logo
1 of 21
Download to read offline
Office of Research and Development
Kamel Mansouri
Richard Judson
The views expressed in this presentation are those of the author and do not
necessarily reflect the views or policies of the U.S. EPA
AAAS annual meeting (18 February 2017)
Consensus Models to Predict Endocrine
Disruption for All Human-Exposure
Chemicals
National Center for
Computational Toxicology,
U.S. EPA, RTP, NC, USA
Office of Research and Development
National Center for Computational Toxicology
Background and Goals
• U.S. Congress mandated that the EPA screen chemicals for
their potential to be endocrine disruptors
• This led to the development of the Endocrine Disruptor
Screening Program (EDSP)
• The initial focus was on environmental estrogens, but the
program was expanded to include androgens and thyroid
pathway disruptors
1/18/2017
Office of Research and Development
National Center for Computational Toxicology
Problem Statement
Too many chemicals to test with standard
animal-based methods
–Cost (~$1,000,000/chemical), time, animal welfare
–10,000 chemicals to be tested for EDSP
–Fill the data gaps and bridge the lack of knowledge
Alternative
Office of Research and Development
National Center for Computational Toxicology
Quantitative Structure Activity/Property
Relationships (QSAR/QSPR)
Congenericity principle: QSARs correlate, within congeneric series of compounds,
their chemical or biological activities, either with certain structural features or with
atomic, group or molecular descriptors.
Katritzky, A. R.; Lobanov, V. S.; Karelson, M. Chem. Soc. Rev. 1995, 279-287
Original Structure
Representation Feature selection
Activities
Descriptors
Y = f(bi , X )
X - descriptors (selected variables)
bi - fitted parameters
Office of Research and Development
National Center for Computational Toxicology
Development of a QSAR model
• Curation of experimental data (Data may be noisy and
limits prediction accuracy)
• Preparation of training and test sets
• Calculation of an initial set of descriptors
• Selection of a mathematical method
• Variable selection technique
• Validation of the model’s predictive ability
• Define the Applicability Domain
Office of Research and Development
National Center for Computational Toxicology
Structure curation procedure
Remove of
duplicates
Normalize of
tautomers
Clean salts and
counterions
Remove inorganics
and mixtures
Final inspection
QSAR-ready
structures
Indigo
KNIME workflow
Aim of the workflow:
• Combine different procedures and ideas
• Minimize the differences between the structures used for
prediction
• Produce a flexible free and open source workflow to be shared
Office of Research and Development
National Center for Computational Toxicology
Molecular structures in the computer
Bitstrings in databases
Fragmental keys & fingerprints
- substructural search
- read-across
- similarity search
Office of Research and Development
National Center for Computational Toxicology
Classification methods
Kernel function maximizing the margin
between the classes
• kNN: k Nearest Neighbors • SVM: Support Vector Machines
classification according to the
majority class of the k neighbors
Other methods: Self organized maps (SOM), Kohonen maps, PLSDA, LDA
Office of Research and Development
National Center for Computational Toxicology
Regression methods
𝒚 = 𝐛𝐗
𝐛 = (𝐗′𝐗)−1
𝐗′𝐲
𝐗 = 𝐓𝐏′
+ 𝐄
𝐘 = 𝐔𝐐′
+ 𝐅
• MLR: Multiple
Linear
Regression
PLS is the vector on the PCR ellipse upon which MLR has the longest projection
• PLS: Partial
Least Squares
Other methods: Artificial Neural Networks (ANN), Random Forest, LASSO, PCR…
Office of Research and Development
National Center for Computational Toxicology
Variable selection procedure
Create initial descriptor population
Evaluate fitness of the populations
Select and reproduce
(Crossover, Mutation)
MLR (Multiple Linear Regression)
PLS (Partial Least squares)
SVM (Support Vector Machines)
….
Replace the descriptors of old
populations with new descriptors Stopping
criteria
Final
models
The Genetic Algorithms diagram
- Many more descriptors
than chemicals
- Many irrelevant
descriptors
Only the most important
descriptors are selected
Office of Research and Development
National Center for Computational Toxicology
2,000
1,500
1,000
1
0.75
0.5
198019751970
year
1965
Cross-validation and test-set to avoid
the “by chance” correlation problem
“There is a concern in West Germany over the falling birth rate. The accompanying
graph might suggest a solution that every child knows makes sense”.
H. Sies, Nature 332, 495 (1988)
5- Fold Cross Validation
Fold : 1 2 53 4
trainingset
initiadata
set
20 -
25%
training
set
test
Office of Research and Development
National Center for Computational Toxicology
• EPA/NCCT: U.S. Environmental Protection Agency / National Center for Computational Toxicology. USA
• DTU/food: Technical University of Denmark/ National Food Institute. Denmark
• FDA/NCTR/DBB: U.S. Food and Drug Administration. USA
• FDA/NCTR/DSB: U.S. Food and Drug Administration. USA
• Helmholtz/ISB: Helmholtz Zentrum Muenchen/Institute of Structural Biology. Germany
• ILS&EPA/NCCT: ILS Inc & EPA/NCCT. USA
• IRCSS: Istituto di Ricerche Farmacologiche “Mario Negri”. Italy
• JRC_Ispra: Joint Research Centre of the European Commission, Ispra. Italy
• LockheedMartin&EPA: Lockheed Martin IS&GS/ High Performance Computing. USA
• NIH/NCATS: National Institutes of Health/ National Center for Advancing Translational Sciences. USA
• NIH/NCI: National Institutes of Health/ National Cancer Institute. USA
• RIFM: Research Institute for Fragrance Materials, Inc. USA
• UMEA/Chemistry: University of UMEA/ Chemistry department. Sweden
• UNC/MML: University of North Carolina/ Laboratory for Molecular Modeling. USA
• UniBA/Pharma: University of Bari/ Department of Pharmacy. Italy
• UNIMIB/Michem: University of Milano-Bicocca/ Milano Chemometrics and QSAR Research Group. Italy
• UNISTRA/Infochim: University of Strasbourg/ ChemoInformatique. France
CERRAP : Collaborative Estrogen Receptor Activity Prediction Project
40 scientists, 17 research groups
Mansouri et al. (2016) EHP 124:1023–1033 DOI:10.1289/ehp.1510267
Office of Research and Development
National Center for Computational Toxicology
CERAPP data and results
• Classification / Qualitative:
–Binding: 22 models
–Agonists: 11 models
–Antagonists: 9 models
40 Models received:
Regression / Quantitative:
Binding: 3 models
Agonists: 3 models
Antagonists: 2 models
Datasets of the project
Mansouri et al. (2016) EHP 124:1023–1033 DOI:10.1289/ehp.1510267
Consensus modeling:
• Training set: 1,677 chemicals (EPA ToxCast data)
• Prediction set: 32,464 chemicals (The Human Exposure Universe)
• Evaluation set: 7,000 chemicals (Literature: Tox21, FDA, METI…)
Weighted vote based on rankings of the predictions accuracy scores
Total binders: 3961
Agonists: 2494
Antagonists: 2793
Consensus Qualitative Accuracy
ToxCast
data
Literature
data
(All: 7283)
Literature data
(>6 sources:
1209)
Sensitivity 0.93 0.30 0.87
Specificity 0.97 0.91 0.94
Balanced accuracy 0.95 0.61 0.91
ToxCast data
(training set)
Literature data
(test set)
ObservedPredicted Actives Inactives Actives Inactives
Actives 83 6 597 1385
Inactives 40 1400 463 4838
Prediction Accuracy Strongly Depends on Data Quality
ROC curve of the external validation set (literature)
Mansouri et al. (2016) EHP 124:1023–1033 DOI:10.1289/ehp.1510267
• positive concordance < 0.6 => Potency class= Very weak
• 0.6=<positive concordance<0.75 => Potency class= Weak
• 0.75=<positive concordance<0.9 => Potency class= Moderate
• positive concordance>=0.9 => Potency class= Strong
Consensus Quantitative Accuracy
Box plot of the active classes of the
consensus model.
Variation of the balanced accuracy with
positive concordance thresholds
Mansouri et al. (2016) EHP 124:1023–1033 DOI:10.1289/ehp.1510267
Modelsconcordance
Potency classes
Office of Research and Development
National Center for Computational Toxicology
Concordance of the qualitative models
Only 757 chemicals have >75% positive concordance
Actives
Inactives
Prioritization
Most models predict most chemicals as inactive
Only a small fraction of chemicals require further testing!
Mansouri et al. (2016) EHP 124:1023–1033 DOI:10.1289/ehp.1510267
Office of Research and Development
National Center for Computational Toxicology
Mansouri et al. (2016) EHP 124:1023–
1033 DOI:10.1289/ehp.1510267
Office of Research and Development
National Center for Computational Toxicology
• Follow the CERAPP framework
• Use larger size prioritization set
• Use data from the combined EPA ToxCast AR assays
• Collect and curate data from the literature for validation
• Use agonists, antagonists, and binding data
• Build continuous and classification models
• Similar approach for consensus modeling and validation
From CERAPP to CoMPARA : Collaborative
Modeling Project for Androgen Receptor Activity
Office of Research and Development
National Center for Computational Toxicology
• EPA/NCCT. USA
• DTU/food. Denmark
• FDA/NCTR/DBB. USA
• Helmholtz. Germany
• ILS&EPA/NCCT. USA
• IRCSS. Italy
• LockheedMartin&EPA. USA
• NIH/NCATS. USA
• NIH/NCI. USA
• UMEA/Chemistry. Sweden
• UNC/MML. USA
• UniBA/Pharma. Italy
• UNIMIB/Michem. Italy
• UNISTRA/Infochim. France
• VCCLab. Germany
CoMPARA participants: 34 international groups
• NCSU. Department of Chemistry, Bioinformatics Research Center. USA
• EPA/NRMRL. National Risk Management Research Laboratory. USA
• INSUBRIA. University of Insubria. Environmental Chemistry. Italy
• Tartu. University of Tartu. Institute of Chemistry. Estonia
• NIH/NTP/NICEATM. USA
• Chemistry Institute. Lab of Chemometrics. Slovenia
• SWETOX. Swedish toxicology research center. Sweden
• Lanzhou University . China
• BDS. Biodetection Systems. Netherlands
• MTI. Molecules Theurapetiques in silico. France
• IBMC. Institute of Biomedical Chemistry. Russia
• UNIMORE. University of Modena Reggio-Emilia. Italy
• UFG. Federal University of Golas. Brazil
• MSU. Moscow State University. Russia
• ZJU. Zhejiang University. China
• JKU. Johannes Kepler University. Austria
• CTIS. Centre de Traitement de l'Information Scientifique. France
• IdeaConsult. Bulgaria
• ECUST. East China University of Science and Technology. China
From CERAPP
New research groups
Office of Research and Development
National Center for Computational Toxicology
Summary
20
• Prioritized tens of thousands of chemicals for ER & AR in a fast accurate
and economic way to help with the EDSP program.
• Generated high quality data and models that can be reused
• Free & open-source code and workflows shared with the community
• Published manuscripts in peer reviewed journals
• Data and predictions available for visualization on the EDSP dashboard:
http://actor.epa.gov/edsp21/ and soon on the CompTox dashboard:
https://comptox.epa.gov/dashboard/
Office of Research and Development
National Center for Computational Toxicology
21
Thank you for your attention

More Related Content

What's hot

Predictive tox proposal_ver1
Predictive tox proposal_ver1Predictive tox proposal_ver1
Predictive tox proposal_ver1Ankur Khanna
 
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...Andrew McEachran
 
Chemical Alternatives Assessments_HP
Chemical Alternatives Assessments_HPChemical Alternatives Assessments_HP
Chemical Alternatives Assessments_HPHans Plugge
 
News from EURL ECVAM - October 2017
News from EURL ECVAM - October 2017News from EURL ECVAM - October 2017
News from EURL ECVAM - October 2017Adelaide Dura
 
Oral Fluid as a Chemical Test for the DRE Program
Oral Fluid as a Chemical Test for the DRE ProgramOral Fluid as a Chemical Test for the DRE Program
Oral Fluid as a Chemical Test for the DRE ProgramNMS Labs
 
Guidance Document for Acute Lethality Testing.PDF
Guidance Document for Acute Lethality Testing.PDFGuidance Document for Acute Lethality Testing.PDF
Guidance Document for Acute Lethality Testing.PDFGuy Gilron
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarAnn-Marie Roche
 
Oral fluid drug testing
Oral fluid drug testingOral fluid drug testing
Oral fluid drug testingwolfeinc1
 
In vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyIn vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyEFSA EU
 

What's hot (20)

Predictive tox proposal_ver1
Predictive tox proposal_ver1Predictive tox proposal_ver1
Predictive tox proposal_ver1
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...
 
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 
Chemical Alternatives Assessments_HP
Chemical Alternatives Assessments_HPChemical Alternatives Assessments_HP
Chemical Alternatives Assessments_HP
 
News from EURL ECVAM - October 2017
News from EURL ECVAM - October 2017News from EURL ECVAM - October 2017
News from EURL ECVAM - October 2017
 
Resume_CIW
Resume_CIWResume_CIW
Resume_CIW
 
Oral Fluid as a Chemical Test for the DRE Program
Oral Fluid as a Chemical Test for the DRE ProgramOral Fluid as a Chemical Test for the DRE Program
Oral Fluid as a Chemical Test for the DRE Program
 
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
 
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
 
Translating research into practical tools: A case study of GenRA, a new read...
Translating research into practical tools: A case study of GenRA,  a new read...Translating research into practical tools: A case study of GenRA,  a new read...
Translating research into practical tools: A case study of GenRA, a new read...
 
Guidance Document for Acute Lethality Testing.PDF
Guidance Document for Acute Lethality Testing.PDFGuidance Document for Acute Lethality Testing.PDF
Guidance Document for Acute Lethality Testing.PDF
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
Oral fluid drug testing
Oral fluid drug testingOral fluid drug testing
Oral fluid drug testing
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
In vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyIn vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicology
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 

Similar to Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemicals.

Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...Kamel Mansouri
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
 
Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Andrew McEachran
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
Application of Computational and High-Throughput in vitro Screening for Prior...
Application of Computational and High-Throughput in vitro Screening for Prior...Application of Computational and High-Throughput in vitro Screening for Prior...
Application of Computational and High-Throughput in vitro Screening for Prior...U.S. EPA Office of Research and Development
 
International Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsInternational Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsKamel Mansouri
 
High Throughput Screening drugg discovery.pdf
High Throughput Screening drugg discovery.pdfHigh Throughput Screening drugg discovery.pdf
High Throughput Screening drugg discovery.pdfsayedjannatfatema72
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
Structure identification using high resolution mass spectrometry data and the...
Structure identification using high resolution mass spectrometry data and the...Structure identification using high resolution mass spectrometry data and the...
Structure identification using high resolution mass spectrometry data and the...Andrew McEachran
 

Similar to Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemicals. (20)

Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
 
Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
 
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
 
Computational Toxicity in 21st Century Safety Sciences
Computational Toxicity in 21st Century Safety SciencesComputational Toxicity in 21st Century Safety Sciences
Computational Toxicity in 21st Century Safety Sciences
 
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
Application of Computational and High-Throughput in vitro Screening for Prior...
Application of Computational and High-Throughput in vitro Screening for Prior...Application of Computational and High-Throughput in vitro Screening for Prior...
Application of Computational and High-Throughput in vitro Screening for Prior...
 
International Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsInternational Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology Problems
 
High Throughput Screening drugg discovery.pdf
High Throughput Screening drugg discovery.pdfHigh Throughput Screening drugg discovery.pdf
High Throughput Screening drugg discovery.pdf
 
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Structure identification using high resolution mass spectrometry data and the...
Structure identification using high resolution mass spectrometry data and the...Structure identification using high resolution mass spectrometry data and the...
Structure identification using high resolution mass spectrometry data and the...
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 

More from Kamel Mansouri

OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELSOPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELSKamel Mansouri
 
Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Kamel Mansouri
 
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...Kamel Mansouri
 
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Kamel Mansouri
 
Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Kamel Mansouri
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Kamel Mansouri
 
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...Kamel Mansouri
 
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...Kamel Mansouri
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...Kamel Mansouri
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...Kamel Mansouri
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...Kamel Mansouri
 

More from Kamel Mansouri (11)

OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELSOPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
 
Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...
 
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
 
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
 
Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
 
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 

Recently uploaded

Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 

Recently uploaded (20)

Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 

Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemicals.

  • 1. Office of Research and Development Kamel Mansouri Richard Judson The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA AAAS annual meeting (18 February 2017) Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemicals National Center for Computational Toxicology, U.S. EPA, RTP, NC, USA
  • 2. Office of Research and Development National Center for Computational Toxicology Background and Goals • U.S. Congress mandated that the EPA screen chemicals for their potential to be endocrine disruptors • This led to the development of the Endocrine Disruptor Screening Program (EDSP) • The initial focus was on environmental estrogens, but the program was expanded to include androgens and thyroid pathway disruptors 1/18/2017
  • 3. Office of Research and Development National Center for Computational Toxicology Problem Statement Too many chemicals to test with standard animal-based methods –Cost (~$1,000,000/chemical), time, animal welfare –10,000 chemicals to be tested for EDSP –Fill the data gaps and bridge the lack of knowledge Alternative
  • 4. Office of Research and Development National Center for Computational Toxicology Quantitative Structure Activity/Property Relationships (QSAR/QSPR) Congenericity principle: QSARs correlate, within congeneric series of compounds, their chemical or biological activities, either with certain structural features or with atomic, group or molecular descriptors. Katritzky, A. R.; Lobanov, V. S.; Karelson, M. Chem. Soc. Rev. 1995, 279-287 Original Structure Representation Feature selection Activities Descriptors Y = f(bi , X ) X - descriptors (selected variables) bi - fitted parameters
  • 5. Office of Research and Development National Center for Computational Toxicology Development of a QSAR model • Curation of experimental data (Data may be noisy and limits prediction accuracy) • Preparation of training and test sets • Calculation of an initial set of descriptors • Selection of a mathematical method • Variable selection technique • Validation of the model’s predictive ability • Define the Applicability Domain
  • 6. Office of Research and Development National Center for Computational Toxicology Structure curation procedure Remove of duplicates Normalize of tautomers Clean salts and counterions Remove inorganics and mixtures Final inspection QSAR-ready structures Indigo KNIME workflow Aim of the workflow: • Combine different procedures and ideas • Minimize the differences between the structures used for prediction • Produce a flexible free and open source workflow to be shared
  • 7. Office of Research and Development National Center for Computational Toxicology Molecular structures in the computer Bitstrings in databases Fragmental keys & fingerprints - substructural search - read-across - similarity search
  • 8. Office of Research and Development National Center for Computational Toxicology Classification methods Kernel function maximizing the margin between the classes • kNN: k Nearest Neighbors • SVM: Support Vector Machines classification according to the majority class of the k neighbors Other methods: Self organized maps (SOM), Kohonen maps, PLSDA, LDA
  • 9. Office of Research and Development National Center for Computational Toxicology Regression methods 𝒚 = 𝐛𝐗 𝐛 = (𝐗′𝐗)−1 𝐗′𝐲 𝐗 = 𝐓𝐏′ + 𝐄 𝐘 = 𝐔𝐐′ + 𝐅 • MLR: Multiple Linear Regression PLS is the vector on the PCR ellipse upon which MLR has the longest projection • PLS: Partial Least Squares Other methods: Artificial Neural Networks (ANN), Random Forest, LASSO, PCR…
  • 10. Office of Research and Development National Center for Computational Toxicology Variable selection procedure Create initial descriptor population Evaluate fitness of the populations Select and reproduce (Crossover, Mutation) MLR (Multiple Linear Regression) PLS (Partial Least squares) SVM (Support Vector Machines) …. Replace the descriptors of old populations with new descriptors Stopping criteria Final models The Genetic Algorithms diagram - Many more descriptors than chemicals - Many irrelevant descriptors Only the most important descriptors are selected
  • 11. Office of Research and Development National Center for Computational Toxicology 2,000 1,500 1,000 1 0.75 0.5 198019751970 year 1965 Cross-validation and test-set to avoid the “by chance” correlation problem “There is a concern in West Germany over the falling birth rate. The accompanying graph might suggest a solution that every child knows makes sense”. H. Sies, Nature 332, 495 (1988) 5- Fold Cross Validation Fold : 1 2 53 4 trainingset initiadata set 20 - 25% training set test
  • 12. Office of Research and Development National Center for Computational Toxicology • EPA/NCCT: U.S. Environmental Protection Agency / National Center for Computational Toxicology. USA • DTU/food: Technical University of Denmark/ National Food Institute. Denmark • FDA/NCTR/DBB: U.S. Food and Drug Administration. USA • FDA/NCTR/DSB: U.S. Food and Drug Administration. USA • Helmholtz/ISB: Helmholtz Zentrum Muenchen/Institute of Structural Biology. Germany • ILS&EPA/NCCT: ILS Inc & EPA/NCCT. USA • IRCSS: Istituto di Ricerche Farmacologiche “Mario Negri”. Italy • JRC_Ispra: Joint Research Centre of the European Commission, Ispra. Italy • LockheedMartin&EPA: Lockheed Martin IS&GS/ High Performance Computing. USA • NIH/NCATS: National Institutes of Health/ National Center for Advancing Translational Sciences. USA • NIH/NCI: National Institutes of Health/ National Cancer Institute. USA • RIFM: Research Institute for Fragrance Materials, Inc. USA • UMEA/Chemistry: University of UMEA/ Chemistry department. Sweden • UNC/MML: University of North Carolina/ Laboratory for Molecular Modeling. USA • UniBA/Pharma: University of Bari/ Department of Pharmacy. Italy • UNIMIB/Michem: University of Milano-Bicocca/ Milano Chemometrics and QSAR Research Group. Italy • UNISTRA/Infochim: University of Strasbourg/ ChemoInformatique. France CERRAP : Collaborative Estrogen Receptor Activity Prediction Project 40 scientists, 17 research groups Mansouri et al. (2016) EHP 124:1023–1033 DOI:10.1289/ehp.1510267
  • 13. Office of Research and Development National Center for Computational Toxicology CERAPP data and results • Classification / Qualitative: –Binding: 22 models –Agonists: 11 models –Antagonists: 9 models 40 Models received: Regression / Quantitative: Binding: 3 models Agonists: 3 models Antagonists: 2 models Datasets of the project Mansouri et al. (2016) EHP 124:1023–1033 DOI:10.1289/ehp.1510267 Consensus modeling: • Training set: 1,677 chemicals (EPA ToxCast data) • Prediction set: 32,464 chemicals (The Human Exposure Universe) • Evaluation set: 7,000 chemicals (Literature: Tox21, FDA, METI…) Weighted vote based on rankings of the predictions accuracy scores
  • 14. Total binders: 3961 Agonists: 2494 Antagonists: 2793 Consensus Qualitative Accuracy ToxCast data Literature data (All: 7283) Literature data (>6 sources: 1209) Sensitivity 0.93 0.30 0.87 Specificity 0.97 0.91 0.94 Balanced accuracy 0.95 0.61 0.91 ToxCast data (training set) Literature data (test set) ObservedPredicted Actives Inactives Actives Inactives Actives 83 6 597 1385 Inactives 40 1400 463 4838 Prediction Accuracy Strongly Depends on Data Quality ROC curve of the external validation set (literature) Mansouri et al. (2016) EHP 124:1023–1033 DOI:10.1289/ehp.1510267
  • 15. • positive concordance < 0.6 => Potency class= Very weak • 0.6=<positive concordance<0.75 => Potency class= Weak • 0.75=<positive concordance<0.9 => Potency class= Moderate • positive concordance>=0.9 => Potency class= Strong Consensus Quantitative Accuracy Box plot of the active classes of the consensus model. Variation of the balanced accuracy with positive concordance thresholds Mansouri et al. (2016) EHP 124:1023–1033 DOI:10.1289/ehp.1510267 Modelsconcordance Potency classes
  • 16. Office of Research and Development National Center for Computational Toxicology Concordance of the qualitative models Only 757 chemicals have >75% positive concordance Actives Inactives Prioritization Most models predict most chemicals as inactive Only a small fraction of chemicals require further testing! Mansouri et al. (2016) EHP 124:1023–1033 DOI:10.1289/ehp.1510267
  • 17. Office of Research and Development National Center for Computational Toxicology Mansouri et al. (2016) EHP 124:1023– 1033 DOI:10.1289/ehp.1510267
  • 18. Office of Research and Development National Center for Computational Toxicology • Follow the CERAPP framework • Use larger size prioritization set • Use data from the combined EPA ToxCast AR assays • Collect and curate data from the literature for validation • Use agonists, antagonists, and binding data • Build continuous and classification models • Similar approach for consensus modeling and validation From CERAPP to CoMPARA : Collaborative Modeling Project for Androgen Receptor Activity
  • 19. Office of Research and Development National Center for Computational Toxicology • EPA/NCCT. USA • DTU/food. Denmark • FDA/NCTR/DBB. USA • Helmholtz. Germany • ILS&EPA/NCCT. USA • IRCSS. Italy • LockheedMartin&EPA. USA • NIH/NCATS. USA • NIH/NCI. USA • UMEA/Chemistry. Sweden • UNC/MML. USA • UniBA/Pharma. Italy • UNIMIB/Michem. Italy • UNISTRA/Infochim. France • VCCLab. Germany CoMPARA participants: 34 international groups • NCSU. Department of Chemistry, Bioinformatics Research Center. USA • EPA/NRMRL. National Risk Management Research Laboratory. USA • INSUBRIA. University of Insubria. Environmental Chemistry. Italy • Tartu. University of Tartu. Institute of Chemistry. Estonia • NIH/NTP/NICEATM. USA • Chemistry Institute. Lab of Chemometrics. Slovenia • SWETOX. Swedish toxicology research center. Sweden • Lanzhou University . China • BDS. Biodetection Systems. Netherlands • MTI. Molecules Theurapetiques in silico. France • IBMC. Institute of Biomedical Chemistry. Russia • UNIMORE. University of Modena Reggio-Emilia. Italy • UFG. Federal University of Golas. Brazil • MSU. Moscow State University. Russia • ZJU. Zhejiang University. China • JKU. Johannes Kepler University. Austria • CTIS. Centre de Traitement de l'Information Scientifique. France • IdeaConsult. Bulgaria • ECUST. East China University of Science and Technology. China From CERAPP New research groups
  • 20. Office of Research and Development National Center for Computational Toxicology Summary 20 • Prioritized tens of thousands of chemicals for ER & AR in a fast accurate and economic way to help with the EDSP program. • Generated high quality data and models that can be reused • Free & open-source code and workflows shared with the community • Published manuscripts in peer reviewed journals • Data and predictions available for visualization on the EDSP dashboard: http://actor.epa.gov/edsp21/ and soon on the CompTox dashboard: https://comptox.epa.gov/dashboard/
  • 21. Office of Research and Development National Center for Computational Toxicology 21 Thank you for your attention