SlideShare a Scribd company logo
1 of 1
Download to read offline
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS
Kamel Mansouri, Tine Ringsted, Viviana Consonni,
Davide Ballabio, Roberto Todeschini
Milano Chemometrics and QSAR Research Group, Department of Environmental Sciences,
University of Milano-Bicocca, P.za della Scienza 1 – 20126 Milano, Italy
Persistent organic pollutants are highly bioaccumulative with toxic
effects on humans, wildlife and the environment. Their persistency have
been studied experimentally and theoretically for the evaluation of new
chemicals to avoid Persistent Bioaccumulative and Toxic (PBT)
compounds. In order to fill data gaps, QSARs are increasingly being
used by scientific community as an alternative to animal testing and
implemented in legislation (REACH).
The goal of this study was to predict ready biodegradation of
chemicals by QSAR modeling. The dataset used for this purpose was
produced by the Japanese Ministry of International Trade and Industry
(MITI) with experimental results according to the OECD test guideline
301C. Molecular descriptors from Dragon 6 were calculated. Variable
selection coupled with classification methods were applied to find the
most predictive models with low cross-validation error rate. The best
models were after that validated using the preselected test set to check
its prediction reliability and for further analysis.
 1314 compounds with ready biodegradation (MITI-I test) were collected.[1]
 A molecule was removed if:
 it had a disconnected structure
 the experimental value did not agree with the classification BOD threshold
of 60%. (Fig1)
 replicate values had more than 20% difference
 the classification would change if nitrification was taken into account
 After removal 1055 molecules remained (356 ready biodegradable/ 699 not ready
biodegradable). (Fig2)
Descriptors :
Different blocks of molecular descriptors were
initially calculated using Dragon6 [3]; 2D Atom pairs,
Topological indices, Ring descriptors, Constitutional
indices, Functional groups, 2D Matrix based, Atom centered
fragments, Atom type E-state.
Highly correlated, constant and near constant descriptors
were removed automatically using the same software.
Variable selection:
In Matlab, using genetic algorithms (GA) [4] applied on each classification
method, (SVM, KNN, PLSDA), two filters were performed to select the best
descriptors:
+ first on each block apart, then on resulting sets all merged.
+ the frequency of selection after 100 GA runs was used to sort the
descriptors by importance to keep only the 100 most appropriates ones for
the last modeling step.
Validation of models:
5-fold cross-validation.
A test set which was chosen by randomly splitting the initial data set into
20% test and 80% training set while keeping the balance between ready
biodegradable/not ready biodegradable. The training set contained 837
molecules and the test set 218 molecules.
0
10
20
30
40
50
60
70
80
Numberofmolecules
28 days
<28 days
QSAR
SVM
KNN
PLS-DA
Model ID Descriptors
5f-CV Test
ER cv Spec. Sens. ER test Spec. Sens.
SVM_1 20 0.151 0.775 0.924 0.135 0.806 0.925
SVM_2 23 0.153 0.785 0.910 0.131 0.806 0.932
SVM_3 24 0.156 0.775 0.913 0.131 0.819 0.918
Model ID Descriptors LVs
Fit 5f-CV Test
ER fit Spec. Sens. ER cv Spec. Sens. ER test Spec. Sens.
PLSDA_1 26 9 0.140 0.887 0.834 0.141 0.891 0.826 0.145 0.861 0.849
PLSDA_2 28 9 0.144 0.891 0.821 0.142 0.887 0.828 0.145 0.847 0.863
PLSDA_3 23 5 0.144 0.880 0.832 0.141 0.884 0.834 0.148 0.833 0.870
Model ID Descriptors Distance K
5f-CV Test
ER cv Spec. Sens. ER test Spec. Sens.
KNN_1 17 Euclidean 6 0.136 0.859 0.870 0.121 0.847 0.911
KNN_2 17 CityBloc 6 0.139 0.852 0.870 0.138 0.847 0.877
KNN_1 15 CityBloc 8 0.141 0.849 0.870 0.142 0.806 0.911
Abstract:
Acknowledgements:
The research leading to these results has received funding from the [European
Community's] Seventh Framework Programme ([FP7/2007-2013]) under Grant Agreement
n° [238701] of the project Marie Curie ITN Environmental Chemoinformatics (ECO-ITN).
http://www.eco-itn.eu
References:
[1]. Chemical Risk Information Platform (CHRIP), National Institute of Technology and
Evaluation, Japan, http://www.safe.nite.go.jp/english/kizon/KIZON_start_hazkizon.htm
[2]. Chih-Chung Chang and Chih-Jen Lin, LIBSVM 3.1
http://www.csie.ntu.edu.tw/~cjlin/libsvm
[3]. Dragon6. Talete srl, Milano, Italy, http://www.talete.mi.it
[4]. Leardi, R., Lupianez, A., 1998. Genetic algorithms applied to feature selection in PLS
regression: how and when to use them. Chemometr. Intell. Lab. 41, 195–207.
in vesselsubstancetestmg
blankbyuptakeOmg-substanceby testuptakeOmg 22
BOD
Table2: Selected best
models using GA-SVM
Table1: Selected best models using GA-KNN
Table3: Selected best models using GA-PLSDA
Fig2: Multidimensional scaling plot
Fig1:
Distribution of
BOD values in
Ready Biodeg.
compounds.
The number of K nearest neighbors was optimized during the GA calculations to
meet the lowest cross-validation error rate (ER cv). The most selected descriptors
are: Kier benzene-likeliness (BLI), nb. atoms of type 'sssN', sum of 'dssC' E-
states, nb. of subst. benzene C(sp2) and nb. of ring tertiary C(sp3).
The number of PLSDA latent variables (LVs) was optimized during the GA calculations to meet
the lowest cross-validation error rate (ER cv). The most selected descriptors are: R-CX-R, nb. of
atoms type 'sssN’, spectral mean absolute deviation from Laplace matrix , presence of C-Cl at
Topo. Dist. 1, eccentricity, nb. of N atoms, nb. of (thio-) carbamates (aliphatic) average
Randic index from Burden matrix weighted by mass and Cl attached to C1(sp3).
The SVM results were obtained using the LIBSVM3.1 C library
compiled in Matlab [2]. The kernel used in the radial-basis-function
and its default parameters defined in the library. The most selected
descriptors are: average MW, nb. of terminal primary C(sp3), mean
first ionisation pot., nb. of N atoms, sum of ' aasC' E-states, nb. of
heteroatoms, nb. of esters (aromatic), intrinsic state
pseudoconnectivity index and freq. of C-P at Topo. Dist 2.

More Related Content

What's hot

Cytoscan_Copy_Number_Confirmation_with_SYBR_Green_qPCR_white_paper
Cytoscan_Copy_Number_Confirmation_with_SYBR_Green_qPCR_white_paperCytoscan_Copy_Number_Confirmation_with_SYBR_Green_qPCR_white_paper
Cytoscan_Copy_Number_Confirmation_with_SYBR_Green_qPCR_white_paperAndrea Ujvari
 
Dr. Mohammed Hamoda - Composting of Mixtures of Municipal Solid Wastes and Se...
Dr. Mohammed Hamoda - Composting of Mixtures of Municipal Solid Wastes and Se...Dr. Mohammed Hamoda - Composting of Mixtures of Municipal Solid Wastes and Se...
Dr. Mohammed Hamoda - Composting of Mixtures of Municipal Solid Wastes and Se...Hudhaib Al-Allatti
 
Virtual screening studies in search of dopamine D1 receptor ligands as antips...
Virtual screening studies in search of dopamine D1 receptor ligands as antips...Virtual screening studies in search of dopamine D1 receptor ligands as antips...
Virtual screening studies in search of dopamine D1 receptor ligands as antips...Monika Marcinkowska
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Using Calorimetric Data to Drive Accuracy in Computer-Aided Drug Design
Using Calorimetric Data to Drive Accuracy in Computer-Aided Drug DesignUsing Calorimetric Data to Drive Accuracy in Computer-Aided Drug Design
Using Calorimetric Data to Drive Accuracy in Computer-Aided Drug DesignMichael Gilson
 
EPA Summer 2013_Portable Pharmacokinetic Parameter Prediction Tool
EPA Summer 2013_Portable Pharmacokinetic Parameter Prediction ToolEPA Summer 2013_Portable Pharmacokinetic Parameter Prediction Tool
EPA Summer 2013_Portable Pharmacokinetic Parameter Prediction ToolEmerald Feng
 

What's hot (8)

Reciprocal symmetry plots in Countercurrent Chromatography
Reciprocal symmetry plots in Countercurrent ChromatographyReciprocal symmetry plots in Countercurrent Chromatography
Reciprocal symmetry plots in Countercurrent Chromatography
 
Cytoscan_Copy_Number_Confirmation_with_SYBR_Green_qPCR_white_paper
Cytoscan_Copy_Number_Confirmation_with_SYBR_Green_qPCR_white_paperCytoscan_Copy_Number_Confirmation_with_SYBR_Green_qPCR_white_paper
Cytoscan_Copy_Number_Confirmation_with_SYBR_Green_qPCR_white_paper
 
MU3C Presentation
MU3C PresentationMU3C Presentation
MU3C Presentation
 
Dr. Mohammed Hamoda - Composting of Mixtures of Municipal Solid Wastes and Se...
Dr. Mohammed Hamoda - Composting of Mixtures of Municipal Solid Wastes and Se...Dr. Mohammed Hamoda - Composting of Mixtures of Municipal Solid Wastes and Se...
Dr. Mohammed Hamoda - Composting of Mixtures of Municipal Solid Wastes and Se...
 
Virtual screening studies in search of dopamine D1 receptor ligands as antips...
Virtual screening studies in search of dopamine D1 receptor ligands as antips...Virtual screening studies in search of dopamine D1 receptor ligands as antips...
Virtual screening studies in search of dopamine D1 receptor ligands as antips...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Using Calorimetric Data to Drive Accuracy in Computer-Aided Drug Design
Using Calorimetric Data to Drive Accuracy in Computer-Aided Drug DesignUsing Calorimetric Data to Drive Accuracy in Computer-Aided Drug Design
Using Calorimetric Data to Drive Accuracy in Computer-Aided Drug Design
 
EPA Summer 2013_Portable Pharmacokinetic Parameter Prediction Tool
EPA Summer 2013_Portable Pharmacokinetic Parameter Prediction ToolEPA Summer 2013_Portable Pharmacokinetic Parameter Prediction Tool
EPA Summer 2013_Portable Pharmacokinetic Parameter Prediction Tool
 

Similar to QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemoinformatics Summer School. Strasbourg, France 25 – 29 June 2012. And ESOF EuroScience Open Forum, Dublin, Ireland 11-15 July 2012

In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...Kamel Mansouri
 
Electron Density Derived Descriptors in Drug Discovery and Protein Modeling
Electron Density Derived Descriptors in Drug Discovery and Protein ModelingElectron Density Derived Descriptors in Drug Discovery and Protein Modeling
Electron Density Derived Descriptors in Drug Discovery and Protein ModelingN. Sukumar
 
ACS 238th Meeting, 2009, Rasulev
ACS 238th Meeting, 2009, RasulevACS 238th Meeting, 2009, Rasulev
ACS 238th Meeting, 2009, RasulevB R
 
Gordon2003
Gordon2003Gordon2003
Gordon2003toluene
 
Phil. Trans. R. Soc. B-2013-Michalet-
Phil. Trans. R. Soc. B-2013-Michalet-Phil. Trans. R. Soc. B-2013-Michalet-
Phil. Trans. R. Soc. B-2013-Michalet-Fabrizio Guerrieri
 
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...Merck Life Sciences
 
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...MilliporeSigma
 
Identification and characterization of intact proteins in complex mixtures
Identification and characterization of intact proteins in complex mixturesIdentification and characterization of intact proteins in complex mixtures
Identification and characterization of intact proteins in complex mixturesExpedeon
 
IJBB-51-3-188-200
IJBB-51-3-188-200IJBB-51-3-188-200
IJBB-51-3-188-200sankar basu
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptorsRAJAN ROLTA
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysayeshasattarsandhu
 
Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Valery Tkachenko
 
consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...Deepak Rohilla
 
Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...
Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...
Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...ijsc
 
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...ijsc
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug designThanh Truong
 
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...Cresset
 

Similar to QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemoinformatics Summer School. Strasbourg, France 25 – 29 June 2012. And ESOF EuroScience Open Forum, Dublin, Ireland 11-15 July 2012 (20)

In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
 
Electron Density Derived Descriptors in Drug Discovery and Protein Modeling
Electron Density Derived Descriptors in Drug Discovery and Protein ModelingElectron Density Derived Descriptors in Drug Discovery and Protein Modeling
Electron Density Derived Descriptors in Drug Discovery and Protein Modeling
 
ACS 238th Meeting, 2009, Rasulev
ACS 238th Meeting, 2009, RasulevACS 238th Meeting, 2009, Rasulev
ACS 238th Meeting, 2009, Rasulev
 
Gordon2003
Gordon2003Gordon2003
Gordon2003
 
Phil. Trans. R. Soc. B-2013-Michalet-
Phil. Trans. R. Soc. B-2013-Michalet-Phil. Trans. R. Soc. B-2013-Michalet-
Phil. Trans. R. Soc. B-2013-Michalet-
 
I010415255
I010415255I010415255
I010415255
 
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...
 
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...
Characterization of monoclonal antibodies and Antibody drug conjugates by Sur...
 
Identification and characterization of intact proteins in complex mixtures
Identification and characterization of intact proteins in complex mixturesIdentification and characterization of intact proteins in complex mixtures
Identification and characterization of intact proteins in complex mixtures
 
IJBB-51-3-188-200
IJBB-51-3-188-200IJBB-51-3-188-200
IJBB-51-3-188-200
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptors
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
 
Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...
 
consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...
 
defense 2.0
defense 2.0defense 2.0
defense 2.0
 
Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...
Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...
Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...
 
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
 
3 D QSAR Approaches and Contour Map Analysis
3 D QSAR Approaches and Contour Map Analysis3 D QSAR Approaches and Contour Map Analysis
3 D QSAR Approaches and Contour Map Analysis
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug design
 
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
 

More from Kamel Mansouri

OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELSOPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELSKamel Mansouri
 
International Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsInternational Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsKamel Mansouri
 
Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Kamel Mansouri
 
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...Kamel Mansouri
 
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Kamel Mansouri
 
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Kamel Mansouri
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...Kamel Mansouri
 
Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Kamel Mansouri
 
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityCoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityKamel Mansouri
 
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...Kamel Mansouri
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Kamel Mansouri
 
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...Kamel Mansouri
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
 
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...Kamel Mansouri
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...Kamel Mansouri
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...Kamel Mansouri
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...Kamel Mansouri
 

More from Kamel Mansouri (17)

OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELSOPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
 
International Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsInternational Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology Problems
 
Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...
 
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
 
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
 
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...
 
Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...
 
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityCoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
 
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Recently uploaded (20)

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemoinformatics Summer School. Strasbourg, France 25 – 29 June 2012. And ESOF EuroScience Open Forum, Dublin, Ireland 11-15 July 2012

  • 1. QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS Kamel Mansouri, Tine Ringsted, Viviana Consonni, Davide Ballabio, Roberto Todeschini Milano Chemometrics and QSAR Research Group, Department of Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1 – 20126 Milano, Italy Persistent organic pollutants are highly bioaccumulative with toxic effects on humans, wildlife and the environment. Their persistency have been studied experimentally and theoretically for the evaluation of new chemicals to avoid Persistent Bioaccumulative and Toxic (PBT) compounds. In order to fill data gaps, QSARs are increasingly being used by scientific community as an alternative to animal testing and implemented in legislation (REACH). The goal of this study was to predict ready biodegradation of chemicals by QSAR modeling. The dataset used for this purpose was produced by the Japanese Ministry of International Trade and Industry (MITI) with experimental results according to the OECD test guideline 301C. Molecular descriptors from Dragon 6 were calculated. Variable selection coupled with classification methods were applied to find the most predictive models with low cross-validation error rate. The best models were after that validated using the preselected test set to check its prediction reliability and for further analysis.  1314 compounds with ready biodegradation (MITI-I test) were collected.[1]  A molecule was removed if:  it had a disconnected structure  the experimental value did not agree with the classification BOD threshold of 60%. (Fig1)  replicate values had more than 20% difference  the classification would change if nitrification was taken into account  After removal 1055 molecules remained (356 ready biodegradable/ 699 not ready biodegradable). (Fig2) Descriptors : Different blocks of molecular descriptors were initially calculated using Dragon6 [3]; 2D Atom pairs, Topological indices, Ring descriptors, Constitutional indices, Functional groups, 2D Matrix based, Atom centered fragments, Atom type E-state. Highly correlated, constant and near constant descriptors were removed automatically using the same software. Variable selection: In Matlab, using genetic algorithms (GA) [4] applied on each classification method, (SVM, KNN, PLSDA), two filters were performed to select the best descriptors: + first on each block apart, then on resulting sets all merged. + the frequency of selection after 100 GA runs was used to sort the descriptors by importance to keep only the 100 most appropriates ones for the last modeling step. Validation of models: 5-fold cross-validation. A test set which was chosen by randomly splitting the initial data set into 20% test and 80% training set while keeping the balance between ready biodegradable/not ready biodegradable. The training set contained 837 molecules and the test set 218 molecules. 0 10 20 30 40 50 60 70 80 Numberofmolecules 28 days <28 days QSAR SVM KNN PLS-DA Model ID Descriptors 5f-CV Test ER cv Spec. Sens. ER test Spec. Sens. SVM_1 20 0.151 0.775 0.924 0.135 0.806 0.925 SVM_2 23 0.153 0.785 0.910 0.131 0.806 0.932 SVM_3 24 0.156 0.775 0.913 0.131 0.819 0.918 Model ID Descriptors LVs Fit 5f-CV Test ER fit Spec. Sens. ER cv Spec. Sens. ER test Spec. Sens. PLSDA_1 26 9 0.140 0.887 0.834 0.141 0.891 0.826 0.145 0.861 0.849 PLSDA_2 28 9 0.144 0.891 0.821 0.142 0.887 0.828 0.145 0.847 0.863 PLSDA_3 23 5 0.144 0.880 0.832 0.141 0.884 0.834 0.148 0.833 0.870 Model ID Descriptors Distance K 5f-CV Test ER cv Spec. Sens. ER test Spec. Sens. KNN_1 17 Euclidean 6 0.136 0.859 0.870 0.121 0.847 0.911 KNN_2 17 CityBloc 6 0.139 0.852 0.870 0.138 0.847 0.877 KNN_1 15 CityBloc 8 0.141 0.849 0.870 0.142 0.806 0.911 Abstract: Acknowledgements: The research leading to these results has received funding from the [European Community's] Seventh Framework Programme ([FP7/2007-2013]) under Grant Agreement n° [238701] of the project Marie Curie ITN Environmental Chemoinformatics (ECO-ITN). http://www.eco-itn.eu References: [1]. Chemical Risk Information Platform (CHRIP), National Institute of Technology and Evaluation, Japan, http://www.safe.nite.go.jp/english/kizon/KIZON_start_hazkizon.htm [2]. Chih-Chung Chang and Chih-Jen Lin, LIBSVM 3.1 http://www.csie.ntu.edu.tw/~cjlin/libsvm [3]. Dragon6. Talete srl, Milano, Italy, http://www.talete.mi.it [4]. Leardi, R., Lupianez, A., 1998. Genetic algorithms applied to feature selection in PLS regression: how and when to use them. Chemometr. Intell. Lab. 41, 195–207. in vesselsubstancetestmg blankbyuptakeOmg-substanceby testuptakeOmg 22 BOD Table2: Selected best models using GA-SVM Table1: Selected best models using GA-KNN Table3: Selected best models using GA-PLSDA Fig2: Multidimensional scaling plot Fig1: Distribution of BOD values in Ready Biodeg. compounds. The number of K nearest neighbors was optimized during the GA calculations to meet the lowest cross-validation error rate (ER cv). The most selected descriptors are: Kier benzene-likeliness (BLI), nb. atoms of type 'sssN', sum of 'dssC' E- states, nb. of subst. benzene C(sp2) and nb. of ring tertiary C(sp3). The number of PLSDA latent variables (LVs) was optimized during the GA calculations to meet the lowest cross-validation error rate (ER cv). The most selected descriptors are: R-CX-R, nb. of atoms type 'sssN’, spectral mean absolute deviation from Laplace matrix , presence of C-Cl at Topo. Dist. 1, eccentricity, nb. of N atoms, nb. of (thio-) carbamates (aliphatic) average Randic index from Burden matrix weighted by mass and Cl attached to C1(sp3). The SVM results were obtained using the LIBSVM3.1 C library compiled in Matlab [2]. The kernel used in the radial-basis-function and its default parameters defined in the library. The most selected descriptors are: average MW, nb. of terminal primary C(sp3), mean first ionisation pot., nb. of N atoms, sum of ' aasC' E-states, nb. of heteroatoms, nb. of esters (aromatic), intrinsic state pseudoconnectivity index and freq. of C-P at Topo. Dist 2.