SlideShare a Scribd company logo
1 of 15
Download to read offline
1
Evaluating Multiple Machine Learning Models for
Biodegradation and Aquatic Toxicity
Sean Ekins,Thomas R. Lane and Fabio Urbina
2
Biodegradation
The ability of a material to decompose
after interactions with biological
elements
Aquatic Toxicity
Toxicity of industrial chemicals to
organisms living in the water body to
which the chemicals are discharged
What is Biodegradation and Aquatic Toxicity?
Mansouri et al., PMID: 29520515
Teixodóet al., PMID: 31598995
3
Globally, industrial chemicals end up in all types of bodies
of water, including freshwater
Testing the biodegradation and aquatic toxicity for every
compound is unrealistic - requires alternative
toxicological methods
One alternative is prediction of these properties
using Machine Learning models
Importance of Biodegradation and Aquatic Toxicity
& Challenges
4
Multiple algorithms
(Deep Neural Networks, k
Nearest Neighbors, Bernoulli
naïve Bayes, Linear Logistic
Regression, AdaBoost Decision
Tree, Random Forest, XGBoost,
Support Vector Machine, Elastic
Net Regression)
Fingerprint descriptors (ECFP4-8)
with nested, 5-fold cross-validation
Applicability domain is calculated
based on the reliability-density
neighborhood (RDN) method
Machine Learning: Assay Central ® Introduction
Lane et. al., PMID: 33325717
5
Organism Acute Toxicity Values Chronic Toxicity Values
Fish (Aquatic Vertebrate) 96-hour LC50 Chronic Value (ChV)
Daphnid (Aquatic Invertebrate) 48-hour LC50 Chronic Value (ChV)
Algae (Aquatic Plant) 72- or 96-hour EC50 Chronic Value (ChV)
Biodegradation: compounds classified using OECD guidelines (ECHA REACH,
EPA’s Biowin software, literature)
Classification à 3428 unique compounds (962 RB); Regression à ~200
Ecotoxicity: Multiple sources for acute and chronic ecotoxicity (ECOTOX, Ministry
of the Environment of Japan, literature) using EPA’s ECOSAR defined testing
parameters:
Data Mining: Biodegradation and Ecotoxicity Datasets
6
Classification models
Prediction of readily/non-biodegradable
All algorithms performed well using
nested, 5-fold cross-validation (CV),
with SVC outperforming others
Regression models
Prediction of ”ultimate” biodegradation of
compounds
CV suggested a predictive model, with
low MAE/RMSE and a moderate R2 (SVR
example)
Biodegradation Classification and Regression Models
7
Aquatic Acute Toxicity Classification Models
Organism Endpoint High (≤1 mg/L) Low (≥100 mg/L) Total Compounds
Fish LD50 880 664 2983
Daphnid LD50 347 484 1379
Green Algae EC50 390 126 1130
Thresholds based on the EPA’s ECOSAR defined toxicity
Aquatic toxicity concern level
High Concern: any acute value <1 mg/L
Moderate Concern: lowest acute value between >1 and < 100mg/L
Low Concern: all acute are >100 mg/L
Only the most sensitive species is considered
8
High (≤1 mg/L) Low (≥100 mg/L)
Organism Endpoint
Ave AUC
(Best)
Ave F-1 (best)
Ave AUC
(Best)
Ave F-1 (best)
Fish LD50 0.76 (0.80) 0.52 (0.61) 0.79 (0.82) 0.45 (0.56)
Daphnid LD50 0.78 (0.83) 0.53 (0.62) 0.83 (0.86) 0.66 (0.70)
Green Algae EC50 0.69 (0.72) 0.46 (0.53) 0.77 (0.79) 0.24 (0.36)
Aquatic Acute Toxicity Classification Models
Total dataset sizes ranged from ~320 - 1100
9
Aquatic Acute Toxicity Regression Models
10
Organism Endpoint High (≤0.1 mg/L) Low (≥10 mg/L) Total Compounds
Fish ChV 458 217 1087
Daphnid ChV 231 80 566
Green Algae Chv 77 90 321
Aquatic Chronic Toxicity Classification Models
Data from the EPA’s ECOTOX website and literature
ChV requires calculation of the geometric mean of NOEC & LOEC
LOEC = lowest observed effect concentration, NOEC = no observed effect concentration
11
High (≤1 mg/L) Low (≥100 mg/L)
Organism Endpoint Ave AUC (Best) Ave F-1 (best) Ave AUC (Best) Ave F-1 (best)
Fish ChV 0.67 (0.67) 0.53 (0.58) 0.72 (0.75) 0.29 (0.47)
Daphnid ChV 0.78 (0.80) 0.67 (0.69) 0.78 (0.82) 0.40 (0.48)
Green Algae Chv 0.67 (0.71) 0.37 (0.49) 0.63 (0.66) 0.29 (0.42)
Aquatic Chronic Toxicity Classification Models
Total dataset sizes ranged from ~1100 - 3000
12
Aquatic Chronic Toxicity Regression Models
13
Summary
• Biodegradation Models - SVC performs the best
• Aquatic chronic and acute toxicity models - Datasets from EPA’s
ECOTOX and literature
• Support Vector Machine outperformed the other algorithms
• Future: Many more descriptors to try!
• Implementation of conformal predictors to add a reliable
confidence scoring system
Sheffield and Judson PMID: 31560848
14
Improving the Quality of Predictions
- Conformal Predictors for Biodegradation
Angelopoulos and Bates arXiv:2107.07511
Uses a calibration dataset to determine optimal prediction score threshold for each class
15
Josh Harris
Scott Snyder
Discussions with:
Diedrich Bermudez
Daniel Mucs
Funding
NIGMS: R44GM122196-04A1
NIEHS: 1R43ES031038-02A1
Contact me at:
sean@collaborationspharma.com
Acknowledgments

More Related Content

Similar to Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic Toxicity

Lecture toxicity testing
Lecture   toxicity testingLecture   toxicity testing
Lecture toxicity testing
Fiddy Prasetiya
 
Lab #11 – EcotoxicologyPrelab DiscussionT.docx
Lab #11 – EcotoxicologyPrelab DiscussionT.docxLab #11 – EcotoxicologyPrelab DiscussionT.docx
Lab #11 – EcotoxicologyPrelab DiscussionT.docx
croysierkathey
 
Halsey Capstone Presentation
Halsey Capstone Presentation Halsey Capstone Presentation
Halsey Capstone Presentation
Aquiera Halsey
 
Removal of anionic surfactant from grey water and its comparison with chemica...
Removal of anionic surfactant from grey water and its comparison with chemica...Removal of anionic surfactant from grey water and its comparison with chemica...
Removal of anionic surfactant from grey water and its comparison with chemica...
Alexander Decker
 

Similar to Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic Toxicity (20)

Drinking water standards
Drinking water standardsDrinking water standards
Drinking water standards
 
Agricultural and food chemistry sudeb mandal acs
Agricultural and food chemistry  sudeb mandal acsAgricultural and food chemistry  sudeb mandal acs
Agricultural and food chemistry sudeb mandal acs
 
Emerging Contaminants in Biosolids
Emerging Contaminants in BiosolidsEmerging Contaminants in Biosolids
Emerging Contaminants in Biosolids
 
Modern wastewater disinfection
Modern wastewater disinfectionModern wastewater disinfection
Modern wastewater disinfection
 
Ecotoxicology cadmium poster RIVM
Ecotoxicology cadmium poster RIVMEcotoxicology cadmium poster RIVM
Ecotoxicology cadmium poster RIVM
 
Lecture toxicity testing
Lecture   toxicity testingLecture   toxicity testing
Lecture toxicity testing
 
Molecular exploration of biomarkers as early warning system of aquatic pollution
Molecular exploration of biomarkers as early warning system of aquatic pollutionMolecular exploration of biomarkers as early warning system of aquatic pollution
Molecular exploration of biomarkers as early warning system of aquatic pollution
 
Lab #11 – EcotoxicologyPrelab DiscussionT.docx
Lab #11 – EcotoxicologyPrelab DiscussionT.docxLab #11 – EcotoxicologyPrelab DiscussionT.docx
Lab #11 – EcotoxicologyPrelab DiscussionT.docx
 
Selection of Biological Methods for Industrial Effluents Assessments
Selection of Biological Methods for Industrial Effluents AssessmentsSelection of Biological Methods for Industrial Effluents Assessments
Selection of Biological Methods for Industrial Effluents Assessments
 
Halsey Capstone Presentation
Halsey Capstone Presentation Halsey Capstone Presentation
Halsey Capstone Presentation
 
U0 vqmt qymza=
U0 vqmt qymza=U0 vqmt qymza=
U0 vqmt qymza=
 
Ecotoxicity of river sediments: invertebrate community, toxicity bioassays an...
Ecotoxicity of river sediments: invertebrate community, toxicity bioassays an...Ecotoxicity of river sediments: invertebrate community, toxicity bioassays an...
Ecotoxicity of river sediments: invertebrate community, toxicity bioassays an...
 
UC Davis Summary of Past Research.pdf
UC Davis Summary of Past Research.pdfUC Davis Summary of Past Research.pdf
UC Davis Summary of Past Research.pdf
 
Surveillance and Risk Assessment of Antibiotic Resistance in the Urban Water ...
Surveillance and Risk Assessment of Antibiotic Resistance in the Urban Water ...Surveillance and Risk Assessment of Antibiotic Resistance in the Urban Water ...
Surveillance and Risk Assessment of Antibiotic Resistance in the Urban Water ...
 
Effect of Fish Size and Treatment Conditions on the Piscicidal Activity of Ne...
Effect of Fish Size and Treatment Conditions on the Piscicidal Activity of Ne...Effect of Fish Size and Treatment Conditions on the Piscicidal Activity of Ne...
Effect of Fish Size and Treatment Conditions on the Piscicidal Activity of Ne...
 
F1804013742
F1804013742F1804013742
F1804013742
 
INVESTIGATION OF IN-VITRO ANTHELMINTIC AND CYTOTOXIC ACTIVITIES OF ARTABOTRYS...
INVESTIGATION OF IN-VITRO ANTHELMINTIC AND CYTOTOXIC ACTIVITIES OF ARTABOTRYS...INVESTIGATION OF IN-VITRO ANTHELMINTIC AND CYTOTOXIC ACTIVITIES OF ARTABOTRYS...
INVESTIGATION OF IN-VITRO ANTHELMINTIC AND CYTOTOXIC ACTIVITIES OF ARTABOTRYS...
 
Removal of anionic surfactant from grey water and its comparison with chemica...
Removal of anionic surfactant from grey water and its comparison with chemica...Removal of anionic surfactant from grey water and its comparison with chemica...
Removal of anionic surfactant from grey water and its comparison with chemica...
 
Removal of anionic surfactant from grey water and its comparison with chemica...
Removal of anionic surfactant from grey water and its comparison with chemica...Removal of anionic surfactant from grey water and its comparison with chemica...
Removal of anionic surfactant from grey water and its comparison with chemica...
 
STUDIES ON TREATMENT OF PHARMACEUTICAL WASTE EFFLUENTS BY POLYMER MATERIALS M...
STUDIES ON TREATMENT OF PHARMACEUTICAL WASTE EFFLUENTS BY POLYMER MATERIALS M...STUDIES ON TREATMENT OF PHARMACEUTICAL WASTE EFFLUENTS BY POLYMER MATERIALS M...
STUDIES ON TREATMENT OF PHARMACEUTICAL WASTE EFFLUENTS BY POLYMER MATERIALS M...
 

More from Sean Ekins

More from Sean Ekins (20)

How to Win a small business grant.pptx
How to Win a small business grant.pptxHow to Win a small business grant.pptx
How to Win a small business grant.pptx
 
A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...
 
Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...
 
Bayesian Models for Chagas Disease
Bayesian Models for Chagas DiseaseBayesian Models for Chagas Disease
Bayesian Models for Chagas Disease
 
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
 
Drug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issueDrug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issue
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
 
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchFive Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation
 
academic / small company collaborations for rare and neglected diseasesv2
 academic / small company collaborations for rare and neglected diseasesv2 academic / small company collaborations for rare and neglected diseasesv2
academic / small company collaborations for rare and neglected diseasesv2
 
CDD models case study #3
CDD models case study #3 CDD models case study #3
CDD models case study #3
 
CDD models case study #2
CDD models case study #2 CDD models case study #2
CDD models case study #2
 
CDD Models case study #1
CDD Models case study #1 CDD Models case study #1
CDD Models case study #1
 
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
 
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
 
The future of computational chemistry b ig
The future of computational chemistry b igThe future of computational chemistry b ig
The future of computational chemistry b ig
 
#ZikaOpen: Homology Models -
#ZikaOpen: Homology Models - #ZikaOpen: Homology Models -
#ZikaOpen: Homology Models -
 
Slas talk 2016
Slas talk 2016Slas talk 2016
Slas talk 2016
 
Pros and cons of social networking for scientists
Pros and cons of social networking for scientistsPros and cons of social networking for scientists
Pros and cons of social networking for scientists
 
CDD: Vault, CDD: Vision and CDD: Models for Drug Discovery Collaborations
CDD: Vault, CDD: Vision and CDD: Models for Drug Discovery CollaborationsCDD: Vault, CDD: Vision and CDD: Models for Drug Discovery Collaborations
CDD: Vault, CDD: Vision and CDD: Models for Drug Discovery Collaborations
 

Recently uploaded

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 

Recently uploaded (20)

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic Toxicity

  • 1. 1 Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic Toxicity Sean Ekins,Thomas R. Lane and Fabio Urbina
  • 2. 2 Biodegradation The ability of a material to decompose after interactions with biological elements Aquatic Toxicity Toxicity of industrial chemicals to organisms living in the water body to which the chemicals are discharged What is Biodegradation and Aquatic Toxicity? Mansouri et al., PMID: 29520515 Teixodóet al., PMID: 31598995
  • 3. 3 Globally, industrial chemicals end up in all types of bodies of water, including freshwater Testing the biodegradation and aquatic toxicity for every compound is unrealistic - requires alternative toxicological methods One alternative is prediction of these properties using Machine Learning models Importance of Biodegradation and Aquatic Toxicity & Challenges
  • 4. 4 Multiple algorithms (Deep Neural Networks, k Nearest Neighbors, Bernoulli naïve Bayes, Linear Logistic Regression, AdaBoost Decision Tree, Random Forest, XGBoost, Support Vector Machine, Elastic Net Regression) Fingerprint descriptors (ECFP4-8) with nested, 5-fold cross-validation Applicability domain is calculated based on the reliability-density neighborhood (RDN) method Machine Learning: Assay Central ® Introduction Lane et. al., PMID: 33325717
  • 5. 5 Organism Acute Toxicity Values Chronic Toxicity Values Fish (Aquatic Vertebrate) 96-hour LC50 Chronic Value (ChV) Daphnid (Aquatic Invertebrate) 48-hour LC50 Chronic Value (ChV) Algae (Aquatic Plant) 72- or 96-hour EC50 Chronic Value (ChV) Biodegradation: compounds classified using OECD guidelines (ECHA REACH, EPA’s Biowin software, literature) Classification à 3428 unique compounds (962 RB); Regression à ~200 Ecotoxicity: Multiple sources for acute and chronic ecotoxicity (ECOTOX, Ministry of the Environment of Japan, literature) using EPA’s ECOSAR defined testing parameters: Data Mining: Biodegradation and Ecotoxicity Datasets
  • 6. 6 Classification models Prediction of readily/non-biodegradable All algorithms performed well using nested, 5-fold cross-validation (CV), with SVC outperforming others Regression models Prediction of ”ultimate” biodegradation of compounds CV suggested a predictive model, with low MAE/RMSE and a moderate R2 (SVR example) Biodegradation Classification and Regression Models
  • 7. 7 Aquatic Acute Toxicity Classification Models Organism Endpoint High (≤1 mg/L) Low (≥100 mg/L) Total Compounds Fish LD50 880 664 2983 Daphnid LD50 347 484 1379 Green Algae EC50 390 126 1130 Thresholds based on the EPA’s ECOSAR defined toxicity Aquatic toxicity concern level High Concern: any acute value <1 mg/L Moderate Concern: lowest acute value between >1 and < 100mg/L Low Concern: all acute are >100 mg/L Only the most sensitive species is considered
  • 8. 8 High (≤1 mg/L) Low (≥100 mg/L) Organism Endpoint Ave AUC (Best) Ave F-1 (best) Ave AUC (Best) Ave F-1 (best) Fish LD50 0.76 (0.80) 0.52 (0.61) 0.79 (0.82) 0.45 (0.56) Daphnid LD50 0.78 (0.83) 0.53 (0.62) 0.83 (0.86) 0.66 (0.70) Green Algae EC50 0.69 (0.72) 0.46 (0.53) 0.77 (0.79) 0.24 (0.36) Aquatic Acute Toxicity Classification Models Total dataset sizes ranged from ~320 - 1100
  • 9. 9 Aquatic Acute Toxicity Regression Models
  • 10. 10 Organism Endpoint High (≤0.1 mg/L) Low (≥10 mg/L) Total Compounds Fish ChV 458 217 1087 Daphnid ChV 231 80 566 Green Algae Chv 77 90 321 Aquatic Chronic Toxicity Classification Models Data from the EPA’s ECOTOX website and literature ChV requires calculation of the geometric mean of NOEC & LOEC LOEC = lowest observed effect concentration, NOEC = no observed effect concentration
  • 11. 11 High (≤1 mg/L) Low (≥100 mg/L) Organism Endpoint Ave AUC (Best) Ave F-1 (best) Ave AUC (Best) Ave F-1 (best) Fish ChV 0.67 (0.67) 0.53 (0.58) 0.72 (0.75) 0.29 (0.47) Daphnid ChV 0.78 (0.80) 0.67 (0.69) 0.78 (0.82) 0.40 (0.48) Green Algae Chv 0.67 (0.71) 0.37 (0.49) 0.63 (0.66) 0.29 (0.42) Aquatic Chronic Toxicity Classification Models Total dataset sizes ranged from ~1100 - 3000
  • 12. 12 Aquatic Chronic Toxicity Regression Models
  • 13. 13 Summary • Biodegradation Models - SVC performs the best • Aquatic chronic and acute toxicity models - Datasets from EPA’s ECOTOX and literature • Support Vector Machine outperformed the other algorithms • Future: Many more descriptors to try! • Implementation of conformal predictors to add a reliable confidence scoring system Sheffield and Judson PMID: 31560848
  • 14. 14 Improving the Quality of Predictions - Conformal Predictors for Biodegradation Angelopoulos and Bates arXiv:2107.07511 Uses a calibration dataset to determine optimal prediction score threshold for each class
  • 15. 15 Josh Harris Scott Snyder Discussions with: Diedrich Bermudez Daniel Mucs Funding NIGMS: R44GM122196-04A1 NIEHS: 1R43ES031038-02A1 Contact me at: sean@collaborationspharma.com Acknowledgments