SlideShare a Scribd company logo
1 of 24
A Blood Brain
Barrier
Permeability
Prediction
Model
Aadarsh Singh (22072005)
Subhojit Paul (22072008)
CONTENTS
• Problem Statement
• Introduction
• Dataset
• Work Done In The Paper
• Our Progress So Far
• Results
• Future Work
Problem Statement
Improving the accuracy of predicting the blood-brain barrier
permeability of compounds for CNS-acting drug
development using deep learning and machine learning
algorithms.
Introduction
Introduction
• The Blood-Brain Barrier (BBB) is a semipermeable boundary that protects the
central nervous system (CNS) and separates it from the bloodstream.
• Drugs that target the CNS need to cross the BBB to be effective.
• There is a high attrition rate of drug candidates due to their inability to permeate
the BBB.
• Clinical experiments to determine BBB permeability are accurate but time-
consuming and labor-intensive.
Introduction
• Computational methods using deep learning and machine learning algorithms
have been developed to predict BBB permeability, but accuracy has been an
issue.
• The major challenge while applying ML algorithms is selecting optimal features to
develop predictive models based on labeled BBB permeability datasets.
• To overcome this challenge, we applied DL algorithms and compared their
performance with traditional ML algorithms.
Dataset
Dataset
• A total of 3,971 compounds with
information on their BBB permeability
were collected and later checked for
redundancy.
• After curation, the dataset consisted of
3,568 non-redundant compounds, with
2,592 BBB permeable and 976 BBB
non-permeable compounds.
Work Done In The
Paper
Features
Used In The
Paper
• Physicochemical Properties: The first set of
features used were the physicochemical properties of
the compounds. These included molecular weight,
hydrogen bond donors, hydrogen bond acceptors,
logP, polar surface area, and others.
• MACCS Keys: The MACCS (Molecular ACCess
System) keys are binary fingerprints that represent the
presence or absence of certain chemical
substructures in a compound. These fingerprints are
used to encode the chemical structure of a compound.
• Substructure Fingerprints: The substructure
fingerprints were generated using the Python package
RDKit. These fingerprints represent the presence or
absence of certain chemical substructures in a
compound, similar to the MACCS keys.
ML Based
Model Used
Support Vector Machine
Naïve Bayes
k-Nearest Neighbor
Random Forest
DL Based
Models Used
• Deep Neural Network
• Convolutional Neural Network-1 Dimension
(CNN-1D)
• Convolutional Neural Network by VGG16
Transfer Learning(CNN-VGG16)
Our Progress So Far
Features For ML Based
Models
• We used three types of features to represent the
compounds:
• Physicochemical properties
• MACCS fingerprints
• Substructure fingerprints.
• A total of 1,917 features were calculated for each
compound using the PaDel software.
• The dataset was split into a training set and a test set at a
3:1 ratio to avoid bias.
• The Feature set used on the models is preprocessed to
remove all NaN values.
ML Based Model
• We have Implemented the following Machine learning
Based Models :
1. Random Forest
2. K-Nearest Neighbours
3. Naive Bayes
4. Support Vector Machine
Features For DL Based
Models
• For Deep Learning Based Models, the Python
package RDKit was used to generate the structure images
of the compounds using their Simplified molecular input
line entry system (SMILES) notations.
• Dataset
• Train:
• Permeable: 2092 Images
• Non-Permeable: 776 Images
• Test:
• Permeable: 500 Images
• Non-Permeable: 200 Images
SMILES Notation
• Example Formula
CC(=O)NCCc1c[nH]c2ccc(OC)cc12
RDKIT Generated Image
RDKit Visualization
DL Based Model
• The CNN-2D Model and VGG-16 were used in our study using RDKit-generated
images.
• The images were scaled to a pixel size of 300*300 to develop and validate the
BBB permeability prediction model.
• The Sequential Model is trained on 35 Epochs (due to time constraints).
• The developed model was tested with an independent test set consisting of 700
images.
• The Model gave an Accuracy of 91.4% on the test set.
CNN Model
Summary
VGG-16 Model
Results
Results
Model Name Accuracy F1 Score
Naïve Bayes 82.81 92.0
Random Forest 81.01 91.0
Support Vector Machine 85.73 95.0
K-Nearest Neighbours 83.65 93.0
CNN-2D 81.57 88.0
VGG 91.14 93.66
Future Works
• To train our models more effectively, we will do feature reduction.
• Principal component analysis (PCA) can be used for feature
reduction.
• CNN with VGG-16 can be implemented using transfer learning for better
accuracy.
• Deep Neural Nets can also be applied with some parameters already
given in paper and with some modifications.
Thank You

More Related Content

Similar to AI Project IIT.pptx

Teach a neural network to read handwriting
Teach a neural network to read handwritingTeach a neural network to read handwriting
Teach a neural network to read handwritingVipul Kaushal
 
AI approaches in healthcare - targeting precise and personalized medicine
AI approaches in healthcare - targeting precise and personalized medicine AI approaches in healthcare - targeting precise and personalized medicine
AI approaches in healthcare - targeting precise and personalized medicine DayOne
 
Chemistry development kit
Chemistry development kitChemistry development kit
Chemistry development kitAlichy Sowmya
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineMichael Gerke
 
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient SimulationDongwonSon1
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
 
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identificationabhinav vedanbhatla
 
2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsrDebora Da Rosa
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
Efficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image ClassficationEfficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image ClassficationYogendra Tamang
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Moving object detection in complex scene
Moving object detection in complex sceneMoving object detection in complex scene
Moving object detection in complex sceneKumar Mayank
 
A schema generation approach for column oriented no sql data stores
A schema generation approach for column oriented no sql data storesA schema generation approach for column oriented no sql data stores
A schema generation approach for column oriented no sql data storesKIRAN V
 
Full resolution image compression with recurrent neural networks
Full resolution image compression with  recurrent neural networksFull resolution image compression with  recurrent neural networks
Full resolution image compression with recurrent neural networksAshis Kumar Chanda
 
Smart environment for industry 4.0
Smart environment for industry 4.0Smart environment for industry 4.0
Smart environment for industry 4.0JawadSajid2
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...IEEEFINALYEARSTUDENTPROJECT
 

Similar to AI Project IIT.pptx (20)

Teach a neural network to read handwriting
Teach a neural network to read handwritingTeach a neural network to read handwriting
Teach a neural network to read handwriting
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
AI approaches in healthcare - targeting precise and personalized medicine
AI approaches in healthcare - targeting precise and personalized medicine AI approaches in healthcare - targeting precise and personalized medicine
AI approaches in healthcare - targeting precise and personalized medicine
 
Chemistry development kit
Chemistry development kitChemistry development kit
Chemistry development kit
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identification
 
2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Efficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image ClassficationEfficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image Classfication
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Moving object detection in complex scene
Moving object detection in complex sceneMoving object detection in complex scene
Moving object detection in complex scene
 
A schema generation approach for column oriented no sql data stores
A schema generation approach for column oriented no sql data storesA schema generation approach for column oriented no sql data stores
A schema generation approach for column oriented no sql data stores
 
Full resolution image compression with recurrent neural networks
Full resolution image compression with  recurrent neural networksFull resolution image compression with  recurrent neural networks
Full resolution image compression with recurrent neural networks
 
The Importance of Time in Visual Attention Models
The Importance of Time in Visual Attention ModelsThe Importance of Time in Visual Attention Models
The Importance of Time in Visual Attention Models
 
Smart environment for industry 4.0
Smart environment for industry 4.0Smart environment for industry 4.0
Smart environment for industry 4.0
 
neuralAC
neuralACneuralAC
neuralAC
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 

Recently uploaded (20)

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 

AI Project IIT.pptx

  • 1. A Blood Brain Barrier Permeability Prediction Model Aadarsh Singh (22072005) Subhojit Paul (22072008)
  • 2. CONTENTS • Problem Statement • Introduction • Dataset • Work Done In The Paper • Our Progress So Far • Results • Future Work
  • 3. Problem Statement Improving the accuracy of predicting the blood-brain barrier permeability of compounds for CNS-acting drug development using deep learning and machine learning algorithms.
  • 5. Introduction • The Blood-Brain Barrier (BBB) is a semipermeable boundary that protects the central nervous system (CNS) and separates it from the bloodstream. • Drugs that target the CNS need to cross the BBB to be effective. • There is a high attrition rate of drug candidates due to their inability to permeate the BBB. • Clinical experiments to determine BBB permeability are accurate but time- consuming and labor-intensive.
  • 6. Introduction • Computational methods using deep learning and machine learning algorithms have been developed to predict BBB permeability, but accuracy has been an issue. • The major challenge while applying ML algorithms is selecting optimal features to develop predictive models based on labeled BBB permeability datasets. • To overcome this challenge, we applied DL algorithms and compared their performance with traditional ML algorithms.
  • 8. Dataset • A total of 3,971 compounds with information on their BBB permeability were collected and later checked for redundancy. • After curation, the dataset consisted of 3,568 non-redundant compounds, with 2,592 BBB permeable and 976 BBB non-permeable compounds.
  • 9. Work Done In The Paper
  • 10. Features Used In The Paper • Physicochemical Properties: The first set of features used were the physicochemical properties of the compounds. These included molecular weight, hydrogen bond donors, hydrogen bond acceptors, logP, polar surface area, and others. • MACCS Keys: The MACCS (Molecular ACCess System) keys are binary fingerprints that represent the presence or absence of certain chemical substructures in a compound. These fingerprints are used to encode the chemical structure of a compound. • Substructure Fingerprints: The substructure fingerprints were generated using the Python package RDKit. These fingerprints represent the presence or absence of certain chemical substructures in a compound, similar to the MACCS keys.
  • 11. ML Based Model Used Support Vector Machine Naïve Bayes k-Nearest Neighbor Random Forest
  • 12. DL Based Models Used • Deep Neural Network • Convolutional Neural Network-1 Dimension (CNN-1D) • Convolutional Neural Network by VGG16 Transfer Learning(CNN-VGG16)
  • 14. Features For ML Based Models • We used three types of features to represent the compounds: • Physicochemical properties • MACCS fingerprints • Substructure fingerprints. • A total of 1,917 features were calculated for each compound using the PaDel software. • The dataset was split into a training set and a test set at a 3:1 ratio to avoid bias. • The Feature set used on the models is preprocessed to remove all NaN values.
  • 15. ML Based Model • We have Implemented the following Machine learning Based Models : 1. Random Forest 2. K-Nearest Neighbours 3. Naive Bayes 4. Support Vector Machine
  • 16. Features For DL Based Models • For Deep Learning Based Models, the Python package RDKit was used to generate the structure images of the compounds using their Simplified molecular input line entry system (SMILES) notations. • Dataset • Train: • Permeable: 2092 Images • Non-Permeable: 776 Images • Test: • Permeable: 500 Images • Non-Permeable: 200 Images
  • 17. SMILES Notation • Example Formula CC(=O)NCCc1c[nH]c2ccc(OC)cc12 RDKIT Generated Image RDKit Visualization
  • 18. DL Based Model • The CNN-2D Model and VGG-16 were used in our study using RDKit-generated images. • The images were scaled to a pixel size of 300*300 to develop and validate the BBB permeability prediction model. • The Sequential Model is trained on 35 Epochs (due to time constraints). • The developed model was tested with an independent test set consisting of 700 images. • The Model gave an Accuracy of 91.4% on the test set.
  • 22. Results Model Name Accuracy F1 Score Naïve Bayes 82.81 92.0 Random Forest 81.01 91.0 Support Vector Machine 85.73 95.0 K-Nearest Neighbours 83.65 93.0 CNN-2D 81.57 88.0 VGG 91.14 93.66
  • 23. Future Works • To train our models more effectively, we will do feature reduction. • Principal component analysis (PCA) can be used for feature reduction. • CNN with VGG-16 can be implemented using transfer learning for better accuracy. • Deep Neural Nets can also be applied with some parameters already given in paper and with some modifications.