AI Project IIT.pptx

A Blood Brain
Barrier
Permeability
Prediction
Model
Aadarsh Singh (22072005)
Subhojit Paul (22072008)

CONTENTS
• Problem Statement
• Introduction
• Dataset
• Work Done In The Paper
• Our Progress So Far
• Results
• Future Work

Problem Statement
Improving the accuracy of predicting the blood-brain barrier
permeability of compounds for CNS-acting drug
development using deep learning and machine learning
algorithms.

Introduction
• The Blood-Brain Barrier (BBB) is a semipermeable boundary that protects the
central nervous system (CNS) and separates it from the bloodstream.
• Drugs that target the CNS need to cross the BBB to be effective.
• There is a high attrition rate of drug candidates due to their inability to permeate
the BBB.
• Clinical experiments to determine BBB permeability are accurate but time-
consuming and labor-intensive.

Introduction
• Computational methods using deep learning and machine learning algorithms
have been developed to predict BBB permeability, but accuracy has been an
issue.
• The major challenge while applying ML algorithms is selecting optimal features to
develop predictive models based on labeled BBB permeability datasets.
• To overcome this challenge, we applied DL algorithms and compared their
performance with traditional ML algorithms.

Dataset
• A total of 3,971 compounds with
information on their BBB permeability
were collected and later checked for
redundancy.
• After curation, the dataset consisted of
3,568 non-redundant compounds, with
2,592 BBB permeable and 976 BBB
non-permeable compounds.

Features
Used In The
Paper
• Physicochemical Properties: The first set of
features used were the physicochemical properties of
the compounds. These included molecular weight,
hydrogen bond donors, hydrogen bond acceptors,
logP, polar surface area, and others.
• MACCS Keys: The MACCS (Molecular ACCess
System) keys are binary fingerprints that represent the
presence or absence of certain chemical
substructures in a compound. These fingerprints are
used to encode the chemical structure of a compound.
• Substructure Fingerprints: The substructure
fingerprints were generated using the Python package
RDKit. These fingerprints represent the presence or
absence of certain chemical substructures in a
compound, similar to the MACCS keys.

ML Based
Model Used
Support Vector Machine
Naïve Bayes
k-Nearest Neighbor
Random Forest

DL Based
Models Used
• Deep Neural Network
• Convolutional Neural Network-1 Dimension
(CNN-1D)
• Convolutional Neural Network by VGG16
Transfer Learning(CNN-VGG16)

Features For ML Based
Models
• We used three types of features to represent the
compounds:
• Physicochemical properties
• MACCS fingerprints
• Substructure fingerprints.
• A total of 1,917 features were calculated for each
compound using the PaDel software.
• The dataset was split into a training set and a test set at a
3:1 ratio to avoid bias.
• The Feature set used on the models is preprocessed to
remove all NaN values.

ML Based Model
• We have Implemented the following Machine learning
Based Models :
1. Random Forest
2. K-Nearest Neighbours
3. Naive Bayes
4. Support Vector Machine

Features For DL Based
Models
• For Deep Learning Based Models, the Python
package RDKit was used to generate the structure images
of the compounds using their Simplified molecular input
line entry system (SMILES) notations.
• Dataset
• Train:
• Permeable: 2092 Images
• Non-Permeable: 776 Images
• Test:
• Permeable: 500 Images
• Non-Permeable: 200 Images

SMILES Notation
• Example Formula
CC(=O)NCCc1c[nH]c2ccc(OC)cc12
RDKIT Generated Image
RDKit Visualization

DL Based Model
• The CNN-2D Model and VGG-16 were used in our study using RDKit-generated
images.
• The images were scaled to a pixel size of 300*300 to develop and validate the
BBB permeability prediction model.
• The Sequential Model is trained on 35 Epochs (due to time constraints).
• The developed model was tested with an independent test set consisting of 700
images.
• The Model gave an Accuracy of 91.4% on the test set.

Results
Model Name Accuracy F1 Score
Naïve Bayes 82.81 92.0
Random Forest 81.01 91.0
Support Vector Machine 85.73 95.0
K-Nearest Neighbours 83.65 93.0
CNN-2D 81.57 88.0
VGG 91.14 93.66

Future Works
• To train our models more effectively, we will do feature reduction.
• Principal component analysis (PCA) can be used for feature
reduction.
• CNN with VGG-16 can be implemented using transfer learning for better
accuracy.
• Deep Neural Nets can also be applied with some parameters already
given in paper and with some modifications.

AI Project IIT.pptx

Recommended

Recommended

More Related Content

Similar to AI Project IIT.pptx

Similar to AI Project IIT.pptx (20)

Recently uploaded

Recently uploaded (20)

AI Project IIT.pptx