2. CONTENTS
• Problem Statement
• Introduction
• Dataset
• Work Done In The Paper
• Our Progress So Far
• Results
• Future Work
3. Problem Statement
Improving the accuracy of predicting the blood-brain barrier
permeability of compounds for CNS-acting drug
development using deep learning and machine learning
algorithms.
5. Introduction
• The Blood-Brain Barrier (BBB) is a semipermeable boundary that protects the
central nervous system (CNS) and separates it from the bloodstream.
• Drugs that target the CNS need to cross the BBB to be effective.
• There is a high attrition rate of drug candidates due to their inability to permeate
the BBB.
• Clinical experiments to determine BBB permeability are accurate but time-
consuming and labor-intensive.
6. Introduction
• Computational methods using deep learning and machine learning algorithms
have been developed to predict BBB permeability, but accuracy has been an
issue.
• The major challenge while applying ML algorithms is selecting optimal features to
develop predictive models based on labeled BBB permeability datasets.
• To overcome this challenge, we applied DL algorithms and compared their
performance with traditional ML algorithms.
8. Dataset
• A total of 3,971 compounds with
information on their BBB permeability
were collected and later checked for
redundancy.
• After curation, the dataset consisted of
3,568 non-redundant compounds, with
2,592 BBB permeable and 976 BBB
non-permeable compounds.
10. Features
Used In The
Paper
• Physicochemical Properties: The first set of
features used were the physicochemical properties of
the compounds. These included molecular weight,
hydrogen bond donors, hydrogen bond acceptors,
logP, polar surface area, and others.
• MACCS Keys: The MACCS (Molecular ACCess
System) keys are binary fingerprints that represent the
presence or absence of certain chemical
substructures in a compound. These fingerprints are
used to encode the chemical structure of a compound.
• Substructure Fingerprints: The substructure
fingerprints were generated using the Python package
RDKit. These fingerprints represent the presence or
absence of certain chemical substructures in a
compound, similar to the MACCS keys.
12. DL Based
Models Used
• Deep Neural Network
• Convolutional Neural Network-1 Dimension
(CNN-1D)
• Convolutional Neural Network by VGG16
Transfer Learning(CNN-VGG16)
14. Features For ML Based
Models
• We used three types of features to represent the
compounds:
• Physicochemical properties
• MACCS fingerprints
• Substructure fingerprints.
• A total of 1,917 features were calculated for each
compound using the PaDel software.
• The dataset was split into a training set and a test set at a
3:1 ratio to avoid bias.
• The Feature set used on the models is preprocessed to
remove all NaN values.
15. ML Based Model
• We have Implemented the following Machine learning
Based Models :
1. Random Forest
2. K-Nearest Neighbours
3. Naive Bayes
4. Support Vector Machine
16. Features For DL Based
Models
• For Deep Learning Based Models, the Python
package RDKit was used to generate the structure images
of the compounds using their Simplified molecular input
line entry system (SMILES) notations.
• Dataset
• Train:
• Permeable: 2092 Images
• Non-Permeable: 776 Images
• Test:
• Permeable: 500 Images
• Non-Permeable: 200 Images
17. SMILES Notation
• Example Formula
CC(=O)NCCc1c[nH]c2ccc(OC)cc12
RDKIT Generated Image
RDKit Visualization
18. DL Based Model
• The CNN-2D Model and VGG-16 were used in our study using RDKit-generated
images.
• The images were scaled to a pixel size of 300*300 to develop and validate the
BBB permeability prediction model.
• The Sequential Model is trained on 35 Epochs (due to time constraints).
• The developed model was tested with an independent test set consisting of 700
images.
• The Model gave an Accuracy of 91.4% on the test set.
22. Results
Model Name Accuracy F1 Score
Naïve Bayes 82.81 92.0
Random Forest 81.01 91.0
Support Vector Machine 85.73 95.0
K-Nearest Neighbours 83.65 93.0
CNN-2D 81.57 88.0
VGG 91.14 93.66
23. Future Works
• To train our models more effectively, we will do feature reduction.
• Principal component analysis (PCA) can be used for feature
reduction.
• CNN with VGG-16 can be implemented using transfer learning for better
accuracy.
• Deep Neural Nets can also be applied with some parameters already
given in paper and with some modifications.