ADMET.pptx

Optimizing Drug Discovery using
ADMET
Translating Data into Actionable Insights and Decisions using ML
Santu Chall
ME, MCA

C10H9NO3
SMILES : SMILES ( implified olecular nput ine ntry ystem) is a concise notation
for representing chemical structures in a line of text.
For example : OC(=O)CN1C(=O)Cc2c1cccc2
Molecular Representation
0D/1D 2D 3D 4D
Descriptors : Molecular descriptors are quantitative values that characterize chemical
structures, aiding in structure-property relationships and computational
chemistry analysis. For example: MW, HBA, HBD, no_of_atom etc.
Software : There are various software that can calculate and analyze chemical properties.
Such as RDKit, ChemAxon, Dragon, PaDEL, MOE etc etc

Molecular Fingerprint
• Binary Representation of Molecule for fast,
objective and compact
• “keyed” fingerprint indicates the present or
absent of a structural features
• Task search and comparison, prediction and
clustering
• Types of fingerprint
• Selecting the right Fingerprint

ADMET
bsorption
istribution
etabolism
xcretion/ limination
oxicity

Data Selection
• Online Database : ChEMBL, PubChem, ChemDB,
ChemSpider, DrugBank etc
• Scientific Reputed Journal : Journal of Chemical
Information and Modeling, Journal of
Cheminformatics, Journal of Computer-Aided
Molecular Design etc etc
• Data Retrival from the Liturature : PubMed,
ScienceDirect, Google Scholar, ACS Publications,
Open-access journals etc etc

Data Division
• Random Division (train_test_split(X, y, test_size=0.3, random_state=42)
• Kennord-Stone Division : Selecting the two data points that are
farthest apart in the feature space.
• Activity Based Division : Selecting specific activity or property in
predicting or modeling. Represent the full range of activity levels in the dataset.
• Euclidean Distance Based: Compute the Euclidean distance
between all pairs of data in a multidimensional space. (euclidean_distances =
np.linalg.norm(X[:, np.newaxis] - X, axis=2)
• K-Medois based: Clustering algorithm that divides data into groups.
(clusterer = KMedoids(n_clusters=K, random_state=0)

Feature Selection
• Genetic Algorithm : GA’s feature selection is the process of choosing a
subset of the most relevant features (variables) from the original feature set to
improve model performance and reduce computational complexity.
ga = GeneticAlgorithm(num_features=X.shape[1], fitness_func=fitness_function)
• Lasso Feature Selection: Lasso (Least Absolute Shrinkage and
Selection Operator) adding a penalty term to the linear regression or logistic
regression cost function, which encourages the model to set the coefficients of some
features to zero, effectively removing them from the model.
lasso = sklearn.linear_model.Lasso(alpha=1.0)
• Stepwise Selection: select the most relevant features (Forward
Selection, Backward Elimination, Bidirectional Selection, Stopping Criteria)
rfe = sklearn.feature_selection.RFE(LogisticRegression(), 10) # Select the top 10 features

Learning Algorithm
• Supervised
– Regression - build predictive models for tasks where the goal is to predict a
continuous numeric value. Example : Random Forest Regression(RF), Support
Vector Regression(SVR),Decision Tree Regression,K-Nearest Neighbors Regression,
Neural Networks for Regression etc etc
– Classification - build models that categorize data into predefined classes
or categories. Example: Logistic Regression, Decision Trees, Support Vector
Machines (SVM), K-Nearest Neighbors (KNN)
• Unsupervised
– Clustering: used to group data into clusters based on inherent patterns or
similarities in the data. Example: K-Means Clustering, X-Means, Gaussian Mixture
Models (GMM)
– Dimensionally Reduction: used to reduce the number of features or
dimensions in a dataset while preserving important information and
patterns.Example: Principal Component Analysis (PCA), Independent Component
Analysis (ICA), Autoencoders

Absorption
Property Definition Used Model and Method
%Abs Absorption Rate Percentage
through the (Intestinal)
Barrier
RF and MACCS Key
%HIA The absorbed percentage
through the human GI tract.
RF and MACCS Key
Caco2 Artificial membrane models
predict absorption with
paracellular and active
transport.
RF and Descriptor
Pgp Inhibiting P-glycoprotein (P-
gp) function to enhance
drug absorption.
SVM and ECFP4
Amount absorbed Compound absorption
weight per kilogram of body
weight.
RF and Descriptor

Distribution
BBB partitioning Brain-blood barrier
partitioning: Brain vs. blood
concentration ratio
(serum/plasma).
SVM and ECFP2
%PPB Protein binding percentage
of the compound in plasma.
RF and Descriptor
Vd Volume of distribution
within the body
RF and Descriptor
Fbt Fraction bound in tissues SVM and Descriptor
Ktb Tissue-blood partition
coefficient measure the
distribution of a substance
between a specific tissue
and the blood.
SVM and PubChem FP

Metabolism
Primary enzyme Predominant enzyme
accountable for metabolism
(CYP P450 1A2, 2C9, 2C19,
2D6, 3A4 etc )
1A2 – SVM and ECFP4
2C9 – RF and ECFP2
2C19 – SVM and ECFP2
2D6 – RF and ECFP4
3A4 – SVM and ECFP4
% metabolised Overall percentage of
metabolism
SVM and MACCS
% excreted The proportion of the
compound excreted
unchanged in urine.
RF and Descriptor
Vmax Maximum velocity of
metabolic reaction
SVM and MACCS
Cliv Clearance rate in liver RF and Descriptor

Excretion/Elemination
Clr Renal clearance RF and Descriptor
Cltot Total clearance across all
routes
SVM and MACCS key
AUC Area under concentration
time curve
RF and Descriptor
t1⁄2 Half-life: Time for compound
concentration to reduce by
50%
RF and Descriptor
Tmax Time to achieve peak
concentration
RF and Descriptor

Toxicity
hERG hERG encodes a potassium
ion channel potentially
causing adverse effects on
the heart's electrical activity.
RF and Descriptor and
MACCS
LD50 acute toxicity of a substance,
meaning its potential to
cause harm within a short
period after exposure.
RF and Descriptor
DILI ingestion of a drug or
medication leads to damage,
injury, or dysfunction of the
liver
RF and MACCS key
Hepatotoxicity harmful effects or damage
to the liver caused by drugs
RF and Descriptor
SkinSen skin's response to certain
allergens
RF and MACCS

Model Analysis and Performance
• Predictive Variance : measures prediction variability; high variance
means less precision. Calculation of MAPE (Mean Absolute Percentage Error), MAE
(Mean Absolute Error).
• Model Quality: refers to the effectiveness, reliability, and performance of a
machine learning. Calculation of confusion matrix (Accuracy, Precision, Recall
(Sensitivity), Specificity, F1 Score ).
• Error Analysis :investigate and analyze model errors to identify patterns or
areas where the model may need improvement, then fine-tune the model or collect
more relevant data. Check response times and throughput to ensure the model can
handle the required workload without causing delays
• Model Versioning: keep track of different model versions to understand
which versions are performing best and to facilitate easy rollback in case of issues.
• Scheduled Retraining: set up a retraining schedule to periodically
update the model with new data. This is essential to adapt to changing patterns in the
data.

Model Deployment
• Source Code Management (Git)
• CI/CD ( Jenkins )
• Container (Docker)
• Orchestration (Ansible)
• Log Analysis ( ELK, Grafna)

Model Monitoring
• Data Processing Issue: Data Quality Checks, Data Consistency, Input
Validation, Pipeline Monitoring, Logging and Alerting
• Data Scheme Changes: Validate Incoming data, Automated Alerts,
Data Transformation Monitoring.
• Data Loss at the Source: Recovery Mechanisms, Data Ingestion
Monitoring, Logging and Auditing
• Anomaly Detection : unusual behavior in model outputs or predictions
that may indicate a problem, such as a sudden increase in errors
• Model Documentation : Data Sources, Testing and Validation,
Model Performance

Current Working
• Generate molecule (or similar molecule)
with(almost) desired properties using generative
AI(RNN, GNN etc)
• Checking fit score for compatibility
• Working on automated energy minimisation of
structure.
• Working on DEL, EGFR VIII data analysis
• Working on various different biological data
analysis(NGS, PacBio) project.
Github: https://github.com/santuchal/ADMET
Medium: https://medium.com/@santuchal/admet-an-essential-component-in-drug-
discovery-and-development-f503a5aae5dd
Streamlit: https://hav8whwegtyvgwjixnhxqw.streamlit.app/

ADMET.pptx

More Related Content

What's hot

Similar to ADMET.pptx

Recently uploaded

ADMET.pptx