Perceiver CPI.pptx

Perceiver CPI: a nested cross-attention
network for compound-protein interaction
prediction
Minjae Chung
2023 / 08 / 18

Drug Discovery and Repurposing
• Discovery of new drugs
• Repurposing of existing drugs
• Identification of novel interacting
proteins for approved drugs
3/30
Compound-Protein Interaction
(CPI)

Compound-Protein Interaction (CPI)
4/30
Wet lab Virtual Screening Conventional Machine Learning Deep Learning
• Extremely costly
• Time-consuming
• Lack of 3D
structure
• Random Forest
• Support Vector Machine
• Over-simplification
• Increase in data
• High capacity
computing
machines

Drug = Compound = Ligand = Atoms and bonds
• Extended-connectivity fingerprint (ECFP): Morgan/Circular fingerprint
• Graph representation
• Simplified molecular-input line-entry system (SMILES)
5/30

• Extended-connectivity fingerprint (ECFP): Morgan/Circular fingerprint
6/30

7/30

8/30

• Simplified molecular-input line-entry system (SMILES)
9/30
Melatonin
(C13H16N2O2)
Glucose (β-D-glucopyranose)
(C6H12O6)
CC(=O)NCCC1=CNc2c1cc(OC)cc2
CC(=O)NCCc1c[nH]c2ccc(OC)cc12
OC[C@@H](O1)[C@@H](O)[C
@H](O)[C@@H](O)[C@H](O)1

Protein = Target
• Raw Amino Acid (AA) Sequence
10/30

Binding affinity
11/30
Dissociation constant (𝑲𝒅) Inhibitor constant (𝑲𝒊) Half maximal inhibitory
concentration (IC50)
The dissociation constant (𝐾𝑑)
measures the tendency of a
species to break up into its
components.
The Inhibitor constant (𝐾𝑖) is
the concentration of the
inhibitor that is required in
order to decrease the maximal
rate of the reaction by half.
The half maximal inhibitory
concentration (IC50) is a
quantitative measure that
indicates how much of a
particular inhibitory substance
(e.g. drug) is needed to inhibit,
in vitro, a given biological
process or biological
component by 50%.
The higher the 𝐾𝑑 value, the
weaker the binding affinity. The smaller the 𝐾𝑖 value, the
greater the binding affinity.
The smaller the IC50 value, the
greater the binding affinity.

Baseline Model or State-of-the-art (SOTA) model
• Deep DTA
• DeepConv-DTI
• GraphDTA
• TransformerCPI
• HyperAttentionDTI
12/30

13/30
• Compound Input Format: SMILES
• Protein Input Format: Amino Acid Sequence
• 1-Dimensional Convolutional Neural Network (CNN)
• Fully-Connected (FC) layer
DeepDTA (Öztürk et al., 2018)

14/30
• Compound Input Format: Morgan/Circular Fingerprint
DeepConv-DTI (Lee et al., 2019)

15/30
GraphDTA (Nguyen et al., 2021)
• Protein Input Format: Amino Acid
Sequence
• 1-Dimensional Convolutional Neural
Network (CNN)
• RDkit
• Graph Neural Network (GNN)

16/30
• RDkit
• Graph Convolutional Network (GCN)
• Attention Mechanism
TransformerCPI (Chen et al., 2020)

17/30
• Attention Mechanism
HyperAttentionDTI (Zhao et al., 2022)

Table of Baseline Models
Baseline Model Protein Input Compound Input Major Blocks
DeepDTA AA sequence SMILES 1DCNN, FC
DeepConv-DTI AA sequence Fingerprint 1DCNN, FC
GraphDTA AA sequence SMILES 1DCNN, GNN, FC
TransformerCPI AA sequence SMILES 1DCNN, Attention, GCN, FC
HyperAttentionDTI AA sequence SMILES 1DCNN, Attention, FC
18/30

Drawbacks
• 1. Molecular descriptor vectors or fingerprints contain useful chemical knowledge from the start,
the use of molecular fingerprints and molecular descriptors might lead to a better performance
than using complex graphs on small dataset. However, owing to the representation’s
simplification, models deploying them may underfit larger datasets.
• 2. GNNs must always learn a meaningful chemical space embedding from scratch. However,
because of the global pooling step, which is simply chosen as the sum or average of all atomic
features, over-smoothing and information loss are also crucial issues for GNNs.
• 3. Integration of the compound network’s and protein network’s representation is often
performed by a simple concatenation, which is practically unsuitable for revealing the
relationship between these molecules in practice.
19/30

Statistics of the Benchmark Datasets
21/30

Statistic of G protein-coupled receptor (GPCR) dataset
22/30

Statistics of GPCR and diverse subsets from DUD-E database
23/30

Experimental procedure
• Novel pair (Davis, KIBA and Metz)
- Neither the training compound nor the training protein appeared in the test set
• Novel-hard pair (Davis)
- The testing interactions were highly selective for similarities less than 0.3 by comparing to training
interactions
• Novel compound (Davis)
- There were no intersections of compounds in the training set and compounds in the test set
24/30

Experimental procedure
• Novel protein (Davis)
- There were no intersections of proteins in the training set and proteins in the test set
• Cross-domain experiment (Davis and PDBbind)
- Trained the model with the Davis dataset and tested it with the PDBbind dataset
• Enrichment factor analysis (GPCR, GPCR subset (DUD-E dataset), Diverse subset
(DUD-E dataset))
- Trained the model with the GPCR dataset and tested it with subsets from the DUD-E dataset
25/30

Evaluation metric
• Mean squared error (MSE)
• Concordance index (CI)
• Enrichment factor (EF) score
• Boltzmann-enhanced discrimination of the receiver operating characteristic score (BEDROC)
26/30

Comparison of the models in terms of three settings from the Davis
dataset with 5-fold cross validation
27/30
• MSE (the lower, the better) and CI (the higher, the better)

Comparison of the models on novel-hard pair setting
28/30

Comparison of the models in novel pair task from on KIBA and Metz
datasets
29/30

Results of the cross-domain experiment (trained on Davis and tested on
PDBbind)
30/30

Enrichment factor analysis results for subsets in the DUD-E database
(UP:𝐸𝐹1%, DOWN: 𝐵𝐸𝐷𝑅𝑂𝐶𝑥=80.5)
31/30
• 𝐸𝐹1% (the higher, the better), 𝐵𝐸𝐷𝑅𝑂𝐶𝑥=80.5 (the higher, the better from 0 to 1)
Data-driven method Docking-based method

Future work
• Finding and extracting meaningful features from proteins remains a difficult but worthwhile task. For
example AlphaFold2 from DeepMind can be used to predict the 3D structure of proteins.
• The information taken from compounds can be still be cultivated more profitably, such as by using
META-Learning method to construct a better representation from small datasets.
• Utilizing information from 3D structures produced from SMILES is also a promising method because of
its high information capacity.
• Adopting the transfer learning method for individual neural networks to generate improved
representations from the beginning with the help of prior knowledge should also be considered.
• The interpretability of Perceiver CPI is limited by the dimensionality reduction of MLP from the hidden
state update process in the message-passing step and from the attention blocks. Addressing such
useful features would form a valuable part of future work.
32/30

Perceiver CPI.pptx

Recommended

Recommended

More Related Content

Similar to Perceiver CPI.pptx

Similar to Perceiver CPI.pptx (20)

Recently uploaded

Recently uploaded (20)

Perceiver CPI.pptx

Editor's Notes