SlideShare a Scribd company logo
1 of 33
Perceiver CPI: a nested cross-attention
network for compound-protein interaction
prediction
Minjae Chung
2023 / 08 / 18
Journal article review
2/30
Drug Discovery and Repurposing
• Discovery of new drugs
• Repurposing of existing drugs
• Identification of novel interacting
proteins for approved drugs
3/30
Compound-Protein Interaction
(CPI)
Compound-Protein Interaction (CPI)
4/30
Wet lab Virtual Screening Conventional Machine Learning Deep Learning
• Extremely costly
• Time-consuming
• Lack of 3D
structure
• Random Forest
• Support Vector Machine
• Over-simplification
• Increase in data
• High capacity
computing
machines
Drug = Compound = Ligand = Atoms and bonds
• Extended-connectivity fingerprint (ECFP): Morgan/Circular fingerprint
• Graph representation
• Simplified molecular-input line-entry system (SMILES)
5/30
Drug = Compound = Ligand = Atoms and bonds
• Extended-connectivity fingerprint (ECFP): Morgan/Circular fingerprint
6/30
Drug = Compound = Ligand = Atoms and bonds
• Graph representation
7/30
Drug = Compound = Ligand = Atoms and bonds
• Graph representation
8/30
Drug = Compound = Ligand = Atoms and bonds
• Simplified molecular-input line-entry system (SMILES)
9/30
Melatonin
(C13H16N2O2)
Glucose (β-D-glucopyranose)
(C6H12O6)
CC(=O)NCCC1=CNc2c1cc(OC)cc2
CC(=O)NCCc1c[nH]c2ccc(OC)cc12
OC[C@@H](O1)[C@@H](O)[C
@H](O)[C@@H](O)[C@H](O)1
Protein = Target
• Raw Amino Acid (AA) Sequence
10/30
Binding affinity
11/30
Dissociation constant (𝑲𝒅) Inhibitor constant (𝑲𝒊) Half maximal inhibitory
concentration (IC50)
The dissociation constant (𝐾𝑑)
measures the tendency of a
species to break up into its
components.
The Inhibitor constant (𝐾𝑖) is
the concentration of the
inhibitor that is required in
order to decrease the maximal
rate of the reaction by half.
The half maximal inhibitory
concentration (IC50) is a
quantitative measure that
indicates how much of a
particular inhibitory substance
(e.g. drug) is needed to inhibit,
in vitro, a given biological
process or biological
component by 50%.
The higher the 𝐾𝑑 value, the
weaker the binding affinity. The smaller the 𝐾𝑖 value, the
greater the binding affinity.
The smaller the IC50 value, the
greater the binding affinity.
Baseline Model or State-of-the-art (SOTA) model
• Deep DTA
• DeepConv-DTI
• GraphDTA
• TransformerCPI
• HyperAttentionDTI
12/30
13/30
• Compound Input Format: SMILES
• Protein Input Format: Amino Acid Sequence
• 1-Dimensional Convolutional Neural Network (CNN)
• Fully-Connected (FC) layer
DeepDTA (Öztürk et al., 2018)
14/30
• Compound Input Format: Morgan/Circular Fingerprint
• Protein Input Format: Amino Acid Sequence
• 1-Dimensional Convolutional Neural Network (CNN)
• Fully-Connected (FC) layer
DeepConv-DTI (Lee et al., 2019)
15/30
GraphDTA (Nguyen et al., 2021)
• Compound Input Format: SMILES
• Protein Input Format: Amino Acid
Sequence
• 1-Dimensional Convolutional Neural
Network (CNN)
• RDkit
• Graph Neural Network (GNN)
• Fully-Connected (FC) layer
16/30
• Compound Input Format: SMILES
• Protein Input Format: Amino Acid Sequence
• 1-Dimensional Convolutional Neural Network (CNN)
• RDkit
• Graph Convolutional Network (GCN)
• Attention Mechanism
• Fully-Connected (FC) layer
TransformerCPI (Chen et al., 2020)
17/30
• Compound Input Format: SMILES
• Protein Input Format: Amino Acid Sequence
• 1-Dimensional Convolutional Neural Network (CNN)
• Attention Mechanism
• Fully-Connected (FC) layer
HyperAttentionDTI (Zhao et al., 2022)
Table of Baseline Models
Baseline Model Protein Input Compound Input Major Blocks
DeepDTA AA sequence SMILES 1DCNN, FC
DeepConv-DTI AA sequence Fingerprint 1DCNN, FC
GraphDTA AA sequence SMILES 1DCNN, GNN, FC
TransformerCPI AA sequence SMILES 1DCNN, Attention, GCN, FC
HyperAttentionDTI AA sequence SMILES 1DCNN, Attention, FC
18/30
Drawbacks
• 1. Molecular descriptor vectors or fingerprints contain useful chemical knowledge from the start,
the use of molecular fingerprints and molecular descriptors might lead to a better performance
than using complex graphs on small dataset. However, owing to the representation’s
simplification, models deploying them may underfit larger datasets.
• 2. GNNs must always learn a meaningful chemical space embedding from scratch. However,
because of the global pooling step, which is simply chosen as the sum or average of all atomic
features, over-smoothing and information loss are also crucial issues for GNNs.
• 3. Integration of the compound network’s and protein network’s representation is often
performed by a simple concatenation, which is practically unsuitable for revealing the
relationship between these molecules in practice.
19/30
Perceiver CPI
20/30
Statistics of the Benchmark Datasets
21/30
Statistic of G protein-coupled receptor (GPCR) dataset
22/30
Statistics of GPCR and diverse subsets from DUD-E database
23/30
Experimental procedure
• Novel pair (Davis, KIBA and Metz)
- Neither the training compound nor the training protein appeared in the test set
• Novel-hard pair (Davis)
- The testing interactions were highly selective for similarities less than 0.3 by comparing to training
interactions
• Novel compound (Davis)
- There were no intersections of compounds in the training set and compounds in the test set
24/30
Experimental procedure
• Novel protein (Davis)
- There were no intersections of proteins in the training set and proteins in the test set
• Cross-domain experiment (Davis and PDBbind)
- Trained the model with the Davis dataset and tested it with the PDBbind dataset
• Enrichment factor analysis (GPCR, GPCR subset (DUD-E dataset), Diverse subset
(DUD-E dataset))
- Trained the model with the GPCR dataset and tested it with subsets from the DUD-E dataset
25/30
Evaluation metric
• Mean squared error (MSE)
• Concordance index (CI)
• Enrichment factor (EF) score
• Boltzmann-enhanced discrimination of the receiver operating characteristic score (BEDROC)
26/30
Comparison of the models in terms of three settings from the Davis
dataset with 5-fold cross validation
27/30
• MSE (the lower, the better) and CI (the higher, the better)
Comparison of the models on novel-hard pair setting
28/30
• MSE (the lower, the better) and CI (the higher, the better)
Comparison of the models in novel pair task from on KIBA and Metz
datasets
29/30
• MSE (the lower, the better) and CI (the higher, the better)
Results of the cross-domain experiment (trained on Davis and tested on
PDBbind)
30/30
• MSE (the lower, the better) and CI (the higher, the better)
Enrichment factor analysis results for subsets in the DUD-E database
(UP:𝐸𝐹1%, DOWN: 𝐵𝐸𝐷𝑅𝑂𝐶𝑥=80.5)
31/30
• 𝐸𝐹1% (the higher, the better), 𝐵𝐸𝐷𝑅𝑂𝐶𝑥=80.5 (the higher, the better from 0 to 1)
Data-driven method Docking-based method
Future work
• Finding and extracting meaningful features from proteins remains a difficult but worthwhile task. For
example AlphaFold2 from DeepMind can be used to predict the 3D structure of proteins.
• The information taken from compounds can be still be cultivated more profitably, such as by using
META-Learning method to construct a better representation from small datasets.
• Utilizing information from 3D structures produced from SMILES is also a promising method because of
its high information capacity.
• Adopting the transfer learning method for individual neural networks to generate improved
representations from the beginning with the help of prior knowledge should also be considered.
• The interpretability of Perceiver CPI is limited by the dimensionality reduction of MLP from the hidden
state update process in the message-passing step and from the attention blocks. Addressing such
useful features would form a valuable part of future work.
32/30
Thank you for listening
33

More Related Content

Similar to Perceiver CPI.pptx

Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...SOYEON KIM
 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...CSCJournals
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataUC Davis
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptPerumalPitchandi
 
AN IMPROVED CTGAN FOR DATA PROCESSING METHOD OF IMBALANCED DISK FAILURE
AN IMPROVED CTGAN FOR DATA PROCESSING METHOD OF IMBALANCED DISK FAILUREAN IMPROVED CTGAN FOR DATA PROCESSING METHOD OF IMBALANCED DISK FAILURE
AN IMPROVED CTGAN FOR DATA PROCESSING METHOD OF IMBALANCED DISK FAILUREIJCI JOURNAL
 
Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...
Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...
Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...Maciej Przybyłek
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
 
Strong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional DataStrong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional Datasahirbhatnagar
 
ANN in System Biology
ANN in System Biology ANN in System Biology
ANN in System Biology Hajra Qayyum
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Developmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual VariationDevelopmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual VariationSaigeRutherford
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfSunghoon Joo
 
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...ijdms
 
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...ijdms
 
Impact of Classification Algorithms on Cardiotocography Dataset for Fetal Sta...
Impact of Classification Algorithms on Cardiotocography Dataset for Fetal Sta...Impact of Classification Algorithms on Cardiotocography Dataset for Fetal Sta...
Impact of Classification Algorithms on Cardiotocography Dataset for Fetal Sta...BRNSSPublicationHubI
 
Network meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximationsNetwork meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximationsBurak Kürsad Günhan
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 

Similar to Perceiver CPI.pptx (20)

Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.ppt
 
AN IMPROVED CTGAN FOR DATA PROCESSING METHOD OF IMBALANCED DISK FAILURE
AN IMPROVED CTGAN FOR DATA PROCESSING METHOD OF IMBALANCED DISK FAILUREAN IMPROVED CTGAN FOR DATA PROCESSING METHOD OF IMBALANCED DISK FAILURE
AN IMPROVED CTGAN FOR DATA PROCESSING METHOD OF IMBALANCED DISK FAILURE
 
Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...
Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...
Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
Strong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional DataStrong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional Data
 
ANN in System Biology
ANN in System Biology ANN in System Biology
ANN in System Biology
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Developmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual VariationDevelopmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual Variation
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
 
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...
 
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...
ACTIVE LEARNING ENTROPY SAMPLING BASED CLUSTERING OPTIMIZATION METHOD FOR ELE...
 
May workshop
May workshopMay workshop
May workshop
 
Impact of Classification Algorithms on Cardiotocography Dataset for Fetal Sta...
Impact of Classification Algorithms on Cardiotocography Dataset for Fetal Sta...Impact of Classification Algorithms on Cardiotocography Dataset for Fetal Sta...
Impact of Classification Algorithms on Cardiotocography Dataset for Fetal Sta...
 
Network meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximationsNetwork meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximations
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 

Recently uploaded

Polyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptxPolyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptxMuhammadRazzaq31
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfStart Project
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptxMuhammadRazzaq31
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfmarcuskenyatta275
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysBrahmesh Reddy B R
 
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed RahimoonVital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed RahimoonAftabAhmedRahimoon
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptxyoussefboujtat3
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Ansari Aashif Raza Mohd Imtiyaz
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyAreesha Ahmad
 
Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfhoangquan21999
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...yogeshlabana357357
 
MSC IV_Forensic medicine - Mechanical injuries.pdf
MSC IV_Forensic medicine - Mechanical injuries.pdfMSC IV_Forensic medicine - Mechanical injuries.pdf
MSC IV_Forensic medicine - Mechanical injuries.pdfSuchita Rawat
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...kevin8smith
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxPat (JS) Heslop-Harrison
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil RecordSangram Sahoo
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxArpitaMishra69
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneySérgio Sacani
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandRcvets
 
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxNanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxssusera4ec7b
 

Recently uploaded (20)

Polyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptxPolyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptx
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdf
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree days
 
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed RahimoonVital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptx
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) Enzymology
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdf
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
MSC IV_Forensic medicine - Mechanical injuries.pdf
MSC IV_Forensic medicine - Mechanical injuries.pdfMSC IV_Forensic medicine - Mechanical injuries.pdf
MSC IV_Forensic medicine - Mechanical injuries.pdf
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil Record
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxNanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
 

Perceiver CPI.pptx

  • 1. Perceiver CPI: a nested cross-attention network for compound-protein interaction prediction Minjae Chung 2023 / 08 / 18
  • 3. Drug Discovery and Repurposing • Discovery of new drugs • Repurposing of existing drugs • Identification of novel interacting proteins for approved drugs 3/30 Compound-Protein Interaction (CPI)
  • 4. Compound-Protein Interaction (CPI) 4/30 Wet lab Virtual Screening Conventional Machine Learning Deep Learning • Extremely costly • Time-consuming • Lack of 3D structure • Random Forest • Support Vector Machine • Over-simplification • Increase in data • High capacity computing machines
  • 5. Drug = Compound = Ligand = Atoms and bonds • Extended-connectivity fingerprint (ECFP): Morgan/Circular fingerprint • Graph representation • Simplified molecular-input line-entry system (SMILES) 5/30
  • 6. Drug = Compound = Ligand = Atoms and bonds • Extended-connectivity fingerprint (ECFP): Morgan/Circular fingerprint 6/30
  • 7. Drug = Compound = Ligand = Atoms and bonds • Graph representation 7/30
  • 8. Drug = Compound = Ligand = Atoms and bonds • Graph representation 8/30
  • 9. Drug = Compound = Ligand = Atoms and bonds • Simplified molecular-input line-entry system (SMILES) 9/30 Melatonin (C13H16N2O2) Glucose (β-D-glucopyranose) (C6H12O6) CC(=O)NCCC1=CNc2c1cc(OC)cc2 CC(=O)NCCc1c[nH]c2ccc(OC)cc12 OC[C@@H](O1)[C@@H](O)[C @H](O)[C@@H](O)[C@H](O)1
  • 10. Protein = Target • Raw Amino Acid (AA) Sequence 10/30
  • 11. Binding affinity 11/30 Dissociation constant (𝑲𝒅) Inhibitor constant (𝑲𝒊) Half maximal inhibitory concentration (IC50) The dissociation constant (𝐾𝑑) measures the tendency of a species to break up into its components. The Inhibitor constant (𝐾𝑖) is the concentration of the inhibitor that is required in order to decrease the maximal rate of the reaction by half. The half maximal inhibitory concentration (IC50) is a quantitative measure that indicates how much of a particular inhibitory substance (e.g. drug) is needed to inhibit, in vitro, a given biological process or biological component by 50%. The higher the 𝐾𝑑 value, the weaker the binding affinity. The smaller the 𝐾𝑖 value, the greater the binding affinity. The smaller the IC50 value, the greater the binding affinity.
  • 12. Baseline Model or State-of-the-art (SOTA) model • Deep DTA • DeepConv-DTI • GraphDTA • TransformerCPI • HyperAttentionDTI 12/30
  • 13. 13/30 • Compound Input Format: SMILES • Protein Input Format: Amino Acid Sequence • 1-Dimensional Convolutional Neural Network (CNN) • Fully-Connected (FC) layer DeepDTA (Öztürk et al., 2018)
  • 14. 14/30 • Compound Input Format: Morgan/Circular Fingerprint • Protein Input Format: Amino Acid Sequence • 1-Dimensional Convolutional Neural Network (CNN) • Fully-Connected (FC) layer DeepConv-DTI (Lee et al., 2019)
  • 15. 15/30 GraphDTA (Nguyen et al., 2021) • Compound Input Format: SMILES • Protein Input Format: Amino Acid Sequence • 1-Dimensional Convolutional Neural Network (CNN) • RDkit • Graph Neural Network (GNN) • Fully-Connected (FC) layer
  • 16. 16/30 • Compound Input Format: SMILES • Protein Input Format: Amino Acid Sequence • 1-Dimensional Convolutional Neural Network (CNN) • RDkit • Graph Convolutional Network (GCN) • Attention Mechanism • Fully-Connected (FC) layer TransformerCPI (Chen et al., 2020)
  • 17. 17/30 • Compound Input Format: SMILES • Protein Input Format: Amino Acid Sequence • 1-Dimensional Convolutional Neural Network (CNN) • Attention Mechanism • Fully-Connected (FC) layer HyperAttentionDTI (Zhao et al., 2022)
  • 18. Table of Baseline Models Baseline Model Protein Input Compound Input Major Blocks DeepDTA AA sequence SMILES 1DCNN, FC DeepConv-DTI AA sequence Fingerprint 1DCNN, FC GraphDTA AA sequence SMILES 1DCNN, GNN, FC TransformerCPI AA sequence SMILES 1DCNN, Attention, GCN, FC HyperAttentionDTI AA sequence SMILES 1DCNN, Attention, FC 18/30
  • 19. Drawbacks • 1. Molecular descriptor vectors or fingerprints contain useful chemical knowledge from the start, the use of molecular fingerprints and molecular descriptors might lead to a better performance than using complex graphs on small dataset. However, owing to the representation’s simplification, models deploying them may underfit larger datasets. • 2. GNNs must always learn a meaningful chemical space embedding from scratch. However, because of the global pooling step, which is simply chosen as the sum or average of all atomic features, over-smoothing and information loss are also crucial issues for GNNs. • 3. Integration of the compound network’s and protein network’s representation is often performed by a simple concatenation, which is practically unsuitable for revealing the relationship between these molecules in practice. 19/30
  • 21. Statistics of the Benchmark Datasets 21/30
  • 22. Statistic of G protein-coupled receptor (GPCR) dataset 22/30
  • 23. Statistics of GPCR and diverse subsets from DUD-E database 23/30
  • 24. Experimental procedure • Novel pair (Davis, KIBA and Metz) - Neither the training compound nor the training protein appeared in the test set • Novel-hard pair (Davis) - The testing interactions were highly selective for similarities less than 0.3 by comparing to training interactions • Novel compound (Davis) - There were no intersections of compounds in the training set and compounds in the test set 24/30
  • 25. Experimental procedure • Novel protein (Davis) - There were no intersections of proteins in the training set and proteins in the test set • Cross-domain experiment (Davis and PDBbind) - Trained the model with the Davis dataset and tested it with the PDBbind dataset • Enrichment factor analysis (GPCR, GPCR subset (DUD-E dataset), Diverse subset (DUD-E dataset)) - Trained the model with the GPCR dataset and tested it with subsets from the DUD-E dataset 25/30
  • 26. Evaluation metric • Mean squared error (MSE) • Concordance index (CI) • Enrichment factor (EF) score • Boltzmann-enhanced discrimination of the receiver operating characteristic score (BEDROC) 26/30
  • 27. Comparison of the models in terms of three settings from the Davis dataset with 5-fold cross validation 27/30 • MSE (the lower, the better) and CI (the higher, the better)
  • 28. Comparison of the models on novel-hard pair setting 28/30 • MSE (the lower, the better) and CI (the higher, the better)
  • 29. Comparison of the models in novel pair task from on KIBA and Metz datasets 29/30 • MSE (the lower, the better) and CI (the higher, the better)
  • 30. Results of the cross-domain experiment (trained on Davis and tested on PDBbind) 30/30 • MSE (the lower, the better) and CI (the higher, the better)
  • 31. Enrichment factor analysis results for subsets in the DUD-E database (UP:𝐸𝐹1%, DOWN: 𝐵𝐸𝐷𝑅𝑂𝐶𝑥=80.5) 31/30 • 𝐸𝐹1% (the higher, the better), 𝐵𝐸𝐷𝑅𝑂𝐶𝑥=80.5 (the higher, the better from 0 to 1) Data-driven method Docking-based method
  • 32. Future work • Finding and extracting meaningful features from proteins remains a difficult but worthwhile task. For example AlphaFold2 from DeepMind can be used to predict the 3D structure of proteins. • The information taken from compounds can be still be cultivated more profitably, such as by using META-Learning method to construct a better representation from small datasets. • Utilizing information from 3D structures produced from SMILES is also a promising method because of its high information capacity. • Adopting the transfer learning method for individual neural networks to generate improved representations from the beginning with the help of prior knowledge should also be considered. • The interpretability of Perceiver CPI is limited by the dimensionality reduction of MLP from the hidden state update process in the message-passing step and from the attention blocks. Addressing such useful features would form a valuable part of future work. 32/30
  • 33. Thank you for listening 33

Editor's Notes

  1. Extract local residue and atomic features to predict binding affinity