upload.pdf

1
INTRODUCTION
●
Protein-protein interactions (PPIs) participate
in all important biological processes in living
organisms, such as catalyzing metabolic
reactions, DNA replication, DNA transcription,
responding to stimuli and transporting
molecules from one location to another.
●
To reveal the function mechanisms in cells, it is
important to identify PPIs that take place in the
living organism.

2
What is Protein-Protein Interaction ?
Protein–protein
interactions (PPIs) are
physical contacts of
high specificity
established between
two or more protein
molecules as a result of
biochemical events .
https://media.geeksforgeeks.org/wp-content/uploads/20220704095158/C11.png

3
Why PPI prediction?
PPI prediction
models can be a
valuable tool for
screening protein
pairs while
developing new
drugs for
targeted protein
degradation.
https://sj.jst.go.jp/news/202109/n0924-03k.html

4
PPI prediction Techniques:
Performed in a controlled environment
outside a living organism.
Ex: X-ray crystallography...
Performed on the whole living
organism itself.
Ex: yeast two-hybrid (Y2H, Y3H)...
performed on a computer (or) via
computer simulation.
Ex: sequence-based approaches...
In Vitro
Silico
In Vivo

5
Computational Biology:
Computational
biology and
bioinformatics
have the
potential not only
to speed up the
drug discovery
process, thus
reducing the
costs, but also to
change the way
drugs are
designed.
https://www.cell.com/cms/attachment/6dce762f-d8a4-4284-baec-bc0bfb5d9666/gr1_lrg.jpg

6
Protein input for computational Biology:
The starting point (input) of protein structure
prediction or protein interaction prediction is the one-
dimensional amino acid sequence of target protein.
https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Histone_Alignment.png/595px-Histone_Alignment.png

9
Stage-I Result:
Precision, F-measure and MCC obtained by model
are 0.303, 0.397 and 0.206, respectively.

10
Problem statement:
The performance of model for protein-protein interaction
site detection by combining local and global features can
be improved by incorporating additional features or using
ensemble methods, or exploring different model
architectures such as graphical convolutional networks.
Illustration
of a
Sars-cov2
virus particle
structure
https://images.novusbio.com/design/dw-nb-sars-cov-2-viral-particle-v3.png

11
Gantt Chart:
Implementaton Date
Learning and literature review 5th
Feb 2023 - 10th
Mar 2023
Sampling the pre-processed data 12th
Mar 2023 - 15th
Mar 2023
Modifying the pre-processed data 15th
Mar 2023 - 20th
Mar 2023
Modifying the architecture 1st
Apr 2023 - 15th
Apr -2023
Training the model 16th
Apr 2023 - 20th
Apr 2023
Evaluating performance metrics 22nd
Apr 2023 - 30th
Apr 2023

12
LITERATURE REVIEW
[1] Graph Neural Network for Protein–Protein Interaction
Prediction: A Comparative Study
A comparative study for protein–protein interaction prediction using
multiple graph neural network-based models, including the GCN, GAT
and other GNN types. One of the future works of this study will be
combining global information to perform the link prediction on
graphs.
[2]DeepRank-GNN: a graph neural network framework to
learn patterns in protein–protein interfaces
We present a single GNN architecture in this article that can be
reused as such or derived and replaced. The DeepRank framework
offers many perspectives of extension such as the application to
single proteins (e.g. for genetic variant pathogenicity prediction), to
larger multimeric states (e.g. for larger complex(quality prediction).

13
Methodology:
Graph Convolutional network :

14
Graph Convolutional network :
A Graph Convolutional Network, or GCN, is one
of the flavour of GNN used for learning on
graph-structured data. It is based on an efficient
variant of convolutional neural networks which
operate directly on graphs.
The graphs are sequentially passed to
consecutive convolution/activation/pooling
layers. The final graph representations are
flattened using the mean value of each feature
and merged before applying fully connected
layers.

15
Implementation:
To predict PPI consists of three modules: protein
graph construction, feature extraction,
Concatenation and classifier to predict
interactions between them.
Graph representation of proteins:
we will construct the molecular graph of
proteins, also known as amino-acids/residues
contact network, using the PDB files. The PDB
file is a text file containing structural
information such as 3D atomic coordinates.

16
Graph Construction:
Let G(V, E) be a graph
representing the
proteins, where each
node ( v ∈ V ) is the
residue and interaction
between the residues is
described by an edge
( e ∈ E ). Two residues
are connected if they
have any pair of atoms
(one from each residue)
having the Euclidean
distance lessthan a
threshold distance.
Amino acids

17
Graph Construction:
Node features:
The first step is to get each protein’s PDB sequence
and molecular graph structure using a python script.
Then the residue-level features are merged with a
molecular graph, creating a final protein graph.
PSSM
DSSP
RPS

18
Features:
Name of
feature
Full name Description
No of
parameters
Type
Secondary
structure
DSSP
One-hot
encoded
9 Node feature
Evolutionary
Inofrmation
Position specific
scoring matrix
PSSM 20 Node feature
Raw protein
sequence
Raw protein
sequence
One-hot
encoded
20 Node feature
distance
Normalised
distance
(optional) 1 Edge feature

19
Feature Extraction:
The GCN is used to learn features from data
represented as graphs. The connection between
every pair of nodes is encoded in the adjacency
matrix, A ∈ RL,L . The matrix, X ∈ RL,F, contains
the residue-level features for all nodes of the
given graph. Each layer of GCN takes the
adjacency matrix (A) and node embeddings from
the previous layer as input and outputs the
node-level embeddings for the next layer. After
the GCN layer, each residue feature vector is
updated as a weighted sum of the features of
neighboring nodes in the graph, including
residue’s own feature.

20
Classification:
To ensure the fixed-size representations of all
proteins, there will be a global pooling layer
after the GNN layer.
After the pooling layer, we add a fully connected
(FC) layer with a LeakyReLU activation function
to get the final representation of a protein from
its pooled representation. On the other hand, we
get the learned local feature vector.
The concatenated feature vectors of protein are
then fed to the classifier having two FC layers
and an output layer with sigmoid activation.
Note: The LeakyReLU is used as an activation function to add non-linearity to the
output of a fully connected layer.

21
Architecture:
Local
feature
Extraction
PDB
file
Features
sequence
Protein
featured graph
Global
Feature
vector
Pooling layer
GCN layer
PSSM
RPS
DSSP
Sliding
window
Local
feature
vector
2 Fully
Connected
layers
Predic
tion
Global
feature
Extraction
Classification

22
Dataset:
Dset_186 has been built
from the PDB database
and consists of 186
protein sequences.
Dset_72 has 72 protein
sequences and
PDBset_164 consists of
164 protein sequences.
Thus, we have 422
different annotated
protein sequences.

23
Performance Metrics:
Accuracy = Correct Predictions / Total Predictions
Precision = TruePositive / (TruePositive +
FalsePositive)
Recall = TruePositive / (TruePositive +
FalseNegative)
F-Measure = (2 * Precision * Recall) / (Precision +
Recall)
MCC = TP*TN-FP*FN /
((TP+FP)*(TP+FN)*(TN+FP)*(TN+FN))^0.5

24
REFERENCES
[1]Zhou, J. et al. Graph neural networks: A review of methods and applications. AI Open 1, 57-81
(2020).
[2]Zhou, H,;Wang, W.; Jin, J,; zheng, z,; Zhou,B. Graph Neural Network for Protein-Protein
Interaction Prediction: A comparative Study. Molecules 2022, 27, 6135.
[3]Xiaotian Hu, Cong Feng, Tianyi Ling and Ming Chena ,Deep learning frameworks for protein–
protein interaction prediction. Comput Struct Biotechnol J. 2022; 20: 3223–3233. doi:
10.1016/j.csbj.2022.06.025.
[4]Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. arXiv
preprint arXiv:1609.02907 (2016).
[5]Hashemifar, S., Neyshabur, B., Khan, A. A. & Xu, J. Predicting protein–protein interactions
through sequence-based deep learning. Bioinformatics 34, i802–i810 (2018)
[6]Yang F., Fan K., Song D., Lin H. Graph-based prediction of Protein-protein interactions with
attributed signed graph embedding. BMC Bioinf. 2020;21:1–16. doi: 10.1186/s12859-020-03646-
8.
[7]Uversky V.N., Oldfield C.J., Dunker A.K. Intrinsically disordered proteins in human diseases:
Introducing the D 2 concept. Annu Rev Biophys. 2008;37:215–246. doi:
10.1146/annurev.biophys.37.032807.125924.

25
REFERENCES
[8]Du X., Sun S., Hu C., Yao Y
., Yan Y
., Zhang Y
. DeepPPI: Boosting Prediction of Protein-Protein
Interactions with Deep Neural Networks. J Chem Inf Model. 2017;57:1499–1510. doi:
10.1021/acs.jcim.7b00028.
[9]Licata L., Briganti L., Peluso D., Perfetto L., Iannuccelli M., Galeota E., et al. MINT, the molecular
interaction database: 2012 Update. Nucleic Acids Res. 2012;40:D572–D574. doi:
10.1093/nar/gkr930.
[10]Bock J.R., Gough D.A. Predicting protein–protein interactions from primary structure.
Bioinformatics. 2001;17:455–460.
[11]Ding Z., Kihara D. Computational methods for predicting protein-protein interactions using
various protein features. Curr Protoc Protein Sci. 2018;93:e62.
[12]Rabbani G., Baig M.H., Ahmad K., Choi I. Protein-protein Interactions and their Role in Various
Diseases and their Prediction Techniques. Curr Protein Pept Sci. 2017;19:948–957. doi:
10.2174/1389203718666170828122927.
[13]Shen, J. et al. Predicting protein–protein interactions based only on sequences information.
Proc. Natl. Acad. Sci. 104, 4337–4341 (2007).
[14]Uzair, M. & Jamil, N. Effects of hidden layers on the efficiency of neural networks. In 2020 IEEE
23rd International Multitopic Conference (INMIC), 1–6 (IEEE, 2020).

upload.pdf

Recommended

Recommended

More Related Content

Similar to upload.pdf

Similar to upload.pdf (20)

Recently uploaded

Recently uploaded (20)

upload.pdf