This document describes a novel approach called DEMOO-SLA for predicting protein-protein interactions. DEMOO-SLA uses differential evolution multi-objective optimization combined with stochastic learning automata. It extracts amino acid features from protein sequences using BLOSUM62 and reduces the feature dimensions. It then constructs a protein-protein interaction network and predicts interactions based on neighboring topology, functional characteristics, and accessible solvent area reduction. The paper compares DEMOO-SLA to existing methods on DIP and SCOP datasets, finding it achieves a higher symmetric substructure score and lower edge correctness, indicating better performance.
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...sipij
A new computational approach for protein sequence similarity analysis and functional classification which is fast and easier compared to the conventional method is described. This technique uses Discrete Wavelet Transform decomposition followed by sequence correlation analysis. The technique can also be used for identifying the functional class of a newly obtained protein sequence. The classification was done using a sample set of 270 protein sequences obtained from organisms of diverse origins and functional classes, which gave a classification accuracy of 94.81%. Accuracy and reliability of the technique is verified by comparing the results with that obtained from NCBI.
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET Journal
This document proposes a machine learning framework for predicting drug-target interactions. It extracts features from drug molecules using FP2 fingerprints and from protein sequences using PsePSSM. It then uses Lasso dimensionality reduction to select important features before balancing the data using SMOTE. Finally, it trains an SVM classifier on the processed data to predict drug-target interactions. The framework achieved better performance than traditional methods by leveraging machine learning techniques for efficient and effective prediction of interactions without costly experiments.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Genetic disease identification and medical diagnosis using MF, CC, BF, MicroR...KarthigaRavichandran3
The document describes a proposed system to integrate genomic and proteomic data to identify disease-associated genes. It uses cross ontology to classify gene functions and calculate protein values. Regulatory modules between transcription factors, genes, and miRNAs are identified using an integration technique. A multiplicative update algorithm solves the optimization function between these regulatory modules. Finally, a Bayesian rose tree represents the identified genetic diseases, their symptoms, and cures.
This document summarizes a framework for automatically extracting human protein-protein interaction data from biomedical literature. It describes benchmarking interaction datasets based on shared functional annotations and known physical interactions. It also outlines a method using a conditional random field tagger to identify protein names in text and two approaches for extracting interactions: co-citation analysis and learning interaction extractors from annotated sentences. Evaluation shows the extracted interactions have accuracy comparable to manually curated databases.
A comparative study of covariance selection models for the inference of gene ...Roberto Anglani
This study compares three methods for estimating gene regulatory networks from gene expression data: 1) a pseudoinverse method (PINV) that estimates the precision matrix using the Moore-Penrose pseudoinverse of the sample covariance matrix, 2) a regularized least squares method (RCM) that estimates partial correlations from regression residuals, and 3) a regularized log-likelihood method ('2C) that maximizes a penalized log-likelihood function to estimate the precision matrix. Extensive simulations show that the '2C method has the most predictive partial correlations and highest sensitivity for inferring conditional dependencies. Application to real datasets provides biological insights into gene pathways in Arabidopsis and human cells.
Protein Secondary Structure Prediction using HMMAbhishek Dabral
This document describes using hidden Markov models to predict protein secondary structure from amino acid sequences. It discusses:
1) Performing statistical correlation analysis to identify significant correlations between amino acid pairs in different secondary structure types. This found correlations between positions in alpha helices, beta strands, and coils.
2) Building a semi-Markov hidden Markov model to represent secondary structure assignments as segments defined by type and endpoint. The model considers correlations at segment borders and within segments.
3) Training the model by iteratively predicting secondary structure, removing inaccurate predictions from the training set, and re-estimating model parameters.
4) Evaluating prediction accuracy using Q3 scores, which measure the percentage of residues correctly
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...sipij
A new computational approach for protein sequence similarity analysis and functional classification which is fast and easier compared to the conventional method is described. This technique uses Discrete Wavelet Transform decomposition followed by sequence correlation analysis. The technique can also be used for identifying the functional class of a newly obtained protein sequence. The classification was done using a sample set of 270 protein sequences obtained from organisms of diverse origins and functional classes, which gave a classification accuracy of 94.81%. Accuracy and reliability of the technique is verified by comparing the results with that obtained from NCBI.
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET Journal
This document proposes a machine learning framework for predicting drug-target interactions. It extracts features from drug molecules using FP2 fingerprints and from protein sequences using PsePSSM. It then uses Lasso dimensionality reduction to select important features before balancing the data using SMOTE. Finally, it trains an SVM classifier on the processed data to predict drug-target interactions. The framework achieved better performance than traditional methods by leveraging machine learning techniques for efficient and effective prediction of interactions without costly experiments.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Genetic disease identification and medical diagnosis using MF, CC, BF, MicroR...KarthigaRavichandran3
The document describes a proposed system to integrate genomic and proteomic data to identify disease-associated genes. It uses cross ontology to classify gene functions and calculate protein values. Regulatory modules between transcription factors, genes, and miRNAs are identified using an integration technique. A multiplicative update algorithm solves the optimization function between these regulatory modules. Finally, a Bayesian rose tree represents the identified genetic diseases, their symptoms, and cures.
This document summarizes a framework for automatically extracting human protein-protein interaction data from biomedical literature. It describes benchmarking interaction datasets based on shared functional annotations and known physical interactions. It also outlines a method using a conditional random field tagger to identify protein names in text and two approaches for extracting interactions: co-citation analysis and learning interaction extractors from annotated sentences. Evaluation shows the extracted interactions have accuracy comparable to manually curated databases.
A comparative study of covariance selection models for the inference of gene ...Roberto Anglani
This study compares three methods for estimating gene regulatory networks from gene expression data: 1) a pseudoinverse method (PINV) that estimates the precision matrix using the Moore-Penrose pseudoinverse of the sample covariance matrix, 2) a regularized least squares method (RCM) that estimates partial correlations from regression residuals, and 3) a regularized log-likelihood method ('2C) that maximizes a penalized log-likelihood function to estimate the precision matrix. Extensive simulations show that the '2C method has the most predictive partial correlations and highest sensitivity for inferring conditional dependencies. Application to real datasets provides biological insights into gene pathways in Arabidopsis and human cells.
Protein Secondary Structure Prediction using HMMAbhishek Dabral
This document describes using hidden Markov models to predict protein secondary structure from amino acid sequences. It discusses:
1) Performing statistical correlation analysis to identify significant correlations between amino acid pairs in different secondary structure types. This found correlations between positions in alpha helices, beta strands, and coils.
2) Building a semi-Markov hidden Markov model to represent secondary structure assignments as segments defined by type and endpoint. The model considers correlations at segment borders and within segments.
3) Training the model by iteratively predicting secondary structure, removing inaccurate predictions from the training set, and re-estimating model parameters.
4) Evaluating prediction accuracy using Q3 scores, which measure the percentage of residues correctly
Application of three graph Laplacian based semisupervised learning methods to...ijbbjournal
This document discusses applying three graph Laplacian based semi-supervised learning methods (un-normalized, symmetric normalized, and random walk) to predict protein functions using integrated networks from multiple sources. It provides detailed descriptions of the random walk and symmetric normalized graph Laplacian algorithms. Experimental results on yeast protein data show the un-normalized and symmetric normalized methods perform slightly better than the random walk method, and all three methods perform better on the integrated network than individual networks.
HMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSISijcseit
HMM has found its application in almost every field. Applying Hmm to biological sequences has its own
advantages. HMM’s being more systematic and specific, yield a result better than consensus techniques.
Profile HMMs use position specific scoring for the matching & substitution of a residue and for the
opening or extension of a gap. HMMs apply a statistical method to estimate the true frequency of a residue
at a given position in the alignment from its observed frequency while standard profiles use the observed
frequency itself to assign the score for that residue. This means that a profile HMM derived from only 10 to
20 aligned sequences can be of equivalent quality to a standard profile created from 40 to 50 aligned
sequences.
Protein can be represented by amino acid interaction network. This network is a graph whose vertices are
the proteins amino acids and whose edges are the interactions between them. In this paper we have
formalized amino acid interaction network prediction as a multi-objective evolutionary optimization
problem. This formalism is biologically plausible because interactions among amino acids do not depend
only on a single factor like atomic distance but also other factors like torsion angle, hydrophobicity and
hydrophilicity etc. This problem is then solved and implemented using multi-objective genetic algorithm
and subsequently optimized using ant colony optimization technique. The result shows that our algorithm
performs better than recent amino acid interaction network prediction algorithms that are based on single
factor
This document summarizes a research paper that proposes a new technique called Protein Tertiary Structure Prediction using Genetic Algorithm (PTSPGA) to predict the tertiary structure of proteins based on their primary amino acid sequences. The technique uses a genetic algorithm approach to find protein conformations with the lowest free energy, as evaluated by the Empirical Conformational Energy Program for Peptides (ECEPP/3) force field model. The proposed genetic algorithm was tested on Met-enkephalin and other proteins, and experimental results found it to be reliable and accurate at predicting protein tertiary structures computationally from sequence alone.
Algorithm for Predicting Compound Protein Interaction Using Tanimoto Similari...TELKOMNIKA JOURNAL
This research aimed to develop a method for predicting interaction between chemical compounds contained in herbs and proteins related to particular disease. The algorithm of this method is based on binary local models algorithm, with protein similarity section is omitted. Klekota-Roth fingerprint is used for the compound's representation. In the development process of the method, three similarity functions are compared: Tanimoto, Cosine, and Dice. Youden’s index is used to evaluate optimum threshold value. The result showed that Tanimoto similarity function yielded higher similarity values and higher AUC value than those of the other two functions. Moreover, the optimum threshold value obtained is 0.65. Therefore, Tanimoto similarity function and threshold value 0.65 are selected to be used on the prediction method. The average evaluation accuracy of the developed algorithm is only about 50%. The low accuracy value is allegedly caused by the only use of compound similarity on the prediction method, without including the protein similarity.
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...ijbbjournal
Population-based Ant Colony algorithm is stochastic local search algorithm that mimics the behavior of
real ants, simulating pheromone trails to search for solutions to combinatorial optimization problems. This
paper introduces population-based Ant Colony algorithm to solve 3D Hydrophobic Polar Protein structure
Prediction Problem then introduces a new enhanced approach of population-based Ant Colony algorithm
called Enhanced Population-based Ant Colony algorithm (EP-ACO) to avoid stagnation problem in
population-based Ant Colony algorithm and increase exploration in the search space escaping from local
optima, The experiments show that our approach appears more efficient results than state of art method.
This document presents a method for protein function prediction that integrates different data sources, including protein sequence similarity, protein-protein interaction data, and gene expression data. The authors use a weighted k-nearest neighbors algorithm to calculate likelihood scores for different protein-function pairs based on integrated scores from the different data sources. Their results show that integrating multiple data sources improves prediction accuracy over using individual sources alone, and that different data sources are better predictors for different types of protein functions.
The document discusses the field of proteomics, which is the large-scale study of proteins, including their functions and structures. It defines proteomics and describes several areas within it, such as functional proteomics, expressional proteomics, and structural proteomics. It outlines typical proteomics experiments and some key methods used, including two-dimensional electrophoresis, mass spectrometry, and protein-protein interaction prediction methods like phylogenetic profiling.
This document discusses protein structure determination using bioinformatics tools. It describes that proteins are made of amino acids and have four levels of structure: primary, secondary, tertiary, and quaternary. Tertiary structure prediction methods include de novo modeling and comparative modeling. Quaternary structure prediction identifies interacting protein pairs using phylogenetic analysis, homologous interactions, and structural pattern identification. Bioinformatics tools for structure prediction apply algorithms and techniques from computer science like neural networks and approximation algorithms.
This document describes a study that uses machine learning algorithms to efficiently predict DNA-binding proteins. Support vector machines and cascade correlation neural networks are optimized and compared to determine the best performing model. The SVM model achieves 86.7% accuracy at predicting DNA-binding proteins using features like overall charge, patch size, and amino acid composition of proteins. The CCNN model achieves lower accuracy of 75.4%. The study aims to improve on previous work by using the standard jack-knife validation technique to evaluate model performance on unseen data.
1) The document analyzes a system of differential equations to model the dynamics of messenger RNA (mRNA) concentration and protein concentration over time.
2) The stationary or experimental solution was obtained by solving the differential equations numerically using MATLAB. This represents the steady state concentrations after a certain time.
3) Computationally, it was shown that there are no periodic solutions, meaning the concentrations do not oscillate over time, according to the Poincare-Bendixson theorem. The solution asymptotically approaches the stationary solution.
Criterion based Two Dimensional Protein Folding Using Extended GA IJCSEIT Journal
In the dynamite field of biological and protein research, the protein fold recognition for long pattern
protein sequences is a great confrontation for many years. With that consideration, this paper contributes
to the protein folding research field and presents a novel procedure for mapping appropriate protein
structure to its correct 2D fold by a concrete model using swarm intelligence. Moreover, the model
incorporates Extended Genetic Algorithm (EGA) with concealed Markov model (CMM) for effectively
folding the protein sequences that are having long chain lengths. The protein sequences are preprocessed,
classified and then, analyzed with some parameters (criterion) such as fitness, similarity and sequence gaps
for optimal formation of protein structures. Fitness correlation is evaluated for the determination of
bonding strength of molecules, thereby involves in efficient fold recognition task. Experimental results have
shown that the proposed method is more adept in 2D protein folding and outperforms the existing
algorithms.
Bacterial virulence proteins, which have been classified on structure of virulence, causes
several diseases. For instance, Adhesins play an important role in the host cells. They are
inserted DNA sequences for a variety of virulence properties. Several important methods
conducted for the prediction of bacterial virulence proteins for finding new drugs or vaccines.
In this study, we propose a method for feature selection about classification of bacterial
virulence protein. The features are constituted directly from the amino acid sequence of a given
protein. Amino acids form proteins, which are critical to life, and have many important
functions in living cells. They occurring with different physicochemical properties by a vector of
20 numerical values, and collected in AAIndex databases of known 544 indices.
For all that, this approach have two steps. Firstly, the amino acid sequence of a given protein
analysed with Lyapunov Exponents that they have a chaotic structure in accordance with the
chaos theory. After that, if the results show characterization over the complete distribution in
the phase space from the point of deterministic system, it means related protein will show a
chaotic structure.
Empirical results revealed that generated feature vectors give the best performance with chaotic
structure of physicochemical features of amino acids with Adhesins and non-Adhesins data sets.
The Chaotic Structure of Bacterial Virulence Protein Sequencescsandit
This document discusses analyzing the chaotic structure of bacterial virulence protein sequences using their amino acid physicochemical properties. It proposes a method involving two main steps: 1) Analyzing the amino acid sequence of a given protein using Lyapunov exponents to determine if it exhibits chaotic behavior according to chaos theory. 2) If the results characterize the complete distribution in phase space like a deterministic system, the related protein is considered to have a chaotic structure. The method is tested on adhesin and non-adhesin protein datasets. Results show that physicochemical feature vectors generated from the chaotic structure analysis perform best for classification, supporting the hypothesis that bacterial virulence protein sequences have chaotic structures derived from the physicochemical properties of their constituent amino acids
A novel optimized deep learning method for protein-protein prediction in bioi...IJECEIAES
Proteins have been shown to perform critical activities in cellular processes and are required for the organism's existence and proliferation. On complicated protein-protein interaction (PPI) networks, conventional centrality approaches perform poorly. Machine learning algorithms based on enormous amounts of data do not make use of biological information's temporal and spatial dimensions. As a result, we developed a sequence- dependent PPI prediction model using an Aquila and shark noses-based hybrid prediction technique. This model operates in two stages: feature extraction and prediction. The features are acquired using the semantic similarity technique for good results. The acquired features are utilized to predict the PPI using hybrid deep networks long short-term memory (LSTM) networks and restricted Boltzmann machines (RBMs). The weighting parameters of these neural networks (NNs) were changed using a novel optimization approach hybrid of aquila and shark noses (ASN), and the results revealed that our proposed ASN-based PPI prediction is more accurate and efficient than other existing techniques.
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO csandit
1. The document proposes using a particle swarm optimization (PSO) algorithm to design stable drug molecules that minimize interaction energy with target proteins.
2. In the algorithm, drugs are represented as variable-length trees containing functional groups, and PSO is used to optimize van der Waals and electrostatic interaction energies.
3. Results show that PSO performs better than previous fixed-length tree methods at designing drugs that stably bind to active sites of human rhinovirus, malaria, and HIV proteins.
The document describes the complementarity plot (CP), a validation tool for protein structures based on packing and electrostatics of buried residues. The CP plots surface complementarity against electrostatic complementarity for buried residues. The document outlines how local and global scores are designed based on the CP to detect various errors, such as incorrect side chain orientations, diffuse main chain errors, and imbalanced charges. Validation results show the CP is effective at discriminating obsolete structures from updated ones and identifying other errors. Applications of the CP in protein modeling and design are also demonstrated.
MULISA : A New Strategy for Discovery of Protein Functional Motifs and Residuescsandit
To predict and identify details regarding function
from protein sequences is an emergency task
since the growing number and diversity of protein s
equence. Here, we develop a novel approach
for identifying conservation residues and motifs of
ligand-binding proteins. In this method,
called MuLiSA (Multiple Ligand-bound Structure Alig
nment), we first superimpose the ligands
of ligand-binding proteins and then the residues of
ligand-binding sites are naturally aligned.
We identify important residues and patterns based o
n the z-scores of the residue entropy and
residue-segment entropy. After identifying new patt
ern candidates, the profiles of patterns are
generated to predict the protein function from only
protein sequences. We tested our approach
on ATP-binding proteins and HEM-binding proteins. T
he experiments show that MuLiSA can
identify the conservation residues and novel patter
ns which are really correlated with protein
functions of certain ligand-binding proteins. We fo
und that our MuLiSA can identify
conservation patterns and is better than traditiona
l alignments such as CE and CLUSTALW in
some ligand-binding proteins. We believe that our M
uLiSA is useful to discover ligand-binding
specificity-determining residues and functional imp
ortant patterns of proteins.
Jiang Y., Xu W., Thompson L.P., Gutell R., and Miranker D. (2011).
R-PASS: A Fast Structure-based RNA Sequence Alignment Algorithm.
Proceedings of 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2011), Atlanta, GA. November 12-15, 2011. IEEE Computer Society, Washington, DC, USA. pp. 618-622.
Application of three graph Laplacian based semisupervised learning methods to...ijbbjournal
This document discusses applying three graph Laplacian based semi-supervised learning methods (un-normalized, symmetric normalized, and random walk) to predict protein functions using integrated networks from multiple sources. It provides detailed descriptions of the random walk and symmetric normalized graph Laplacian algorithms. Experimental results on yeast protein data show the un-normalized and symmetric normalized methods perform slightly better than the random walk method, and all three methods perform better on the integrated network than individual networks.
HMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSISijcseit
HMM has found its application in almost every field. Applying Hmm to biological sequences has its own
advantages. HMM’s being more systematic and specific, yield a result better than consensus techniques.
Profile HMMs use position specific scoring for the matching & substitution of a residue and for the
opening or extension of a gap. HMMs apply a statistical method to estimate the true frequency of a residue
at a given position in the alignment from its observed frequency while standard profiles use the observed
frequency itself to assign the score for that residue. This means that a profile HMM derived from only 10 to
20 aligned sequences can be of equivalent quality to a standard profile created from 40 to 50 aligned
sequences.
Protein can be represented by amino acid interaction network. This network is a graph whose vertices are
the proteins amino acids and whose edges are the interactions between them. In this paper we have
formalized amino acid interaction network prediction as a multi-objective evolutionary optimization
problem. This formalism is biologically plausible because interactions among amino acids do not depend
only on a single factor like atomic distance but also other factors like torsion angle, hydrophobicity and
hydrophilicity etc. This problem is then solved and implemented using multi-objective genetic algorithm
and subsequently optimized using ant colony optimization technique. The result shows that our algorithm
performs better than recent amino acid interaction network prediction algorithms that are based on single
factor
This document summarizes a research paper that proposes a new technique called Protein Tertiary Structure Prediction using Genetic Algorithm (PTSPGA) to predict the tertiary structure of proteins based on their primary amino acid sequences. The technique uses a genetic algorithm approach to find protein conformations with the lowest free energy, as evaluated by the Empirical Conformational Energy Program for Peptides (ECEPP/3) force field model. The proposed genetic algorithm was tested on Met-enkephalin and other proteins, and experimental results found it to be reliable and accurate at predicting protein tertiary structures computationally from sequence alone.
Algorithm for Predicting Compound Protein Interaction Using Tanimoto Similari...TELKOMNIKA JOURNAL
This research aimed to develop a method for predicting interaction between chemical compounds contained in herbs and proteins related to particular disease. The algorithm of this method is based on binary local models algorithm, with protein similarity section is omitted. Klekota-Roth fingerprint is used for the compound's representation. In the development process of the method, three similarity functions are compared: Tanimoto, Cosine, and Dice. Youden’s index is used to evaluate optimum threshold value. The result showed that Tanimoto similarity function yielded higher similarity values and higher AUC value than those of the other two functions. Moreover, the optimum threshold value obtained is 0.65. Therefore, Tanimoto similarity function and threshold value 0.65 are selected to be used on the prediction method. The average evaluation accuracy of the developed algorithm is only about 50%. The low accuracy value is allegedly caused by the only use of compound similarity on the prediction method, without including the protein similarity.
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...ijbbjournal
Population-based Ant Colony algorithm is stochastic local search algorithm that mimics the behavior of
real ants, simulating pheromone trails to search for solutions to combinatorial optimization problems. This
paper introduces population-based Ant Colony algorithm to solve 3D Hydrophobic Polar Protein structure
Prediction Problem then introduces a new enhanced approach of population-based Ant Colony algorithm
called Enhanced Population-based Ant Colony algorithm (EP-ACO) to avoid stagnation problem in
population-based Ant Colony algorithm and increase exploration in the search space escaping from local
optima, The experiments show that our approach appears more efficient results than state of art method.
This document presents a method for protein function prediction that integrates different data sources, including protein sequence similarity, protein-protein interaction data, and gene expression data. The authors use a weighted k-nearest neighbors algorithm to calculate likelihood scores for different protein-function pairs based on integrated scores from the different data sources. Their results show that integrating multiple data sources improves prediction accuracy over using individual sources alone, and that different data sources are better predictors for different types of protein functions.
The document discusses the field of proteomics, which is the large-scale study of proteins, including their functions and structures. It defines proteomics and describes several areas within it, such as functional proteomics, expressional proteomics, and structural proteomics. It outlines typical proteomics experiments and some key methods used, including two-dimensional electrophoresis, mass spectrometry, and protein-protein interaction prediction methods like phylogenetic profiling.
This document discusses protein structure determination using bioinformatics tools. It describes that proteins are made of amino acids and have four levels of structure: primary, secondary, tertiary, and quaternary. Tertiary structure prediction methods include de novo modeling and comparative modeling. Quaternary structure prediction identifies interacting protein pairs using phylogenetic analysis, homologous interactions, and structural pattern identification. Bioinformatics tools for structure prediction apply algorithms and techniques from computer science like neural networks and approximation algorithms.
This document describes a study that uses machine learning algorithms to efficiently predict DNA-binding proteins. Support vector machines and cascade correlation neural networks are optimized and compared to determine the best performing model. The SVM model achieves 86.7% accuracy at predicting DNA-binding proteins using features like overall charge, patch size, and amino acid composition of proteins. The CCNN model achieves lower accuracy of 75.4%. The study aims to improve on previous work by using the standard jack-knife validation technique to evaluate model performance on unseen data.
1) The document analyzes a system of differential equations to model the dynamics of messenger RNA (mRNA) concentration and protein concentration over time.
2) The stationary or experimental solution was obtained by solving the differential equations numerically using MATLAB. This represents the steady state concentrations after a certain time.
3) Computationally, it was shown that there are no periodic solutions, meaning the concentrations do not oscillate over time, according to the Poincare-Bendixson theorem. The solution asymptotically approaches the stationary solution.
Criterion based Two Dimensional Protein Folding Using Extended GA IJCSEIT Journal
In the dynamite field of biological and protein research, the protein fold recognition for long pattern
protein sequences is a great confrontation for many years. With that consideration, this paper contributes
to the protein folding research field and presents a novel procedure for mapping appropriate protein
structure to its correct 2D fold by a concrete model using swarm intelligence. Moreover, the model
incorporates Extended Genetic Algorithm (EGA) with concealed Markov model (CMM) for effectively
folding the protein sequences that are having long chain lengths. The protein sequences are preprocessed,
classified and then, analyzed with some parameters (criterion) such as fitness, similarity and sequence gaps
for optimal formation of protein structures. Fitness correlation is evaluated for the determination of
bonding strength of molecules, thereby involves in efficient fold recognition task. Experimental results have
shown that the proposed method is more adept in 2D protein folding and outperforms the existing
algorithms.
Bacterial virulence proteins, which have been classified on structure of virulence, causes
several diseases. For instance, Adhesins play an important role in the host cells. They are
inserted DNA sequences for a variety of virulence properties. Several important methods
conducted for the prediction of bacterial virulence proteins for finding new drugs or vaccines.
In this study, we propose a method for feature selection about classification of bacterial
virulence protein. The features are constituted directly from the amino acid sequence of a given
protein. Amino acids form proteins, which are critical to life, and have many important
functions in living cells. They occurring with different physicochemical properties by a vector of
20 numerical values, and collected in AAIndex databases of known 544 indices.
For all that, this approach have two steps. Firstly, the amino acid sequence of a given protein
analysed with Lyapunov Exponents that they have a chaotic structure in accordance with the
chaos theory. After that, if the results show characterization over the complete distribution in
the phase space from the point of deterministic system, it means related protein will show a
chaotic structure.
Empirical results revealed that generated feature vectors give the best performance with chaotic
structure of physicochemical features of amino acids with Adhesins and non-Adhesins data sets.
The Chaotic Structure of Bacterial Virulence Protein Sequencescsandit
This document discusses analyzing the chaotic structure of bacterial virulence protein sequences using their amino acid physicochemical properties. It proposes a method involving two main steps: 1) Analyzing the amino acid sequence of a given protein using Lyapunov exponents to determine if it exhibits chaotic behavior according to chaos theory. 2) If the results characterize the complete distribution in phase space like a deterministic system, the related protein is considered to have a chaotic structure. The method is tested on adhesin and non-adhesin protein datasets. Results show that physicochemical feature vectors generated from the chaotic structure analysis perform best for classification, supporting the hypothesis that bacterial virulence protein sequences have chaotic structures derived from the physicochemical properties of their constituent amino acids
A novel optimized deep learning method for protein-protein prediction in bioi...IJECEIAES
Proteins have been shown to perform critical activities in cellular processes and are required for the organism's existence and proliferation. On complicated protein-protein interaction (PPI) networks, conventional centrality approaches perform poorly. Machine learning algorithms based on enormous amounts of data do not make use of biological information's temporal and spatial dimensions. As a result, we developed a sequence- dependent PPI prediction model using an Aquila and shark noses-based hybrid prediction technique. This model operates in two stages: feature extraction and prediction. The features are acquired using the semantic similarity technique for good results. The acquired features are utilized to predict the PPI using hybrid deep networks long short-term memory (LSTM) networks and restricted Boltzmann machines (RBMs). The weighting parameters of these neural networks (NNs) were changed using a novel optimization approach hybrid of aquila and shark noses (ASN), and the results revealed that our proposed ASN-based PPI prediction is more accurate and efficient than other existing techniques.
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO csandit
1. The document proposes using a particle swarm optimization (PSO) algorithm to design stable drug molecules that minimize interaction energy with target proteins.
2. In the algorithm, drugs are represented as variable-length trees containing functional groups, and PSO is used to optimize van der Waals and electrostatic interaction energies.
3. Results show that PSO performs better than previous fixed-length tree methods at designing drugs that stably bind to active sites of human rhinovirus, malaria, and HIV proteins.
The document describes the complementarity plot (CP), a validation tool for protein structures based on packing and electrostatics of buried residues. The CP plots surface complementarity against electrostatic complementarity for buried residues. The document outlines how local and global scores are designed based on the CP to detect various errors, such as incorrect side chain orientations, diffuse main chain errors, and imbalanced charges. Validation results show the CP is effective at discriminating obsolete structures from updated ones and identifying other errors. Applications of the CP in protein modeling and design are also demonstrated.
MULISA : A New Strategy for Discovery of Protein Functional Motifs and Residuescsandit
To predict and identify details regarding function
from protein sequences is an emergency task
since the growing number and diversity of protein s
equence. Here, we develop a novel approach
for identifying conservation residues and motifs of
ligand-binding proteins. In this method,
called MuLiSA (Multiple Ligand-bound Structure Alig
nment), we first superimpose the ligands
of ligand-binding proteins and then the residues of
ligand-binding sites are naturally aligned.
We identify important residues and patterns based o
n the z-scores of the residue entropy and
residue-segment entropy. After identifying new patt
ern candidates, the profiles of patterns are
generated to predict the protein function from only
protein sequences. We tested our approach
on ATP-binding proteins and HEM-binding proteins. T
he experiments show that MuLiSA can
identify the conservation residues and novel patter
ns which are really correlated with protein
functions of certain ligand-binding proteins. We fo
und that our MuLiSA can identify
conservation patterns and is better than traditiona
l alignments such as CE and CLUSTALW in
some ligand-binding proteins. We believe that our M
uLiSA is useful to discover ligand-binding
specificity-determining residues and functional imp
ortant patterns of proteins.
Jiang Y., Xu W., Thompson L.P., Gutell R., and Miranker D. (2011).
R-PASS: A Fast Structure-based RNA Sequence Alignment Algorithm.
Proceedings of 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2011), Atlanta, GA. November 12-15, 2011. IEEE Computer Society, Washington, DC, USA. pp. 618-622.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Advanced control scheme of doubly fed induction generator for wind turbine us...
ANTIC-2021_paper_95.pdf
1. A Novel approach of Differential Evolution Multi-Objective Optimization
with Stochastic Learning Automata Algorithm to predict Protein Interactions
P.Lakshmi 1
Dr.D.Ramyachitra 2*
1
Ph.D. Research Scholar, Department of Computer Science, Bharathiar University, Coimbatore, Tamilnadu.
visalaks@gmail
2
Assistant Professor, Department of Computer Science, Bharathiar University, Coimbatore, Tamilnadu.
jaichitra1@yahoo.co.in
Abstract - Nowadays, the research of protein interaction prediction increases to propose various methods and technologies in
bioinformatics. By using the BLOSUM62 block substitution Matrix method, the mutation rate of the amino acid and its features
are extracted. For global alignment, Protein interaction is predicted via an Incremental Depth Extension (INDEX) algorithm in
PPIN. However, this approach obtained a minimum score of symmetric substructure and maximum edge correctness. Hence a
novel algorithm DEMOO and SLA methods are proposed to resolve the problem of protein interaction prediction. Three functions
are considered for protein interaction predictions between them, such as calculating nearby related common neighbor’s interaction
between the proteins. Functions have the similarity between proteins. Finding the ratio of the proteins access solvent area related
to the protein complex. The results of the experiments represent the technique is providing better accuracy based on edge
correctness and symmetric substructure score in the protein interaction network.
Keywords: Protein-Protein Interaction Network, Differential Evolution Multi-Objective Optimization with Stochastic Learning
Automata, INDEX algorithm, Blosum62.
1. Introduction
Chowdhury, A., et al., (2016) suggested the various computational methods are invented to predict protein-protein
interaction depends on its characteristics such as, pair of protein pairs and its complex formation, increased problem-
solving methods with less complexity, similar functions, and domains, etc., Interacted protein information gathered
from PPIN in the STRING database, helps to identify one part of lung cancer. Progress of dysfunctional genes in two
levels could contribute to the random walk and restart algorithm used to predict the protein interaction. (Yuan, F., &
Lu, W., 2017) From the protein hubs, protein-protein interaction was identified to describe the novel drug targets by
using the CD-HIT tool with the clustering method to group the Orthologous proteins. (Uddin, R, and Jamil, F., 2018)
Zhang, C., et al., (2018) Presented a novel method which comprises the Meta Gene Ontology to describe the
information of protein interaction belongs to the attributes of the proteins from the annotation of the homology-based
structure prediction and network mapping of the protein interaction network. Protein interaction prediction uses to
find the protein function, for that SMISS is described to predict protein function and the information of homologs is
fetched from PSI-BLAST, PPIN, Gene interaction networks. Cao. Et al., (2016) Chemicals of the protein features can
be extracted by using the DNN model and DL-CPI method proposed to predict the protein interaction with the datasets
either balanced or imbalanced. Tian, K., et al., (2016) Various techniques in computational biology applied to predict
the host-pathogen PPIs, that is related to MRSA and Humans, based on the characteristics of homologs and its
interacted protein partners, it helps to implement the potential drug targets by using the data collected from DIP with
BLAST operation. Uddin, R., et al., (2017) Wei, Z. S., et al., (2016) suggested predicting the sites of the interacted
2. protein by using the classification methods with an ensemble of SVM and SSWRF with the representation of the
lower-dimensional features such as evolutionary conversation, hydrophobic property, and hydrophilic property are the
features found from target residue and its relations. To overcome the existing methods, a new algorithm is proposed
by using DEMOO and SLA. The remaining chapters are structured as follows: Section II represents the methodology
of proposed algorithms for protein-protein interaction networks. Chapter third represents the performance of the
results for proposed methods. The fourth chapter summarizes the conclusion of the research and further enhancements.
2. Methodology
Overall system architecture for the proposed system is discussed for that, the proposed algorithm
flowchart and advantages are presented.
System Architecture
Fig -1 A Framework of DEMOO-SLA method for Protein-Protein Interaction Prediction
2.1 Feature Extraction
The mutation rate of the amino acid features are extracted by BLOSUM62 matrix follow the N x 20 block
substitution matrix, transferred into an HP matrix (High dimensional protein) by the less complex transformation
which is described as,
3. In the above equation, P= p1, p2, p3…pN describes the Number of the amino acid sequences and B(i,j) defines the
BLOSUM62 20 X 20 matrix. In the given equation, all sequences of the protein are taken as the size of the protein
feature coefficients. Hence, each pair of the proteins has a total number of feature coefficients.
A Two-dimensional linear discriminant analysis method is adopted. To reduce the number of feature dimensions,
energy and noise reduction concentrated. Let 800 pairs of protein pairs of features with high dimension HPI where I
= 1, 2…N. I denote a pair of proteins. The two DLDA approach is designed with two mappings are: 𝐿 ∈ 𝐼𝑅𝑟×𝑝
𝑎𝑛𝑑𝑅 ∈
𝐼𝑅𝑐×𝑞
, to direct the real high dimension 𝐻𝑃𝐼 ∈ 𝐼𝑅𝑟×𝑐
space into the lower-dimensional space 𝐵𝑃𝐼 ∈ 𝐼𝑅𝑝×𝑞
. The
mapping is defined as:
To find the solution of the optimal linear mapping problem with L and R, within class Fw and inter-class Fb matrix
used. To get the optimum value of L and R, minimum value Fwand maximum value Fb is achieved by using the
formula given below,
Where, ith class mean 𝑚𝐼 =
1
𝑛𝐼
∑ 𝑥
𝑥∈𝛱𝐼
and global value of the mean 𝑚 =
1
𝑛
∑ ∑ 𝑥
𝑥∈𝛱𝐼
𝑘
𝐼=1
Equations (3) and (4) values are assigned to the iterative algorithm. After the number of iterations, protein pair
features with high dimension HP can be reduced to the low dimension pair of protein features with the values r,c.
2.2 Label Propagation Algorithm
In the graph approach, the Physiochemical properties of the protein sequences were considered to predict the
protein interaction by using the method Network Fusion Similarity and the LPA method. Amino acid features and
their mutation rate are extracted by the matrix method named BLOSUM62, which locates the sequence of the proteins
into a block substitution matrix. Hydrophobicity and amino acids mutation rate acts as the protein sequence features.
4. 2.3 Incremental depth extension (INDEX) approach
This algorithm executes the global alignment with multiple stages in PPIN. Initial alignment depends on the
matching strategy of the scores. For score calculation; biological and topological scores of the proteins are computed.
By using these measures proteins are aligned and it selects the proteins with a high score. New alignments are
expanded with them till the last alignment is to be reached. A new method DEMO-SLA is proposed for protein
interaction prediction in PPIN. The protein interaction network is represented with a solution vector by combining the
weights [0,1] of interacted pair of proteins and the establishment of the protein connections is described based on the
threshold. In the proposed approach, Edge correctness is decreased and the Symmetric substructure score value is
increased with high performance.
2.4 Formation of a PPI Network
The number of Proteins P in PPIN has high P x (P-1) /2 interactions. To monitor the observation, it has represented
by a vector 𝑉
⃗ with dimensions 1 x D where,
The 𝑚𝑡ℎ position of 𝑉
⃗ , defined as Vm ∈ (0,1)where m = 1,2,…. D-1 and wi,j are denoted as weight computed
between the interacted proteins Pi and Pj.
Where i=1,2,…. p-1 and j= i+1,i+2,….p.
2.5 Neighborhood Topology with Protein interaction prediction
Pair of protein interactions is possible when it relates to the size of its neighbor. In the interacted protein pairs pi
and Pj, the common size of the neighborhood to be determined by the identification of protein p1 in the PPIN.
The weight of interaction 𝑤𝑖,𝑗between proteins 𝑝𝑖𝑎𝑛𝑑𝑝𝑗is fetched in the related common proteins in the PPIN.
From the pair of proteins pi and pj with the relevant weights, interacted proteins are identified depends on the particular
threshold Th.
In PPIN, Wi,l ( or Wl,i) > Th and Wjl (or Wl,j) > Th. The accurate pair of proteins pi and pj predicted with the weight
of the protein interactions between them and they have measured by the similarity. If ni,j is the number of all protein
pairs p, then Wil (or Wli) > Th and Wjl (orWlj) > Th. By measuring the interaction weight wi,j, its common ratio of
5. the neighborhood interaction ratio d |ni,j|/N, the accuracy is decided in interacted protein pairs pi and pj in PPIN. The
requirement is accomplished by maximizing the equation given below,
From the above equation, 𝜀 represents a small minimized positive constant. By assigning the equation given below,
protein interaction weights may accurately predict in a network.
2.6 Functional Characteristics of Protein Interaction Prediction
Proteins possess interacted molecular functions and are sited in homo cellular sections. It relates to similar
functions and biological processes. Functionally two similar interacted proteins pi and pj are maximized in the PPIN
with the equation given below,
2.7 Predicting PPIs using ASA
Accessible solvent Area reduction of protein interaction pi and pj and its strength is computed depends on the
binding is given below,
With the above equation, 𝐴𝑆𝐴(𝑝𝑖)𝑎𝑛𝑑𝐴𝑆𝐴(𝑝𝑖_𝑗), the formation of the protein complex is denoted between the pi
and pj of the protein. The maximization of the similarity between the pair of protein interaction prediction is computed
after ASA binding by the equation shown below,
6. It estimates the pair of protein interaction predictions. To ensure the maximum value of J3, protein interaction
predictions of pi and pj with weights wi,j has a high reduction in ASA with its binding individuals of the complex are
evaluated.
2.8 Differential Evolution algorithm for Multi-Objective Optimization
(a) Initialization: Let Pt initialized with the first population of NP with Dimensional D vector of DEMOO as
given below,
In search area, generation t = 0 initialized for i = [1, NP] randomly. The crossover rate CR starts with 0,1.
The kth
Position value is computed by the vector 𝑉
⃗𝑖(0). Where k = 1,K and I = 1,NP with function [Jk (𝑉
⃗𝑖(0))].
(b) Mutation: Creation of the donor vector 𝑉
⃗𝑖 (𝑡) with the related target vector 𝑉
⃗𝑖 (𝑡).
Where I = 1, NP which depends on the DE/rand/1mutation system. A
By assigning the values to the above equation, Random solution 𝑉
⃗𝑟1(𝑡), 𝑉
⃗𝑟2(𝑡)𝑎𝑛𝑑𝑉
⃗𝑟3(𝑡)from Pt. It describes the
scaling factor within [0, 2], where 𝑖 ≠ 𝑟1 ≠ 𝑟2 ≠ 𝑟3.
(c) Crossover CR: A test vector 𝑈
⃗
⃗ 𝑖(t) is produced with the concern of the binomial crossover for both couple
proteins of a vector𝐷
⃗⃗ 𝑖(𝑡)[Donar] with the required vector 𝑉
⃗𝑖(𝑡) [target] represented by the equation shown
below,
For𝑗 = [1, 𝐷] where 𝑗𝑟𝑎𝑛𝑑 ∈ [1, 𝐷] is to select the indexes randomly.
2.9 Stochastic Learning Automata (SLA)
It is supportive learning that depends on the classes. It acts as a learning agent control at level-wise responses from
the atmosphere. Let 𝑆 = {𝑠1, 𝑠2, … , 𝑠𝑚} , an agent with a list of m states atmosphere given. Let, 𝐴 =
{𝑎1, 𝑎2, … , 𝑎𝑛}Selection of agent from the n actions at each state belongs to S i.e., 𝑠𝑖 ∈ 𝑆.
7. Pseudo code for DEMOO-SLA
3. Results And Discussion
Performance measures of the proposed method are compared with the existing methods in terms of Edge
Correctness and Symmetric substructure score.
3.1 Performance measures
The comparison is made in terms of the performance metrics referred to as the Edge Correctness and Symmetric
substructure score that is defined in the following subsections.
8. 3.1.1 Edge Correctness (EC)
Particular criteria of the first network alignment of the edges with its percentage indicate to align with one edge to the
next network (second network). In second network nodes are related between one another i.e., g(u)&g(v) belongs to
u and v. Edge Correctness calculated by using the formula shown below,
3.1.2 Symmetric substructure score (𝑺𝟑
)
The symmetric substructure score is one more measure of the topological alignment evaluation. Penalty considers
from the EC as unaligned edges in G1 but in (𝑆3), it contains the unaligned edges in G1 and G2. It induced subgraph
relates to V1 nodes of G2 as penalties. (𝑆3) is computed by using the formula is shown below,
In the above equation, |𝑓(𝐸1)| denotes the aligned edges & 𝐺2[𝑓(𝑉1)] indicates the induced sub-graph
corresponding to𝑉1nodes in 𝐺2 network.
3.2 Performance comparison of existing and proposed methods for DIP and SCOP datasets
Here the proposed approach has been compared with the existing approach. The results are shown that the method
is better than the existing approaches for DIP and SCOP datasets. The tables and graphs have represented the
comparison of performance measures for the PPI dataset.
Table-1 Performance comparison of existing and proposed methods for DIP dataset
Dataset Algorithms Edge correctness (EC) Symmetric substructure
score (3S)
DIP
LPA 89 76
INDEX 82 83
DEMO-SLA 72 91
9. Fig-2 Performance comparison with Edge correctness and Symmetric Substructures using DIP Dataset
The above figure shows that the comparison results of the proposed approach with the existing method in terms of
edge correctness and symmetric substructure score (3S).EC and 3S are represented on X-axis. From the bar chart, the
proposed approach provides a high symmetric substructure score and low edge correctness.
Table-2 Performance comparison of existing and proposed methods for SCOP dataset
Dataset Algorithms Edge correctness (EC) Symmetric substructure score (3S)
SCOP
LPA 90 68
INDEX 81 79
DEMO- SLA 70 89
10. Figure -3 Performance comparison of existing and proposed methods for SCOP dataset
Comparative results of the proposed approach with the existing method in terms of edge correctness and symmetric
substructure score (3S). EC and 3S are represented on X-axis. From the bar chart, the symmetric substructure score is
increased and edge correctness is decreased for the proposed approach when compared with the existing methods. The
comparison is made in terms of the accuracy, sensitivity, specificity, and F1-score performance measures that are
defined in the following subsections.
3.2.1 Sensitivity
Sensitivity or recall represents the percentage of positive values that are correctly identified and computed using
the formula given below,
3.2.2 Specificity
It is defined by the ratio of true negatives that are described as a negative performance of the results and it is shown
in the following equation given below,
11. 3.2.3 F1-score
The F-measure has described the average of the information retrieval of the recall and precision measures shown
below,
In the above equation, precision denotes as𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝐹𝑎𝑙𝑠𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
.
3.2.4 Accuracy
The overall accuracy rate of the classification is calculated by using the formula as follows,
Fig – 4 Comparison of performance measures with existing and proposed methods
12. The results are shown that the proposed algorithm is better than the existing methods for PPI datasets. The tables
and graphs have represented the comparison of performance measures for the PPI dataset. Here the results and
discussion of the existing and proposed algorithms are discussed. Also, the PPI dataset is used in the experimental
study. The comparisons of existing and proposed approaches are given. In the experimental results, it has been found
that the proposed methods perform better than the existing approach.
4. Conclusion
This paper presents a novel algorithm to predict Protein interaction with the Multi-Objective Optimization (MOO)
problem. Local filtering and global optimization search evaluated by using the algorithms Multi-objective
optimization and the Stochastic Learning Automata. From the experiment of the methods, the results have shown that
the proposed approach is providing better accuracy results in terms of edge correctness and symmetric substructure
score. In the future, this research work to be extended as this method for PPIs prediction for the unbalanced, small
sample dataset. Besides, the invention of the new methods can deal with the problem of imbalanced data and classes.
Meanwhile, seeking discriminative features is helpful to predict the sites of the Proteins in Protein-Protein Interaction
Networks.
References
[1] Chowdhury, A., Rakshit, P., &Konar, A. (2016). Protein-protein interaction network prediction using stochastic learning
automata-induced differential evolution. Applied Soft Computing, 49, 699-724.
[2] Feng, Z. J., Xu, S. C., Liu, N., Zhang, G. W., Hu, Q. Z., & Gong, Y. M. (2018). Soybean TCP transcription factors:
Evolution, classification, protein interaction and stress, and hormone responsiveness. Plant Physiology and Biochemistry.
[3] Du, T., Liao, L., Wu, C. H., & Sun, B. (2016). Prediction of residue-residue contact matrix for protein-protein interaction
with Fisher score features and deep learning. Methods, 110, 97-105.
[4] Tian, K., Shao, M., Wang, Y., Guan, J., & Zhou, S. (2016). Boosting compound-protein interaction prediction by deep
learning. Methods, 110, 64-72.
[5] Wei, Z. S., Han, K., Yang, J. Y., Shen, H. B., & Yu, D. J. (2016). Protein-protein interaction site prediction by ensembling
SVM and sample-weighted random forests.Neurocomputing, 193, 201-212.
[6] Cao, R., & Cheng, J. (2016). Integrated protein function prediction by mining function associations, sequences, and
protein-protein and gene-gene interaction networks. Methods, 93, 84-91.
[7] Uddin, R., &Jamil, F. (2018). Prioritization of potential drug targets against P. aeruginosa by core proteomic analysis
using computational subtractive genomics and protein-Protein interaction network. Computational Biology and
Chemistry.
[8] Lai, J. K., Ambia, J., Wang, Y., & Barth, P. (2017). Enhancing Structure Prediction and Design of Soluble and Membrane
Proteins with Explicit Solvent-Protein Interactions. Structure, 25(11), 1758-1770.
[9] Zhang, C., Zheng, W., Freddolino, P. L., & Zhang, Y. (2018). MetaGO: Predicting Gene Ontology of non-homologous
proteins through low-resolution protein structure prediction and protein-protein network mapping. Journal of molecular
biology.
[10] Uddin, R., Tariq, S. S., Azam, S. S., Wadood, A., &Moin, S. T. (2017). Identification of Histone Deacetylase (HDAC)
as a drug target against MRSA via interlock method of protein-protein interaction prediction. European Journal of
Pharmaceutical Sciences, 106, 198-211.
13. [11] Uddin, R., Tariq, S. S., Azam, S. S., Wadood, A., &Moin, S. T. (2017). Identification of Histone Deacetylase (HDAC)
as a drug target against MRSA via interlock method of protein-protein interaction prediction. European Journal of
Pharmaceutical Sciences, 106, 198-211.