Proteins have been shown to perform critical activities in cellular processes and are required for the organism's existence and proliferation. On complicated protein-protein interaction (PPI) networks, conventional centrality approaches perform poorly. Machine learning algorithms based on enormous amounts of data do not make use of biological information's temporal and spatial dimensions. As a result, we developed a sequence- dependent PPI prediction model using an Aquila and shark noses-based hybrid prediction technique. This model operates in two stages: feature extraction and prediction. The features are acquired using the semantic similarity technique for good results. The acquired features are utilized to predict the PPI using hybrid deep networks long short-term memory (LSTM) networks and restricted Boltzmann machines (RBMs). The weighting parameters of these neural networks (NNs) were changed using a novel optimization approach hybrid of aquila and shark noses (ASN), and the results revealed that our proposed ASN-based PPI prediction is more accurate and efficient than other existing techniques.
Novel modelling of clustering for enhanced classification performance on gene...IJECEIAES
Gene expression data is popularized for its capability to disclose various disease conditions. However, the conventional procedure to extract gene expression data itself incorporates various artifacts that offer challenges in diagnosis a complex disease indication and classification like cancer. Review of existing research approaches indicates that classification approaches are few to proven to be standard with respect to higher accuracy and applicable to gene expression data apart from unaddresed problems of computational complexity. Therefore, the proposed manuscript introduces a novel and simplified model capable using Graph Fourier Transform, Eigen Value and vector for offering better classification performance considering case study of microarray database, which is one typical example of gene expressiondata. The study outcome shows that proposed system offers comparatively better accuracy and reduced computational complexity with the existing clustering approaches.
Deep learning optimization for drug-target interaction prediction in COVID-19...IJECEIAES
The exponentially increasing bioinformatics data raised a new problem: the computation time length. The amount of data that needs to be processed is not matched by an increase in hardware performance, so it burdens researchers on computation time, especially on drug-target interaction prediction, where the computational complexity is exponential. One of the focuses of high-performance computing research is the utilization of the graphics processing unit (GPU) to perform multiple computations in parallel. This study aims to see how well the GPU performs when used for deep learning problems to predict drug-target interactions. This study used the gold-standard data in drug-target interaction (DTI) and the coronavirus disease (COVID-19) dataset. The stages of this research are data acquisition, data preprocessing, model building, hyperparameter tuning, performance evaluation and COVID-19 dataset testing. The results of this study indicate that the use of GPU in deep learning models can speed up the training process by 100 times. In addition, the hyperparameter tuning process is also greatly helped by the presence of the GPU because it can make the process up to 55 times faster. When tested using the COVID-19 dataset, the model showed good performance with 76% accuracy, 74% F-measure and a speed-up value of 179.
Design and development of learning model for compression and processing of d...IJECEIAES
This document describes a deep learning model for efficient compression and processing of genomic DNA sequence data. It proposes using computational optimization techniques across three stages of genomic data compression: extraction, storage, and retrieval of data. The model uses gene network embedding to combine gene interaction and expression data to learn lower dimensional representations. It predicts gene functions, reconstructs gene ontologies, and predicts gene interactions. Evaluation shows the model achieves an average precision score of 0.820 for gene interaction prediction on test data.
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET Journal
This document presents a methodology for analyzing gene mutation data using ontologies and association rule mining. It aims to develop a common knowledge base for genomic and proteomic analysis by integrating multiple data sources. The methodology involves using k-nearest neighbors algorithm to find similar genes, an iterative multiplicative updating algorithm to solve optimization problems, and SNCoNMF to identify co-regulatory modules between genes, microRNAs and transcription factors. The results are represented using a Bayesian rose tree for efficient visualization of associations between genetic components and diseases.
ABSTRACT
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
DSAGLSTM-DTA: Prediction of Drug-Target Affinity using Dual Self-Attention an...mlaij
The research on affinity between drugs and targets (DTA) aims to effectively narrow the target search space for drug repurposing. Therefore, reasonable prediction of drug and target affinities can minimize the waste of resources such as human and material resources. In this work, a novel graph-based model called DSAGLSTM-DTA was proposed for DTA prediction. The proposed model is unlike previous graph-based drug-target affinity model, which incorporated self-attention mechanisms in the feature extraction process of drug molecular graphs to fully extract its effective feature representations. The features of each atom in the 2D molecular graph were weighted based on attention score before being aggregated as molecule representation and two distinct pooling architectures, namely centralized and distributed architectures were implemented and compared on benchmark datasets. In addition, in the course of processing protein sequences, inspired by the approach of protein feature extraction in GDGRU-DTA, we continue to interpret protein sequences as time series and extract their features using Bidirectional Long Short-Term Memory (BiLSTM) networks, since the context-dependence of long amino acid sequences. Similarly, DSAGLSTM-DTA also utilized a self-attention mechanism in the process of protein feature extraction to obtain comprehensive representations of proteins, in which the final hidden states for element in the batch were weighted with the each unit output of LSTM, and the results were represented as the final feature of proteins. Eventually, representations of drug and protein were concatenated and fed into prediction block for final prediction. The proposed model was evaluated on different regression datasets and binary classification datasets, and the results demonstrated that DSAGLSTM-DTA was superior to some state-ofthe-art DTA models and exhibited good generalization ability.
Novel modelling of clustering for enhanced classification performance on gene...IJECEIAES
Gene expression data is popularized for its capability to disclose various disease conditions. However, the conventional procedure to extract gene expression data itself incorporates various artifacts that offer challenges in diagnosis a complex disease indication and classification like cancer. Review of existing research approaches indicates that classification approaches are few to proven to be standard with respect to higher accuracy and applicable to gene expression data apart from unaddresed problems of computational complexity. Therefore, the proposed manuscript introduces a novel and simplified model capable using Graph Fourier Transform, Eigen Value and vector for offering better classification performance considering case study of microarray database, which is one typical example of gene expressiondata. The study outcome shows that proposed system offers comparatively better accuracy and reduced computational complexity with the existing clustering approaches.
Deep learning optimization for drug-target interaction prediction in COVID-19...IJECEIAES
The exponentially increasing bioinformatics data raised a new problem: the computation time length. The amount of data that needs to be processed is not matched by an increase in hardware performance, so it burdens researchers on computation time, especially on drug-target interaction prediction, where the computational complexity is exponential. One of the focuses of high-performance computing research is the utilization of the graphics processing unit (GPU) to perform multiple computations in parallel. This study aims to see how well the GPU performs when used for deep learning problems to predict drug-target interactions. This study used the gold-standard data in drug-target interaction (DTI) and the coronavirus disease (COVID-19) dataset. The stages of this research are data acquisition, data preprocessing, model building, hyperparameter tuning, performance evaluation and COVID-19 dataset testing. The results of this study indicate that the use of GPU in deep learning models can speed up the training process by 100 times. In addition, the hyperparameter tuning process is also greatly helped by the presence of the GPU because it can make the process up to 55 times faster. When tested using the COVID-19 dataset, the model showed good performance with 76% accuracy, 74% F-measure and a speed-up value of 179.
Design and development of learning model for compression and processing of d...IJECEIAES
This document describes a deep learning model for efficient compression and processing of genomic DNA sequence data. It proposes using computational optimization techniques across three stages of genomic data compression: extraction, storage, and retrieval of data. The model uses gene network embedding to combine gene interaction and expression data to learn lower dimensional representations. It predicts gene functions, reconstructs gene ontologies, and predicts gene interactions. Evaluation shows the model achieves an average precision score of 0.820 for gene interaction prediction on test data.
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET Journal
This document presents a methodology for analyzing gene mutation data using ontologies and association rule mining. It aims to develop a common knowledge base for genomic and proteomic analysis by integrating multiple data sources. The methodology involves using k-nearest neighbors algorithm to find similar genes, an iterative multiplicative updating algorithm to solve optimization problems, and SNCoNMF to identify co-regulatory modules between genes, microRNAs and transcription factors. The results are represented using a Bayesian rose tree for efficient visualization of associations between genetic components and diseases.
ABSTRACT
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
DSAGLSTM-DTA: Prediction of Drug-Target Affinity using Dual Self-Attention an...mlaij
The research on affinity between drugs and targets (DTA) aims to effectively narrow the target search space for drug repurposing. Therefore, reasonable prediction of drug and target affinities can minimize the waste of resources such as human and material resources. In this work, a novel graph-based model called DSAGLSTM-DTA was proposed for DTA prediction. The proposed model is unlike previous graph-based drug-target affinity model, which incorporated self-attention mechanisms in the feature extraction process of drug molecular graphs to fully extract its effective feature representations. The features of each atom in the 2D molecular graph were weighted based on attention score before being aggregated as molecule representation and two distinct pooling architectures, namely centralized and distributed architectures were implemented and compared on benchmark datasets. In addition, in the course of processing protein sequences, inspired by the approach of protein feature extraction in GDGRU-DTA, we continue to interpret protein sequences as time series and extract their features using Bidirectional Long Short-Term Memory (BiLSTM) networks, since the context-dependence of long amino acid sequences. Similarly, DSAGLSTM-DTA also utilized a self-attention mechanism in the process of protein feature extraction to obtain comprehensive representations of proteins, in which the final hidden states for element in the batch were weighted with the each unit output of LSTM, and the results were represented as the final feature of proteins. Eventually, representations of drug and protein were concatenated and fed into prediction block for final prediction. The proposed model was evaluated on different regression datasets and binary classification datasets, and the results demonstrated that DSAGLSTM-DTA was superior to some state-ofthe-art DTA models and exhibited good generalization ability.
DSAGLSTM-DTA: PREDICTION OF DRUG-TARGET AFFINITY USING DUAL SELF-ATTENTION AN...mlaij
The research on affinity between drugs and targets (DTA) aims to effectively narrow the target search
space for drug repurposing. Therefore, reasonable prediction of drug and target affinities can minimize the
waste of resources such as human and material resources. In this work, a novel graph-based model called
DSAGLSTM-DTA was proposed for DTA prediction. The proposed model is unlike previous graph-based
drug-target affinity model, which incorporated self-attention mechanisms in the feature extraction process
of drug molecular graphs to fully extract its effective feature representations. The features of each atom in
the 2D molecular graph were weighted based on attention score before being aggregated as molecule
representation and two distinct pooling architectures, namely centralized and distributed architectures
were implemented and compared on benchmark datasets. In addition, in the course of processing protein
sequences, inspired by the approach of protein feature extraction in GDGRU-DTA, we continue to
interpret protein sequences as time series and extract their features using Bidirectional Long Short-Term
Memory (BiLSTM) networks, since the context-dependence of long amino acid sequences. Similarly,
DSAGLSTM-DTA also utilized a self-attention mechanism in the process of protein feature extraction to
obtain comprehensive representations of proteins, in which the final hidden states for element in the batch
were weighted with the each unit output of LSTM, and the results were represented as the final feature of
proteins. Eventually, representations of drug and protein were concatenated and fed into prediction block
for final prediction. The proposed model was evaluated on different regression datasets and binary
classification datasets, and the results demonstrated that DSAGLSTM-DTA was superior to some state-ofthe-art DTA models and exhibited good generalization ability.
This research article proposes a deep transfer learning technique to classify plant diseases in rice images. The researchers conducted a literature review of methods for detecting rice plant diseases from the past decade. They analyzed different classifiers and strategies. The article then proposes a convolutional neural network model for detecting rice diseases. Finally, the existing approaches are compared based on their accuracy.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
Multiple Features Based Two-stage Hybrid Classifier Ensembles for Subcellular...CSCJournals
Subcellular localization is a key functional characteristic of proteins. As an interesting ``bio-image informatics\'\' application, an automatic, reliable and efficient prediction system for protein subcellular localization can be used for establishing knowledge of the spatial distribution of proteins within living cells and permits to screen systems for drug discovery or for early diagnosis of a disease. In this paper, we propose a two-stage multiple classifier system to improve classification reliability by introducing rejection option. The system is built as a cascade of two classifier ensembles. The first ensemble consists of set of binary SVMs which generalizes to learn a general classification rule and the second ensemble, which also include three distinct classifiers, focus on the exceptions rejected by the rule. A new way to induce diversity for the classifier ensembles is proposed by designing classifiers that are based on descriptions of different feature patterns. In addition to the Subcellular Location Features (SLF) generally adopted in earlier researches, three well-known texture feature descriptions have been applied to cell phenotype images, which are the local binary patterns (LBP), Gabor filtering and Gray Level Coocurrence Matrix (GLCM). The different texture feature sets can provide sufficient diversity among base classifiers, which is known as a necessary condition for improvement in ensemble performance. Using the public benchmark 2D HeLa cell images, a high classification accuracy 96% is obtained with rejection rate $21\\%$ from the proposed system by taking advantages of the complementary strengths of feature construction and majority-voting based classifiers\' decision fusions.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Application of Hybrid Genetic Algorithm Using Artificial Neural Network in Da...IOSRjournaljce
The main purpose of data mining is to extract knowledge from large amount of data. Artificial Neural network (ANN) has already been applied in a variety of domains with remarkable success. This paper presents the application of hybrid model for stroke disease that integrates Genetic algorithm and back propagation algorithm. Selecting a good subset of features, without sacrificing accuracy, is of great importance for neural networks to be successfully applied to the area. In addition the hybrid model that leads to further improvised categorization, accuracy compared to the result produced by genetic algorithm alone. In this study, a new hybrid model of Neural Networks and Genetic Algorithm (GA) to initialize and optimize the connection weights of ANN so as to improve the performance of the ANN and the same has been applied in a medical problem of predicting stroke disease for verification of the results.
This document describes a study that uses machine learning algorithms to efficiently predict DNA-binding proteins. Support vector machines and cascade correlation neural networks are optimized and compared to determine the best performing model. The SVM model achieves 86.7% accuracy at predicting DNA-binding proteins using features like overall charge, patch size, and amino acid composition of proteins. The CCNN model achieves lower accuracy of 75.4%. The study aims to improve on previous work by using the standard jack-knife validation technique to evaluate model performance on unseen data.
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...IJECEIAES
In many diseases classification an accurate gene analysis is needed, for which selection of most informative genes is very important and it require a technique of decision in complex context of ambiguity. The traditional methods include for selecting most significant gene includes some of the statistical analysis namely 2-Sample-T-test (2STT), Entropy, Signal to Noise Ratio (SNR). This paper evaluates gene selection and classification on the basis of accurate gene selection using structured complex decision technique (SCDT) and classifies it using fuzzy cluster based nearest neighborclassifier (FC-NNC). The effectiveness of the proposed SCDT and FC-NNC is evaluated for leave one out cross validation metric(LOOCV) along with sensitivity, specificity, precision and F1-score with four different classifiers namely 1) Radial Basis Function (RBF), 2) Multi-layer perception(MLP), 3) Feed Forward(FF) and 4) Support vector machine(SVM) for three different datasets of DLBCL, Leukemia and Prostate tumor. The proposed SCDT &FC-NNC exhibits superior result for being considered more accurate decision mechanism.
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
In post genomic era with the advent of new technologies a huge amount of complex molecular data are
generated with high throughput. The management of this biological data is definitely a challenging task
due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its
inventory success. Discovering new knowledge from the biological data is a major challenge in data
mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify
the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram
encoding method, neural network model and rough set classifier in a single model and in an appropriate
place can enhance the quality of the classification system .Thus the primary challenge is to identify and
classify the large protein sequences in a very fast and easy but intellectual way to decrease the time
complexity and space complexity.
ieee projects download, base paper for ieee projects, ieee projects list, ieee projects titles, ieee projects for cse, ieee projects on networking,ieee projects 2012, ieee projects 2013, final year project, computer science final year projects, final year projects for information technology, ieee final year projects, final year students projects, students projects in java, students projects download, students projects in java with source code, students projects architecture, free ieee papers
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...CSCJournals
Logistic Regression (LR) is a well known classification method in the field of statistical learning. It allows probabilistic classification and shows promising results on several benchmark problems. Logistic regression enables us to investigate the relationship between a categorical outcome and a set of explanatory variables. Artificial Neural Networks (ANNs) are popularly used as universal non-linear inference models and have gained extensive popularity in recent years. Research activities are considerable and literature is growing. The goal of this research work is to compare the performance of Logistic Regression and Neural Network models on publicly available medical datasets. The evaluation process of the model is as follows. The logistic regression and neural network methods with sensitivity analysis have been evaluated for the effectiveness of the classification. The Classification Accuracy is used to measure the performance of both the models. From the experimental results it is confirmed that the neural network model with sensitivity analysis model gives more efficient result.
Poster presented at the Thirty-Second International Joint Conference on Artificial Intelligence, 2023, Macao, SAR. https://doi.org/10.24963/ijcai.2023/554
Guava fruit disease identification based on improved convolutional neural net...IJECEIAES
Guava fruit cultivation is crucial for Asian economic development, with Indonesia producing 449,970 metric tons between 2022 and 2023. However, technology-based approaches can detect disease symptoms, enhancing production and mitigating economic losses by enhancing quality. In this paper, we introduce an accurate guava fruit disease detection (GFDI) system. It contains the generation of appropriate diseased images and the development of a novel improved convolutional neural network (improvedCNN) that is built depending on the principles of AlexNet. Also, several preprocessing techniques have been used, including data augmentation, contrast enhancement, image resizing, and dataset splitting. The proposed improved-CNN model is trained to identify three common guava fruit diseases using a dataset of 612 images. The experimental findings indicate that the proposed improved-CNN model achieve accuracy 98% for trains and 93% for tests using 0.001 learning rate, the model parameters are decreased by 50,106,831 compared with traditional AlexNet model. The findings of the investigation indicate that the deep learning model improves the accuracy and convergence rate for guava fruit disease prevention.
Classification of pathologies on digital chest radiographs using machine lear...IJECEIAES
This article is devoted to the research and development of methods for classifying pathologies on digital chest radiographs using two different machine learning approaches: the eXtreme gradient boosting (XGBoost) algorithm and the deep convolutional neural network residual network (ResNet50). The goal of the study is to develop effective and accurate methods for automatically classifying various pathologies detected on chest X-rays. The study collected an extensive dataset of digital chest radiographs, including a variety of clinical cases and different classes of pathology. Developed and trained machine learning models based on the XGBoost algorithm and the ResNet50 convolutional neural network using preprocessed images. The performance and accuracy of both models were assessed on test data using quality metrics and a comparative analysis of the results was carried out. The expected results of the article are high accuracy and reliability of methods for classifying pathologies on chest radiographs, as well as an understanding of their effectiveness in the context of clinical practice. These results may have significant implications for improving the diagnosis and care of patients with chest diseases, as well as promoting the development of automated decision support systems in radiology.
Network embedding in biomedical data scienceArindam Ghosh
Excerpts from the paper:
What is it?
Network embedding aims at converting the network into a low-dimensional space while structural information of the network is preserved.
In this way, nodes and/or edges of the network can be represented as compacted yet informative vectors in the embedding space.
Advantages:
Typical non-network-based machine learning methods such as linear regression, Support Vector Machine (SVM) and decision forest, which have been demonstrated to be effective and efficient as the state-of-the-art techniques, can be applied to such vectors.
Current status:
Efforts of applying network embedding to improve biomedical data analysis are already planned or underway.
Difficulties:
The biomedical networks are sparse, noisy, incomplete, heterogeneous and usually consist of biomedical text and other domain knowledge. It makes embedding tasks more complicated than other application fields.
This document describes two challenges presented as part of the DREAM initiative to evaluate methods for parameter estimation and network topology inference from experimental data. In the first challenge, participants were given the topology of a 9-gene network and asked to estimate 45 kinetic parameters. In the second challenge, participants were given an incomplete 11-gene network and asked to identify 3 missing links and associated parameters. Participants could purchase simulated experimental data using a credit system, allowing iterative experimental design. While parameter estimation was accomplished well using fluorescence data, topology inference was more difficult. Aggregating submissions produced better solutions than individual methods.
Praharshit Sharma is a bioinformatician currently working as a project fellow at CSIR-Centre for Cellular and Molecular Biology in Hyderabad, India studying microRNA-mRNA interactions in neurodegenerative diseases. He has work experience in bioinformatics industrial training and research internships. Praharshit has an MSc in Bioinformatics and PG Diploma in Bioinformatics. He has skills in Linux, programming languages, bioinformatics tools and databases. Praharshit has publications and presentations in bioinformatics and has done statistical analysis projects correlating genetic code, protein structure and crystallographic data.
introduction,history scope and applications of
relation to other fields , bioinformatics,biological databases,computers internet,sequence development, and
introduction to sequence development and alignment
A brief study on rice diseases recognition and image classification: fusion d...IJECEIAES
In the regions of southern Andhra Pradesh, rice brown spot, rice blast, and rice sheath blight have emerged as the most prevalent diseases. The goal of this research is to increase the precision and effectiveness of disease diagnosis by proposing a framework for the automated recognition and classification of rice diseases. Therefore, this work proposes a hybrid approach with multiple stages. Initially, the region of interest (ROI) is extracted from the dataset and test images. Then, the multiple features are extracted, such as color-moment-based features, grey-level cooccurrence matrix (GLCM)-based texture, and shape features. Then, the S-particle swarm optimization (SPSO) model selects the best features from the extracted features. Moreover, the deep belief network (DBN) model trained by SPSO is based on optimal features, which classify the different types of rice diseases. The SPSO algorithm also optimized the losses generated in the DBN model. The suggested model achieves a hit rate of 94.85% and an accuracy of 97.48% with the 10-fold cross-validation approach. The traditional machine learning (ML) model is significantly less accurate than the area under the receiver operating characteristic curve (AUC), which has an accuracy of 97.48%.
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...gerogepatton
Parkinson’s disease is a complex chronic neurodegenerative disorder of the central nervous system. One of the common symptoms for the Parkinson’s disease subjects, is vocal performance degradation. Patients usually advised to follow personalized rehabilitative treatment sessions with speech experts. Recent research trends aim to investigate the potential of using sustained vowel phonations for replicating the speech experts’ assessments of Parkinson’s disease subjects’ voices. With the purpose of improving the accuracy and efficiency of Parkinson’s disease treatment, this article proposes a two-stage diagnosis model to evaluate an LSVT dataset. Firstly, we propose a modified minimum Redundancy-Maximum Relevance (mRMR) feature selection approach, based on Cuckoo Search and Tabu Search to reduce the features numbers. Secondly, we apply simple random sampling technique to dataset to increase the samples of the minority class. Promisingly, the developed approach obtained a classification Accuracy rate of 95% with 24 features by 10-fold CV method.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
More Related Content
Similar to A novel optimized deep learning method for protein-protein prediction in bioinformatics
DSAGLSTM-DTA: PREDICTION OF DRUG-TARGET AFFINITY USING DUAL SELF-ATTENTION AN...mlaij
The research on affinity between drugs and targets (DTA) aims to effectively narrow the target search
space for drug repurposing. Therefore, reasonable prediction of drug and target affinities can minimize the
waste of resources such as human and material resources. In this work, a novel graph-based model called
DSAGLSTM-DTA was proposed for DTA prediction. The proposed model is unlike previous graph-based
drug-target affinity model, which incorporated self-attention mechanisms in the feature extraction process
of drug molecular graphs to fully extract its effective feature representations. The features of each atom in
the 2D molecular graph were weighted based on attention score before being aggregated as molecule
representation and two distinct pooling architectures, namely centralized and distributed architectures
were implemented and compared on benchmark datasets. In addition, in the course of processing protein
sequences, inspired by the approach of protein feature extraction in GDGRU-DTA, we continue to
interpret protein sequences as time series and extract their features using Bidirectional Long Short-Term
Memory (BiLSTM) networks, since the context-dependence of long amino acid sequences. Similarly,
DSAGLSTM-DTA also utilized a self-attention mechanism in the process of protein feature extraction to
obtain comprehensive representations of proteins, in which the final hidden states for element in the batch
were weighted with the each unit output of LSTM, and the results were represented as the final feature of
proteins. Eventually, representations of drug and protein were concatenated and fed into prediction block
for final prediction. The proposed model was evaluated on different regression datasets and binary
classification datasets, and the results demonstrated that DSAGLSTM-DTA was superior to some state-ofthe-art DTA models and exhibited good generalization ability.
This research article proposes a deep transfer learning technique to classify plant diseases in rice images. The researchers conducted a literature review of methods for detecting rice plant diseases from the past decade. They analyzed different classifiers and strategies. The article then proposes a convolutional neural network model for detecting rice diseases. Finally, the existing approaches are compared based on their accuracy.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
Multiple Features Based Two-stage Hybrid Classifier Ensembles for Subcellular...CSCJournals
Subcellular localization is a key functional characteristic of proteins. As an interesting ``bio-image informatics\'\' application, an automatic, reliable and efficient prediction system for protein subcellular localization can be used for establishing knowledge of the spatial distribution of proteins within living cells and permits to screen systems for drug discovery or for early diagnosis of a disease. In this paper, we propose a two-stage multiple classifier system to improve classification reliability by introducing rejection option. The system is built as a cascade of two classifier ensembles. The first ensemble consists of set of binary SVMs which generalizes to learn a general classification rule and the second ensemble, which also include three distinct classifiers, focus on the exceptions rejected by the rule. A new way to induce diversity for the classifier ensembles is proposed by designing classifiers that are based on descriptions of different feature patterns. In addition to the Subcellular Location Features (SLF) generally adopted in earlier researches, three well-known texture feature descriptions have been applied to cell phenotype images, which are the local binary patterns (LBP), Gabor filtering and Gray Level Coocurrence Matrix (GLCM). The different texture feature sets can provide sufficient diversity among base classifiers, which is known as a necessary condition for improvement in ensemble performance. Using the public benchmark 2D HeLa cell images, a high classification accuracy 96% is obtained with rejection rate $21\\%$ from the proposed system by taking advantages of the complementary strengths of feature construction and majority-voting based classifiers\' decision fusions.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Application of Hybrid Genetic Algorithm Using Artificial Neural Network in Da...IOSRjournaljce
The main purpose of data mining is to extract knowledge from large amount of data. Artificial Neural network (ANN) has already been applied in a variety of domains with remarkable success. This paper presents the application of hybrid model for stroke disease that integrates Genetic algorithm and back propagation algorithm. Selecting a good subset of features, without sacrificing accuracy, is of great importance for neural networks to be successfully applied to the area. In addition the hybrid model that leads to further improvised categorization, accuracy compared to the result produced by genetic algorithm alone. In this study, a new hybrid model of Neural Networks and Genetic Algorithm (GA) to initialize and optimize the connection weights of ANN so as to improve the performance of the ANN and the same has been applied in a medical problem of predicting stroke disease for verification of the results.
This document describes a study that uses machine learning algorithms to efficiently predict DNA-binding proteins. Support vector machines and cascade correlation neural networks are optimized and compared to determine the best performing model. The SVM model achieves 86.7% accuracy at predicting DNA-binding proteins using features like overall charge, patch size, and amino acid composition of proteins. The CCNN model achieves lower accuracy of 75.4%. The study aims to improve on previous work by using the standard jack-knife validation technique to evaluate model performance on unseen data.
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...IJECEIAES
In many diseases classification an accurate gene analysis is needed, for which selection of most informative genes is very important and it require a technique of decision in complex context of ambiguity. The traditional methods include for selecting most significant gene includes some of the statistical analysis namely 2-Sample-T-test (2STT), Entropy, Signal to Noise Ratio (SNR). This paper evaluates gene selection and classification on the basis of accurate gene selection using structured complex decision technique (SCDT) and classifies it using fuzzy cluster based nearest neighborclassifier (FC-NNC). The effectiveness of the proposed SCDT and FC-NNC is evaluated for leave one out cross validation metric(LOOCV) along with sensitivity, specificity, precision and F1-score with four different classifiers namely 1) Radial Basis Function (RBF), 2) Multi-layer perception(MLP), 3) Feed Forward(FF) and 4) Support vector machine(SVM) for three different datasets of DLBCL, Leukemia and Prostate tumor. The proposed SCDT &FC-NNC exhibits superior result for being considered more accurate decision mechanism.
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
In post genomic era with the advent of new technologies a huge amount of complex molecular data are
generated with high throughput. The management of this biological data is definitely a challenging task
due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its
inventory success. Discovering new knowledge from the biological data is a major challenge in data
mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify
the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram
encoding method, neural network model and rough set classifier in a single model and in an appropriate
place can enhance the quality of the classification system .Thus the primary challenge is to identify and
classify the large protein sequences in a very fast and easy but intellectual way to decrease the time
complexity and space complexity.
ieee projects download, base paper for ieee projects, ieee projects list, ieee projects titles, ieee projects for cse, ieee projects on networking,ieee projects 2012, ieee projects 2013, final year project, computer science final year projects, final year projects for information technology, ieee final year projects, final year students projects, students projects in java, students projects download, students projects in java with source code, students projects architecture, free ieee papers
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...CSCJournals
Logistic Regression (LR) is a well known classification method in the field of statistical learning. It allows probabilistic classification and shows promising results on several benchmark problems. Logistic regression enables us to investigate the relationship between a categorical outcome and a set of explanatory variables. Artificial Neural Networks (ANNs) are popularly used as universal non-linear inference models and have gained extensive popularity in recent years. Research activities are considerable and literature is growing. The goal of this research work is to compare the performance of Logistic Regression and Neural Network models on publicly available medical datasets. The evaluation process of the model is as follows. The logistic regression and neural network methods with sensitivity analysis have been evaluated for the effectiveness of the classification. The Classification Accuracy is used to measure the performance of both the models. From the experimental results it is confirmed that the neural network model with sensitivity analysis model gives more efficient result.
Poster presented at the Thirty-Second International Joint Conference on Artificial Intelligence, 2023, Macao, SAR. https://doi.org/10.24963/ijcai.2023/554
Guava fruit disease identification based on improved convolutional neural net...IJECEIAES
Guava fruit cultivation is crucial for Asian economic development, with Indonesia producing 449,970 metric tons between 2022 and 2023. However, technology-based approaches can detect disease symptoms, enhancing production and mitigating economic losses by enhancing quality. In this paper, we introduce an accurate guava fruit disease detection (GFDI) system. It contains the generation of appropriate diseased images and the development of a novel improved convolutional neural network (improvedCNN) that is built depending on the principles of AlexNet. Also, several preprocessing techniques have been used, including data augmentation, contrast enhancement, image resizing, and dataset splitting. The proposed improved-CNN model is trained to identify three common guava fruit diseases using a dataset of 612 images. The experimental findings indicate that the proposed improved-CNN model achieve accuracy 98% for trains and 93% for tests using 0.001 learning rate, the model parameters are decreased by 50,106,831 compared with traditional AlexNet model. The findings of the investigation indicate that the deep learning model improves the accuracy and convergence rate for guava fruit disease prevention.
Classification of pathologies on digital chest radiographs using machine lear...IJECEIAES
This article is devoted to the research and development of methods for classifying pathologies on digital chest radiographs using two different machine learning approaches: the eXtreme gradient boosting (XGBoost) algorithm and the deep convolutional neural network residual network (ResNet50). The goal of the study is to develop effective and accurate methods for automatically classifying various pathologies detected on chest X-rays. The study collected an extensive dataset of digital chest radiographs, including a variety of clinical cases and different classes of pathology. Developed and trained machine learning models based on the XGBoost algorithm and the ResNet50 convolutional neural network using preprocessed images. The performance and accuracy of both models were assessed on test data using quality metrics and a comparative analysis of the results was carried out. The expected results of the article are high accuracy and reliability of methods for classifying pathologies on chest radiographs, as well as an understanding of their effectiveness in the context of clinical practice. These results may have significant implications for improving the diagnosis and care of patients with chest diseases, as well as promoting the development of automated decision support systems in radiology.
Network embedding in biomedical data scienceArindam Ghosh
Excerpts from the paper:
What is it?
Network embedding aims at converting the network into a low-dimensional space while structural information of the network is preserved.
In this way, nodes and/or edges of the network can be represented as compacted yet informative vectors in the embedding space.
Advantages:
Typical non-network-based machine learning methods such as linear regression, Support Vector Machine (SVM) and decision forest, which have been demonstrated to be effective and efficient as the state-of-the-art techniques, can be applied to such vectors.
Current status:
Efforts of applying network embedding to improve biomedical data analysis are already planned or underway.
Difficulties:
The biomedical networks are sparse, noisy, incomplete, heterogeneous and usually consist of biomedical text and other domain knowledge. It makes embedding tasks more complicated than other application fields.
This document describes two challenges presented as part of the DREAM initiative to evaluate methods for parameter estimation and network topology inference from experimental data. In the first challenge, participants were given the topology of a 9-gene network and asked to estimate 45 kinetic parameters. In the second challenge, participants were given an incomplete 11-gene network and asked to identify 3 missing links and associated parameters. Participants could purchase simulated experimental data using a credit system, allowing iterative experimental design. While parameter estimation was accomplished well using fluorescence data, topology inference was more difficult. Aggregating submissions produced better solutions than individual methods.
Praharshit Sharma is a bioinformatician currently working as a project fellow at CSIR-Centre for Cellular and Molecular Biology in Hyderabad, India studying microRNA-mRNA interactions in neurodegenerative diseases. He has work experience in bioinformatics industrial training and research internships. Praharshit has an MSc in Bioinformatics and PG Diploma in Bioinformatics. He has skills in Linux, programming languages, bioinformatics tools and databases. Praharshit has publications and presentations in bioinformatics and has done statistical analysis projects correlating genetic code, protein structure and crystallographic data.
introduction,history scope and applications of
relation to other fields , bioinformatics,biological databases,computers internet,sequence development, and
introduction to sequence development and alignment
A brief study on rice diseases recognition and image classification: fusion d...IJECEIAES
In the regions of southern Andhra Pradesh, rice brown spot, rice blast, and rice sheath blight have emerged as the most prevalent diseases. The goal of this research is to increase the precision and effectiveness of disease diagnosis by proposing a framework for the automated recognition and classification of rice diseases. Therefore, this work proposes a hybrid approach with multiple stages. Initially, the region of interest (ROI) is extracted from the dataset and test images. Then, the multiple features are extracted, such as color-moment-based features, grey-level cooccurrence matrix (GLCM)-based texture, and shape features. Then, the S-particle swarm optimization (SPSO) model selects the best features from the extracted features. Moreover, the deep belief network (DBN) model trained by SPSO is based on optimal features, which classify the different types of rice diseases. The SPSO algorithm also optimized the losses generated in the DBN model. The suggested model achieves a hit rate of 94.85% and an accuracy of 97.48% with the 10-fold cross-validation approach. The traditional machine learning (ML) model is significantly less accurate than the area under the receiver operating characteristic curve (AUC), which has an accuracy of 97.48%.
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...gerogepatton
Parkinson’s disease is a complex chronic neurodegenerative disorder of the central nervous system. One of the common symptoms for the Parkinson’s disease subjects, is vocal performance degradation. Patients usually advised to follow personalized rehabilitative treatment sessions with speech experts. Recent research trends aim to investigate the potential of using sustained vowel phonations for replicating the speech experts’ assessments of Parkinson’s disease subjects’ voices. With the purpose of improving the accuracy and efficiency of Parkinson’s disease treatment, this article proposes a two-stage diagnosis model to evaluate an LSVT dataset. Firstly, we propose a modified minimum Redundancy-Maximum Relevance (mRMR) feature selection approach, based on Cuckoo Search and Tabu Search to reduce the features numbers. Secondly, we apply simple random sampling technique to dataset to increase the samples of the minority class. Promisingly, the developed approach obtained a classification Accuracy rate of 95% with 24 features by 10-fold CV method.
Similar to A novel optimized deep learning method for protein-protein prediction in bioinformatics (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Design and optimization of ion propulsion dronebjmsejournal
Electric propulsion technology is widely used in many kinds of vehicles in recent years, and aircrafts are no exception. Technically, UAVs are electrically propelled but tend to produce a significant amount of noise and vibrations. Ion propulsion technology for drones is a potential solution to this problem. Ion propulsion technology is proven to be feasible in the earth’s atmosphere. The study presented in this article shows the design of EHD thrusters and power supply for ion propulsion drones along with performance optimization of high-voltage power supply for endurance in earth’s atmosphere.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
A novel optimized deep learning method for protein-protein prediction in bioinformatics
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 14, No. 1, February 2024, pp. 749~758
ISSN: 2088-8708, DOI: 10.11591/ijece.v14i1.pp749-758 749
Journal homepage: http://ijece.iaescore.com
A novel optimized deep learning method for protein-protein
prediction in bioinformatics
Preeti Thareja, Rajender Singh Chillar
Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, India
Article Info ABSTRACT
Article history:
Received Jun 9, 2023
Revised Sep 14, 2023
Accepted Oct 9, 2023
Proteins have been shown to perform critical activities in cellular processes
and are required for the organism's existence and proliferation. On
complicated protein-protein interaction (PPI) networks, conventional
centrality approaches perform poorly. Machine learning algorithms based on
enormous amounts of data do not make use of biological information's
temporal and spatial dimensions. As a result, we developed a sequence-
dependent PPI prediction model using an Aquila and shark noses-based hybrid
prediction technique. This model operates in two stages: feature extraction
and prediction. The features are acquired using the semantic similarity
technique for good results. The acquired features are utilized to predict the
PPI using hybrid deep networks long short-term memory (LSTM) networks
and restricted Boltzmann machines (RBMs). The weighting parameters of
these neural networks (NNs) were changed using a novel optimization
approach hybrid of aquila and shark noses (ASN), and the results revealed that
our proposed ASN-based PPI prediction is more accurate and efficient than
other existing techniques.
Keywords:
Aquila optimization
Long short-term memory
Protein-protein interaction
prediction
Restricted Boltzmann machines
Semantic similarity
Shark nose optimization
This is an open access article under the CC BY-SA license.
Corresponding Author:
Preeti Thareja
Department of Computer Science and Applications, Maharshi Dayanand University
Rohtak, Haryana, India
Email: preetithareja10@gmail.com
1. INTRODUCTION
Protein-protein interactions (PPIs) can be utilized to look into the mechanisms underlying many
biological processes, such as deoxyribonucleic acid (DNA) replication, protein modification, and signal
transmission. Due to their accurate understanding and analysis, which can reveal numerous roles at the
molecular and proteome levels, PPIs have been a research focus [1], [2]. On the other hand, there are problems
with incomplete and imprecise prediction using web-lab identification methods [3], [4]. Alternately, low-cost
candidates for future experimental validation could be obtained by applying precise bioinformatics methods
for PPI prediction [5], [6].
Using advanced techniques to calculate PPI is not only laborious and costly, but it also produces an
excessive number of false positives and false negatives [7], [8]. As a result, computational tools that can aid in
the process of discovering genuine protein interactions are required. This problem can be viewed as a
categorical classifying problem from the standpoint of machine learning, and it can be tackled using supervised
learning methods [9], [10]. With the accelerated growth of deep learning techniques and neural network
infrastructure, certain machine intelligence-based and sequence-based models for PPI prediction have been
developed. Table 1 shows a summary of the state-of-art-methods.
Li et al. [11] proposed DeepCellEss which is a methodology for easy-to-interpret deep learning (DL)
based on sequences and cell line-specific key protein predictions. To extract minute and prolonged-range
2. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 749-758
750
hidden features from protein sequences, DeepCellEss uses a convolutional network and bidirectional long
short-term memory (LSTM). Additionally, to enable the residue-level point process, a multi-head self-attention
technique is adopted. Numerous computer studies show that DeepCellEss beats previous sequence-based
approaches as well as network-based clear implications and provides effective prediction results for distinct
cell lines.
Hou et al. [12] created a method for identifying PPI sites based on an ensemble DL model called
ensemble deep learning method for protein-protein interaction (EDLMPPI). This would aid in solving the issue
of modeling the properties of amino acid (AA) sequences for PPI bindings by directly encoding them into
distributed vector representations. Additionally, their performance could stand to be improved when AA
sequences are directly encoded into distributed vector model to categorize PPI binding events because the
experiment numbers for detected PPI sites are significantly less than the number of PPIs or protein domains in
protein complexes.
Gao et al. [13] offered the HIGH-PPI two-side learning hierarchical graph network to forecast PPIs
and deduce the relevant chemical information. A vertex in the graph (top outer view) is a protein graph in this
model's hierarchical graph (bottom inside-of-protein view). To effectively depict the quality of support of the
protein, a set of chemically pertinent descriptors rather than protein sequences are used in the bottom view. To
create a solid machine understanding of PPIs, HIGH-PPI investigates the human interactome's inside and
outside of protein components. In terms of forecasting PPIs, this model has good accuracy and durability.
Yue et al. [14] introduced a deep learning framework for identifying important proteins. Their
research focused on three main objectives: investigating the significance of each element's value in model
prediction, improving the handling of unbalanced datasets, and assessing the model's accuracy in predicting
important proteins. They used node2vec for feature representation and depth-wise separable convolution for
gene expression profiles. Results on Saccharomyces cerevisiae (S. cerevisiae) data demonstrated their model's
superiority over traditional deep learning methods.
In their 2022 study, Díaz-Eufracio and Medina-Franco [15] developed ensemble models utilizing
support vector machine (SVM), logistic regression (LR), and random forest (RF) algorithms, employing an
extended connectivity fingerprint radius of 2. Their primary objective was to validate newly generated PPI
inhibitors from apothecary sources. The significance of their research lies in the predictive models they have
created, which will empower future initiatives in designing PPI inhibitors to make informed, data-driven
choices.
Table 1. Literature review on traditional models
Reference Techniques used Features Issues Dataset used
[11] CNN,
bidirectional long
short-term
memory neural
network
(BiLSTM)
Attention scores play a vital role in enabling
the identification of essential sequence
regions for predicting outcomes specific to
various cell lines. They facilitate in-depth
research and comparisons for critical cell
line-specific proteins.
Does not reflect the
relationships between several
cell lines within the same tissue
or cancer type.
Nucleotide,
Protein
Sequences
[12] BiLSTM and
capsule network
Work directly with AA sequences. Need to add more dynamic
word embedding models to the
model and modify them to
address further pertinent
protein-identifying issues.
Dset_448,
Dset_72, and
Dset_164
[13] Graph
convolutional
network
Its capacity to recognize residue significance
for PPI is a positive sign of great
interpretability.
Protein-level annotations
weren't fully explored, and
memory needs increase with
more views in a hierarchical
graph.
PPI
Sequences
[14] 1D convolution On gene expression profiles, the notion of
depth wise separable convolution is applied
to extract attributes.
Using a long vector to
represent subcellular
localization demands
significant processing
resources.
S. cerevisiae
[15] RF, LR, SVM Assess ML models for classifying new
inhibitors by chemists and maintain the PPI
inhibitors database regularly.
For classification, new models
and challenges can be applied.
PPI Inhibitors
Deep learning (DL) methodologies, as documented in references [16], [17], encompass a range of
techniques, including support vector machines (SVM) [18], artificial neural networks (ANN) [19], and others.
These approaches offer indispensable tools for the secure prediction of PPI by extracting essential peptide
information from amino acid sequences [20]. This research demonstrates that deep learning frameworks [21]
3. Int J Elec & Comp Eng ISSN: 2088-8708
A novel optimized deep learning method for protein-protein prediction in bioinformatics (Preeti Thareja)
751
excel at handling vast, unstructured datasets with intricate characteristics, thereby enhancing the
comprehension of pivotal elements in PPI prediction [22], [23]. Consequently, a novel deep learning concept
based on artificial neural networks, combined with a meticulous hyperparameter tuning strategy, has been
devised to facilitate precise and dependable PPI predictions.
This study offers significant contributions in three main areas. Firstly, the feature extraction process
has been improved by incorporating a semantic similarity-based feature alongside other features, resulting in
more accurate outcomes. Secondly, an approach combining LSTM and restricted Boltzmann machines (RBMs)
has been devised to ensure precise predictions and minimize loss. Lastly, a novel optimization technique named
Aquila and Shark nose optimization has been introduced to fine-tune the weights of both classifiers, thereby
enhancing the efficiency of PPI prediction. The structure of this article is organized as follows: section 2
discusses our proposed sequence-dependent ASN PPI prediction technique; section 3 presents the experimental
results; section 4 concludes the paper; and the subsequent section includes references.
2. METHOD
Proteins are macromolecules that are organic and composed of AAs that are required by cells to
support living activities [24]. They are significant in biology because they connect different important
physiological functions of cells to PPIs, allowing a variety of life processes such as apoptotic and
immunological responses [25]. The suggested ASN technique for predicting interactions from protein
sequences is described in this section. Figure 1 depicts the architecture. Our method for predicting PPIs is
comprised of two steps: i) To aid in reliable prediction, features are gathered using a standard sequence-
dependent and semantic similarity approach; and ii) LSTM and RBMs are employed to execute protein
interaction prediction tasks. The new ensemble Aquila and Shark Nose are applied at this step to produce more
dependable findings, with weighting parameters optimized. Finally, the prediction model uses feature
extraction, ensemble deep learning, and the best parameters to predict protein interactions.
Figure 1. Proposed method ASN PPI prediction model
2.1. Feature extraction
The provided inputs were employed to extract two distinct types of characteristics [26]. These
characteristics consist of one set based on sequence-based physical-chemical attributes and another set based
on semantic similarity features. To understand the feature extraction process in detail, a comprehensive
description is provided below.
2.1.1. Sequence-based physical-chemical features
To establish a strong foundation for predicting PPI, proteins have been thoroughly characterized using
a comprehensive set of 12 physical and chemical attributes. These attributes are derived from the constituent
amino acids of the proteins and include hydrophilicity, adaptability, accessibility, torsion, external surface,
polarizability, antigenic propensity, hydrophobicity, net charge of side chains, polarity, solvent-accessible
surface area, and side-chain volume. Notably, among these attributes, hydrophobicity and polarity were
assessed using two distinct measurement methods, as detailed in reference [27], which documented the values
of 14 different physical and chemical characteristic scales for the 20 essential amino acids
In this approach, each AA is transformed into a matrix consisting of 14 numerical data points,
corresponding to the various physicochemical scale ratings. Since proteins exhibit variations in length, this
transformation can result in a variable number of vectors, making it challenging to process uniformly. To
address this issue and provide a consistent input for the ensemble meta-base learner's classifier, a conversion
method is employed. This method transforms the protein descriptions into an even matrix format utilizing auto
4. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 749-758
752
covariance (AC). This transformation ensures that all proteins, regardless of their varying amino acid content,
are represented by matrices of the same length.
The lth
physicochemical property scale’s auto covariance 𝐴𝐶𝑙,𝑔 is provided by (1) and (2):
𝐴𝐶𝑙,𝑔 =
1
𝐿−𝑔
∑ (𝑃𝑙,𝑚 − 𝛾𝑙) × (𝑃𝑙,𝑚+𝑔 − 𝛾𝑙)
𝐿−𝑔
𝑚=1 (1)
𝛾𝑙 =
1
𝐿
∑ 𝑃𝑙,𝑚
𝐿
𝑚=1 (2)
where 𝑔 denotes the predefined gap, 𝐿 indicates the protein 𝑃's length, while 𝛾𝑙 indicates the mean of protein
𝑃's 𝑙th
physicochemical scale values. By fixing the greatest spacing to 𝐺(𝑔 = 1, 2, … , 𝐺), every protein may be
initialized of 𝑘 × 𝐺 elements, where k seems to be the physicochemical property scales count.
2.1.2. Semantic similarity-based feature extraction
The semantic similarity identification technique has been utilized to compute the degree of similarity.
To compute the resemblance, every source is represented as a vector. In particular, the vector model faces
issues such as word impropriety (e.g., disregard synonymy) as well as lacking semantic data. The point word
recognition (PWR) technique incorporates semantic meaning into the vector model, hence removing vector
semantic problems. Throughout this application, the species sensitivity distributions (SSD) approach is
primarily concerned with determining the associations between every pair of resources by using a cosine
similarity metric. Syntax, as well as semantic similarity, are integrated into SSD cosine similarity as can be
expressed with (3).
𝑆𝑒𝑚𝑠𝑖𝑚(𝑅𝑎, 𝑅𝑏) =
𝑅𝑏.𝑅𝑎
|𝑅𝑏|.|𝑅𝑎|
=
∑ (𝜔𝑏∗𝑆𝑅.𝜔𝑎)
𝑚
𝑎=1
√∑ (𝜔𝑏
𝑎
∗𝑆𝑅)
2
𝑚
𝑎=1 .∑ 𝜔𝑎
2
𝑚
𝑎=1
(3)
2.2. Optimal trained hybrid classifier
The extracted features are subjected to the prediction model where a hybrid model that combines
improved LSTM and the RBMs classifiers is used. The hybrid concept is as follows: initially, the features are
passed to both the individual classifiers, and finally the mean of the classifiers’ output will be considered as
the outcome. Here, to enhance the performance of prediction results, the training of both classifiers is carried
out by the proposed ASN via tuning the optimal weights.
2.2.1. LSTM networks
The most popular type of recurrent neural networks (RNNs) are LSTM networks. The memory cell
and the gates are the two essential parts of the LSTM. The input gates and forget gates alter the internal elements
of the memory cell.
LSTM networks rely on four essential gates. The forget gate (f), responsible for determining what
information from the previous state should be remembered or discarded. The input gate (i) comes into play,
deciding which incoming data should be integrated into the current state. The input modulation gate (g), often
considered as part of the input gate, alters incoming data to ensure its appropriateness for updating the internal
state. Finally, the output gate (o) combines various outputs, including the previous state, to generate the current
state. Together, these four gates orchestrate the flow of information, control state updates, and contribute to the
network's output.
In our work, we have employed the tanh activation function, denoted in (4). The use of the tanh
activation function is pivotal in our neural network framework, introducing essential non-linearity that aids in
capturing intricate data relationships. This function's significance lies in its widespread use across various
neural network architectures, contributing to tasks like feature transformation and classification.
𝐻𝑡 = 𝑡𝑎𝑛ℎ(𝑊𝐻𝐻𝐻𝑡−1 + 𝑊𝑥𝐻𝑥𝑡) (4)
To reduce the model's loss, we employed the following cross-entropy loss function in our work, as in (5).
𝑐𝑟𝑜𝑠𝑠𝐸𝑛𝑡 =
−1
𝑁
[∑ [𝑡𝑖 𝑙𝑜𝑔(𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝜒)) + (1 − 𝑡𝑖) 𝑙𝑜𝑔(𝑠𝑖𝑔𝑚𝑜𝑖𝑑(1 − 𝜍𝑖))]
𝑁
𝑖=1 ] (5)
2.2.2. Restricted Boltzmann machines
RBM layers, which were used for pre-training, were transformed into a feed-forward network to
enable weight fine-tuning by using a new strategy. A SoftMax layer was added to the top layer during the fine-
5. Int J Elec & Comp Eng ISSN: 2088-8708
A novel optimized deep learning method for protein-protein prediction in bioinformatics (Preeti Thareja)
753
tuning step to improve the characteristics of the tagged samples. The underlying features were learned using a
greedy tier unsupervised technique during the pre-training stage.
2.3. ASN optimizer
The proposed ASN is the combination of Aquila optimizer and shark nose smell optimization. Aquila
update is influenced by the Shark Nose algorithm. Normally, the hybrid concept of optimization ensures better
convergence rate and speed rather than executing as the individual algorithms. The objective function of our
proposed ASN-based PPI prediction model is provided in further subsections.
2.3.1. Objective function
The input for our proposed ASN-based method is visually represented in Figure 2. This figure serves
as a pivotal element in conveying the data and information that are crucial for the successful implementation
of our approach. Figure 2 provides a clear and concise visualization of the solution input, which can include
various data sources, parameters, or components, depending on the context of the method.
Figure 2. Solution encoding of proposed ASN-based PPI prediction technique
The primary objective of this work was centered around the minimization of the mean square error,
as described in (6). Mean square error (MSE) is a fundamental metric used in various fields, particularly in the
context of optimization problems and statistical analysis. In this work, (6) serves as the key representation of
the objective function, which outlines the specific mathematical criterion for minimizing the discrepancies
between predicted and observed values.
𝑂𝑏𝑗 = 𝑀𝑖𝑛(𝑀𝑆𝐸) (6)
Where, the MSE denotes the mean square error.
2.3.2. Aquila optimizer
The Aquila optimizer is a nature-inspired algorithm based on Aquila bird hunting behavior, primarily
used for optimization tasks. It may exhibit slower convergence and suboptimal results in complex optimization
tasks. Aquila birds, like true eagles, build their nests in high places, use speed and talons for hunting, with
ground squirrels being their common prey.
The four hunting methods used by the bird are described as: ii) expanded exploration, in which the
target is pursued (high soar with vertical scoop); ii) narrowed exploration, which is the preferred technique for
pursuing ground creatures like snakes and squirrels (contour flight with short glide attack); iii) expanded
exploitation, which is the technique for pursuing slow prey (low flight with a slow descending attack); and
iv) narrowed exploitation is a technique for hunting huge animals (walking and grabbing the target). The
mathematical expressions for the methods are expressed in (7)-(11).
𝑋1(𝑡 + 1) = 𝑋𝑏𝑒𝑠𝑡(𝑡) × (1 −
𝑡
𝑇
) + (𝑋𝑀(𝑡) − 𝑋𝑏𝑒𝑠𝑡(𝑡) × 𝑟𝑎𝑛𝑑) (7)
𝑋𝑀(𝑡) =
1
𝑁
∑ 𝑋𝑖(𝑡), ∀𝑗 = 1,2, … , 𝐷𝑖𝑚
𝑁
𝑖=1 (8)
𝑋2(𝑡 + 1) = 𝑋𝑏𝑒𝑠𝑡(𝑡) × 𝐿𝑒𝑣𝑦(𝐷) + (𝑋𝑅(𝑡) + (𝑦 − 𝑥) × 𝑟𝑎𝑛𝑑) (9)
𝑋3(𝑡 + 1) = (𝑋𝑏𝑒𝑠𝑡(𝑡) × 𝑋𝑀(𝑡)) × 𝛼 − 𝑟𝑎𝑛𝑑 + ((𝑈𝐵 − 𝐿𝐵) × 𝑟𝑎𝑛𝑑 + 𝐿𝐵) × 𝛿 (10)
𝑋4(𝑡 + 1) = 𝑄𝐹 × 𝑋𝑏𝑒𝑠𝑡(𝑡) − (𝐺1 × 𝑋(𝑡) × 𝑟𝑎𝑛𝑑) − 𝐺2 × 𝐿𝑒𝑣𝑦(𝐷) + 𝑟𝑎𝑛𝑑 × 𝐺1 (11)
𝑄𝐹(𝑡) = 𝑡
2 ×𝑟𝑎𝑛𝑑 ()−1
(1−𝑇)2
(12)
𝐺1 = 2 × 𝑟𝑎𝑛𝑑() − 1 (13)
6. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 749-758
754
𝐺2 = 2 × (1 −
𝑡
𝑇
) (14)
where 𝑋1, 𝑋2, and 𝑋3 represent the new solution for methods 1, 2, and 3, 𝑋𝑏𝑒𝑠𝑡 is the best solution, 𝑡 is the
current iteration, 𝑇 is the maximum iteration, 𝑁 is the population size, 𝐷𝑖𝑚 is the variable size, 𝑟𝑎𝑛𝑑 is the
random value in the range 0 to 1, and 𝑋𝑀 is the local mean value, as in (8). 𝐿𝑒𝑣𝑦 (𝐷) is Levy’s flight
distribution, 𝑈𝐵 is the upper bound, 𝐿𝐵 is the lower bound, α, δ are exploitation parameters, and 𝑄𝐹, 𝐺1, and
𝐺2 are quality factors as shown in (12)-(14).
2.3.3. Shark nose optimizer
Shark nose optimization algorithm is a population-based metaheuristic optimization algorithm. Shark
nose optimization algorithm is inspired by the Shark food foraging behavior. The entire algorithm is based on
calculating the shark’s position based on the movements of the shark which are: i) forward movement and
ii) rotational movement. The mathematical expression for the movements is expressed in (15)-(17).
𝑌𝑖
𝑘+1
= 𝑋𝑖
𝐾
+ 𝑉𝑖
𝑘
× ∆𝑡𝑘, 𝑖 = 1,2, … , 𝑛𝑒𝑤𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑘 = 1,2, … , 𝑘𝑚𝑎𝑥 (15)
𝑍𝑖
𝑘+1,𝑚
= 𝑌𝑖
𝑘+1
+ 𝑅3 × 𝑌𝑖
𝑘+1
, 𝑚 = 1,2, … , 𝑀, 𝑖 = 1,2, … , 𝑛𝑒𝑤𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑘 = 1,2, … , 𝑘𝑚𝑎𝑥 (16)
The shark’s new position is determined using the expression as shown in (17).
𝑋𝑖
𝑘+1
= arg max{𝑜𝑓(𝑌𝑖
𝑘+1
), 𝑜𝑓(𝑍𝑖
𝑘+1,𝑖
), … , 𝑜𝑓(𝑍𝑖
𝑘+1,
)} , 𝑖 = 1,2, … , 𝑛𝑒𝑤𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (17)
Gauss mutation was also performed in our work to provide an accurate and reliable optimization. To make a
new generation, Gaussian mutation simply adds a random value from a Gaussian distribution to every member
of an individual's vector. The pseudocode for the proposed algorithm is described in Table 2.
Table 2. Pseudocode for proposed ASN technique
Step Number Step Name Step procedure
1 Initialization Set the attributes new position, 𝑘𝑚𝑎𝑥, α, δ.
Create an initial population.
Create every decision randomly within the acceptable range.
Initializing the stage counter 𝑘 = 1
for 𝑘 = 1 to 𝑘𝑚𝑎𝑥
2 Forward
movement
Compute velocity vector using for every element.
Acquire a new location of the shark depending on its forward movement, using the
Aquila updating function.
3 Rotational
movement
Depending on the rotational movement, acquire the new location of the shark.
Depending on the two moves, choose the shark's upcoming location.
4 Gaussian
mutation
Apply Gaussian mutation to increase the local search ability.
End for 𝑘
Set 𝑘 = 𝑘 + 1
Choose the shark position with the greatest value in the final stage.
3. RESULTS AND DISCUSSION
3.1. Simulation setup
The proposed work has been implemented in the MATLAB tool. The datasets are typical UniProt
proteins with experimental gene ontology (GO) annotation and structure models predicted by I-TASSER.
Performance matrices of our proposed ASN PPI prediction technique were evaluated and compared with
conventional techniques such as Aquila, cat swarm optimization, hunger games search, poor rich optimization,
and shark nose optimization.
3.2. Error analysis
The error analysis in this study encompasses several performance metrics to evaluate the model's
accuracy. These metrics include mean absolute error (MAE), measuring the absolute size of discrepancies
between actual and predicted values, root mean square error (RMSE), assessing the overall magnitude of errors,
mean absolute relative error (MARE), evaluating prediction accuracy in relation to relative errors, and mean
squared error (MSE), which calculates the average of squared differences between predicted values and the
overall mean, offering insights into prediction variability. These metrics collectively provide a comprehensive
assessment of the model's predictive capabilities and its ability to minimize errors across a range of contexts.
7. Int J Elec & Comp Eng ISSN: 2088-8708
A novel optimized deep learning method for protein-protein prediction in bioinformatics (Preeti Thareja)
755
Our proposed ASN approach was compared to traditional optimization techniques, including cat
swarm optimization (CSO), hunger games search (HGS), and poor rich optimization (PRO), using various
evaluation metrics. For dataset-1, our approach achieved a lower MAE of 0.013 at 60% learning percentage
(LP) compared to PRO (0.017) and CSO (0.014). Additionally, our method demonstrated a MARE value of 1
for 60% and 70% LPs, highlighting its effectiveness. In contrast, HGS resulted in higher MSE and RMSE
values of 0.043 and 0.21, respectively, at 60% LP, indicating that our ASN approach is more reliable and
outperforms traditional methods. Figure 3 shows the comparison for ASN PPI prediction model with traditional
models when applied to dataset 1 giving results for MAE in Figure 3(a), MSE in Figure 3(b), MARE in
Figure 3(c) and RMSE in Figure 3(d).
(a) (b)
(c) (d)
Figure 3. Comparing results of (a) MAE, (b) MSE, (c) MARE, and (d) RMSE of proposed ASN PPI
prediction model with standard optimization algorithms for dataset-1
For dataset-2, we compared our ASN approach to traditional optimization algorithms, assessing
metrics like MAE, MARE, MASE, and RMSE. Figure 4 shows the comparison for ASN PPI prediction model
with traditional models when applied to dataset 2 giving results for MAE in Figure 4(a), MSE in Figure 4(b),
MARE in Figure 4(c) and RMSE in Figure 4(d). Notably, at 60-90% LPs, our approach achieves lower MAE
values (0.17, 0.18, 0.19, and 0.17) compared to CSO (0.180, 0.183, 0.194, and 0.195). Similarly, our MARE
values for dataset-2 are consistently lower (1.7, 1.3, 1.8, and 1.5) across LPs, showcasing the effectiveness of
our ASN technique for PPI prediction. In contrast, both HGS and PRO techniques yield higher MAE and
MARE values, underlining the superior performance of our proposed prediction strategy over traditional
methods.
Figure 5 serves as a visual representation of the performance comparison of the proposed ASN-based
prediction strategy with other alternative networks across multiple cases and datasets. The figure provides a
clear and concise summary of the evaluation results for dataset-1 and dataset-2. It is divided into two
sub-figures, Figures 5(a) and 5(b), each focusing on a specific dataset.
In Figure 5(a), the performance results for dataset-1 are presented. The key performance metric, MAE,
is highlighted, showing that the ASN-based strategy achieves a low MAE of 0.135. This is contrasted with
LSTM, CNN, and SVM, which exhibit significantly higher MARE values of 2.44, 2.99, and 1.96, respectively.
The results emphasize the superior performance of the proposed ASN-based strategy in dataset-1.
Figure 5(b) shifts the focus to dataset-2 and provides a comprehensive examination of performance
metrics, including MAE, RMSE, MARE, and MSE. The ASN-based approach in dataset-2 demonstrates MAE
and RMSE values of 0.173 and 0.228, respectively. In contrast, alternative networks like LSTM, CNN, and
8. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 749-758
756
SVM yield higher MAE and RMSE values, further highlighting the superior performance of the ASN-based
strategy in this dataset.
(a) (b)
(c) (d)
Figure 4. Comparing results of (a) MAE, (b) MSE, (c) MARE, and (d) RMSE of proposed ASN PPI
prediction model with standard optimization algorithms for dataset-2
(a) (b)
Figure 5. Comparing results of MAE, MSE, MARE, and RMSE of the proposed ASN model with standard
optimization algorithms for (a) dataset-1 and (b) dataset-2
3.3. Accuracy analysis
The performance of the projected model is evaluated for dataset 1 and dataset 2 by considering various
learning percentages such as 60, 70, 80, and 90 respectively. As per the obtained results, the projected model
has attained the highest accuracy over the conventional models for different learning percentages. The obtained
results are illustrated in Tables 3 and 4.
At a 60% learning percentage in dataset 1, the developed model achieves an impressive accuracy of
approximately 87.37%, surpassing traditional methods such as CSO, HGS, and PRO. Additionally, in
dataset 2, at a 70% learning percentage, the developed model consistently attains the highest accuracy among
the alternatives. These results underscore the model's robust performance and its superiority over traditional
methods in delivering accurate outcomes across various datasets and learning percentages.
9. Int J Elec & Comp Eng ISSN: 2088-8708
A novel optimized deep learning method for protein-protein prediction in bioinformatics (Preeti Thareja)
757
Table 3. Comparison of accuracy of the proposed ASN approach with traditional optimization
algorithms for dataset 1
60% 70% 80% 90%
Cat Swarm 84 85.6 84 83.9
Hunger Games 84.6 85.8 85 84
Poor Rich 84.9 85.8 86 84.5
Proposed ASN 87.37 86.6 86.5 87
Table 4. Comparison of accuracy of the proposed ASN approach with traditional optimization
algorithms for dataset 2
60% 70% 80% 90%
Cat Swarm 84 85.4 83.9 83.8
Hunger Games 85 86 85 84
Poor Rich 85.9 86 85.9 84.9
Proposed ASN 86.7 87.5 86.5 87
4. CONCLUSION
The current research work has emphasized predicting the protein-to-protein interaction by using
sequence-based features and optimized classifiers. Different physicochemical properties have different effects
on the classification of AAs in protein sequences. The classification criteria for AAs based on their
physicochemical properties is difficult to choose. This is also the direction of our efforts in the future.
In addition, the proposed machine learning approach has distinctive inherent biases, including representation
biases and process biases, which affect their learning behaviors and performances significantly even in the
same learning task. In the future, we will develop an ensemble meta-learning strategy to overcome these issues
and it will extensible to other domains also. And also, we would employ another sequence-based model with
an advanced deep-learning concept. In addition, the most effective optimization approach can be developed for
extending the current method.
REFERENCES
[1] H. Shi, S. Liu, J. Chen, X. Li, Q. Ma, and B. Yu, “Predicting drug-target interactions using Lasso with random forest based on
evolutionary information and chemical structure,” Genomics, vol. 111, no. 6, pp. 1839–1852, Dec. 2019,
doi: 10.1016/j.ygeno.2018.12.007.
[2] P. Thareja and R. S. Chhillar, “Power of deep learning models in bioinformatics,” in International Conference on Innovations in
Data Analytics, 2023, pp. 535–542, doi: 10.1007/978-981-99-0550-8_42.
[3] X. Hu, C. Feng, T. Ling, and M. Chen, “Deep learning frameworks for protein–protein interaction prediction,” Computational and
Structural Biotechnology Journal, vol. 20, pp. 3223–3233, 2022, doi: 10.1016/j.csbj.2022.06.025.
[4] S. Lim et al., “A review on compound-protein interaction prediction methods: data, format, representation and model,”
Computational and Structural Biotechnology Journal, vol. 19, pp. 1541–1556, 2021, doi: 10.1016/j.csbj.2021.03.004.
[5] L. Zhao, Y. Zhu, J. Wang, N. Wen, C. Wang, and L. Cheng, “A brief review of protein–ligand interaction prediction,”
Computational and Structural Biotechnology Journal, vol. 20, pp. 2831–2838, 2022, doi: 10.1016/j.csbj.2022.06.004.
[6] T. Tang et al., “Machine learning on protein–protein interaction prediction: models, challenges and trends,” Briefings in
Bioinformatics, vol. 24, no. 2, Mar. 2023, doi: 10.1093/bib/bbad076.
[7] P. Thareja and R. S. Chhillar, “Applications of deep learning models in bioinformatics,” in Machine Learning Algorithms for
Intelligent Data Analytics, Technoarete Research and Development Association, 2022, pp. 116–126.
[8] H. Askr, E. Elgeldawi, H. Aboul Ella, Y. A. M. M. Elshaier, M. M. Gomaa, and A. E. Hassanien, Deep learning in drug discovery:
an integrative review and future challenges, vol. 56, no. 7. Springer Netherlands, 2023.
[9] Y. Zhuang et al., “Deep learning on graphs for multi-omics classification of COPD,” PLoS ONE, vol. 18, 2023,
doi: 10.1371/journal.pone.0284563.
[10] H. Yang et al., “AdmetSAR 2.0: web-service for prediction and optimization of chemical ADMET properties,” Bioinformatics,
vol. 35, no. 6, pp. 1067–1069, Mar. 2019, doi: 10.1093/bioinformatics/bty707.
[11] Y. Li, M. Zeng, F. Zhang, F.-X. Wu, and M. Li, “DeepCellEss: cell line-specific essential protein prediction with attention-based
interpretable deep learning,” Bioinformatics, vol. 39, no. 1, pp. 1–9, Jan. 2023, doi: 10.1093/bioinformatics/btac779.
[12] Z. Hou, Y. Yang, Z. Ma, K. Wong, and X. Li, “Learning the protein language of proteome-wide protein-protein binding sites via
explainable ensemble deep learning,” Communications Biology, vol. 6, no. 1, Jan. 2023, doi: 10.1038/s42003-023-04462-5.
[13] Z. Gao et al., “Hierarchical graph learning for protein–protein interaction,” Nature Communications, vol. 14, no. 1, Feb. 2023,
doi: 10.1038/s41467-023-36736-1.
[14] Y. Yue et al., “A deep learning framework for identifying essential proteins based on multiple biological information,” BMC
Bioinformatics, vol. 23, no. 1, Dec. 2022, doi: 10.1186/s12859-022-04868-8.
[15] B. I. Díaz-Eufracio and J. L. Medina-Franco, “Machine learning models to predict protein–protein interaction inhibitors,”
Molecules, vol. 27, no. 22, Nov. 2022, doi: 10.3390/molecules27227986.
[16] Aman and R. S. Chhillar, “Disease predictive models for healthcare by using data mining techniques: state of the art,” International
Journal of Engineering Trends and Technology, vol. 68, no. 10, pp. 52–57, 2020, doi: 10.14445/22315381/IJETT-V68I10P209.
[17] A. - and R. S. Chhillar, “Analyzing predictive algorithms in data mining for cardiovascular disease using WEKA tool,” International
Journal of Advanced Computer Science and Applications, vol. 12, no. 8, pp. 144–150, 2021, doi: 10.14569/IJACSA.2021.0120817.
[18] Y. Li, C. Huang, L. Ding, Z. Li, Y. Pan, and X. Gao, “Deep learning in bioinformatics: introduction, application, and perspective
in the big data era,” Methods, vol. 166, pp. 4–21, 2019, doi: 10.1016/j.ymeth.2019.04.008.
10. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 749-758
758
[19] H. Gao, C. Chen, S. Li, C. Wang, W. Zhou, and B. Yu, “Prediction of protein-protein interactions based on ensemble residual
convolutional neural network,” Computers in Biology and Medicine, vol. 152, Jan. 2023, doi: 10.1016/j.compbiomed.2022.106471.
[20] P. S. Dholaniya and S. Rizvi, “Effect of various sequence descriptors in predicting human protein-protein interactions using ANN-based
prediction models,” Current Bioinformatics, vol. 16, no. 8, pp. 1024–1033, Nov. 2021, doi: 10.2174/1574893616666210402114623.
[21] R. Kaundal, C. D. Loaiza, N. Duhan, and N. Flann, “DeepHPI: a comprehensive deep learning platform for accurate prediction and
visualization of host–pathogen protein–protein interactions,” Briefings in Bioinformatics, vol. 23, no. 3, May 2022,
doi: 10.1093/bib/bbac125.
[22] J. Levy et al., “Artificial intelligence, bioinformatics, and pathology: emerging trends part i-an introduction to machine learning
technologies,” Advances in Molecular Pathology, vol. 5, no. 1, Nov. 2022, doi: 10.1016/j.yamp.2023.01.001.
[23] B. Robson and O. K. Baek, “An ontology for very large numbers of longitudinal health records to facilitate data mining and machine
learning,” Informatics in Medicine Unlocked, vol. 38, 2023, doi: 10.1016/j.imu.2023.101204.
[24] J. Pan, L.-P. Li, C.-Q. Yu, Z.-H. You, Z.-H. Ren, and J.-Y. Tang, “FWHT-RF: a novel computational approach to predict plant
protein-protein interactions via an ensemble learning method,” Scientific Programming, vol. 2021, pp. 1–11, Jul. 2021,
doi: 10.1155/2021/1607946.
[25] N. Renaud et al., “DeepRank: a deep learning framework for data mining 3D protein-protein interfaces,” Nature Communications,
vol. 12, no. 1, Dec. 2021, doi: 10.1038/s41467-021-27396-0.
[26] M. Quadrini, S. Daberdaku, and C. Ferrari, “Hierarchical representation for PPI sites prediction,” BMC Bioinformatics, vol. 23,
no. 1, Mar. 2022, doi: 10.1186/s12859-022-04624-y.
[27] K.-H. Chen, T.-F. Wang, and Y.-J. Hu, “Protein-protein interaction prediction using a hybrid feature representation and a stacked
generalization scheme,” BMC Bioinformatics, vol. 20, no. 1, Dec. 2019, doi: 10.1186/s12859-019-2907-1.
BIOGRAPHIES OF AUTHORS
Preeti Thareja is a computer science research scholar at Maharshi Dayanand
University in Rohtak, Haryana, India. Data mining, artificial intelligence, soft computing, and
deep learning are among her research interests. Over the last few years, she has published
3 journal papers, 4 conference papers, and 1 book chapter, as well as two books about Python
and soft computing. She can be contacted on preetithareja10@gmail.com.
Rajender Singh Chillar is a computer science professor at Maharshi Dayanand
University in Rohtak, Haryana, India. He was also the head of the Department of Computer
Science, the chairman of a board of studies, and a member of the executive and academic
councils. Software engineering, software testing, software metrics, web metrics, biometrics, data
warehouse and data mining, computer networking, and software design are among his research
interests. Over the last several years, he has produced over 91 journal papers and 65 conference
papers, as well as two books about software engineering and information technology. He is a
director of the CMAI Asia Association in New Delhi, as well as a senior member of the IACSIT
in Singapore and a member of the Computer Society of India. He can be contacted on
r.chhillar@mdurohtak.ac.in.