Gene expression data is popularized for its capability to disclose various disease conditions. However, the conventional procedure to extract gene expression data itself incorporates various artifacts that offer challenges in diagnosis a complex disease indication and classification like cancer. Review of existing research approaches indicates that classification approaches are few to proven to be standard with respect to higher accuracy and applicable to gene expression data apart from unaddresed problems of computational complexity. Therefore, the proposed manuscript introduces a novel and simplified model capable using Graph Fourier Transform, Eigen Value and vector for offering better classification performance considering case study of microarray database, which is one typical example of gene expressiondata. The study outcome shows that proposed system offers comparatively better accuracy and reduced computational complexity with the existing clustering approaches.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
GRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATAcscpconf
Many studies uses different data mining techniques to analyze mass spectrometry data and
extract useful knowledge about biomarkers. These Biomarkers allow the medical experts to
determine whether an individual has a disease or not. Some of these studies have proposed
models that have obtained high accuracy. However, the black-box nature and complexity of the
proposed models have posed significant issues. Thus, to address this problem and build an
accurate model, we use a genetic algorithm for feature selection along with a rule-based
classifier, namely Genetic Rule-Based Classifier algorithm for Mass Spectra data (GRC-MS).
According to the literature, rule-based classifiers provide understandable rules, but not
accurate. In addition, genetic algorithms have achieved excellent results when used with
different classifiers for feature selection. Experiments are conducted on real dataset and the
proposed classifier GRC-MS achieves 99.7% accuracy. In addition, the generated rules are
more understandable than those of other classifier models
Correlation of artificial neural network classification and nfrs attribute fi...eSAT Journals
Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data mining task, with roots in machine learning. Naïve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques, greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
Medical informatics growth can be observed now days. Advancement in different medical fields
discovers the various critical diseases and provides the guidelines for their cure. This has been possible
only because of well heeled medical databases as well as automation of data analysis process. Towards
this analysis process lots of learning and intelligence is required, the data mining techniques provides the
basis for that and various data mining techniques are available like Decision tree Induction, Rule Based
Classification or mining, Support vector machine, Stochastic classification, Logistic regression, Naïve
bayes, Artificial Neural Network & Fuzzy Logic, Genetic Algorithms. This paper provides the basic of
data mining with their effective techniques availability in medical sciences & reveals the efforts done on
medical databases using data mining techniques for human disease diagnosis.
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...rahulmonikasharma
Enormous generation of biological data and the need of analysis of that data led to the generation of the field Bioinformatics. Data mining is the stream which is used to derive, analyze the data by exploring the hidden patterns of the biological data. Though, data mining can be used in analyzing biological data such as genomic data, proteomic data here Gene Expression (GE) Data is considered for evaluation. GE is generated from Microarrays such as DNA and oligo micro arrays. The generated data is analyzed through the clustering techniques of data mining. This study deals with an implement the basic clustering approach K-Means and various clustering approaches like Hierarchal, Som, Click and basic fuzzy based clustering approach. Eventually, the comparative study of those approaches which lead to the effective approach of cluster analysis of GE.The experimental results shows that proposed algorithm achieve a higher clustering accuracy and takes less clustering time when compared with existing algorithms.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
GRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATAcscpconf
Many studies uses different data mining techniques to analyze mass spectrometry data and
extract useful knowledge about biomarkers. These Biomarkers allow the medical experts to
determine whether an individual has a disease or not. Some of these studies have proposed
models that have obtained high accuracy. However, the black-box nature and complexity of the
proposed models have posed significant issues. Thus, to address this problem and build an
accurate model, we use a genetic algorithm for feature selection along with a rule-based
classifier, namely Genetic Rule-Based Classifier algorithm for Mass Spectra data (GRC-MS).
According to the literature, rule-based classifiers provide understandable rules, but not
accurate. In addition, genetic algorithms have achieved excellent results when used with
different classifiers for feature selection. Experiments are conducted on real dataset and the
proposed classifier GRC-MS achieves 99.7% accuracy. In addition, the generated rules are
more understandable than those of other classifier models
Correlation of artificial neural network classification and nfrs attribute fi...eSAT Journals
Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data mining task, with roots in machine learning. Naïve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques, greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
Medical informatics growth can be observed now days. Advancement in different medical fields
discovers the various critical diseases and provides the guidelines for their cure. This has been possible
only because of well heeled medical databases as well as automation of data analysis process. Towards
this analysis process lots of learning and intelligence is required, the data mining techniques provides the
basis for that and various data mining techniques are available like Decision tree Induction, Rule Based
Classification or mining, Support vector machine, Stochastic classification, Logistic regression, Naïve
bayes, Artificial Neural Network & Fuzzy Logic, Genetic Algorithms. This paper provides the basic of
data mining with their effective techniques availability in medical sciences & reveals the efforts done on
medical databases using data mining techniques for human disease diagnosis.
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...rahulmonikasharma
Enormous generation of biological data and the need of analysis of that data led to the generation of the field Bioinformatics. Data mining is the stream which is used to derive, analyze the data by exploring the hidden patterns of the biological data. Though, data mining can be used in analyzing biological data such as genomic data, proteomic data here Gene Expression (GE) Data is considered for evaluation. GE is generated from Microarrays such as DNA and oligo micro arrays. The generated data is analyzed through the clustering techniques of data mining. This study deals with an implement the basic clustering approach K-Means and various clustering approaches like Hierarchal, Som, Click and basic fuzzy based clustering approach. Eventually, the comparative study of those approaches which lead to the effective approach of cluster analysis of GE.The experimental results shows that proposed algorithm achieve a higher clustering accuracy and takes less clustering time when compared with existing algorithms.
Network embedding in biomedical data scienceArindam Ghosh
Excerpts from the paper:
What is it?
Network embedding aims at converting the network into a low-dimensional space while structural information of the network is preserved.
In this way, nodes and/or edges of the network can be represented as compacted yet informative vectors in the embedding space.
Advantages:
Typical non-network-based machine learning methods such as linear regression, Support Vector Machine (SVM) and decision forest, which have been demonstrated to be effective and efficient as the state-of-the-art techniques, can be applied to such vectors.
Current status:
Efforts of applying network embedding to improve biomedical data analysis are already planned or underway.
Difficulties:
The biomedical networks are sparse, noisy, incomplete, heterogeneous and usually consist of biomedical text and other domain knowledge. It makes embedding tasks more complicated than other application fields.
Internet becomes the most popular surfing environment which increases the
service oriented data size. As the data size grows, finding and retrieving the most
similar data from the large volume of data would become more difficult task. This
problem is focused in the various research methods, which attempts to cluster the
large volume of data. In the existing research method Clustering-based Collaborative
Filtering approach (ClubCF) is introduced whose main goal is to cluster the similar
kind of data together, so that retrieval time cost can be reduced considerably.
However, existing research methods cannot find the similar reviews accurately which
needs to be focused more for efficient and accurate recommendation system. This is
ensured in the proposed research method by introducing the novel research technique
namely Modified Collaborative Filtering and Clustering with Regression (MoCFCR).
In this research method, initially k means algorithm is used to cluster the similar
movie reviewer together, so that recommendation process can be done in the easier
way. In order to handle the large volume of data this research work adapts the map
reduce framework which will divide the entire data into subsets which will assigned
on separate nodes with individual key values. After clustering, the clustered outcome
is merged together using inverted index procedure in which similarity between movies
would be calculated. Here collaborative filtering is applied to remove the movies that
are not relevant to input. Finally recommendations of movies are made in the accurate
way by using the logistic regression method. The overall evaluation of the proposed
research method is done in Hadoop from which it can be proved that the proposed
research technique can lead to provide better outcome than the existing research
techniques
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
Classification of medical datasets using back propagation neural network powe...IJECEIAES
The classification is a one of the most indispensable domains in the data mining and machine learning. The classification process has a good reputation in the area of diseases diagnosis by computer systems where the progress in smart technologies of computer can be invested in diagnosing various diseases based on data of real patients documented in databases. The paper introduced a methodology for diagnosing a set of diseases including two types of cancer (breast cancer and lung), two datasets for diabetes and heart attack. Back Propagation Neural Network plays the role of classifier. The performance of neural net is enhanced by using the genetic algorithm which provides the classifier with the optimal features to raise the classification rate to the highest possible. The system showed high efficiency in dealing with databases differs from each other in size, number of features and nature of the data and this is what the results illustrated, where the ratio of the classification reached to 100% in most datasets).
Drug discovery and development is a long and expensive process and over time has notoriously bucked Moore’s law that it now has its own law called Eroom’s Law named after it (the opposite of Moore’s). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of the failures.
Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible is paramount in accelerating drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains.
Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories:
1. Discovery,
2. Toxicity and Safety, and
3. Post-Market Monitoring.
We will address the recent progress in predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them.
We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will address some of the remaining challenges and limitations yet to be addressed in the area of drug discovery and safety assessment.
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from
large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an
established, proven structure from a voluminous collection of facts. A dominant area of modern-day
research in the field of medical investigations includes disease prediction and malady categorization. In
this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering
techniques and compare the performance of classification algorithms on the clinical data. Feature
selection is a supervised method that attempts to select a subset of the predictor features based on the
information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with
the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms
in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen
classification algorithms on the Lymphography dataset that enables the classifier to accurately perform
multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and
the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and
also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated
here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm
offers increased clustering accuracy in less computation time.
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many
biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon
cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies
in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in
their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms
and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the
matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix
Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification
accuracies are then compared for these algorithms.This technique gives an accuracy of 98%
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms.This technique gives an accuracy of 98%.
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
Sequencing projects arising from high throughput technologies including those of sequencing DNA microarrays allowed to simultaneously measure the expression levels of millions of genes of a biological sample as well as annotate and identify the role (function) of those genes. Consequently, to better manage and organize this significant amount of information,
bioinformatics approaches have been developed. These approaches provide a representation and a more 'relevant' integration of data in order to test and validate the hypothesis of researchers throughout the experimental cycle. In this context, this article describes and discusses some of techniques used for the functional analysis of gene expression data.
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...CSCJournals
Logistic Regression (LR) is a well known classification method in the field of statistical learning. It allows probabilistic classification and shows promising results on several benchmark problems. Logistic regression enables us to investigate the relationship between a categorical outcome and a set of explanatory variables. Artificial Neural Networks (ANNs) are popularly used as universal non-linear inference models and have gained extensive popularity in recent years. Research activities are considerable and literature is growing. The goal of this research work is to compare the performance of Logistic Regression and Neural Network models on publicly available medical datasets. The evaluation process of the model is as follows. The logistic regression and neural network methods with sensitivity analysis have been evaluated for the effectiveness of the classification. The Classification Accuracy is used to measure the performance of both the models. From the experimental results it is confirmed that the neural network model with sensitivity analysis model gives more efficient result.
Application of Hybrid Genetic Algorithm Using Artificial Neural Network in Da...IOSRjournaljce
The main purpose of data mining is to extract knowledge from large amount of data. Artificial Neural network (ANN) has already been applied in a variety of domains with remarkable success. This paper presents the application of hybrid model for stroke disease that integrates Genetic algorithm and back propagation algorithm. Selecting a good subset of features, without sacrificing accuracy, is of great importance for neural networks to be successfully applied to the area. In addition the hybrid model that leads to further improvised categorization, accuracy compared to the result produced by genetic algorithm alone. In this study, a new hybrid model of Neural Networks and Genetic Algorithm (GA) to initialize and optimize the connection weights of ANN so as to improve the performance of the ANN and the same has been applied in a medical problem of predicting stroke disease for verification of the results.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Graphical Model and Clustering-Regression based Methods for Causal Interactio...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex
because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast
Cancer at the early stage was to apply artificial intelligence where machines are simulated with
intelligence and programmed to think and act like a human. This allows machines to passively learn and
find a pattern, which can be used later to detect any new changes that may occur. In general, machine
learning is quite useful particularly in the medical field, which depends on complex genomic
measurements such as microarray technique and would increase the accuracy and precision of results.
With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper
treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast
Cancer diagnostic system using complex genomic analysis via microarray technology. The system will
combine two machine learning methods, K-means cluster, and linear regression.
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...IJECEIAES
In many diseases classification an accurate gene analysis is needed, for which selection of most informative genes is very important and it require a technique of decision in complex context of ambiguity. The traditional methods include for selecting most significant gene includes some of the statistical analysis namely 2-Sample-T-test (2STT), Entropy, Signal to Noise Ratio (SNR). This paper evaluates gene selection and classification on the basis of accurate gene selection using structured complex decision technique (SCDT) and classifies it using fuzzy cluster based nearest neighborclassifier (FC-NNC). The effectiveness of the proposed SCDT and FC-NNC is evaluated for leave one out cross validation metric(LOOCV) along with sensitivity, specificity, precision and F1-score with four different classifiers namely 1) Radial Basis Function (RBF), 2) Multi-layer perception(MLP), 3) Feed Forward(FF) and 4) Support vector machine(SVM) for three different datasets of DLBCL, Leukemia and Prostate tumor. The proposed SCDT &FC-NNC exhibits superior result for being considered more accurate decision mechanism.
Sample Work For Engineering Literature Review and Gap IdentificationPhD Assistance
Sample Work For Engineering Literature Review and Gap Identification - PhD Assistance - http://bit.ly/2E9fAVq
2.1 INTRODUCTION
2.2 RESEARCH GAPS IN EXISTING METHODS
2.3 OBJECTIVES OF THIS WORK
Read More : http://bit.ly/2Rl7XT5
#gapanalysis #strategicmanagement #datagapanalysis #gapanalysisppt #gapanalysishealthcare #gapanalysisfinance #gapanalysisEngineering
Network embedding in biomedical data scienceArindam Ghosh
Excerpts from the paper:
What is it?
Network embedding aims at converting the network into a low-dimensional space while structural information of the network is preserved.
In this way, nodes and/or edges of the network can be represented as compacted yet informative vectors in the embedding space.
Advantages:
Typical non-network-based machine learning methods such as linear regression, Support Vector Machine (SVM) and decision forest, which have been demonstrated to be effective and efficient as the state-of-the-art techniques, can be applied to such vectors.
Current status:
Efforts of applying network embedding to improve biomedical data analysis are already planned or underway.
Difficulties:
The biomedical networks are sparse, noisy, incomplete, heterogeneous and usually consist of biomedical text and other domain knowledge. It makes embedding tasks more complicated than other application fields.
Internet becomes the most popular surfing environment which increases the
service oriented data size. As the data size grows, finding and retrieving the most
similar data from the large volume of data would become more difficult task. This
problem is focused in the various research methods, which attempts to cluster the
large volume of data. In the existing research method Clustering-based Collaborative
Filtering approach (ClubCF) is introduced whose main goal is to cluster the similar
kind of data together, so that retrieval time cost can be reduced considerably.
However, existing research methods cannot find the similar reviews accurately which
needs to be focused more for efficient and accurate recommendation system. This is
ensured in the proposed research method by introducing the novel research technique
namely Modified Collaborative Filtering and Clustering with Regression (MoCFCR).
In this research method, initially k means algorithm is used to cluster the similar
movie reviewer together, so that recommendation process can be done in the easier
way. In order to handle the large volume of data this research work adapts the map
reduce framework which will divide the entire data into subsets which will assigned
on separate nodes with individual key values. After clustering, the clustered outcome
is merged together using inverted index procedure in which similarity between movies
would be calculated. Here collaborative filtering is applied to remove the movies that
are not relevant to input. Finally recommendations of movies are made in the accurate
way by using the logistic regression method. The overall evaluation of the proposed
research method is done in Hadoop from which it can be proved that the proposed
research technique can lead to provide better outcome than the existing research
techniques
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
Classification of medical datasets using back propagation neural network powe...IJECEIAES
The classification is a one of the most indispensable domains in the data mining and machine learning. The classification process has a good reputation in the area of diseases diagnosis by computer systems where the progress in smart technologies of computer can be invested in diagnosing various diseases based on data of real patients documented in databases. The paper introduced a methodology for diagnosing a set of diseases including two types of cancer (breast cancer and lung), two datasets for diabetes and heart attack. Back Propagation Neural Network plays the role of classifier. The performance of neural net is enhanced by using the genetic algorithm which provides the classifier with the optimal features to raise the classification rate to the highest possible. The system showed high efficiency in dealing with databases differs from each other in size, number of features and nature of the data and this is what the results illustrated, where the ratio of the classification reached to 100% in most datasets).
Drug discovery and development is a long and expensive process and over time has notoriously bucked Moore’s law that it now has its own law called Eroom’s Law named after it (the opposite of Moore’s). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of the failures.
Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible is paramount in accelerating drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains.
Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories:
1. Discovery,
2. Toxicity and Safety, and
3. Post-Market Monitoring.
We will address the recent progress in predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them.
We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will address some of the remaining challenges and limitations yet to be addressed in the area of drug discovery and safety assessment.
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from
large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an
established, proven structure from a voluminous collection of facts. A dominant area of modern-day
research in the field of medical investigations includes disease prediction and malady categorization. In
this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering
techniques and compare the performance of classification algorithms on the clinical data. Feature
selection is a supervised method that attempts to select a subset of the predictor features based on the
information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with
the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms
in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen
classification algorithms on the Lymphography dataset that enables the classifier to accurately perform
multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and
the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and
also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated
here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm
offers increased clustering accuracy in less computation time.
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many
biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon
cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies
in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in
their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms
and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the
matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix
Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification
accuracies are then compared for these algorithms.This technique gives an accuracy of 98%
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms.This technique gives an accuracy of 98%.
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
Sequencing projects arising from high throughput technologies including those of sequencing DNA microarrays allowed to simultaneously measure the expression levels of millions of genes of a biological sample as well as annotate and identify the role (function) of those genes. Consequently, to better manage and organize this significant amount of information,
bioinformatics approaches have been developed. These approaches provide a representation and a more 'relevant' integration of data in order to test and validate the hypothesis of researchers throughout the experimental cycle. In this context, this article describes and discusses some of techniques used for the functional analysis of gene expression data.
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...CSCJournals
Logistic Regression (LR) is a well known classification method in the field of statistical learning. It allows probabilistic classification and shows promising results on several benchmark problems. Logistic regression enables us to investigate the relationship between a categorical outcome and a set of explanatory variables. Artificial Neural Networks (ANNs) are popularly used as universal non-linear inference models and have gained extensive popularity in recent years. Research activities are considerable and literature is growing. The goal of this research work is to compare the performance of Logistic Regression and Neural Network models on publicly available medical datasets. The evaluation process of the model is as follows. The logistic regression and neural network methods with sensitivity analysis have been evaluated for the effectiveness of the classification. The Classification Accuracy is used to measure the performance of both the models. From the experimental results it is confirmed that the neural network model with sensitivity analysis model gives more efficient result.
Application of Hybrid Genetic Algorithm Using Artificial Neural Network in Da...IOSRjournaljce
The main purpose of data mining is to extract knowledge from large amount of data. Artificial Neural network (ANN) has already been applied in a variety of domains with remarkable success. This paper presents the application of hybrid model for stroke disease that integrates Genetic algorithm and back propagation algorithm. Selecting a good subset of features, without sacrificing accuracy, is of great importance for neural networks to be successfully applied to the area. In addition the hybrid model that leads to further improvised categorization, accuracy compared to the result produced by genetic algorithm alone. In this study, a new hybrid model of Neural Networks and Genetic Algorithm (GA) to initialize and optimize the connection weights of ANN so as to improve the performance of the ANN and the same has been applied in a medical problem of predicting stroke disease for verification of the results.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Graphical Model and Clustering-Regression based Methods for Causal Interactio...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex
because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast
Cancer at the early stage was to apply artificial intelligence where machines are simulated with
intelligence and programmed to think and act like a human. This allows machines to passively learn and
find a pattern, which can be used later to detect any new changes that may occur. In general, machine
learning is quite useful particularly in the medical field, which depends on complex genomic
measurements such as microarray technique and would increase the accuracy and precision of results.
With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper
treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast
Cancer diagnostic system using complex genomic analysis via microarray technology. The system will
combine two machine learning methods, K-means cluster, and linear regression.
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...IJECEIAES
In many diseases classification an accurate gene analysis is needed, for which selection of most informative genes is very important and it require a technique of decision in complex context of ambiguity. The traditional methods include for selecting most significant gene includes some of the statistical analysis namely 2-Sample-T-test (2STT), Entropy, Signal to Noise Ratio (SNR). This paper evaluates gene selection and classification on the basis of accurate gene selection using structured complex decision technique (SCDT) and classifies it using fuzzy cluster based nearest neighborclassifier (FC-NNC). The effectiveness of the proposed SCDT and FC-NNC is evaluated for leave one out cross validation metric(LOOCV) along with sensitivity, specificity, precision and F1-score with four different classifiers namely 1) Radial Basis Function (RBF), 2) Multi-layer perception(MLP), 3) Feed Forward(FF) and 4) Support vector machine(SVM) for three different datasets of DLBCL, Leukemia and Prostate tumor. The proposed SCDT &FC-NNC exhibits superior result for being considered more accurate decision mechanism.
Sample Work For Engineering Literature Review and Gap IdentificationPhD Assistance
Sample Work For Engineering Literature Review and Gap Identification - PhD Assistance - http://bit.ly/2E9fAVq
2.1 INTRODUCTION
2.2 RESEARCH GAPS IN EXISTING METHODS
2.3 OBJECTIVES OF THIS WORK
Read More : http://bit.ly/2Rl7XT5
#gapanalysis #strategicmanagement #datagapanalysis #gapanalysisppt #gapanalysishealthcare #gapanalysisfinance #gapanalysisEngineering
Evolving Efficient Clustering and Classification Patterns in Lymphography Dat...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an established, proven structure from a voluminous collection of facts. A dominant area of modern-day research in the field of medical investigations includes disease prediction and malady categorization. In this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering techniques and compare the performance of classification algorithms on the clinical data. Feature selection is a supervised method that attempts to select a subset of the predictor features based on the information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen classification algorithms on the Lymphography dataset that enables the classifier to accurately perform multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm offers increased clustering accuracy in less computation time.
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...mlaij
The classification of different types of tumor is of great importance in cancer diagnosis and drug discovery.
Earlier studies on cancer classification have limited diagnostic ability. The recent development of DNA
microarray technology has made monitoring of thousands of gene expression simultaneously. By using this
abundance of gene expression data researchers are exploring the possibilities of cancer classification.
There are number of methods proposed with good results, but lot of issues still need to be addressed. This
paper present an overview of various cancer classification methods and evaluate these proposed methods
based on their classification accuracy, computational time and ability to reveal gene information. We have
also evaluated and introduced various proposed gene selection method. In this paper, several issues
related to cancer classification have also been discussed.
A novel optimized deep learning method for protein-protein prediction in bioi...IJECEIAES
Proteins have been shown to perform critical activities in cellular processes and are required for the organism's existence and proliferation. On complicated protein-protein interaction (PPI) networks, conventional centrality approaches perform poorly. Machine learning algorithms based on enormous amounts of data do not make use of biological information's temporal and spatial dimensions. As a result, we developed a sequence- dependent PPI prediction model using an Aquila and shark noses-based hybrid prediction technique. This model operates in two stages: feature extraction and prediction. The features are acquired using the semantic similarity technique for good results. The acquired features are utilized to predict the PPI using hybrid deep networks long short-term memory (LSTM) networks and restricted Boltzmann machines (RBMs). The weighting parameters of these neural networks (NNs) were changed using a novel optimization approach hybrid of aquila and shark noses (ASN), and the results revealed that our proposed ASN-based PPI prediction is more accurate and efficient than other existing techniques.
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
ABSTRACT
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
An advance extended binomial GLMBoost ensemble method with synthetic minorit...IJECEIAES
Classification is an important activity in a variety of domains. Class imbalance problem have reduced the performance of the traditional classification approaches. An imbalance problem arises when mismatched class distributions are discovered among the instances of class of classification datasets. An advance extended binomial GLMBoost (EBGLMBoost) coupled with synthetic minority over-sampling technique (SMOTE) technique is the proposed model in the study to manage imbalance issues. The SMOTE is used to solve the proposed model, ensuring that the target variable's distribution is balanced, whereas the GLMBoost ensemble techniques are built to deal with imbalanced datasets. For the entire experiment, twenty different datasets are used, and support vector machine (SVM), Nu-SVM, bagging, and AdaBoost classification algorithms are used to compare with the suggested method. The model's sensitivity, specificity, geometric mean (G-mean), precision, recall, and F-measure resulted in percentages for training and testing datasets are 99.37, 66.95, 80.81, 99.21, 99.37, 99.29 and 98.61, 54.78, 69.88, 98.77, 96.61, 98.68, respectively. With the help of the Wilcoxon test, it is determined that the proposed technique performed well on unbalanced data. Finally, the proposed solutions are capable of efficiently dealing with the problem of class imbalance.
Multiple Features Based Two-stage Hybrid Classifier Ensembles for Subcellular...CSCJournals
Subcellular localization is a key functional characteristic of proteins. As an interesting ``bio-image informatics\'\' application, an automatic, reliable and efficient prediction system for protein subcellular localization can be used for establishing knowledge of the spatial distribution of proteins within living cells and permits to screen systems for drug discovery or for early diagnosis of a disease. In this paper, we propose a two-stage multiple classifier system to improve classification reliability by introducing rejection option. The system is built as a cascade of two classifier ensembles. The first ensemble consists of set of binary SVMs which generalizes to learn a general classification rule and the second ensemble, which also include three distinct classifiers, focus on the exceptions rejected by the rule. A new way to induce diversity for the classifier ensembles is proposed by designing classifiers that are based on descriptions of different feature patterns. In addition to the Subcellular Location Features (SLF) generally adopted in earlier researches, three well-known texture feature descriptions have been applied to cell phenotype images, which are the local binary patterns (LBP), Gabor filtering and Gray Level Coocurrence Matrix (GLCM). The different texture feature sets can provide sufficient diversity among base classifiers, which is known as a necessary condition for improvement in ensemble performance. Using the public benchmark 2D HeLa cell images, a high classification accuracy 96% is obtained with rejection rate $21\\%$ from the proposed system by taking advantages of the complementary strengths of feature construction and majority-voting based classifiers\' decision fusions.
The analysis of proteins and messenger RNA is commonly used in the comparison of gene expression patterns in tissues or cells of different types and under distinct conditions. In gene expression analysis, normalization is a critical step as it guarantees the validity of downstream analyses. Data preprocessing is an indispensable step in the extraction and normalization of microarray gene expression data. The normalization of gene expression data is essential in ensuring accurate inferences. A number of normalization methods in high throughput sequencing studies are being employed. The preprocessing activity begins by a careful analysis of the gene expression data and usually involves the classification of many raw signal intensities into one expression value. The Robust Multiarray Average (RMA) is a normalization approach for microarrays that involves background correction, normalization and summarization of probe levels information without using MM probes (Lim et al., 2007). It is an algorithm commonly used in the creation of an expression matrix for Affymetrix data and is one of the most commonly used modes of preprocessing to normalize gene expression data. Values of raw intensity are initially background corrected and log2 transformed before being normalized. In order to generate an expression measure for probe sets on each array, a linear model is fitted to the normalized data.
An Enhanced Feature Selection Method to Predict the Severity in Brain Tumorijtsrd
The level of severity of brain tumor is captured through MRI and then assessed by the physician for their medical interpretation. The facts behind the MRI images are then analyzed by the physician for further medication and follow up activities. An MRI image composed of large volume of features. It has irrelevant, missing and information which is not certain. In medical data analysis, an MRI image doesn't express facts very clearly to the physician for correct interpretation all the time. It also includes huge amount of redundant information within it. A mathematical model known as rough set theory has been applied to resolve this problem by eliminating the redundancy in medical image data. This paper uses a rough set method to find the severity level of the brain tumor of the given MRI image. Rough set feature selection algorithms are applied over the medical image data to select the prominent features. The classification accuracy of the brain tumor can be improved to a better level by using this rough set approach. The prominent features selected through this approach deliver a set of decision rules for the classification task. A search method based on the particle swarm optimization is proposed in this paper for minimizing the attribute set. This approach is compared with previously existing rough set reduction algorithm for finding the accuracy. The reducts originated from the proposed algorithm is more efficient and can generate decision rules that will better classify the tumor types. The rule based method provided by the rough set method delivers classification accuracy in higher level than other smart methods such as fuzzy rule extraction, neural networks, decision trees and Fuzzy Networks like Fuzzy Min Max Neural Networks. Parthiban J | Dr. B. Sathees Kumar "An Enhanced Feature Selection Method to Predict the Severity in Brain Tumor" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26802.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26802/an-enhanced-feature-selection-method-to-predict-the-severity-in-brain-tumor/parthiban-j
CLUSTERING ALGORITHM FOR A HEALTHCARE DATASET USING SILHOUETTE SCORE VALUEijcsit
The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining
interesting research areas. Data mining tools and techniques help to discover and understand hidden
patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate
clustering method and optimal number of clusters in healthcare data can be confusing and difficult most
times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but
it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms.
This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable
algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (Kmeans
and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means
algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed
DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and
different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms
have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm
performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.
The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining interesting research areas. Data mining tools and techniques help to discover and understand hidden patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate clustering method and optimal number of clusters in healthcare data can be confusing and difficult most times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms. This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (Kmeans and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.
Evaluating the Impact of Color Normalization on Kidney Image SegmentationIJCI JOURNAL
The role of deep learning in the recognition of morphological structures in histopathological data has progressed significantly. But, less intensive preprocessing stages and their contribution to deep learning pipelines is often overlooked. Color normalization (CN) algorithms are among the most prominent methods in this stage, and they work by standardizing the staining pattern of a dataset. However, the impact of various color normalization algorithms on the detection of glomeruli functional tissue units (FTUs) in kidney tissue data has not been explored before. An advanced deep learning architecture was built with the U-NET segmentation model. The U-NET model is an architecture that specializes in the segmentation of biomedical data. A dataset of 15 kidney whole slide images (WSIs), each annotated with locations of glomeruli FTUs were processed and subsequently normalized according to three 3 different conventional color normalization techniques (Reinhard, Vahadane, Macenko), and fed into a U-NET model. The dice score coefficient (DSC) was used to compare the results of each run. It was determined that color normalization algorithms significantly impact the segmentation results of deep learning algorithms, with the Reinhard algorithm being the best technique. The implications of this work are immense, as it could contribute to the proliferation of color normalization techniques in preprocessing deep learning workflows, which could improve general segmentation accuracies.
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...IJTET Journal
inAbstract— Pattern Recognition (PR) plays an important role in field of Bioinformatics. PR is concerned with processing raw measurement data by a computer to arrive at a prediction that can be used to formulate a decision to be taken. The important problem in which pattern recognition are applied have common that they are too complex to model explicitly. Diverse methods of this PR are used to analyze, segment and manage the high dimensional microarray gene data for classification. PR is concerned with the development of systems that learn to solve a given problem using a set of instances, each instances represented by a number of features. The microarray expression technologies are possible to monitor the expression levels of thousands of genes simultaneously. The microarrays generated large amount of data has stimulate the development of various computational methods to different biological processes by gene expression profiling. Microarray Gene Expression Profiling (MGEP) is important in Bioinformatics, it yield various high dimensional data used in various clinical applications like cancer diagnostics and drug designing. In this work a new schema has developed for classification of unknown malignant tumors into known class. According to this work an new classification scheme includes the transformation of very high dimensional microarray data into mahalanobis space before classification. The eligibility of the proposed classification scheme has proved to 10 commonly available cancer gene datasets, this contains both the binary and multiclass data sets. To improve the performance of the classification gene selection method is applied to the datasets as a preprocessing and data extraction step.
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
A review on internet of things-based stingless bee's honey production with im...IJECEIAES
Honey is produced exclusively by honeybees and stingless bees which both are well adapted to tropical and subtropical regions such as Malaysia. Stingless bees are known for producing small amounts of honey and are known for having a unique flavor profile. Problem identified that many stingless bees collapsed due to weather, temperature and environment. It is critical to understand the relationship between the production of stingless bee honey and environmental conditions to improve honey production. Thus, this paper presents a review on stingless bee's honey production and prediction modeling. About 54 previous research has been analyzed and compared in identifying the research gaps. A framework on modeling the prediction of stingless bee honey is derived. The result presents the comparison and analysis on the internet of things (IoT) monitoring systems, honey production estimation, convolution neural networks (CNNs), and automatic identification methods on bee species. It is identified based on image detection method the top best three efficiency presents CNN is at 98.67%, densely connected convolutional networks with YOLO v3 is 97.7%, and DenseNet201 convolutional networks 99.81%. This study is significant to assist the researcher in developing a model for predicting stingless honey produced by bee's output, which is important for a stable economy and food security.
A trust based secure access control using authentication mechanism for intero...IJECEIAES
The internet of things (IoT) is a revolutionary innovation in many aspects of our society including interactions, financial activity, and global security such as the military and battlefield internet. Due to the limited energy and processing capacity of network devices, security, energy consumption, compatibility, and device heterogeneity are the long-term IoT problems. As a result, energy and security are critical for data transmission across edge and IoT networks. Existing IoT interoperability techniques need more computation time, have unreliable authentication mechanisms that break easily, lose data easily, and have low confidentiality. In this paper, a key agreement protocol-based authentication mechanism for IoT devices is offered as a solution to this issue. This system makes use of information exchange, which must be secured to prevent access by unauthorized users. Using a compact contiki/cooja simulator, the performance and design of the suggested framework are validated. The simulation findings are evaluated based on detection of malicious nodes after 60 minutes of simulation. The suggested trust method, which is based on privacy access control, reduced packet loss ratio to 0.32%, consumed 0.39% power, and had the greatest average residual energy of 0.99 mJoules at 10 nodes.
Fuzzy linear programming with the intuitionistic polygonal fuzzy numbersIJECEIAES
In real world applications, data are subject to ambiguity due to several factors; fuzzy sets and fuzzy numbers propose a great tool to model such ambiguity. In case of hesitation, the complement of a membership value in fuzzy numbers can be different from the non-membership value, in which case we can model using intuitionistic fuzzy numbers as they provide flexibility by defining both a membership and a non-membership functions. In this article, we consider the intuitionistic fuzzy linear programming problem with intuitionistic polygonal fuzzy numbers, which is a generalization of the previous polygonal fuzzy numbers found in the literature. We present a modification of the simplex method that can be used to solve any general intuitionistic fuzzy linear programming problem after approximating the problem by an intuitionistic polygonal fuzzy number with n edges. This method is given in a simple tableau formulation, and then applied on numerical examples for clarity.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
According to the World Health Organization (WHO), seventy million individuals worldwide suffer from epilepsy, a neurological disorder. While electroencephalography (EEG) is crucial for diagnosing epilepsy and monitoring the brain activity of epilepsy patients, it requires a specialist to examine all EEG recordings to find epileptic behavior. This procedure needs an experienced doctor, and a precise epilepsy diagnosis is crucial for appropriate treatment. To identify epileptic seizures, this study employed a convolutional neural network (CNN) based on raw scalp EEG signals to discriminate between preictal, ictal, postictal, and interictal segments. The possibility of these characteristics is explored by examining how well timedomain signals work in the detection of epileptic signals using intracranial Freiburg Hospital (FH), scalp Children's Hospital Boston-Massachusetts Institute of Technology (CHB-MIT) databases, and Temple University Hospital (TUH) EEG. To test the viability of this approach, two types of experiments were carried out. Firstly, binary class classification (preictal, ictal, postictal each versus interictal) and four-class classification (interictal versus preictal versus ictal versus postictal). The average accuracy for stage detection using CHB-MIT database was 84.4%, while the Freiburg database's time-domain signals had an accuracy of 79.7% and the highest accuracy of 94.02% for classification in the TUH EEG database when comparing interictal stage to preictal stage.
Analysis of driving style using self-organizing maps to analyze driver behaviorIJECEIAES
Modern life is strongly associated with the use of cars, but the increase in acceleration speeds and their maneuverability leads to a dangerous driving style for some drivers. In these conditions, the development of a method that allows you to track the behavior of the driver is relevant. The article provides an overview of existing methods and models for assessing the functioning of motor vehicles and driver behavior. Based on this, a combined algorithm for recognizing driving style is proposed. To do this, a set of input data was formed, including 20 descriptive features: About the environment, the driver's behavior and the characteristics of the functioning of the car, collected using OBD II. The generated data set is sent to the Kohonen network, where clustering is performed according to driving style and degree of danger. Getting the driving characteristics into a particular cluster allows you to switch to the private indicators of an individual driver and considering individual driving characteristics. The application of the method allows you to identify potentially dangerous driving styles that can prevent accidents.
Hyperspectral object classification using hybrid spectral-spatial fusion and ...IJECEIAES
Because of its spectral-spatial and temporal resolution of greater areas, hyperspectral imaging (HSI) has found widespread application in the field of object classification. The HSI is typically used to accurately determine an object's physical characteristics as well as to locate related objects with appropriate spectral fingerprints. As a result, the HSI has been extensively applied to object identification in several fields, including surveillance, agricultural monitoring, environmental research, and precision agriculture. However, because of their enormous size, objects require a lot of time to classify; for this reason, both spectral and spatial feature fusion have been completed. The existing classification strategy leads to increased misclassification, and the feature fusion method is unable to preserve semantic object inherent features; This study addresses the research difficulties by introducing a hybrid spectral-spatial fusion (HSSF) technique to minimize feature size while maintaining object intrinsic qualities; Lastly, a soft-margins kernel is proposed for multi-layer deep support vector machine (MLDSVM) to reduce misclassification. The standard Indian pines dataset is used for the experiment, and the outcome demonstrates that the HSSF-MLDSVM model performs substantially better in terms of accuracy and Kappa coefficient.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSveerababupersonal22
It consists of cw radar and fmcw radar ,range measurement,if amplifier and fmcw altimeterThe CW radar operates using continuous wave transmission, while the FMCW radar employs frequency-modulated continuous wave technology. Range measurement is a crucial aspect of radar systems, providing information about the distance to a target. The IF amplifier plays a key role in signal processing, amplifying intermediate frequency signals for further analysis. The FMCW altimeter utilizes frequency-modulated continuous wave technology to accurately measure altitude above a reference point.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
2. Int J Elec & Comp Eng ISSN: 2088-8708
Novel modelling of clustering for enhanced classification performance on gene expression data (Sudha V.)
2061
similarity. But still this method is also flawed as it cannot address the problems associated with outliers that
results in false positives. The biggest problem with correlation-based approach is that if there is a common
pattern between two different objects with single feature than discrete value of differences are not detected by
this method. This method is definitely not robust for data or patterns with non-Gaussian distribution [8].
The clustering approach utilized for sample-based methods are supervised and unsupervised approach of
selecting gene. As there are very less availability of the gene with potential information; therefore, selecting
a gene with higher score of information is quite a challenging one. Apart from this there are other problems
associated with sample-based clustering approach i.e. predefined number of cluster which is impractical and
possess time complexity. Apart from the existing system, there are various other schemes to perform
clustering approaches over gene expression data [9]. However, the problem is still unaddressed apart from
various studies in existing system which is about selection of precise algorithm for particular genomic data.
At present, there is no robust or full-proof approach to claim best clustering performance whereas it is found
that only candidate algorithms are opted by researchers in order to perform comparative analysis. Therefore,
this manuscript presents a discussion of a unique clustering algorithm which uses graph theory in order to
construct a network of the entire significant object obtained from microarray data of gene expression
structured logically. The paper gives a vivid explanation of the process undertaken in order to develop this
mechanism of clustering and shows that proposed system offers better outcome. The organization of
the paper is as follows: Section 1.1 discusses about the existing literatures where different techniques are
discussed for detection schemes used in power transmission lines followed by discussion of research
problems in Section 1.2 and proposed solution in 1.3. Section 2 discusses about algorithm implementation
followed by discussion of result analysis in Section 3. Finally, the conclusive remarks are provided in
Section 4.
- The background
There have been various approaches towards enhancing the clustering performance over the medical
data [10]. The recent work carried out by Chen et al. [11] has introduced a network model for analyzing
the redundant informationin the gene expression data. Identification of the specific form of cancer was
carried out by Farouq et al. [12] over gene expression data. The approach uses a profiling mechanism for
identification of disease condition using fusion-based mechanism. The work of Rosati et al. [13] has analyzed
gene expression data where a hierarchical clustering approach mainly has been introduced using spatial
information of the associated pattern. Existing system also finds that low rank clustering is another
frequency used mechanism for analyzing complex medical data. The work of Liu et al. [14] has used
regularization of hypergraph using a learning mechanism for performing subspace clustering. Usage of
spectral clustering is another robust mechanism in order to explore overlapping region considering the case
study of breast cancer (Luo et al. [15]). Sun et al. [16] have implemented a design of clustering mechanism
on the basis of the affinity propagation where hybrid kernel system is introduced for obtaining effective
precision. Deep learning is another frequently used clustering mechanism for investgating medical condition
from gene expression data. Suo et al. [17] have used deep learning for clustering along with fuzzy c- means
approach for carrying out clustering operation. Study towards involuntary training for the medical data is
carried out by Xia et al. [18] where sub-space clustering approach using representation approach for low
utilized ranks are harnessed. The work of Ahn et al. [19] have used time-series analysis integrated with
clustering approach over gene expression data for solving the detection of specific forms of genes.
Dominguez and Martin [20] have used a specific form of tools for computing similiarity score in order to
generate a sophisticated network of genes. The model is claimed to offer reduced computational complexity
in its clustering operation. Applying weights over the subspace is another strategy to improve clustering
performance as seen in the work of Chen et al. [21]. Principal Component Analysis is another proven strategy
to perform clustering over the gene expression data. The work of Feng et al. [22] has used Laplacian
regularization process in order to optimize the clustering performance. Singular Value Decomposition using
p-normalization approach as well as k-means clustering is another effective strategy to perform bimolecular
clustering as seen in the work of Kong et al.[23]. Apart from this other schemes toward clustering operation
are usage of ensemble classifier (Pratama [24]), Laplacian regularization with mix-norm (Wang et al. [25]),
clustering on the basis of available information (Leale et al. [26]), matrix factorization (Li et al. [27]),
random forest graph (Pouyan and Nourani [28]), integrated clustering using distance factor (Ushakov
et al. [29]), and weighted consensus matrix (Wu et al. [30]) ]. Sudha V and Girijamma H A [31] has
introduced a technique called SCDT for Gene study by using fuzzy cluster based closest neighbor
categorization. Therefore, there are different variants of clustering scheme directed towards leveraging
the clustering operation over gene expression data. All the mechanism has associated beneficial
charecteristics as well as pitfalls too. The next section briefs of the open end research problems identified
from this review.
3. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 2, April 2020 : 2060 - 2068
2062
- The research problem
The open end research issues associated with clustering approach in gene expression data are:
a. Existing clustering approach is based on implementing standard available clustering logic without any
form of amendments on the top of it and thereby ignoring the associated issues.
b. Few approaches of clustering are actually found to be cost effective computation model towards assisting
for solving complex disease classification problems.
c. There is no reported work on considering multi-dimensional data of the gene expression data for which
reason existing clustering algorithm are less practical.
d. There was no much analysis of the computational complexity associated with the existing clustering
approaches
Hence, the statement of the identified problem is “Obtaining a precise classification performance using cost-
efficient design of clustering approach over gene expression data is quite challenging to achieve”. The next
section outlines the solution adopted to address this problem.
- The proposed solution
The proposed system is an extension of the prior clustering model where fuzzy logic has been
used [31]. The model has focussed on framing up clustering framework; however, the proposed system
extends this proposition by incorporating the classification operation using a cost effective modeling
approach using graph theory. The schematic flow of the proposed system is highlighted in Figure 1.
Medical
Data
Compute
iteration
Find Non-Assymetric
directed graph
Capture maximum
value
Apply Graph Theory
Apply Differential
Operator
Apply Eigen value
Apply Eigen value &
vector
Graph Fourier
Transform
Perform classification
Figure 1. Schematic flow of proposed system
The proposed system considers a medical dataset in the form of complex gene expression data wih
multiple discrete handlers. Each handler is subjected to defined set of iteration followed by applying
non-assymetric directed graph. The system also considers maximum value of one of the handler in order to
compute weight. In order to make the decision making system easier for classification, the proposed system
constructs a network using graph theory followed by applying differential operator for better retrieval of
classification vector. The proposed system also considers that there are good probabilityof fluctuations in
the findings of the classification which can have an impact on the classification accuracy. Therefore,
this research challenge is addressed by using Eigen value and Eigen vector which is capable of capturing
the true information of any form of orientation. Finally, the proposed system applies Graph Fourier
Transform along with conventional clustering approach of k-Nearest Neighborhood algorithm for efficient
extraction of elite clusters associated with the condition of the cancer. Hypothetical clinical conventions of
dual stages of cancers are then obtained as the outcome of the study representing the classification operation.
In a nutshell, the proposed system offers a simplified approach which is capable of processing as well as
analyzing the complex gene expression data for performing classification of the conditions of cancer.
The next section illustrates about the system design and implementation of proposed classification process.
4. Int J Elec & Comp Eng ISSN: 2088-8708
Novel modelling of clustering for enhanced classification performance on gene expression data (Sudha V.)
2063
2. SYSTEM IMPLEMENTATION
The proposed algorithm basically harnesses the potential of graph partitioning system in order to
perform classification of the diseases associated with medical data in the form of the logical matrix of
variable dimensions. The research problem to solve in this state of implementation is associated with
exploring a unique pattern of the complex medical data, which is in the form of microarray database.
This section discusses about the strategies formulated for implementing the proposed algorithm following by
illustrative discussion of the algorithm execution flow.
2.1. Adopted strategy for implementation
The primary strategy of the proposed system is to ensure that there is always certain form of ground
truth values associated with the input of the gene expression data. Therefore, the dataset is considered in
a unqiue way where there is an explcit flag information to clearly state the indication of type of cancer. It has
to be understood that medical data in the form of image can be subjected for classification in the form of
malilgnant and benign state based on some morphological condition of image (i.e. the input signal).
However, this logic cannot be applied here as the signal is a form of gene expression data which is a logical
matrix of elements 0 and 1. Hence the classification will be carried out with respect to type-1 and type-2
cancer which is at par with the numerical inputs of the dataset. This consideration is more practical and more
realistic as it offers a true scale of numerical classification. The secondarystrategy of the proposed study is to
ensure that the proposed system offers significant less computational overhead while perform classification
operation and therefore, it is designed in such way that proposed system offers involvement of less iterative
mechanism to evolve up in precision calculation. The proposed system uses K-Nearest Neighboring as
clustering approach to obtained highly filtered outcome and has used graph foureir transform for better
formation of the network to represent an unique clustering approach.
2.2. Execution flow of the classification algorithm
The proposed algorithm takes the input of the d (gene expression database) which is basically a form
of complex medical data with an objective to perform cancer classification. The outcome of the algorithm is
basically a v classification vector. The steps of the algorithm are as follows:
Algorithm for Classification
Input: d (gene expression database)
Output: v (Classification Vector)
Start
1. init d={dn}, where n is number of gene expression database
2. Obtain hndn
3. gene_matrixg1(hi)
4. fluc_matg2(gene_matrix)
5. res_matg3(fluc_mat)
6. precg4(hi)
7. vpreci
End
The discussion of the execution flow of the proposed algorithm is as follows: The proposed algorithm
considers multiple gene expression dataset d where n is the number of the type of the dataset. The study
considers n=3, i.e. d1, d2, and d3. The dataset d1 represents signals of the genetic graph of the subject while d2
represents the histology aspect of it. The dataset d3 represents network of the gene which actually formulates
gene vectors for making the computation easier. The significance of d1 and d2 are that d1 offers
a representation of the muted state of the subject’s gene for flag value equivalent to 1 while the value of
the flag is equivalent to 0 if it is non-muted. Similarly, d2 dataset signifies Cancer State-I with a flag value of
1 while Cancer State-II is represented by flag value of 2 (Line-1). For simpler understanding it can be said
that proposed algorithm obtains a simplified handlers h in the form of structure for the given input dataset d
(Line-2). It will mean that h1, h2… hn will be used for representing d1, d2, …, d2 respectively. The next part of
the algorithm is about accessing the gene structure using a function g1(x) with an input argument of the one
handler hi (Line-3). In the proposed system, the first handler is considered as signal of genetic graph of
the subject. The working methodology of g1(x) is as follows: a) the dataset say d1 is considered that consist of
information associated with signal of genetic graph of the subject which is basically a logical matrix in nature
with elements of 0 and 1 and specific dimension of m x n. This matrix is represented by its handler i.e. h1.
b) An iterative process iter is constructed if the sum of diagonal elements of all the elements of the handler h1
is found to be non-zero. As the proposed system uses graph classification method therefore a directed graph
5. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 2, April 2020 : 2060 - 2068
2064
nsg is formed by accessing only the elements in the handler h1 which are non-symmetric in nature.
c) The algorithm than checks for the maximum value of the handler in order to obtain the weighted value
max_val, d) The next process is to obtain the sum of all the non-zero elements of handler h1, d) Finally,
the primary and secondary fluctuation parameters are obtained. The primary fluctuation prim_fluc is obtained
by subtracting diagnonal elements diag_elem with the handler h1 while the secondary fluctuation sec_fluc is
obtained by subtracting the handler matrix h1 with all the diagonal elements within it h1, e) The next task of
the algorithm is to obtain the network information using graph classification on obtained secondary
fluctuation sec_fluc, and f) differential operation diff_op is obtained by applying Laplacian operator to
obtained network. Finally the step used in Line-3 results in a sparsity matrix for the given handler h1.
The next part of the algorithm is to apply another function g2(x) which is responsible for evaluating
the amount of fluctuation over the given matrix by obtaining the charecteritsic root (Line-4). The input
argument of this function g2(x) are prior output arguments e.g. iteration iter, non-symmteric directed graph
nsg, maximum value of the handler max_val, diagonal elements diag_elem, primary & secondary fluctuation
prim_fluc and sec_fluc, network for classification network, and differential operation diff_op. Following
operations are carried out in the following process of g2(x) function viz. a) The first process of this step of
g2(x) function is to develop a matrix up_prim_fluc for updating the primary fluctuation value, b) the second
process of this function is to obtain eigen value of updated primary function that results in complete matrix
full_mat and diagonal matrix diag_mat, c) the third step is to obtain a sub-diagonal matrix sub_dia_mat from
original diagonal matrix dia_mat, d) the fourth step is to obtain sequence seq value by sorting the absolute
value of sub diagonal matrix sub_dia_mat in ascending order, e) this lead to generation of the updated
version of the full and diagonal matrix with respect to order, f) the next step is to apply a conditional
statement to check if the maximum value of the effective fluctuation is found less than cut-off value than
the system obtains the final value of the updated primary fluctuation otherwise it flags inappropriate eigen
decomposition outcome. g) Finally, all the shortlisted value of updated primary fluctuation are considered to
obtain the final distinct value distinct_val. Therefore, this algorithm is primarily responsible for comparing
the Eigen value of the signal with the cut-off variation to obtain the distinct value of fluctuation.
The next part of the algorithm implementation is all about applying another function g3(x) which
takes the input argument of first handler h1, second handler h2, and third handler h3. The flow of
the underlying process are as follows: a) all the elements of the first handler h1 is obtained and added up
followed by obtaining the diagonal elements of it. The obtained diagonal elements are retained in a matrix
diag_elem, b) The updated primary fluctuation prim_fluc is obtained in a similar way i.e subtracting
the obtained diagonal matrix with the first handler h1, c) obtain an eigen value as well as eigen vectors for
primary fluctuation matrix with respect to length of the matrix. The obtained value is stored in complet
matrix full_mat and inverse of this complete matrix is treated as diagonal matrix i.e. diag_elem, d) the next
process is an iterative process for lower range of transformation where the third handler h3 is multiplied with
recently obtained matrix with diagonal element i.e. diag_elem. This operation will lead to generation of all
possible transformation values explicitely for lower range of transformation to have a better control on
the computational complexity. e) the final process is to shortlist transformation matrix of order 1 and 2 as
stage-1 and stage-2 of cancer from the given gene expression dataset. The average value avg1 and avg2 are
subsequently obtained for both stage-1 and stage-2. vi) the total transform value p is obtained from obtaining
absolute value of transformation matrix followed by summation of it. This operation also leads to generation
of effective classification value eff_class by substracting both the obtained average and then divided by total
transformed value.
The next part of the algorithm is about checking the level of accuracy using a new function g4(x)
which takes the input arguments of third handler h3, second handler h2, transform matrix trans, and complete
matrix full_mat. The operational steps of this algorithm are as follow: a) The first step is to apply K-nearest
neighboring algorithm for clustering by considering the input argument of third handler h3 and second
handler h2 which yields the outcome of primary optimal number x1. b) this is followed by obtaining number
of optimal outcome of clustering x and primary precision value preci, c) The next process is to generate
primary optimal cluster O1 for lower range of transformation, d) this process is followed by obtaining
enhanced value of the transformation enh_trans by multiplying primary optimal cluster with graph fourier
transform trans, e) the obtained transformation value enh_trans is than multiplied with complete matrix
full_mat to obtain a scalar component scal_com of it. f) Similar process is continued to obtain secondary
optimal number x2 and secondary precision value sec_preci. Therefore, this stage of algorithm process is
assists in computing the overall precision of the system as shown in Figure 2.
6. Int J Elec & Comp Eng ISSN: 2088-8708
Novel modelling of clustering for enhanced classification performance on gene expression data (Sudha V.)
2065
Start
Database input
d
d1 h1 d2 h2 d3 h3
If
sum(diag(h1))≠
0
Obtain iter
Obtain all non-
symmetric no.
nsg max_val
Sum of all non-
zero h1
diag_elem-h1 Graph(h1-diag(h1))
sec_fluc
diag_elem Obtain
prim_fluc
Obtain
network
Apply Laplacian diff_opApply Eigen
value
full_mat
dia_mat sub_dia_mat
Obtain diagonal
elements
seq
sort
If effective
fluctuation
<cut-off
up_prim_fluc
distinct_val
Stop dia_mat prim_fluc
Apply eigen
value & vector
Obtain graph
Fourier
transform trans
Stage-I, avg1
Stage-I, avg1
effec_classTotal trans val
p
Apply KNN
x1 Prim_preci
Extract max
x, no.of
optimal o/p
o1,o2
x1, x2
sec_preci
obtain
Classification vector
v
Stop
Abort
T
F
T
F
Figure 2. Proposed flow of process
3. RESULT ANALYSIS
The scripting of the implementation plan of proposed system is carried out in MATLAB considering
normal system configuration of 4GM RAM and 2.20 GHz core-i3 processor in Windows system.
The outcome of the study is compared with the existing approaches of frequently used clustering.
The assessment is carried out over 100 iteration with respect to accuracy and processing time as performance
parameters. The outcome of the proposed system shows that proposed system offers better accuracy as shown
in Figure 3 and reduced processing time as shown in Figure 4 in comparison to existing clustering approach.
7. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 2, April 2020 : 2060 - 2068
2066
The prime reason behind this is that proposed system offer a comprehensive profiling scheme of the gene
pattern which is in logical structure. Usage of differential operator assists in identifying the stages of
the cancer with more accuracy. Apart from this, the usage of graph Fourier transform in a unique manner
assists in forming better form of graph filter which is capable of identifying as well as mitigating
the significant amount of artifacts or fluctuation obtained from the classification process. Apart from this
the iteration considered for the analysis is basically a mapping of the k-value of the clusters which shows that
with the increase of the cluster number, the proposed system do not have any negative impact on
the performance parameters. Apart from this, none of the variables are found to retain more than 15% of
the memory of complete graph processing. This fact will mean that proposed system also offer highly
reduced spatial complexity apart from the reduced time complexity. Therefore, the proposed clustering
proves not only to be cost effective but also a capability to perform classification of complex disease
condition for a given set of gene expression data using multi-dimensional approach of clustering.
Figure 3. Comparative analysis of accuracy Figure 4. Comparative analysis of processing
time
4. CONCLUSION
The significance of the clustering approach is realized in the proposed system where a multi-
dimensional scheme of clustering has been introduced. The rationale of multi-dimensional clustering in
proposed system is that it applies series of cluster formation using graph theory which facilitates in decision
making. The proposed system also introduces a novel clustering mechanism which targets to achieve optimal
number of clusters with highly reduced fluctutation degree in terms of its classification outcome. A significant
contribution in the proposed approach could be noticed when it exhibits the balance of reduced computational
complexity with increased accuracy of the classification for a given gene expression data.
REFERENCES
[1] Chris J. Myers, “Engineering Genetic Circuits”, CRC Press, 2016.
[2] Ho Nam Chang, Sang Yup Lee, Jens Nielsen, Gregory Stephanopoulos, “Emerging Areas in Bioengineering,” vol.1,
John Wiley & Sons, 2018.
[3] Kim, Kwang Baek, and Doo Heon Song. "Intelligent Automatic Extraction of Canine Cataract Object with Dynamic
Controlled Fuzzy C-Means based Quantization." International Journal of Electrical & Computer Engineering,
vol. 8, no. 2, 2018.
[4] Lundmark A, Gerasimcik N, Båge T, Jemt A, "Gene expression profiling of periodontitis-affected gingival tissue by
spatial transcriptomics", PubMed-Science Report, vol.8, iss.1, 2018.
[5] Omara, Hicham, Mohamed Lazaar, and Youness Tabii. "Effect of Feature Selection on Gene Expression Datasets
Classification Accuracy." International Journal of Electrical and Computer Engineering 8, no. 5, 3194, 2018.
[6] Thomas Karn, "High-Throughput Gene Expression and Mutation Profiling: Current Methods and Future
Perspectives," PMC-Breast care, vol.8, Iss.6, 2013.
[7] Jelili Oyelade, Itunuoluwa Isewon, Funke Oladipupo, Olufemi Aromolaran, "Clustering Algorithms: Their
Application to Gene Expression Data", PMC-Bioinform Biol Insights, vol.10, 2016.
[8] Chung, Gwo Chin, Mohamad Yusoff Alias, and Jun Jiat Tiang. "Bit-error-rate Optimization for CDMA Ultra-
wideband System Using Generalized Gaussian Approach." International Journal of Electrical and Computer
Engineering, vol. 7, no. 5, pp: 2661-2673, 2017.
8. Int J Elec & Comp Eng ISSN: 2088-8708
Novel modelling of clustering for enhanced classification performance on gene expression data (Sudha V.)
2067
[9] Jelili Oyelade, Itunuoluwa Isewon, Funke Oladipupo, Olufemi Aromolaran, "Clustering Algorithms: Their
Application to Gene Expression Data," PMC-Bioinform Biol Insights, vol.10,2016
[10] V. Sudha & H a, Girijamma, “Appraising Research Direction & Effectiveness of Existing Clustering Algorithm for
Medical Data,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 3, 2017.
[11] Chen, Xiang, Min Li, Ruiqing Zheng, Siyu Zhao, Fang-Xiang Wu, Yaohang Li, and Jianxin Wang. "A novel method
of gene regulatory network structure inference from gene knock-out expression data," Tsinghua Science and
Technology, vol. 24, no. 4, pp. 446-455, Aug. 2019.
[12] Farouq, Muhamed Wael, Wadii Boulila, Medhat Abdel-Aal, Amir Hussain, and Abdel Badeh Salem. "FGEP: A
Multi-Stage Fusion-Based Approach for Gene Expression Profiling in Non-Small Lung Cancer of Non-Thermal
Plasma Treatment" IEEE Access, 7:37141-37150, February 2019.
[13] Rosati, Paolo, Carmen A. Lupaşcu, and Domenico Tegolo. "Analysis of low-correlated spatial gene expression
patterns: a clustering approach in the mouse brain data hosted in the Allen Brain Atlas" IET Computer Vision, vol.
12, no. 7, pp. 996-1006, 10 2018.
[14] J. Liu, Y. Cheng, X. Wang, X. Cui, Y. Kong and J. Du, "Low Rank Subspace Clustering via Discrete Constraint and
Hypergraph Regularization for Tumor Molecular Pattern Discovery," in IEEE/ACM Transactions on Computational
Biology and Bioinformatics, vol. 15, no. 5, pp. 1500-1512, 1 Sept.-Oct. 2018.
[15] Luo, Jiawei, Ying Yin, Chu Pan, Gen Xiang, and Nguyen Hoang Tu. "Identifying functional modules in co-
regulatory networks through overlapping spectral clustering." IEEE transactions on nanobioscience vol. 17, no. 2,
pp: 134-144, 2018.
[16] L. Sun, R. Liu, J. Xu, S. Zhang and Y. Tian, "An Affinity Propagation Clustering Method Using Hybrid Kernel
Function With LLE," in IEEE Access, vol. 6, pp. 68892-68909, 2018.
[17] Suo, Yina, Tingwei Liu, Xueyong Jia, and Fuxing Yu. "Application of clustering analysis in brain gene data based on
deep learning." IEEE Access, vol. 7, pp: 2947-2956, 2019.
[18] Xia, Chun-Qiu, Ke Han, Yong Qi, Yang Zhang, and Dong-Jun Yu. "A self-training subspace clustering algorithm
under low-rank representation for cancer classification on gene expression data." IEEE/ACM Transactions on
Computational Biology and Bioinformatics (TCBB), vol. 15, no. 4, pp: 1315-1324, 2018.
[19] Ahn, Hongryul, Heejoon Chae, Woosuk Jung, and Sun Kim. "Integration of heterogeneous time series gene
expression data by clustering on time dimension." In 2017 IEEE International Conference on Big Data and Smart
Computing (BigComp), pp. 332-335. IEEE, 2017.
[20] Gonzalez-Dominguez, Jorge, and Maria J. Martin. "MPIGeneNet: Parallel Calculation of Gene Co-Expression
Networks on Multicore Clusters." IEEE/ACM transactions on computational biology and bioinformatics, vol. 15,
no. 5 (2018): 1732-1737.
[21] Chen, Xiaojun, Joshua Zhexue Huang, Qingyao Wu, and Min Yang. "Subspace weighting co-clustering of gene
expression data," IEEE/ACM transactions on computational biology and bioinformatics, vol. 16, no. 2, pp. 352-364,
1 March-April 2019.
[22] Feng, Chun-Mei, Ying-Lian Gao, Jin-Xing Liu, Chun-Hou Zheng, and Jiguo Yu. "PCA based on graph laplacian
regularization and P-norm for gene selection and clustering." IEEE transactions on nanobioscience, vol. 16, no. 4,
pp: 257-265, 2017.
[23] Kong, Xiang-Zhen, Jin-Xing Liu, Chun-Hou Zheng, Mi-Xiao Hou, and Juan Wang. "Robust and Efficient
Biomolecular Clustering of Tumor Based on ${p} $-Norm Singular Value Decomposition." IEEE transactions on
nanobioscience, vol. 16, no. 5, pp: 341-348, 2017.
[24] Pratama, M. Octaviano. "Kidney transplant classification with gene expression profiles using L1 feature selection
ensemble classifier based on data clustering" In 2017 International Conference on Advanced Computer Science and
Information Systems (ICACSIS), pp. 239-244. IEEE, 2017.
[25] Wang, Juan, Jin-Xing Liu, Chun-Hou Zheng, Ya-Xuan Wang, Xiang-Zhen Kong, and Chang-Gang Wen. "A Mixed-
Norm Laplacian Regularized Low-Rank Representation Method for Tumor Samples Clustering." IEEE/ACM
Transactions on Computational Biology and Bioinformatics (TCBB) vol. 16, no. 1, pp: 172-182, 2019.
[26] Leale, Guillermo, Ariel Emilio Bayá, Diego H. Milone, Pablo M. Granitto, and Georgina Stegmayer. "Inferring
unknown biological function by integration of GO annotations and gene expression data." IEEE/ACM transactions
on computational biology and bioinformatics, vol. 15, no. 1, pp: 168-180, 2018.
[27] Li, Jianqiang, and Fei Wang. "Towards unsupervised gene selection: a matrix factorization framework." IEEE/ACM
Transactions on Computational Biology and Bioinformatics (TCBB), vol.14, no. 3, pp: 514-521, 2017.
[28] Pouyan, Maziyar Baran, and Mehrdad Nourani. "Clustering single-cell expression data using random forest
graphs." IEEE journal of biomedical and health informatics, vol. 21, no. 4, pp: 1172-1181, 2017.
[29] Ushakov, Anton V., Xenia Klimentova, and Igor Vasilyev. "Bi-level and bi-objective p-median type problems for
integrative clustering: application to analysis of cancer gene-expression and drug-response data," IEEE/ACM
transactions on computational biology and bioinformatics, vol. 15, no. 1, pp: 46-59, 2018.
[30] Wu, Min, Le Ou-Yang, and Xiao-Li Li. "Protein complex detection via effective integration of base clustering
solutions and co-complex affinity scores" IEEE/ACM Transactions on Computational Biology and Bioinformatics
(TCBB), vol. 14, no. 3, pp: 733-739, 2017.
[31] V. Sudha and H. A. Girijamma, "Novel clustering of bigger and complex medical data by enhanced fuzzy logic
structure," 2017 International Conference on Circuits, Controls, and Communications (CCUBE), Bangalore,
pp. 131-135, 2017.
9. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 2, April 2020 : 2060 - 2068
2068
BIOGRAPHIES OF AUTHORS
SudhaV, currently working as Assistant Professor in the Department of Information Science and
Engineering, RNS Institute of Technology, Bengaluru and having teaching experience of 11 year.
Her research interests are in the field of Data mining, Data analytics and machine learning
algorithms.
Dr. Girijamma H A, currently working as Professor in the Department of Computer Science and
Engineering, RNS Institute of Technology, Bengaluru and having teaching experience of 23 year.
Her research interests are in the field of automata, fuzzy logic and machine learning algorithms.