The document proposes a novel hybrid method called PCA-BEL for classifying gene expression microarray data. PCA-BEL uses principal component analysis (PCA) for feature extraction followed by classification using a Brain Emotional Learning (BEL) network. PCA reduces the dimensionality of the microarray data to overcome the high dimensionality problem. BEL is then used for classification due its low computational complexity making it suitable for high dimensional data. The method is tested on several cancer gene expression datasets and achieves average accuracies of 100%, 96%, 98.32%, 87.40% and 88% on five datasets respectively, demonstrating its effectiveness for microarray classification tasks.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many
biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon
cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies
in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in
their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms
and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the
matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix
Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification
accuracies are then compared for these algorithms.This technique gives an accuracy of 98%
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms.This technique gives an accuracy of 98%.
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
Medical data mining has great deal for exploring new knowledge from large amount of data. Classification is one of the important data mining techniques for classification of data. In this research work, we have used various data mining based classification techniques for classification of cancer diseases patient or not. We applied the Breast Cancer-Wisconsin (Original) data set into different data mining techniques and compared the accuracy of models with two different data partitions. BayesNet achieved highest accuracy as 97.13% in case of 10-fold data partitions. We have also applied the info gain feature selection technique on BayesNet and Support Vector Machine (SVM) and achieved best accuracy 97.28% accuracy with BayesNet in case of 6 feature subset.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
Abstract- The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
Clustering and Classification of Cancer Data Using Soft Computing Technique IOSR Journals
Clustering and classification of cancer data has been used with success in field of medical side. In
this paper the two algorithm K-means and fuzzy C-means proposed for the comparison and find the accuracy of
the result. this paper address the problem of learning to classify the cancer data with two different method and
information derived from the training and testing .various soft computing based classification and show the
comparison of classification technique and classification of this health care data .this paper present the
accuracy of the result in cancer data.
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...mlaij
The classification of different types of tumor is of great importance in cancer diagnosis and drug discovery.
Earlier studies on cancer classification have limited diagnostic ability. The recent development of DNA
microarray technology has made monitoring of thousands of gene expression simultaneously. By using this
abundance of gene expression data researchers are exploring the possibilities of cancer classification.
There are number of methods proposed with good results, but lot of issues still need to be addressed. This
paper present an overview of various cancer classification methods and evaluate these proposed methods
based on their classification accuracy, computational time and ability to reveal gene information. We have
also evaluated and introduced various proposed gene selection method. In this paper, several issues
related to cancer classification have also been discussed.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many
biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon
cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies
in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in
their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms
and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the
matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix
Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification
accuracies are then compared for these algorithms.This technique gives an accuracy of 98%
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms.This technique gives an accuracy of 98%.
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
Medical data mining has great deal for exploring new knowledge from large amount of data. Classification is one of the important data mining techniques for classification of data. In this research work, we have used various data mining based classification techniques for classification of cancer diseases patient or not. We applied the Breast Cancer-Wisconsin (Original) data set into different data mining techniques and compared the accuracy of models with two different data partitions. BayesNet achieved highest accuracy as 97.13% in case of 10-fold data partitions. We have also applied the info gain feature selection technique on BayesNet and Support Vector Machine (SVM) and achieved best accuracy 97.28% accuracy with BayesNet in case of 6 feature subset.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
Abstract- The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
Clustering and Classification of Cancer Data Using Soft Computing Technique IOSR Journals
Clustering and classification of cancer data has been used with success in field of medical side. In
this paper the two algorithm K-means and fuzzy C-means proposed for the comparison and find the accuracy of
the result. this paper address the problem of learning to classify the cancer data with two different method and
information derived from the training and testing .various soft computing based classification and show the
comparison of classification technique and classification of this health care data .this paper present the
accuracy of the result in cancer data.
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...mlaij
The classification of different types of tumor is of great importance in cancer diagnosis and drug discovery.
Earlier studies on cancer classification have limited diagnostic ability. The recent development of DNA
microarray technology has made monitoring of thousands of gene expression simultaneously. By using this
abundance of gene expression data researchers are exploring the possibilities of cancer classification.
There are number of methods proposed with good results, but lot of issues still need to be addressed. This
paper present an overview of various cancer classification methods and evaluate these proposed methods
based on their classification accuracy, computational time and ability to reveal gene information. We have
also evaluated and introduced various proposed gene selection method. In this paper, several issues
related to cancer classification have also been discussed.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from
large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an
established, proven structure from a voluminous collection of facts. A dominant area of modern-day
research in the field of medical investigations includes disease prediction and malady categorization. In
this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering
techniques and compare the performance of classification algorithms on the clinical data. Feature
selection is a supervised method that attempts to select a subset of the predictor features based on the
information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with
the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms
in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen
classification algorithms on the Lymphography dataset that enables the classifier to accurately perform
multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and
the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and
also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated
here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm
offers increased clustering accuracy in less computation time.
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...gerogepatton
Parkinson’s disease is a complex chronic neurodegenerative disorder of the central nervous system. One of the common symptoms for the Parkinson’s disease subjects, is vocal performance degradation. Patients usually advised to follow personalized rehabilitative treatment sessions with speech experts. Recent research trends aim to investigate the potential of using sustained vowel phonations for replicating the speech experts’ assessments of Parkinson’s disease subjects’ voices. With the purpose of improving the accuracy and efficiency of Parkinson’s disease treatment, this article proposes a two-stage diagnosis model to evaluate an LSVT dataset. Firstly, we propose a modified minimum Redundancy-Maximum Relevance (mRMR) feature selection approach, based on Cuckoo Search and Tabu Search to reduce the features numbers. Secondly, we apply simple random sampling technique to dataset to increase the samples of the minority class. Promisingly, the developed approach obtained a classification Accuracy rate of 95% with 24 features by 10-fold CV method.
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
Effective Feature Selection for Feature Possessing Group Structurerahulmonikasharma
Feature selection has become an interesting research topic in recent years. It is an effective method to tackle the data with high dimension. The underlying structure has been ignored by the previous feature selection method and it determines the feature individually. Considering this we focus on the problem where feature possess some group structure. To solve this problem we present group feature selection method at group level to execute feature selection. Its objective is to execute the feature selection in within the group and between the group of features that select discriminative features and remove redundant features to obtain optimal subset. We demonstrate our method on data sets and perform the task to achieve classification accuracy.
A chi-square-SVM based pedagogical rule extraction method for microarray data...IJAAS Team
Support Vector Machine (SVM) is currently an efficient classification technique due to its ability to capture nonlinearities in diagnostic systems, but it does not reveal the knowledge learnt during training. It is important to understand of how a decision is reached in the machine learning technology, such as bioinformatics. On the other hand, a decision tree has good comprehensibility; the process of converting such incomprehensible models into an understandable model is often regarded as rule extraction. In this paper we proposed an approach for extracting rules from SVM for microarray dataset by combining the merits of both the SVM and decision tree. The proposed approach consists of three steps; the SVM-CHI-SQUARE is employed to reduce the feature set. Dataset with reduced features is used to obtain SVM model and synthetic data is generated. Classification and Regression Tree (CART) is used to generate Rules as the Last phase. We use breast masses dataset from UCI repository where comprehensibility is a key requirement. From the result of the experiment as the reduced feature dataset is used, the proposed approach extracts smaller length rules, thereby improving the comprehensibility of the system. We obtained accuracy of 93.53%, sensitivity of 89.58%, specificity of 96.70%, and training time of 3.195 seconds. A comparative analysis is carried out done with other algorithms.
sis of health condition is very challenging task for every human being because life is directly related to health
condition. Data mining based classification is one of the important applications for classification of data. In this
research work, we have used various classification techniques for classification of thyroid data. CART gives highest
accuracy 99.47% as best model. Feature selection plays very important role to computationally efficient and increase
the performance of model. This research work focus on Info Gain and Gain Ratio feature selection technique to
reduce the irrelevant features from original data set and computationally increase the performance of model. We have
applied both the feature selection techniques on best model i. e. CART. Our proposed CART-Info Gain and CARTGain
Ratio gives 99.47% and 99.20% accuracy with 25 and 3 feature respectively.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
An approach for breast cancer diagnosis classification using neural networkacijjournal
Artificial neural network has been widely used in various fields as an intelligent tool in recent years, such
as artificial intelligence, pattern recognition, medical diagnosis, machine learning and so on. The
classification of breast cancer is a medical application that poses a great challenge for researchers and
scientists. Recently, the neural network has become a popular tool in the classification of cancer datasets.
Classification is one of the most active research and application areas of neural networks. Major
disadvantages of artificial neural network (ANN) classifier are due to its sluggish convergence and always
being trapped at the local minima. To overcome this problem, differential evolution algorithm (DE) has
been used to determine optimal value or near optimal value for ANN parameters. DE has been applied
successfully to improve ANN learning from previous studies. However, there are still some issues on DE
approach such as longer training time and lower classification accuracy. To overcome these problems,
island based model has been proposed in this system. The aim of our study is to propose an approach for
breast cancer distinguishing between different classes of breast cancer. This approach is based on the
Wisconsin Diagnostic and Prognostic Breast Cancer and the classification of different types of breast
cancer datasets. The proposed system implements the island-based training method to be better accuracy
and less training time by using and analysing between two different migration topologies
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A new model for large dataset dimensionality reduction based on teaching lear...TELKOMNIKA JOURNAL
One of the human diseases with a high rate of mortality each year is breast cancer (BC). Among all the forms of cancer, BC is the commonest cause of death among women globally. Some of the effective ways of data classification are data mining and classification methods. These methods are particularly efficient in the medical field due to the presence of irrelevant and redundant attributes in medical datasets. Such redundant attributes are not needed to obtain an accurate estimation of disease diagnosis. Teaching learning-based optimization (TLBO) is a new metaheuristic that has been successfully applied to several intractable optimization problems in recent years. This paper presents the use of a multi-objective TLBO algorithm for the selection of feature subsets in automatic BC diagnosis. For the classification task in this work, the logistic regression (LR) method was deployed. From the results, the projected method produced better BC dataset classification accuracy (classified into malignant and benign). This result showed that the projected TLBO is an efficient features optimization technique for sustaining data-based decision-making systems.
Classification of medical datasets using back propagation neural network powe...IJECEIAES
The classification is a one of the most indispensable domains in the data mining and machine learning. The classification process has a good reputation in the area of diseases diagnosis by computer systems where the progress in smart technologies of computer can be invested in diagnosing various diseases based on data of real patients documented in databases. The paper introduced a methodology for diagnosing a set of diseases including two types of cancer (breast cancer and lung), two datasets for diabetes and heart attack. Back Propagation Neural Network plays the role of classifier. The performance of neural net is enhanced by using the genetic algorithm which provides the classifier with the optimal features to raise the classification rate to the highest possible. The system showed high efficiency in dealing with databases differs from each other in size, number of features and nature of the data and this is what the results illustrated, where the ratio of the classification reached to 100% in most datasets).
Design of Quadrature Mirror Filter Bank using Particle Swarm Optimization (PSO)IDES Editor
In this paper, the particle swarm optimization
technique is used for the design of a two channel quadrature
mirror filter (QMF) bank. A new method is developed to
optimize the prototype filter response in passband, stopband
and overall filter bank response. The design problem is
formulated as nonlinear unconstrained optimization of an
objective function, which is weighted sum of square of error
in passband, stop band and overall filter bank response at
frequency ( ω=0.5π ). For solving the given optimization
problem, the particle swarm optimization (PSO) technique
is used. As compared to the conventional design techniques,
the proposed method gives better performance in terms of
reconstruction error, mean square error in passband,
stopband, and computational time. Various design examples
are presented to illustrate the benefits provided by the
proposed method.
Experimental performance of pv fed led lighting system employing particle swa...eSAT Journals
Abstract This paper proposes Particle Swarm Optimization (PSO) based approach for light intensity control in a Photo Voltaic (PV) supply fed LED lighting system. The output power regulation of LED lighting system is formulated as an optimization problem and the output power regulation is carried out using a real time PSO technique. The concept is first simulated and verified and subsequently implemented on a 16-bit PIC microcontroller. The proposed method possesses the major advantage of system independence while regulating the output power. The computed and measured results clearly illustrate the effectiveness of the new method. Keywords: Particle Swarm Optimization, Light Emitting Diode and Photo Voltaic System etc…
An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...CSCJournals
The two widespread concepts of noise reduction algorithms could be observed are spectral noise subtraction and adaptive filtering. They have the disadvantage that there is no parameter to distinguish between the speech and the noise components of same frequency. In this paper, an intelligent controller, BELBIC, based on mammalian limbic Emotional Learning algorithms is used for increasing the speech quality from a noisy environment. Here the learning ability to train the system to recognize and the output thus obtained would be the fundamental frequency of the speech spectrum thus reducing the noise level to minimum. The parameters on which the reduction of noise from the input speech spectrum depends have also been studied. The real time implementations have been done using Simulink and the results of the analysis thus obtained are included in the end.
Identification and real time position control of a servo-hydraulic rotary act...ISA Interchange
This paper presents a new intelligent approach for adaptive control of a nonlinear dynamic system. A modified version of the brain emotional learning based intelligent controller (BELBIC), a bio-inspired algorithm based upon a computational model of emotional learning which occurs in the amygdala, is utilized for position controlling a real laboratorial rotary electro-hydraulic servo (EHS) system. EHS systems are known to be nonlinear and non-smooth due to many factors such as leakage, friction, hysteresis, null shift, saturation, dead zone, and especially fluid flow expression through the servo valve. The large value of these factors can easily influence the control performance in the presence of a poor design. In this paper, a mathematical model of the EHS system is derived, and then the parameters of the model are identified using the recursive least squares method. In the next step, a BELBIC is designed based on this dynamic model and utilized to control the real laboratorial EHS system. To prove the effectiveness of the modified BELBIC’s online learning ability in reducing the overall tracking error, results have been compared to those obtained from an optimal PID controller, an auto-tuned fuzzy PI controller (ATFPIC), and a neural network predictive controller (NNPC) under similar circumstances. The results demonstrate not only excellent improvement in control action, but also less energy consumption.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from
large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an
established, proven structure from a voluminous collection of facts. A dominant area of modern-day
research in the field of medical investigations includes disease prediction and malady categorization. In
this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering
techniques and compare the performance of classification algorithms on the clinical data. Feature
selection is a supervised method that attempts to select a subset of the predictor features based on the
information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with
the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms
in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen
classification algorithms on the Lymphography dataset that enables the classifier to accurately perform
multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and
the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and
also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated
here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm
offers increased clustering accuracy in less computation time.
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...gerogepatton
Parkinson’s disease is a complex chronic neurodegenerative disorder of the central nervous system. One of the common symptoms for the Parkinson’s disease subjects, is vocal performance degradation. Patients usually advised to follow personalized rehabilitative treatment sessions with speech experts. Recent research trends aim to investigate the potential of using sustained vowel phonations for replicating the speech experts’ assessments of Parkinson’s disease subjects’ voices. With the purpose of improving the accuracy and efficiency of Parkinson’s disease treatment, this article proposes a two-stage diagnosis model to evaluate an LSVT dataset. Firstly, we propose a modified minimum Redundancy-Maximum Relevance (mRMR) feature selection approach, based on Cuckoo Search and Tabu Search to reduce the features numbers. Secondly, we apply simple random sampling technique to dataset to increase the samples of the minority class. Promisingly, the developed approach obtained a classification Accuracy rate of 95% with 24 features by 10-fold CV method.
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
Effective Feature Selection for Feature Possessing Group Structurerahulmonikasharma
Feature selection has become an interesting research topic in recent years. It is an effective method to tackle the data with high dimension. The underlying structure has been ignored by the previous feature selection method and it determines the feature individually. Considering this we focus on the problem where feature possess some group structure. To solve this problem we present group feature selection method at group level to execute feature selection. Its objective is to execute the feature selection in within the group and between the group of features that select discriminative features and remove redundant features to obtain optimal subset. We demonstrate our method on data sets and perform the task to achieve classification accuracy.
A chi-square-SVM based pedagogical rule extraction method for microarray data...IJAAS Team
Support Vector Machine (SVM) is currently an efficient classification technique due to its ability to capture nonlinearities in diagnostic systems, but it does not reveal the knowledge learnt during training. It is important to understand of how a decision is reached in the machine learning technology, such as bioinformatics. On the other hand, a decision tree has good comprehensibility; the process of converting such incomprehensible models into an understandable model is often regarded as rule extraction. In this paper we proposed an approach for extracting rules from SVM for microarray dataset by combining the merits of both the SVM and decision tree. The proposed approach consists of three steps; the SVM-CHI-SQUARE is employed to reduce the feature set. Dataset with reduced features is used to obtain SVM model and synthetic data is generated. Classification and Regression Tree (CART) is used to generate Rules as the Last phase. We use breast masses dataset from UCI repository where comprehensibility is a key requirement. From the result of the experiment as the reduced feature dataset is used, the proposed approach extracts smaller length rules, thereby improving the comprehensibility of the system. We obtained accuracy of 93.53%, sensitivity of 89.58%, specificity of 96.70%, and training time of 3.195 seconds. A comparative analysis is carried out done with other algorithms.
sis of health condition is very challenging task for every human being because life is directly related to health
condition. Data mining based classification is one of the important applications for classification of data. In this
research work, we have used various classification techniques for classification of thyroid data. CART gives highest
accuracy 99.47% as best model. Feature selection plays very important role to computationally efficient and increase
the performance of model. This research work focus on Info Gain and Gain Ratio feature selection technique to
reduce the irrelevant features from original data set and computationally increase the performance of model. We have
applied both the feature selection techniques on best model i. e. CART. Our proposed CART-Info Gain and CARTGain
Ratio gives 99.47% and 99.20% accuracy with 25 and 3 feature respectively.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
An approach for breast cancer diagnosis classification using neural networkacijjournal
Artificial neural network has been widely used in various fields as an intelligent tool in recent years, such
as artificial intelligence, pattern recognition, medical diagnosis, machine learning and so on. The
classification of breast cancer is a medical application that poses a great challenge for researchers and
scientists. Recently, the neural network has become a popular tool in the classification of cancer datasets.
Classification is one of the most active research and application areas of neural networks. Major
disadvantages of artificial neural network (ANN) classifier are due to its sluggish convergence and always
being trapped at the local minima. To overcome this problem, differential evolution algorithm (DE) has
been used to determine optimal value or near optimal value for ANN parameters. DE has been applied
successfully to improve ANN learning from previous studies. However, there are still some issues on DE
approach such as longer training time and lower classification accuracy. To overcome these problems,
island based model has been proposed in this system. The aim of our study is to propose an approach for
breast cancer distinguishing between different classes of breast cancer. This approach is based on the
Wisconsin Diagnostic and Prognostic Breast Cancer and the classification of different types of breast
cancer datasets. The proposed system implements the island-based training method to be better accuracy
and less training time by using and analysing between two different migration topologies
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A new model for large dataset dimensionality reduction based on teaching lear...TELKOMNIKA JOURNAL
One of the human diseases with a high rate of mortality each year is breast cancer (BC). Among all the forms of cancer, BC is the commonest cause of death among women globally. Some of the effective ways of data classification are data mining and classification methods. These methods are particularly efficient in the medical field due to the presence of irrelevant and redundant attributes in medical datasets. Such redundant attributes are not needed to obtain an accurate estimation of disease diagnosis. Teaching learning-based optimization (TLBO) is a new metaheuristic that has been successfully applied to several intractable optimization problems in recent years. This paper presents the use of a multi-objective TLBO algorithm for the selection of feature subsets in automatic BC diagnosis. For the classification task in this work, the logistic regression (LR) method was deployed. From the results, the projected method produced better BC dataset classification accuracy (classified into malignant and benign). This result showed that the projected TLBO is an efficient features optimization technique for sustaining data-based decision-making systems.
Classification of medical datasets using back propagation neural network powe...IJECEIAES
The classification is a one of the most indispensable domains in the data mining and machine learning. The classification process has a good reputation in the area of diseases diagnosis by computer systems where the progress in smart technologies of computer can be invested in diagnosing various diseases based on data of real patients documented in databases. The paper introduced a methodology for diagnosing a set of diseases including two types of cancer (breast cancer and lung), two datasets for diabetes and heart attack. Back Propagation Neural Network plays the role of classifier. The performance of neural net is enhanced by using the genetic algorithm which provides the classifier with the optimal features to raise the classification rate to the highest possible. The system showed high efficiency in dealing with databases differs from each other in size, number of features and nature of the data and this is what the results illustrated, where the ratio of the classification reached to 100% in most datasets).
Design of Quadrature Mirror Filter Bank using Particle Swarm Optimization (PSO)IDES Editor
In this paper, the particle swarm optimization
technique is used for the design of a two channel quadrature
mirror filter (QMF) bank. A new method is developed to
optimize the prototype filter response in passband, stopband
and overall filter bank response. The design problem is
formulated as nonlinear unconstrained optimization of an
objective function, which is weighted sum of square of error
in passband, stop band and overall filter bank response at
frequency ( ω=0.5π ). For solving the given optimization
problem, the particle swarm optimization (PSO) technique
is used. As compared to the conventional design techniques,
the proposed method gives better performance in terms of
reconstruction error, mean square error in passband,
stopband, and computational time. Various design examples
are presented to illustrate the benefits provided by the
proposed method.
Experimental performance of pv fed led lighting system employing particle swa...eSAT Journals
Abstract This paper proposes Particle Swarm Optimization (PSO) based approach for light intensity control in a Photo Voltaic (PV) supply fed LED lighting system. The output power regulation of LED lighting system is formulated as an optimization problem and the output power regulation is carried out using a real time PSO technique. The concept is first simulated and verified and subsequently implemented on a 16-bit PIC microcontroller. The proposed method possesses the major advantage of system independence while regulating the output power. The computed and measured results clearly illustrate the effectiveness of the new method. Keywords: Particle Swarm Optimization, Light Emitting Diode and Photo Voltaic System etc…
An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...CSCJournals
The two widespread concepts of noise reduction algorithms could be observed are spectral noise subtraction and adaptive filtering. They have the disadvantage that there is no parameter to distinguish between the speech and the noise components of same frequency. In this paper, an intelligent controller, BELBIC, based on mammalian limbic Emotional Learning algorithms is used for increasing the speech quality from a noisy environment. Here the learning ability to train the system to recognize and the output thus obtained would be the fundamental frequency of the speech spectrum thus reducing the noise level to minimum. The parameters on which the reduction of noise from the input speech spectrum depends have also been studied. The real time implementations have been done using Simulink and the results of the analysis thus obtained are included in the end.
Identification and real time position control of a servo-hydraulic rotary act...ISA Interchange
This paper presents a new intelligent approach for adaptive control of a nonlinear dynamic system. A modified version of the brain emotional learning based intelligent controller (BELBIC), a bio-inspired algorithm based upon a computational model of emotional learning which occurs in the amygdala, is utilized for position controlling a real laboratorial rotary electro-hydraulic servo (EHS) system. EHS systems are known to be nonlinear and non-smooth due to many factors such as leakage, friction, hysteresis, null shift, saturation, dead zone, and especially fluid flow expression through the servo valve. The large value of these factors can easily influence the control performance in the presence of a poor design. In this paper, a mathematical model of the EHS system is derived, and then the parameters of the model are identified using the recursive least squares method. In the next step, a BELBIC is designed based on this dynamic model and utilized to control the real laboratorial EHS system. To prove the effectiveness of the modified BELBIC’s online learning ability in reducing the overall tracking error, results have been compared to those obtained from an optimal PID controller, an auto-tuned fuzzy PI controller (ATFPIC), and a neural network predictive controller (NNPC) under similar circumstances. The results demonstrate not only excellent improvement in control action, but also less energy consumption.
The modern power system around the world has grown in complexity of interconnection and
power demand. The focus has shifted towards enhanced performance, increased customer focus,
low cost, reliable and clean power. In this changed perspective, scarcity of energy resources,
increasing power generation cost, environmental concern necessitates optimal economic dispatch.
In reality power stations neither are at equal distances from load nor have similar fuel cost
functions. Hence for providing cheaper power, load has to be distributed among various power
stations in a way which results in lowest cost for generation. Practical economic dispatch (ED)
problems have highly non-linear objective function with rigid equality and inequality constraints.
Particle swarm optimization (PSO) is applied to allot the active power among the generating
stations satisfying the system constraints and minimizing the cost of power generated. The
viability of the method is analyzed for its accuracy and rate of convergence. The economic load
dispatch problem is solved for three and six unit system using PSO and conventional method for
both cases of neglecting and including transmission losses. The results of PSO method were
compared with conventional method and were found to be superior. The conventional
optimization methods are unable to solve such problems due to local optimum solution
convergence. Particle Swarm Optimization (PSO) since its initiation in the last 15 years has been
a potential solution to the practical constrained economic load dispatch (ELD) problem. The
optimization technique is constantly evolving to provide better and faster results.
While writing the report on our project seminar, we were wondering that Science and smart
technology are as ever expanding field and the engineers working hard day and night and make
the life a gift for us
ECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATIONMln Phaneendra
In this ppt particle swarm optimization (PSO) is applied to allot the active power among the generating stations satisfying the system constraints and minimizing the cost of power generated.The viability of the method is analyzed for its accuracy and rate of convergence. The economic load dispatch problem is solved for three and six unit system using PSO and conventional method for both cases of neglecting and including transmission losses. The results of PSO method were compared with conventional method and were found to be superior.
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to
analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection based ensemble learning models is to classify the high dimensional data with high computational efficiency and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
The development of microarray technology has suppli
ed a large volume of data to many fields. The gene
microarray analysis and classification have demonst
rated an effective way for the effective diagnosis
of
diseases and cancers. In as much as the data achiev
ing from microarray technology is very noisy and al
so
has thousands of features, feature selection plays
an important role in removing irrelevant and redund
ant
features and also reducing computational complexity
. There are two important approaches for gene
selection in microarray data analysis, the filters
and the wrappers. To select a concise subset of inf
ormative
genes, we introduce a hybrid feature selection whic
h combines two approaches. The fact of the matter i
s
that candidate’s features are first selected from t
he original set via several effective filters. The
candidate
feature set is further refined by more accurate wra
ppers. Thus, we can take advantage of both the filt
ers
and wrappers. Experimental results based on 11 micr
oarray datasets show that our mechanism can be
effected with a smaller feature set. Moreover, thes
e feature subsets can be obtained in a reasonable t
ime
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
The development of microarray technology has supplied a large volume of data to many fields. The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. In as much as the data achieving from microarray technology is very noisy and also has thousands of features, feature selection plays an important role in removing irrelevant and redundant features and also reducing computational complexity. There are two important approaches for gene selection in microarray data analysis, the filters and the wrappers. To select a concise subset of informative genes, we introduce a hybrid feature selection which combines two approaches. The fact of the matter is that candidate’s features are first selected from the original set via several effective filters. The candidate feature set is further refined by more accurate wrappers. Thus, we can take advantage of both the filters and wrappers. Experimental results based on 11 microarray datasets show that our mechanism can be effected with a smaller feature set. Moreover, these feature subsets can be obtained in a reasonable time.
The analysis of proteins and messenger RNA is commonly used in the comparison of gene expression patterns in tissues or cells of different types and under distinct conditions. In gene expression analysis, normalization is a critical step as it guarantees the validity of downstream analyses. Data preprocessing is an indispensable step in the extraction and normalization of microarray gene expression data. The normalization of gene expression data is essential in ensuring accurate inferences. A number of normalization methods in high throughput sequencing studies are being employed. The preprocessing activity begins by a careful analysis of the gene expression data and usually involves the classification of many raw signal intensities into one expression value. The Robust Multiarray Average (RMA) is a normalization approach for microarrays that involves background correction, normalization and summarization of probe levels information without using MM probes (Lim et al., 2007). It is an algorithm commonly used in the creation of an expression matrix for Affymetrix data and is one of the most commonly used modes of preprocessing to normalize gene expression data. Values of raw intensity are initially background corrected and log2 transformed before being normalized. In order to generate an expression measure for probe sets on each array, a linear model is fitted to the normalized data.
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
In this research, a hybrid wrapper model is proposed to identify the featured gene subset from the gene expression data. To balance the gap between exploration
and exploitation, a hybrid model with a popular meta-heuristic algorithm named
spider monkey optimizer (SMO) and simulated annealing (SA) is applied. In the proposed model, ReliefF is used as a filter to obtain the relevant gene subset
from dataset by removing the noise and outliers prior to feeding the data to the
wrapper SMO. To enhance the quality of the solution, simulated annealing is
deployed as local search with the SMO in the second phase, which will guide to the detection of the most optimal feature subset. To evaluate the performance of the proposed model, support vector machine (SVM) as a fitness function to recognize the most informative biomarker gene from the cancer datasets along with University of California, Irvine (UCI) datasets. To further evaluate the model, 4 different classifiers (SVM, na¨ıve Bayes (NB), decision tree (DT), and k-nearest neighbors (KNN)) are used. From the experimental results and analysis, it’s noteworthy to accept that the ReliefF-SMO-SA-SVM performs relatively better than its state-of-the-art counterparts. For cancer datasets, our model performs better in terms of accuracy with a maximum of 99.45%.
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...cscpconf
Optimization problems are dominantly being solved using Computational Intelligence. One of
the issues that can be addressed in this context is problems related to attribute subset selection
evaluation. This paper presents a computational intelligence technique for solving the
optimization problem using a proposed model called Modified Genetic Search Algorithms
(MGSA) that avoids local bad search space with merit and scaled fitness variables, detecting
and deleting bad candidate chromosomes, thereby reducing the number of individual
chromosomes from search space and subsequent iterations in next generations. This paper aims
to show that Rotation forest ensembles are useful in the feature selection method. The base
classifier is multinomial logistic regression method integrated with Haar wavelets as projection
filter and reproducing the ranks of each features with 10 fold cross validation method. It also
discusses the main findings and concludes with promising result of the proposed model. It
explores the combination of MGSA for optimization with Naïve Bayes classification. The result
obtained using proposed model MGSA is validated mathematically using Principal Component
Analysis. The goal is to improve the accuracy and quality of diagnosis of Breast cancer disease
with robust machine learning algorithms. As compared to other works in literature survey,
experimental results achieved in this paper show better results with statistical inferenc
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...Kiogyf
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND FIREFLY ALGORITHM
ABSTRACT
Cancer is a globally recognized cause of death. A proper cancer analysis demands the classification of several types of tumor. Investigations into microarray gene expressions seem to be a successful platform for revising genetic diseases. Although the standard machine learning (ML) approaches have been efficient in the realization of significant genes and in the classification of new types of cancer cases, their medical and logical application has faced several drawbacks such as DNA microarray data analysis limitation, which includes an incredible number of features and the relatively small size of an instance. To achieve a reasonable and efficient DNA microarray dataset information, there is a need to extend the level of interpretability and forecast approach while maintaining a great level of precision. In this work, a novel way of cancer classification based on based gene expression profiles is presented. This method is a combination of both Firefly algorithm and Mutual Information Method. First, the features are used to select the features before using the Firefly algorithm for feature reduction. Finally, the Support Vector Machine is used to classify cancer into types. The performance of the proposed system was evaluated by using it to classify datasets from colon cancer; the results of the evaluation were compared with some recent approaches.
Keywords: Feature Selection, Firefly Algorithm, Cancer Disease, Mutual Information
Leave one out cross validated Hybrid Model of Genetic Algorithm and Naïve Bay...IJERA Editor
This paper presents a new approach to select reduced number of features in databases. Every database has a
given number of features but it is observed that some of these features can be redundant and can be harmful as
well as confuse the process of classification. The proposed method first applies a binary coded genetic algorithm
to select a small subset of features. The importance of these features is judged by applying Naïve Bayes (NB)
method of classification. The best reduced subset of features which has high classification accuracy on given
databases is adopted. The classification accuracy obtained by proposed method is compared with that reported
recently in publications on eight databases. It is noted that proposed method performs satisfactory on these
databases and achieves higher classification accuracy but with smaller number of feature
Feature Selection Approach based on Firefly Algorithm and Chi-square IJECEIAES
Dimensionality problem is a well-known challenging issue for most classifiers in which datasets have unbalanced number of samples and features. Features may contain unreliable data which may lead the classification process to produce undesirable results. Feature selection approach is considered a solution for this kind of problems. In this paperan enhanced firefly algorithm is proposed to serve as a feature selection solution for reducing dimensionality and picking the most informative features to be used in classification. The main purpose of the proposedmodel is to improve the classification accuracy through using the selected features produced from the model, thus classification errors will decrease. Modeling firefly in this research appears through simulating firefly position by cell chi-square value which is changed after every move, and simulating firefly intensity by calculating a set of different fitness functionsas a weight for each feature. Knearest neighbor and Discriminant analysis are used as classifiers to test the proposed firefly algorithm in selecting features. Experimental results showed that the proposed enhanced algorithmbased on firefly algorithm with chisquare and different fitness functions can provide better results than others. Results showed that reduction of dataset is useful for gaining higher accuracy in classification.
Breast cancer is the leading cause of death for women worldwide. Cancer can be discovered early, lowering the rate of death. Machine learning techniques are a hot field of research, and they have been shown to be helpful in cancer prediction and early detection. The primary purpose of this research is to identify which machine learning algorithms are the most successful in predicting and diagnosing breast cancer, according to five criteria: specificity, sensitivity, precision, accuracy, and F1 score. The project is finished in the Anaconda environment, which uses Python's NumPy and SciPy numerical and scientific libraries as well as matplotlib and Pandas. In this study, the Wisconsin diagnostic breast cancer dataset was used to evaluate eleven machine learning classifiers: decision tree, quadratic discriminant analysis, AdaBoost, Bagging meta estimator, Extra randomized trees, Gaussian process classifier, Ridge, Gaussian nave Bayes, k-Nearest neighbors, multilayer perceptron, and support vector classifier. During performance analysis, extremely randomized trees outperformed all other classifiers with an F1-score of 96.77% after data collection and data analysis.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly.
For more information:
http://societyofdatascientists.com/controlling-informative-features-for-improved-accuracy-and-faster-predictions-in-omentum-cancer-models/?src=slideshare
Evolving Efficient Clustering and Classification Patterns in Lymphography Dat...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an established, proven structure from a voluminous collection of facts. A dominant area of modern-day research in the field of medical investigations includes disease prediction and malady categorization. In this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering techniques and compare the performance of classification algorithms on the clinical data. Feature selection is a supervised method that attempts to select a subset of the predictor features based on the information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen classification algorithms on the Lymphography dataset that enables the classifier to accurately perform multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm offers increased clustering accuracy in less computation time.
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...IJECEIAES
In many diseases classification an accurate gene analysis is needed, for which selection of most informative genes is very important and it require a technique of decision in complex context of ambiguity. The traditional methods include for selecting most significant gene includes some of the statistical analysis namely 2-Sample-T-test (2STT), Entropy, Signal to Noise Ratio (SNR). This paper evaluates gene selection and classification on the basis of accurate gene selection using structured complex decision technique (SCDT) and classifies it using fuzzy cluster based nearest neighborclassifier (FC-NNC). The effectiveness of the proposed SCDT and FC-NNC is evaluated for leave one out cross validation metric(LOOCV) along with sensitivity, specificity, precision and F1-score with four different classifiers namely 1) Radial Basis Function (RBF), 2) Multi-layer perception(MLP), 3) Feed Forward(FF) and 4) Support vector machine(SVM) for three different datasets of DLBCL, Leukemia and Prostate tumor. The proposed SCDT &FC-NNC exhibits superior result for being considered more accurate decision mechanism.
Similar to 2014 Gene expressionmicroarrayclassification usingPCA–BEL. (20)
1. Gene expression microarray classification using PCA–BEL
Ehsan Lotfi a
, Azita Keshavarz b,n
a
Department of Computer Engineering, Torbat-e-Jam Branch, Islamic Azad University, Torbat-e-Jam, Iran
b
Department of Psychology, Torbat-e-Jam Branch, Islamic Azad University, Torbat-e-Jam, Iran
a r t i c l e i n f o
Article history:
Received 21 February 2014
Accepted 16 September 2014
Keywords:
Amygdala
BEL
Emotional neural network
Cancer
BELBIC
Diagnosis
Diagnostic method
a b s t r a c t
In this paper, a novel hybrid method is proposed based on Principal Component Analysis (PCA) and Brain
Emotional Learning (BEL) network for the classification tasks of gene-expression microarray data. BEL
network is a computational neural model of the emotional brain which simulates its neuropsychological
features. The distinctive feature of BEL is its low computational complexity which makes it suitable for
high dimensional feature vector classification. Thus BEL can be adopted in pattern recognition in order to
overcome the curse of dimensionality problem. In the experimental studies, the proposed model is
utilized for the classification problems of the small round blue cell tumors (SRBCTs), high grade gliomas
(HGG), lung, colon and breast cancer datasets. According to the results based on 5-fold cross validation,
the PCA–BEL provides an average accuracy of 100%, 96%, 98.32%, 87.40% and 88% in these datasets
respectively. Therefore, they can be effectively used in gene-expression microarray classification tasks.
& 2014 Elsevier Ltd. All rights reserved.
1. Introduction
Every cell in our body contains a number of genes that specify
the unique features of different types of cells. The gene expression
of cells can be obtained by DNA microarray technology which is
capable of showing simultaneous expressions of tens of thousands
of genes. This technology is widely used to distinguish between
normal and cancerous tissue samples and support clinical cancer
diagnosis [27]. There are certain challenges facing classification of
gene expression in cancer diagnosis. The main challenge is the
huge number of genes compared to the small number of available
training samples [47]. Microarray learning data samples are
typically gathered from often less than one hundred of patients,
while the number of genes in each sample is usually more than
thousands of genes. Furthermore, microarray data contain an
abundance of redundancy, missing values [7] and noise due to
biological and technical factors [25,75]. In the literature, there are
two general approaches to these issues including feature selection
and feature extraction. A feature selection method selects a feature
subset from the original feature space and provides the marker
and causal genes [9,4,1] which are able to identify cancers quickly
and easily. However, feature extraction methods, normally trans-
forms the original data to other spaces to generate a new set of
features containing high information packing properties. In each of
these two approaches, the reduced features are applied by a
proper classifier to diagnosis. A proper classifier increases the
accuracy of detection and can influence the feature reduction step.
This paper aims to review these approaches, investigate the
recently developed methodology and propose a proper feature
reduction-classification method for cancer detection. The organi-
zation of the paper is as follows: feature selection methods are
reviewed in Section 1.1. Section 1.2 explains the feature extraction
methods and Section 2 offers the proposed method. Experimental
results on cancer classification are evaluated in Section 3. Finally,
conclusions are made in Section 4.
1.1. Feature selection methods
Researchers have developed various feature selection methods
for classification. Feature selection methods are categorized into
three techniques including the filter model [62], wrapper model
and embedded model [19]. The filter model considers feature
selection and classifier's learning as two separate steps and utilizes
the general characteristics of training data to select features. The
filter model includes both traditional methods which often eval-
uate genes separately and new methods which consider gene-to-
gene correlation. These methods rank the genes and select top
ranked genes as input features for the learning step. The gene
ranking methods need a threshold for the number of genes to be
selected. For example Golub et al. [20] proposed the selection of
the top 50 genes. Additionally the filter model needs a criterion to
rank the genes. Liu et al. [35] and Golub et al. [20] have
investigated some filter methods based on statistical tests and
information gain. Examples of the filter criterion include Pearson
correlation coefficient method [84], t-statistics method [2] and
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/cbm
Computers in Biology and Medicine
http://dx.doi.org/10.1016/j.compbiomed.2014.09.008
0010-4825/& 2014 Elsevier Ltd. All rights reserved.
n
Corresponding author.
E-mail address: esilotf@gmail.com (E. Lotfi).
Computers in Biology and Medicine 54 (2014) 180–187
2. signal-to-noise ratio method [20]. The time complexity of these
methods is O(N) where N shows the dimensionality. It is efficient
but they cannot remove redundant genes, the issue studied in
recent literature [83,78,14,26,37].
In the wrapper model, a subset is selected and then the
accuracy of a predetermined learning algorithm is predicted to
determine the properness of a selected subset. In the wrapper
model of Xiong et al. [83], the selected subsets learn through three
learning algorithms including; linear discriminant analysis, logistic
regression and support vector machines. These classifiers should
be run for every subset of genes selected from the search space.
This procedure has a high computational complexity. Like the
wrapper methods, in the embedded models, the genes are selected
as part of the specific learning method but with lower computa-
tional complexity [19]. The subset selection methods of wrapper
model can be categorized into the population-based methods
[71,34,53] and backward selection methods. Recently Lee and
Leu [34], and Tong and Schierz [69] shed light on the effectiveness
of the hybrid model in feature selection. The elements of a hybrid
method include Neural Network (NN), Fuzzy System, Genetic
Algorithm (GA; [76,23]) and Ant Colony [79]. Lee and Leu [34]
examined the GA's ability in the feature selection. Furthermore,
the abilities of fuzzy theories have been successfully applied by
many researchers [12,72,10]. Tong and Schierz [69] used a genetic
algorithm-Neural Network approach (GANN) as a wrapper model.
The feature subset extraction is performed by GA and then the
extracted subset is applied to learn the NN. These processes are
repeated until the best subset is determined. Because of the high
dimension data, the GA looks to be a proper strategy for feature
selection.
1.2. Feature extraction methods
In the literature, there are two well-known methods for feature
extraction including principal component analysis (PCA; [78]) and
linear discriminant analysis (LDA; [48]). They normally transform
the original feature space to a lower dimensional feature transfor-
mation methods. PCA transforms the original data to a set of
reduced feature that best approximate the original data. In the first
step, PCA calculates the data covariance matrix and then finds the
eigenvalues and the eigenvectors of the matrix. Finally it goes
through a dimensionality reduction step. According to the final step,
the only terms corresponding to the K largest eigenvalues are kept.
In contrast to the PCA, first LDA calculates the scatter matrices
including a within-class scatter matrix for each class, and the
between-class scatter matrix. The within-class scatter matrix is
measured by the respective class mean, and within-class scatter
matrix measures the scatter of class means around the mixture
mean. Then LDA transforms the data in a way that maximizes the
between-class scatter and minimizes the within-class scatter. So
the dimension is reduced and the class separability is maximized.
The feature extraction/selection method is the first step in gene
expression microarray classification and cancer detection. The
second step consists of a classifier learning the reduced features.
In the literature, various classifiers have been investigated in order
to find the best classifier. It seems that the NN and various types
of NN [29,36,57,6,56,68,74,81,69,16], k nearest neighbors [61,13],
k-means algorithms [32], Fuzzy c-means algorithm [11], bayesian
networks [4], vector quantization based classifier [59], manifold
methods [18,80], fuzzy approaches [54,58,30,60], complementary learn-
ing fuzzy neural network [64–67], ensemble learning [55,8,27,50],
logistic regression, support vector machines [22,5,82,73,63,46,70],
LSVM [44], wavelet transform [28] as well as radial basis-support
vector machines [51] have been investigated successfully in classi-
fication and cancer detection. But the recently developed classifiers
such as brain emotional learning (BEL) networks [42] have not been
examined in this field.
BEL networks are recently developed methodologies that use
simulated emotions to aid their learning process. BEL is motivated
by the neurophysiological knowledge of the human's emotional
brain. In contrast to the published models, the distinctive features of
the BEL are low computational complexity and fast training which
make it suitable for high dimensional feature vector classification.
In this paper, BEL is developed and examined for gene expression
microarray classification tasks. It is expected that a model with low
computational complexity can be more successful in solving the
challenges of high dimensional microarray classification.
2. Proposed PCA–BEL to microarray data classification
Fig. 1 shows the general view of the proposed methods and the
final proposed algorithm is presented in Fig. 2. In the proposed
framework, what's different from published diagnostic methods is
the application of BEL model to cancer classification. There are
various versions of BEL, including basic BEL [3], BELBIC (BEL based
intelligent controller; [45]), BELPR (BEL based pattern recognizer;
[39]), BELPIC (BEL based picture recognizer; [43]) and supervised
BEL [38,40–42]. They are learning algorithms of emotional neural
networks [42]. These models are inspired by the emotional brain.
The description of the relationship between the main components
of emotional brain is common among all these models. What
differs from one model to another is how they formulate the
reward signal in the learning process. For example in the model
presented by Balkenius and Morén [3], it is not clarified how the
reward is assigned. In the BELBIC, the reward signal is defined
explicitly and the formulization of other equations is formed
accordingly. However, the supervised BEL employs the target value
of input pattern instead of the reward signal in the learning phase.
So supervised BEL is model free and can be utilized in different
applications and here, this version is developed for gene expres-
sion microarray classification task. Generally the computational
complexity of BEL is very low [39–42]. It is O(n) that make it
suitable to use in high dimensional feature vector classification.
Fig. 1. General view of proposed method.
E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187 181
3. BEL [42] is inspired by the interactions of thalamus, amygdala
(AMYG) [15,17,21,24,31,33,77], orbitofrontal cortex (OFC) and
sensory cortex in the emotional brain [42].
The first step is associated with PCA dimension reduction
(Fig. 1). Consider the first k-principle components p1, p2,…, pk,
they are the outputs of the first step and the inputs of second step.
In the second step, this pattern should be normalized between [0
1]. The normalized k-principle components p1, p2,…, pk are outputs
of the second step and the inputs of thirds step. Fig. 2 illustrates
the details of the proposed method. The input pattern of BEL is
illustrated by vector p1, p2,…, pk and the E is the final output. The
model consists of two main subsystems including AMYG and the
OFC. The AMYG receives the input pattern including: p1, p2,…, pk
from the sensory cortex, and pk þ 1 from the thalamus. The OFC
receives the input pattern including p1, p2,…, pk from the sensory
cortex only. The pk þ1 calculated by following formula:
pk þ 1 ¼ maxj ¼ 1:::kðpjÞ ð1Þ
The vk þ 1 is related to AMYG weight and the wk þ 1 is related to OFC
weight. The Ea is the internal output of AMYG which is used to
adjust the plastic connection weights v1, v2,…, vk þ 1 (Eq. (6)). The
Eo is the output of OFC which is used to inhibit the AMYG output.
This inhibitory task is implemented by subtraction of Eo from Ea
(Eq. (5)). As the corrected AMYG response, E is the final output
Fig. 2. The flowchart of proposed method in learning step.
E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187182
4. node. It is evaluated by monotonic increasing activation function
tansig and used to adjust OFC connection weights including w1, w2,
…, wk þ 1 (Eq. (7)). The activation function is as follows:
tan sigðxÞ ¼
2
1þeÀ 2x
À1 ð2Þ
The AMYG, OFC and the final output are simply calculated by
following formulas respectively:
Ea ¼ ∑
k þ1
j ¼ 1
ðvj  pjÞþba ð3Þ
Eo ¼ ∑
k
j ¼ 1
ðwj  pjÞþbo ð4Þ
E ¼ tan sigðEa ÀEoÞ ð5Þ
Let t be target value associated to nth pattern (p). The t should
be binary encoded. So the supervised learning rules are as follows:
vj ¼ vj þlr  maxðtÀEa; 0Þ Â pj f or j ¼ 1…kþ1 ð6Þ
wj ¼ wj þlr  ðEa ÀEo ÀtÞ Â pj f or j ¼ 1…kþ1 ð7Þ
ba ¼ baþlr  maxðtÀEa; 0Þ ð8Þ
bo ¼ boþlr  ðEa ÀEo ÀtÞ ð9Þ
where lr is learning rate, t is binary target and tÀEa is calculated
error, ba is the bias of AMYG neuron and bo is the bias of OFC
neuron. The v1, v2,…, vk þ1 AMYG are learning weights and w1,
w2,…, wk þ 1 are OFC learning weights. Eqs. (3)–(9) show the
multiple-inputs single-output model. In Figs. 2 and 3, the equa-
tions are extended to multiple-inputs multiple-outputs usage. The
input training microarray data in Fig. 2 includes the two matrices
of P and T. The size of the matrix P is m  s where m is the number
of patterns and s is the number of features in each pattern (s⪢k).
The size of the matrix T is m  c where c is the number of classes.
The targets are encoded binary. So each row of matrix T includes
only one “1” and other columns are “0”. In the flowcharts, pi
denotes the ith pattern and ti is related target.
The learning rate lr can be adaptively adjusted to increase the
performance. The final flowchart, Fig. 2, shows this adaptation and
the related parameters including the ratio to increase the learning
rate (lr_inc) initialized with 1.05, the ratio to decrease learning rate
(lr_dec) with the initial value 0.7, the maximum performance
increase (minc) with initial the value 1.04, first performance (perf_f;
in step 4 of the flowchart) and last performance (perf_l) which can be
calculated as MSE. The initial lr¼0.001, the learning weights are
initialized randomly (step 3 in the flowchart.) and according to the
algorithm, if (perf_l/perf_f)4minc then lr¼lr lr_dec, else if (per-
f_loperf_f), lr¼lr  lr_inc. In the Fig. 2, the stop criterion is to reach a
determined learning epoch. The stop criterion can be the maximum
epoch, which means the maximum number of epochs has been
reached (for example 10,000 epochs). Fig. 2 presents the learning
step and Fig. 3 shows the flowchart of the testing step. The inputs of
the algorithm presented in Fig. 3 are a testing pattern, number of
classes and the weights adjusted in the learning step. The last step of
the algorithm is associated to the diagnosis where the index of the
maximum E shows the class number of the pattern.
3. Experimental studies
The source code of the proposed method is accessible from http://
www.bitools.ir/tprojects.html and it is evaluated to classify the gene
expression microarray data of 4-class complementary DNA (cDNA)
microarray dataset of the small round blue cell tumors (SRBCTs), high
grade gliomas (HGG), lung, colon and breast cancer datasets. The
SRBCTs dataset is a 4-class cDNA microarray data and contains 2308
genes and 83 samples including 29 samples in Ewing's sarcoma
(EWS), 25 in rhabdomyosarcoma (RMS), 18 in neuroblastoma (NB)
and 11 in Burkitt lymphoma (BL). This data set can be obtained from
http://research.nghri.nih.gov/microarray/Supplement/. In the pro-
posed algorithm, the maximum learning epoch¼10,000, k¼100
and initial lr is set at 0.001, 0.000001 and 0.001 for SRBCT, HGG
and lung cancer datasets respectively. These parameters are picked
empirically. The value k¼100 and lr¼0.001 and 0.000001 can show
better results for these datasets. However, in other applications these
parameters should be optimized.
The HGG dataset applied here, consist of 50 samples with 12,625
genes including 14 classic glioblastomas, 14 non-classic glioblasto-
mas, 7 classic anaplastic oligodendrogliomas and 15 non-classic
anaplastic oligodendrogliomas. HGG dataset is accessible from
http://www.broadinstitute.org. In this dataset, the number of pat-
terns much less than the number of the features in each sample and
it may be difficult for classification methods to classify the data.
In the lung cancer dataset, there are 181 tissue samples in two
classes: 31 points are malignant pleural mesothelioma and 150 points
are adenocarcinoma. Each sample is described by 12,533 genes. This
data set is also accessible from http://datam.i2r.a-star.edu.sg/datasets/.
Other datasets, applied here, are colon and breast cancer datasets that
are accessible from http://genomics-pubs.princeton.edu/oncology/
affydata/index.html and http://datam.i2r.a-star.edu.sg/datasets/krbd/
BreastCancer/BreastCancer.html, respectively. Colon dataset includes
62 tissue samples with 2000 genes and the breast cancer dataset
consist of 97 samples and 24,481 genes.
Here and prior to entering comparative numerical studies, let
us analyze the computational complexity of the proposed BEL.
Regarding the learning step, the algorithm adjust O(2n) weights for
each pattern-target sample, where n is the number of input
attributes (for example for HGG database n¼12,625). Let us compare
the computational complexity with traditional neural networks and
a supervised orthogonal discriminant projection classifier (SODP;
[80]) applied in cancer detection. As mentioned above, the computa-
tional complexity of the proposed classifier is O(n). In contrast,
computational time is O(cn) for neural network and it is O(n2
) for
SODP. In NN architecture, c is the number of hidden neurons
(generally c¼10) and SODP uses a Lagrangian multiplier that
imposes the complexity of O(n2
). So the proposed method has
a lower computational complexity. This improved computing effi-
ciency can be important for high dimensional feature vector classi-
fication and cancer detection. The key to the proposed method is the
fast processing resulting from low computational complexity that
makes it suitable for cancer detection.
Another important point which is observed across in the
experimental implementations is that the results of the proposed
model can change by changing the initial lr and k values. lr indicates
the learning rate and k specifies the number of PCA's initial k
component in the algorithm. In other words, the value of lr and k
should be optimized in each problem. Here, the optimum values of
0.001, 0.000001, 0.001, 0.00001 and 0.000001 are assigned to lr and
100 to k for SRBCT, HGG, lung, colon and breast cancers respectively.
The values assigned to lr are obtained from 0.1, 0.001, 0.0001,
0.00001… 0.0000000001 and in the case of k from the values of 10,
50 and 100 through implementation and observation.
The proposed method is compared with the results of the
methods which have been reported by Zhang and Zhang [80].
They have reported the results based on the 5-fold cross validation
method. This implementation can result in the assessment of
accuracy and repeatability and it can be used to validate the
proposed method [46]. The compared methods include supervised
locally linear embedding (SLLE), probability-based locally linear
embedding (PLLE), locally linear discriminant embedding (LLDE),
constrained maximum variance mapping (CMVU), orthogonal
E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187 183
5. discriminant projection (ODP) and supervised orthogonal discri-
minant projection (SODP).
These methods are extended manifold approaches that have been
successfully used in tumor classification. SLLE, PLLE and LLDE are
extended versions of the locally linear embedding (LLE) that is a
classical manifold method. SODP is an extended version of ODP and
CMVU is a linear approximation of multi-manifolds learning method.
Figs. 4–8 show the comparative results based on average
accuracy of 5-fold cross validation. As illustrated in the figures,
the proposed model shows consistent results and provides higher
performance in SRBCT, HGG and Lung cancer (Figs. 4–6). Table 1
presents the percentage improvement of PCA–BEL with respect to
the best compared method reported by [80]. The best method in
SRBCT and HDD detection is a supervised orthogonal discriminant
projection (SODP) algorithm with 96.56% and 73.74% average
accuracy while in lung cancer classification the best method is a
locally linear discriminant embedding (LLDE) with average accu-
racy 93.18%. The proposed method improves these results.
Fig. 3. The testing step of proposed method to class diagnosis of an input issue.
85.00
90.00
95.00
100.00
105.00
SLLE PLLE ODP LLDE CMVU SODP PCA-BEL
SRBCT
Fig. 4. The accuracy comparison between various methods and proposed PCA–BEL
in SRBCTs classification problem.
Fig. 5. The accuracy comparison between various methods and proposed PCA–BEL
in HGG classification problem.
Fig. 6. The accuracy comparison between various methods and proposed PCA–BEL
in the lung cancer classification problem.
Fig. 7. The accuracy comparison between various methods reported by Zhang and
Zhang [80] and proposed PCA–BEL in the colon classification problem.
E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187184
6. It seems that SRBCT and lung cancer are rather simple chal-
lenges for the classifiers in terms of complexity, since the best
compared classifiers i.e. SODP and LLDE (Table 1 and Figs. 8 and 6)
have been able to exhibit a detection precision of 96.56% and
93.18%. The proposed model improves these numbers by 3.56%
and 5.52% turning the accuracy into 100% and 98.32% for SRBCT
and lung cancer, respectively (Table 1)
At any rate the detection precision of the proposed model is
very significant for HGG. It seems that this dataset is too complex
for other classifiers, because the best detection precision achieved
for HGG is 73.74% using the SODP method (refer to Table 1 and
Fig. 5). The proposed PCA–BEL has been able to effect a 30.18%
improvement which results in 96% precision rate. However, the
results of colon and breast cancers, obtained from PCA–BEL, are
87.40% and 88% accuracy which does not show any significant
improvement compared to the existing methods (Figs. 7 and 8).
The percentage improvement of the proposed PCA–BEL is sum-
marized in Table 1 and calculated by the following formulas:
Percentage improvement ¼ 100
 ðpropose method result–compared resultÞ=ðcompared resultÞ
ð10Þ
As illustrated in Table 1, the average accuracy of SRBCT, HGG
and lung cancer classification are 100%, 96% and 98.32% respec-
tively obtained from proposed PCA–BEL. Table 2 shows the
statistical details of the improved results. The confidence level
(confiLevel) in Table 2 is the Student's t-test with 95% confidence.
Finally Fig. 9 shows the averaged confusion matrix including
accuracy, precision and recall of improved results obtained from
proposed PCA–BEL in 5-fold. In Fig. 9a, the class numbers 1, 2,
3 and 4 belong to EWS, RMS, BL and NB respectively. In the
experimental results, 10,000 cycles is considered as the maximum
number of learning cycles in every run. However this parameter
can change for different problems. The maximum number of
cycles for the model is 220 in order to reach convergence and
100% accuracy while there is a need for more than 8000 or even
the whole 10,000 cycles to reach convergence in some folds of
HGG and lung cancer datasets. This parameter should preferably
have the maximum value and considering the low calculation
complexity of the method, increasing the number of learning
cycles even to 100,000 will result in an acceptable calculation
time in modern computers.
4. Conclusions
In this paper, a novel gene-expression microarray classification
method is proposed based on PCA and BEL network. In contrast to
the many other classifiers, the proposed method shows lower
computational complexity. Thus BEL can be considered as an
alternative approach to overcome the curse of dimensionality
Table 1
Percentage improvement of classification of the small round blue cell tumor
(SRBCT), high grade gliomas (HGG) and Lung cancer, obtained from proposed
method. The compared methods are the supervised orthogonal discriminant
projection classifier (SODP) and locally linear discriminant embedding (LLDE)
which are the best compared methods (Figs. 4–6).
Problem SRBCT HGG Lung cancer
Compared method SODP SODP LLDE
Detection accuracy of compared method 96.56% 73.74% 93.18%
Detection accuracy of our PCA–BEL method 100% 96% 98.32%
Percentage improvement 3.56% 30.18% 5.52%
Table 2
The statistical results of proposed PCA–BEL in three following improved problems:
the small round blue cell tumor (SRBCT), high grade gliomas (HGG) and Lung
cancer datasets. The rows 2, 3…, 5 show the detection accuracy of the folds and the
remaining rows present the statistical information including maximum, mean,
standard deviation (STD) of the results, and the confidence level (ConfiLevel) based
on the Student's t-test with 95% confidence.
Foldnumber SRBCT (%) HGG (%) Lung cancer (%)
F#1 100.00 100.00 100.00
F#2 100.00 80.00 94.40
F#3 100.00 100.00 97.22
F#4 100.00 100.00 100.00
F#5 100.00 100.00 100.00
Max 100.00 100.00 100.00
Average 100.00 96.00 98.32
STD 0.00 8.94 2.50
ConfiLevel 0.00 11.10 3.10
Fig. 8. The accuracy comparison between various methods reported by Zhang and
Zhang [80] and proposed PCA–BEL in the breast cancer classification problem.
Fig. 9. The averaged confusion matrix of improved problems including (a) SRBCT, (b) HGG and (c) lung cancer datasets.
E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187 185
7. problem. The proposed model is accessible from http://www.
bitools.ir/projects.html and is utilized for classification tasks of
SRBCT, HGG, lung, colon and breast cancer datasets. According to
the experimental results, the proposed method is more accurate
than traditional methods in SRBCT, HGG and lung datasets. PCA–
BEL improves the detection accuracy about 3.56%, 30.18% and
5.52% obtained respectively from SRBCT, HGG and lung cancer. The
results indicate the superiority of the approach in terms of higher
accuracy and lower computational complexity. Hence, it is
expected that the proposed approach can be generally applicable
to high dimensional feature vector classification problems.
However, the proposed approach has a drawback. Like many
other methods that used PCA, this method has not just extract the
informative gens. As mentioned in Section 1, PCA is a feature
extraction method and cannot select the features. For future
improvements the informative genes should be determined. To
determine the informative gens, the proposed method should
apply a feature selection step. This issue can be considered as
the next step of this research effort i.e. a proper feature selection
method should be found and replaced by PCA step of the proposed
method. Furthermore, in order for the proposed method to
provide a proper response in other cancer classification problems,
lr and k parameters should be specifically optimized for each
problem. This issue can also be considered for the future works
and on the other datasets such as prostate cancer.
Conflict of interest
There is no conflict of interest.
References
[1] M. Alshalalfa, G. Naji, A. Qabaja, R. Alhajj, Combining multiple perspective as
intelligent agents into robust approach for biomarker detection in gene
expression data, Int. J. Data Min. Bioinform. 5 (3) (2011) 332–350.
[2] P. Baldi, A.D. Long, A Bayesian framework for the analysis of microarray
expression data: regularized t-test and statistical inferences of gene changes,
Bioinformatics 17 (6) (2001) 509–519.
[3] C. Balkenius, J. Morén, Emotional learning: a computational model of amyg-
dala, Cybern. Syst. 32 (6) (2001) 611–636.
[4] R. Cai, Z. Zhang, Z. Hao, Causal gene identification using combinatorial
V-structure search, Neural Netw. 43 (2013) 63–71.
[5] A.H. Chen, C.H. Lin, A novel support vector sampling technique to improve
classification accuracy and to identify key genes of leukaemia and prostate
cancers, Expert Syst. Appl. 38 (4) (2011) 3209–3219.
[6] J.H. Chiang, S.H. Ho, A combination of rough-based feature selection and RBF
neural network for classification using gene expression data, NanoBiosci. IEEE
Trans. 7 (1) (2008) 91–99.
[7] W.K. Ching, L. Li, N.K. Tsing, C.W. Tai, T.W. Ng, A. Wong, K.W. Cheng, A
weighted local least squares imputation method for missing value estimation
in microarray gene expression data, Int. J. Data Min. Bioinform. 4 (3) (2010)
331–347.
[8] D. Chung, H. Kim, Robust classification ensemble method for microarray data,
Int. J. Data Min. Bioinform. 5 (5) (2011) 504–518.
[9] Y.R. Cho, A. Zhang, X. Xu, Semantic similarity based feature extraction from
microarray expression data, Int. J. Data Min. Bioinform. 3 (3) (2009) 333–345.
[10] J. Dai, Q. Xu, Attribute selection based on information gain ratio in fuzzy rough
set theory with application to tumor classification, Appl. Soft Comput. 13 (1)
(2013) 211–221.
[11] D. Dembele, P. Kastner, Fuzzy C-means method for clustering microarray data,
Bioinformatics 19 (8) (2003) 973–980.
[12] Z. Deng, K.S. Choi, F.L. Chung, S. Wang, EEW-SC: Enhanced Entropy-Weighting
Subspace Clustering for high dimensional gene expression data clustering
analysis, Appl. Soft Comput. 11 (8) (2011) 4798–4806.
[13] M. Dhawan, S. Selvaraja, Z.H. Duan, Application of committee kNN classifiers
for gene expression profile classification, Int. J. Bioinform. Res. Appl. 6 (4)
(2010) 344–352.
[14] C. Ding, H. Peng, Minimum redundancy feature selection from microarray
gene expression data, J. Bioinform. Comput. Biol. 3 (02) (2005) 185–205.
[15] J.P. Fadok, M. Darvas, T.M. Dickerson, R.D. Palmiter, Long-term memory for
pavlovian fear conditioning requires dopamine in the nucleus accumbens and
basolateral amygdala, PloS One 5 (9) (2010) e12751.
[16] F. Fernández-Navarro, C. Hervás-Martínez, R. Ruiz, J.C. Riquelme, Evolutionary
generalized radial basis function neural networks for improving prediction
accuracy in gene classification using feature selection, Appl. Soft Comput. 12
(6) (2012) 1787–1800.
[17] R. Gallassi, L. Sambati, R. Poda, M.S. Maserati, F. Oppi, M. Giulioni, P. Tinuper,
Accelerated long-term forgetting in temporal lobe epilepsy: evidence of
improvement after left temporal pole lobectomy, Epilepsy Behav. 22 (4)
(2011) 793–795.
[18] J.M. García-Gómez, J. Gómez-Sanchs, P. Escandell-Montero, E. Fuster-Garcia,
E. Soria-Olivas, Sparse Manifold Clustering and Embedding to discriminate
gene expression profiles of glioblastoma and meningioma tumors, Comput.
Biol. Med. 43 (11) (2013) 1863–1869.
[19] S. Ghorai, A. Mukherjee, P.K. Dutta, Gene expression data classification by
VVRKFA, Procedia Technol. 4 (2012) 330–335.
[20] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov,
E.S. Lander, Molecular classification of cancer: class discovery and class
prediction by gene expression monitoring, Science 286 (5439) (1999)
531–537.
[21] E.M. Griggs, E.J. Young, G. Rumbaugh, C.A. Miller, MicroRNA-182 regulates
amygdala-dependent memory formation, J. Neurosci. 33 (4) (2013) 1734–1740.
[22] I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification
using support vector machines, Mach. Learn. 46 (1) (2002) 389–422.
[23] C. Gillies, N. Patel, J. Akervall, G. Wilson, Gene expression classification using
binary rule majority voting genetic programming classifier, Int. J. Adv. Intell.
Paradig. 4 (3) (2012) 241–255.
[24] O. Hardt, K. Nader, L. Nadel, Decay happens: the role of active forgetting in
memory, Trends Cogn. Sci. 17 (3) (2013) 111–120.
[25] H. Hong, Q. Hong, J. Liu, W. Tong, L. Shi, Estimating relative noise to signal in
DNA microarray data, Int. J. Bioinform. Res. Appl. 9 (5) (2013) 433–448.
[26] D.S. Huang, C.H. Zheng, Independent component analysis-based penalized
discriminant method for tumor classification using gene expression data,
Bioinformatics 22 (15) (2006) 1855–1862.
[27] N. Iam-On, T. Boongoen, S. Garrett, C. Price, New cluster ensemble approach to
integrative biological data analysis, Int. J. Data Min. Bioinform. 8 (2) (2013)
150–168.
[28] A. Jose, D. Mugler, Z.H. Duan, A gene selection method for classifying cancer
samples using 1D discrete wavelet transform, Int. J. Comput. Biol. Drug Des. 2
(4) (2009) 398–411.
[29] J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, P.S. Meltzer,
Classification and diagnostic prediction of cancers using gene expression
profiling and artificial neural networks, Nat. Med. 7 (6) (2001) 673–679.
[30] M. Khashei, A. Zeinal Hamadani, M. Bijari, A fuzzy intelligent approach to the
classification problem in gene expression data analysis, Knowl.-Based Syst. 27
(2012) 465–474.
[31] J.H. Kim, S. Li, A.S. Hamlin, G.P. McNally, R. Richardson, Phosphorylation of
mitogen-activated protein kinase in the medial prefrontal cortex and the
amygdala following memory retrieval or forgetting in developing rats,
Neurobiol. Learn. Mem. 97 (1) (2011) 59–68.
[32] Y.K. Lam, P.W. Tsang, eXploratory K-Means: a new simple and efficient
algorithm for gene clustering, Appl. Soft Comput. 12 (3) (2012) 1149–1157.
[33] R. Lamprecht, S. Hazvi, Y. Dudai, cAMP response element-binding protein in
the amygdala is required for long-but not short-term conditioned taste
aversion memory, J. Neurosci. 17 (21) (1997) 8443–8450.
[34] C.P. Lee, Y. Leu, A novel hybrid feature selection method for microarray data
analysis, Appl. Soft Comput. 11 (1) (2011) 208–213.
[35] H. Liu, J. Li, L. Wong, A comparative study on feature selection and classifica-
tion methods using gene expression profiles and proteomic patterns, Genome
Inform. Ser. 13 (2002) 51–60.
[36] B. Liu, Q. Cui, T. Jiang, S. Ma, A combinational feature selection and ensemble
neural network method for classification of gene expression data, BMC
Bioinform. 5 (1) (2004) 136.
[37] Y. Liu, Wavelet feature extraction for high-dimensional microarray data,
Neurocomputing 72 (4) (2009) 985–990.
[38] E. Lotfi, M.R. Akbarzadeh-T, Supervised brain emotional learning. IEEE Inter-
national Joint Conference on Neural Networks (IJCNN), 2012, pp. 1–6, http://
dx.doi.org/10.1109/IJCNN.2012.6252391.
[39] E. Lotfi, M.R. Akbarzadeh-T, Brain Emotional Learning-Based Pattern Recogni-
zer, Cybern. Syst. 44 (5) (2013) 402–421.
[40] E. Lotfi, M.R. Akbarzadeh-T, Emotional brain-inspired adaptive fuzzy decayed
learning for online prediction problems, in: 2013 IEEE International Confer-
ence on Fuzzy Systems (FUZZ), pp. 1–7, IEEE, 2013, July).
[41] E. Lotfi, M.R. Akbarzadeh-T, Adaptive brain emotional decayed learning for
online prediction of geomagnetic activity indices, Neurocomputing 126 (2014)
188–196.
[42] E. Lotfi, M.R. Akbarzadeh-T, Practical emotional neural networks, Neural
Networks 59 (2014) 61–72. http://dx.doi.org/10.1016/j.neunet.2014.06.012.
[43] E. Lotfi, S. Setayeshi, S. Taimory, A neural basis computational model of
emotional brain for online visual object recognition, Appl. Artif. Intell. 28
(2014) 1–21. http://dx.doi.org/10.1080/08839514.2014.952924.
[44] Z. Liu, D. Chen, Y. Xu, J. Liu, Logistic support vector machines and their
application to gene expression data, Int. J. Bioinform. Res. Appl. 1 (2) (2005)
169–182.
[45] C. Lucas, D. Shahmirzadi, N. Sheikholeslami, Introducing BELBIC: brain emo-
tional learning based intelligent controller, Int. J. Intell. Autom. Soft Comput.
10 (2004) 11–21.
[46] M. Meselhy Eltoukhy, I. Faye, B. Belhaouari Samir, A statistical based feature
extraction method for breast cancer diagnosis in digital mammogram using
multiresolution representation, Comput. Biol. Med. 42 (1) (2012) 123–128.
E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187186
8. [47] V.S. Tseng, H.H. Yu, Microarray data classification by multi-information based
gene scoring integrated with Gene Ontology, Int. J. Data Min. Bioinform. 5 (4)
(2011) 402–416.
[48] M. Xiong, L. Jin, W. Li, E. Boerwinkle, Computational methods for gene
expression-based tumor classification, Biotechniques 29 (6) (2000) 1264–1271.
[50] Reboiro-Jato Miguel, Glez-Peña Daniel, Díaz Fernando, Fdez-Riverola Florentino,
A novel ensemble approach for multicategory classification of DNA microarray
data using biological relevant gene sets, Int. J. Data Min. Bioinform. 6 (6) (2012)
602–616.
[51] L. Nanni, A. Lumini, Ensemblator: an ensemble of classifiers for reliable
classification of biological data, Pattern Recognit. Lett. 28 (5) (2007) 622–630.
[53] T. Prasartvit, A. Banharnsakun, B. Kaewkamnerdpong, T. Achalakul, Reducing
bioinformatics data dimension with ABC-kNN, Neurocomputing 116 (2013)
367–381. http://dx.doi.org/10.1016/j.neucom.2012.01.045.
[54] M. Perez, D.M. Rubin, L.E. Scott, T. Marwala, W. Stevens, A hybrid fuzzy-svm
classifier, applied to gene expression profiling for automated leukaemia
diagnosis, in: IEEE 25th Convention of Electrical and Electronics Engineers
in Israel, 2008, IEEEI 2008, IEEE, 2008, December, pp. 041–045.
[55] Y. Peng, A novel ensemble machine learning for robust microarray data
classification, Comput. Biol. Med. 36 (6) (2006) 553–573.
[56] L.P. Petalidis, A. Oulas, M. Backlund, M.T. Wayland, L. Liu, K. Plant, V.P. Collins,
Improved grading and survival prediction of human astrocytic brain tumors
by artificial neural network analysis of gene expression microarray data, Mol.
Cancer Ther. 7 (5) (2008) 1013–1024.
[57] L.E. Peterson, M. Ozen, H. Erdem, A. Amini, L. Gomez, C.C. Nelson, M. Ittmann,
Artificial neural network analysis of DNA microarray-based prostate cancer
recurrence, in: Proceedings of the 2005 IEEE Symposium on Computational
Intelligence in Bioinformatics and Computational Biology, 2005, CIBCB'05,
IEEE, 2005, November, pp. 1–8.
[58] L.E. Peterson, M.A. Coleman, Machine learning-based receiver operating
characteristic (ROC) curves for crisp and fuzzy classification of DNA micro-
arrays in cancer research, Int. J. Approx. Reason. 47 (1) (2008) 17–36.
[59] I. Porto-Díaz, V. Bolón-Canedo, A. Alonso-Betanzos, O. Fontenla-Romero, A
study of performance on microarray data sets for a classifier based on
information theoretic learning, Neural Netw. 24 (8) (2011) 888–896.
[60] S. Saha, A. Ekbal, K. Gupta, S. Bandyopadhyay, Gene expression data clustering
using a multiobjective symmetry based clustering technique, Comput. Biol.
Med. 43 (11) (2013) 1965–1977.
[61] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, S. Levy, A comprehensive
evaluation of multicategory classification methods for microarray gene
expression cancer diagnosis, Bioinformatics 21 (5) (2005) 631–643.
[62] X. Sun, Y. Liu, M. Xu, H. Chen, J. Han, K. Wang, Feature selection using dynamic
weights for classification, Knowl.-Based Syst. 37 (2013) 541–549. http://dx.doi.
org/10.1016/j.knosys.2012.10.001.
[63] M. Song, S. Rajasekaran, A greedy algorithm for gene selection based on SVM
and correlation, Int. J. Bioinform. Res. Appl. 6 (3) (2010) 296–307.
[64] T.Z. Tan, C. Quek, G.S. Ng, Ovarian cancer diagnosis by hippocampus and
neocortex-inspired learning memory structures, Neural Netw. 18 (5) (2005)
818–825.
[65] T.Z. Tan, C. Quek, G.S. Ng, E.Y.K. Ng, A novel cognitive interpretation of breast
cancer thermography with complementary learning fuzzy neural memory
structure, Expert Syst. Appl. 33 (3) (2007) 652–666.
[66] T.Z. Tan, G.S. Ng, C. Quek, Complementary learning fuzzy neural network: an
approach to imbalanced dataset, in: International Joint Conference on Neural
Networks, 2007, IJCNN 2007, IEEE, pp. 2306-2311, 2007.
[67] T.Z. Tan, C. Quek, G.S. Ng, K. Razvi, Ovarian cancer diagnosis with comple-
mentary learning fuzzy neural network, Artif. Intell. Med. 43 (3) (2008)
207–222.
[68] M. Takahashi, H. Hayashi, Y. Watanabe, K. Sawamura, N. Fukui, J. Watanabe,
T. Someya, Diagnostic classification of schizophrenia by neural network
analysis of blood-based gene expression signatures, Schizophr. Res. 119 (1)
(2010) 210–218.
[69] D.L. Tong, A.C. Schierz, Hybrid genetic algorithm-neural network: feature
extraction for unpreprocessed microarray data, Artif. Intell. Med. 53 (1) (2011)
47–56.
[70] M. Tong, K.H. Liu, C. Xu, W. Ju, An ensemble of SVM classifiers based on gene
pairs, Comput. Biol. Med. 43 (6) (2013) 729–737.
[71] M.H. Tseng, H.C. Liao, The genetic algorithm for breast tumor diagnosis – the
case of DNA viruses, Appl. Soft Comput. 9 (2) (2009) 703–710.
[72] P. Vadakkepat, L.A. Poh, Fuzzy-rough discriminative feature selection and
classification algorithm, with application to microarray and image datasets,
Appl. Soft Comput. 11 (4) (2011) 3429–3440.
[73] V. Vinaya, N. Bulsara, C.J. Gadgil, M. Gadgil, Comparison of feature selection
and classification combinations for cancer classification using microarray data,
Int. J. Bioinform. Res. Appl. 5 (4) (2009) 417–431.
[74] S.L. Wang, X. Li, S. Zhang, J. Gui, D.S. Huang, Tumor classification by combining
PNN classifier ensemble with neighborhood rough set based gene reduction,
Comput. Biol. Med. 40 (2) (2010) 179–189.
[75] Y.F. Wang, Z.G. Yu, V. Anh, Fuzzy C–means method with empirical mode
decomposition for clustering microarray data, Int. J. Data Min. Bioinform. 7 (2)
(2013) 103–117.
[76] A. Yardimci, Soft computing in medicine, Appl. Soft Comput. 9 (3) (2009)
1029–1043.
[77] S.H. Yeh, C.H. Lin, P.W. Gean, Acetylation of nuclear factor-κB in rat amygdala
improves long-term but not short-term retention of fear memory, Mol.
Pharmacol. 65 (5) (2004) 1286–1292.
[78] K.Y. Yeung, W.L. Ruzzo, Principal component analysis for clustering gene
expression data, Bioinformatics 17 (9) (2001) 763–774.
[79] Y. Zhang, J. Xuan, R. Clarke, H.W. Ressom, Module-based breast cancer
classification, Int. J. Data Min. Bioinform. 7 (3) (2013) 284–302.
[80] C. Zhang, S. Zhang, A supervised orthogonal discriminant projection for tumor
classification using gene expression data, Comput. Biol. Med. 43 (5) (2013)
568–575. http://dx.doi.org/10.1016/j.compbiomed.2013.01.019.
[81] Z. Zainuddin, P. Ong, Reliable multiclass cancer classification of microarray
gene expression profiles using an improved wavelet neural network, Expert
Syst. Appl. 38 (11) (2011) 13711–13722.
[82] X.L. Xia, K. Li, G.W. Irwin, Two-stage gene selection for support vector machine
classification of microarray data, Int. J. Model. Identif. Control 8 (2) (2009)
164–171.
[83] M. Xiong, X. Fang, J. Zhao, Biomarker identification by feature wrappers,
Genome Res. 11 (11) (2001) 1878–1887.
[84] H. Xiong, S. Shekhar, P.N. Tan, V., Kumar, Exploiting a support-based upper
bound of Pearson's correlation coefficient for efficiently identifying strongly
correlated pairs, in: Proceedings of the Tenth ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, ACM, 2004, August,
pp. 334–343.
E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187 187