As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
In post genomic era with the advent of new technologies a huge amount of complex molecular data are
generated with high throughput. The management of this biological data is definitely a challenging task
due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its
inventory success. Discovering new knowledge from the biological data is a major challenge in data
mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify
the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram
encoding method, neural network model and rough set classifier in a single model and in an appropriate
place can enhance the quality of the classification system .Thus the primary challenge is to identify and
classify the large protein sequences in a very fast and easy but intellectual way to decrease the time
complexity and space complexity.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...Devansh16
YouTube video: https://www.youtube.com/watch?v=Ao-19L0sLOI
SinGAN-Seg: Synthetic Training Data Generation for Medical Image Segmentation
Vajira Thambawita, Pegah Salehi, Sajad Amouei Sheshkal, Steven A. Hicks, Hugo L.Hammer, Sravanthi Parasa, Thomas de Lange, Pål Halvorsen, Michael A. Riegler
Processing medical data to find abnormalities is a time-consuming and costly task, requiring tremendous efforts from medical experts. Therefore, Ai has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. AI tools highly depend on data for training the models. However, there are several constraints to access to large amounts of medical data to train machine learning algorithms in the medical domain, e.g., due to privacy concerns and the costly, time-consuming medical data annotation process. To address this, in this paper we present a novel synthetic data generation pipeline called SinGAN-Seg to produce synthetic medical data with the corresponding annotated ground truth masks. We show that these synthetic data generation pipelines can be used as an alternative to bypass privacy concerns and as an alternative way to produce artificial segmentation datasets with corresponding ground truth masks to avoid the tedious medical data annotation process. As a proof of concept, we used an open polyp segmentation dataset. By training UNet++ using both the real polyp segmentation dataset and the corresponding synthetic dataset generated from the SinGAN-Seg pipeline, we show that the synthetic data can achieve a very close performance to the real data when the real segmentation datasets are large enough. In addition, we show that synthetic data generated from the SinGAN-Seg pipeline improving the performance of segmentation algorithms when the training dataset is very small. Since our SinGAN-Seg pipeline is applicable for any medical dataset, this pipeline can be used with any other segmentation datasets.
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2107.00471 [eess.IV]
(or arXiv:2107.00471v1 [eess.IV] for this version)
Reach out to me:
Check out my other articles on Medium. : https://machine-learning-made-simple....
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn: https://www.linkedin.com/in/devansh-d...
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
My Substack: https://devanshacc.substack.com/
Live conversations at twitch here: https://rb.gy/zlhk9y
Get a free stock on Robinhood: https://join.robinhood.com/fnud75
Classification of data is a data mining technique based on machine learning is used to classification of each item set in as a set of dataset into a set of predefined labelled as classes or groups. Classification is tasks for different application such as text classification, image classification, class’s predictions, data Classification etc. In this paper, we presenting the major classification techniques used for prediction of classes using supervised learning dataset. Several major types of classification method including Random Forest, Naive Bayes, Support Vector Machine (SVM) techniques. The goal of this review paper is to provide a review, accuracy and comparative between different classification techniques in data mining.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
In post genomic era with the advent of new technologies a huge amount of complex molecular data are
generated with high throughput. The management of this biological data is definitely a challenging task
due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its
inventory success. Discovering new knowledge from the biological data is a major challenge in data
mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify
the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram
encoding method, neural network model and rough set classifier in a single model and in an appropriate
place can enhance the quality of the classification system .Thus the primary challenge is to identify and
classify the large protein sequences in a very fast and easy but intellectual way to decrease the time
complexity and space complexity.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...Devansh16
YouTube video: https://www.youtube.com/watch?v=Ao-19L0sLOI
SinGAN-Seg: Synthetic Training Data Generation for Medical Image Segmentation
Vajira Thambawita, Pegah Salehi, Sajad Amouei Sheshkal, Steven A. Hicks, Hugo L.Hammer, Sravanthi Parasa, Thomas de Lange, Pål Halvorsen, Michael A. Riegler
Processing medical data to find abnormalities is a time-consuming and costly task, requiring tremendous efforts from medical experts. Therefore, Ai has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. AI tools highly depend on data for training the models. However, there are several constraints to access to large amounts of medical data to train machine learning algorithms in the medical domain, e.g., due to privacy concerns and the costly, time-consuming medical data annotation process. To address this, in this paper we present a novel synthetic data generation pipeline called SinGAN-Seg to produce synthetic medical data with the corresponding annotated ground truth masks. We show that these synthetic data generation pipelines can be used as an alternative to bypass privacy concerns and as an alternative way to produce artificial segmentation datasets with corresponding ground truth masks to avoid the tedious medical data annotation process. As a proof of concept, we used an open polyp segmentation dataset. By training UNet++ using both the real polyp segmentation dataset and the corresponding synthetic dataset generated from the SinGAN-Seg pipeline, we show that the synthetic data can achieve a very close performance to the real data when the real segmentation datasets are large enough. In addition, we show that synthetic data generated from the SinGAN-Seg pipeline improving the performance of segmentation algorithms when the training dataset is very small. Since our SinGAN-Seg pipeline is applicable for any medical dataset, this pipeline can be used with any other segmentation datasets.
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2107.00471 [eess.IV]
(or arXiv:2107.00471v1 [eess.IV] for this version)
Reach out to me:
Check out my other articles on Medium. : https://machine-learning-made-simple....
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn: https://www.linkedin.com/in/devansh-d...
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
My Substack: https://devanshacc.substack.com/
Live conversations at twitch here: https://rb.gy/zlhk9y
Get a free stock on Robinhood: https://join.robinhood.com/fnud75
Classification of data is a data mining technique based on machine learning is used to classification of each item set in as a set of dataset into a set of predefined labelled as classes or groups. Classification is tasks for different application such as text classification, image classification, class’s predictions, data Classification etc. In this paper, we presenting the major classification techniques used for prediction of classes using supervised learning dataset. Several major types of classification method including Random Forest, Naive Bayes, Support Vector Machine (SVM) techniques. The goal of this review paper is to provide a review, accuracy and comparative between different classification techniques in data mining.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
ENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORSsipij
The brain images are indicating what condition the brain has. The objective of this research is to design a software that will automatically classifies the brain images to their associated disorders. In order to achieve the objective of this research, a database for training and testing the software of brain images must to be found. In this research we have 105 number of images in data set. In order to differentiate between the classes of those brain images, features had to be extracted from the images. Then, images will be classified into two classes normal and abnormal by using SVM and KNN classifier. The features that were extracted were used in the classification process. The classifiers performed really well, whereas the SVM classifier performed better since its accuracy is 100% on testing set. In the end, the software was successful in separating the two classes.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
Abstract- The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
When deep learners change their mind learning dynamics for active learningDevansh16
Abstract:
Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATAIJSCAI Journal
Advancement in information and technology has made a major impact on medical science where the
researchers come up with new ideas for improving the classification rate of various diseases. Breast cancer
is one such disease killing large number of people around the world. Diagnosing the disease at its earliest
instance makes a huge impact on its treatment. The authors propose a Binary Bat Algorithm (BBA) based
Feedforward Neural Network (FNN) hybrid model, where the advantages of BBA and efficiency of FNN is
exploited for the classification of three benchmark breast cancer datasets into malignant and benign cases.
Here BBA is used to generate a V-shaped hyperbolic tangent function for training the network and a fitness
function is used for error minimization. FNNBBA based classification produces 92.61% accuracy for
training data and 89.95% for testing data.
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...gerogepatton
Parkinson’s disease is a complex chronic neurodegenerative disorder of the central nervous system. One of the common symptoms for the Parkinson’s disease subjects, is vocal performance degradation. Patients usually advised to follow personalized rehabilitative treatment sessions with speech experts. Recent research trends aim to investigate the potential of using sustained vowel phonations for replicating the speech experts’ assessments of Parkinson’s disease subjects’ voices. With the purpose of improving the accuracy and efficiency of Parkinson’s disease treatment, this article proposes a two-stage diagnosis model to evaluate an LSVT dataset. Firstly, we propose a modified minimum Redundancy-Maximum Relevance (mRMR) feature selection approach, based on Cuckoo Search and Tabu Search to reduce the features numbers. Secondly, we apply simple random sampling technique to dataset to increase the samples of the minority class. Promisingly, the developed approach obtained a classification Accuracy rate of 95% with 24 features by 10-fold CV method.
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...ijistjournal
Feature extraction is a method of capturing visual content of an image. The feature extraction is the process to represent raw image in its reduced form to facilitate decision making such as pattern classification. We have tried to address the problem of classification MRI brain images by creating a robust and more accurate classifier which can act as an expert assistant to medical practitioners. The objective of this paper is to present a novel method of feature selection and extraction. This approach combines the Intensity, Texture, shape based features and classifies the tumor as white matter, Gray matter, CSF, abnormal and normal area. The experiment is performed on 140 tumor contained brain MR images from the Internet Brain Segmentation Repository. The proposed technique has been carried out over a larger database as compare to any previous work and is more robust and effective. PCA and Linear Discriminant Analysis (LDA) were applied on the training sets. The Support Vector Machine (SVM) classifier served as a comparison of nonlinear techniques Vs linear ones. PCA and LDA methods are used to reduce the number of features used. The feature selection using the proposed technique is more beneficial as it analyses the data according to grouping class variable and gives reduced feature set with high classification accuracy.
Classification of medical datasets using back propagation neural network powe...IJECEIAES
The classification is a one of the most indispensable domains in the data mining and machine learning. The classification process has a good reputation in the area of diseases diagnosis by computer systems where the progress in smart technologies of computer can be invested in diagnosing various diseases based on data of real patients documented in databases. The paper introduced a methodology for diagnosing a set of diseases including two types of cancer (breast cancer and lung), two datasets for diabetes and heart attack. Back Propagation Neural Network plays the role of classifier. The performance of neural net is enhanced by using the genetic algorithm which provides the classifier with the optimal features to raise the classification rate to the highest possible. The system showed high efficiency in dealing with databases differs from each other in size, number of features and nature of the data and this is what the results illustrated, where the ratio of the classification reached to 100% in most datasets).
A chi-square-SVM based pedagogical rule extraction method for microarray data...IJAAS Team
Support Vector Machine (SVM) is currently an efficient classification technique due to its ability to capture nonlinearities in diagnostic systems, but it does not reveal the knowledge learnt during training. It is important to understand of how a decision is reached in the machine learning technology, such as bioinformatics. On the other hand, a decision tree has good comprehensibility; the process of converting such incomprehensible models into an understandable model is often regarded as rule extraction. In this paper we proposed an approach for extracting rules from SVM for microarray dataset by combining the merits of both the SVM and decision tree. The proposed approach consists of three steps; the SVM-CHI-SQUARE is employed to reduce the feature set. Dataset with reduced features is used to obtain SVM model and synthetic data is generated. Classification and Regression Tree (CART) is used to generate Rules as the Last phase. We use breast masses dataset from UCI repository where comprehensibility is a key requirement. From the result of the experiment as the reduced feature dataset is used, the proposed approach extracts smaller length rules, thereby improving the comprehensibility of the system. We obtained accuracy of 93.53%, sensitivity of 89.58%, specificity of 96.70%, and training time of 3.195 seconds. A comparative analysis is carried out done with other algorithms.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
بعض (وليس الكل) ملخصات الأبحاث الجيدة المنشورة فى بعض المجلات الجيدة وفيها تنوع من الافكار الابحاث الابتكارية التى يخدم فيها علوم الحاسبات فيها - انها تطبيقات حياتية
An Artificial Neural Network Model for Neonatal Disease DiagnosisWaqas Tariq
The significance of disease diagnosis by artificial intelligence is not obscure now days. The increasing demand of Artificial Neural Network application for predicting the disease shows better performance in the field of medical decision making. This paper represents the use of artificial neural networks in predicting neonatal disease diagnosis. The proposed technique involves training a Multi Layer Perceptron with a BP learning algorithm to recognize a pattern for the diagnosing and prediction of neonatal diseases. A comparative study of using different training algorithm of MLP, Quick Propagation, Conjugate Gradient Descent, shows the higher prediction accuracy. The Backpropogation algorithm was used to train the ANN architecture and the same has been tested for the various categories of neonatal disease. About 94 cases of different sign and symptoms parameter have been tested in this model. This study exhibits ANN based prediction of neonatal disease and improves the diagnosis accuracy of 75% with higher stability. Key words: Artificial Intelligence, Multi Layer Perceptron, Neural Network, Neonate
An approach for breast cancer diagnosis classification using neural networkacijjournal
Artificial neural network has been widely used in various fields as an intelligent tool in recent years, such
as artificial intelligence, pattern recognition, medical diagnosis, machine learning and so on. The
classification of breast cancer is a medical application that poses a great challenge for researchers and
scientists. Recently, the neural network has become a popular tool in the classification of cancer datasets.
Classification is one of the most active research and application areas of neural networks. Major
disadvantages of artificial neural network (ANN) classifier are due to its sluggish convergence and always
being trapped at the local minima. To overcome this problem, differential evolution algorithm (DE) has
been used to determine optimal value or near optimal value for ANN parameters. DE has been applied
successfully to improve ANN learning from previous studies. However, there are still some issues on DE
approach such as longer training time and lower classification accuracy. To overcome these problems,
island based model has been proposed in this system. The aim of our study is to propose an approach for
breast cancer distinguishing between different classes of breast cancer. This approach is based on the
Wisconsin Diagnostic and Prognostic Breast Cancer and the classification of different types of breast
cancer datasets. The proposed system implements the island-based training method to be better accuracy
and less training time by using and analysing between two different migration topologies
Accounting for variance in machine learning benchmarksDevansh16
Accounting for Variance in Machine Learning Benchmarks
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
An efficient convolutional neural network-based classifier for an imbalanced ...IAESIJAI
Imbalanced datasets pose a major challenge for the researchers while addressing machine learning tasks. In these types of datasets, samples of different classes are not in equal proportion rather the gap between the numbers of individual class samples is significantly large. Classification models perform better for datasets having equal proportion of data tuples in both the classes. But, in reality, the medical image datasets are skewed and hence are not always suitable for a model to achieve improved classification performance. Therefore, various techniques have been suggested in the literature to overcome this challenge. This paper applies oversampling technique on an imbalanced dataset and focuses on a customized convolutional neural network model that classifies the images into two categories: diseased and non-diseased. Outcome of the proposed model can assist the health experts in the detection of oral cancer. The proposed model exhibits 99% accuracy after data augmentation. Performance metrics such as precision, recall and F1-score values are very close to 1. In addition, statistical test is performed to validate the statistical significance of the model. It has been found that the proposed model is an optimised classifier in terms of number of network layers and number of neurons.
Correlation of artificial neural network classification and nfrs attribute fi...eSAT Journals
Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data mining task, with roots in machine learning. Naïve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques, greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
Artificial neural networks (ANN) consider classification as one of the most dynamic research and
application areas. ANN is the branch of Artificial Intelligence (AI). The neural network was trained by
back propagation algorithm. The different combinations of functions and its effect while using ANN as a
classifier is studied and the correctness of these functions are analyzed for various kinds of datasets. The
back propagation neural network (BPNN) can be used as a highly successful tool for dataset classification
with suitable combination of training, learning and transfer functions. When the maximum likelihood
method was compared with backpropagation neural network method, the BPNN was more accurate than
maximum likelihood method. A high predictive ability with stable and well functioning BPNN is possible.
Multilayer feed-forward neural network algorithm is also used for classification. However BPNN proves to
be more effective than other classification algorithms.
In this research, a hybrid wrapper model is proposed to identify the featured gene subset from the gene expression data. To balance the gap between exploration
and exploitation, a hybrid model with a popular meta-heuristic algorithm named
spider monkey optimizer (SMO) and simulated annealing (SA) is applied. In the proposed model, ReliefF is used as a filter to obtain the relevant gene subset
from dataset by removing the noise and outliers prior to feeding the data to the
wrapper SMO. To enhance the quality of the solution, simulated annealing is
deployed as local search with the SMO in the second phase, which will guide to the detection of the most optimal feature subset. To evaluate the performance of the proposed model, support vector machine (SVM) as a fitness function to recognize the most informative biomarker gene from the cancer datasets along with University of California, Irvine (UCI) datasets. To further evaluate the model, 4 different classifiers (SVM, na¨ıve Bayes (NB), decision tree (DT), and k-nearest neighbors (KNN)) are used. From the experimental results and analysis, it’s noteworthy to accept that the ReliefF-SMO-SA-SVM performs relatively better than its state-of-the-art counterparts. For cancer datasets, our model performs better in terms of accuracy with a maximum of 99.45%.
ENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORSsipij
The brain images are indicating what condition the brain has. The objective of this research is to design a software that will automatically classifies the brain images to their associated disorders. In order to achieve the objective of this research, a database for training and testing the software of brain images must to be found. In this research we have 105 number of images in data set. In order to differentiate between the classes of those brain images, features had to be extracted from the images. Then, images will be classified into two classes normal and abnormal by using SVM and KNN classifier. The features that were extracted were used in the classification process. The classifiers performed really well, whereas the SVM classifier performed better since its accuracy is 100% on testing set. In the end, the software was successful in separating the two classes.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
Abstract- The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
When deep learners change their mind learning dynamics for active learningDevansh16
Abstract:
Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATAIJSCAI Journal
Advancement in information and technology has made a major impact on medical science where the
researchers come up with new ideas for improving the classification rate of various diseases. Breast cancer
is one such disease killing large number of people around the world. Diagnosing the disease at its earliest
instance makes a huge impact on its treatment. The authors propose a Binary Bat Algorithm (BBA) based
Feedforward Neural Network (FNN) hybrid model, where the advantages of BBA and efficiency of FNN is
exploited for the classification of three benchmark breast cancer datasets into malignant and benign cases.
Here BBA is used to generate a V-shaped hyperbolic tangent function for training the network and a fitness
function is used for error minimization. FNNBBA based classification produces 92.61% accuracy for
training data and 89.95% for testing data.
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...gerogepatton
Parkinson’s disease is a complex chronic neurodegenerative disorder of the central nervous system. One of the common symptoms for the Parkinson’s disease subjects, is vocal performance degradation. Patients usually advised to follow personalized rehabilitative treatment sessions with speech experts. Recent research trends aim to investigate the potential of using sustained vowel phonations for replicating the speech experts’ assessments of Parkinson’s disease subjects’ voices. With the purpose of improving the accuracy and efficiency of Parkinson’s disease treatment, this article proposes a two-stage diagnosis model to evaluate an LSVT dataset. Firstly, we propose a modified minimum Redundancy-Maximum Relevance (mRMR) feature selection approach, based on Cuckoo Search and Tabu Search to reduce the features numbers. Secondly, we apply simple random sampling technique to dataset to increase the samples of the minority class. Promisingly, the developed approach obtained a classification Accuracy rate of 95% with 24 features by 10-fold CV method.
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...ijistjournal
Feature extraction is a method of capturing visual content of an image. The feature extraction is the process to represent raw image in its reduced form to facilitate decision making such as pattern classification. We have tried to address the problem of classification MRI brain images by creating a robust and more accurate classifier which can act as an expert assistant to medical practitioners. The objective of this paper is to present a novel method of feature selection and extraction. This approach combines the Intensity, Texture, shape based features and classifies the tumor as white matter, Gray matter, CSF, abnormal and normal area. The experiment is performed on 140 tumor contained brain MR images from the Internet Brain Segmentation Repository. The proposed technique has been carried out over a larger database as compare to any previous work and is more robust and effective. PCA and Linear Discriminant Analysis (LDA) were applied on the training sets. The Support Vector Machine (SVM) classifier served as a comparison of nonlinear techniques Vs linear ones. PCA and LDA methods are used to reduce the number of features used. The feature selection using the proposed technique is more beneficial as it analyses the data according to grouping class variable and gives reduced feature set with high classification accuracy.
Classification of medical datasets using back propagation neural network powe...IJECEIAES
The classification is a one of the most indispensable domains in the data mining and machine learning. The classification process has a good reputation in the area of diseases diagnosis by computer systems where the progress in smart technologies of computer can be invested in diagnosing various diseases based on data of real patients documented in databases. The paper introduced a methodology for diagnosing a set of diseases including two types of cancer (breast cancer and lung), two datasets for diabetes and heart attack. Back Propagation Neural Network plays the role of classifier. The performance of neural net is enhanced by using the genetic algorithm which provides the classifier with the optimal features to raise the classification rate to the highest possible. The system showed high efficiency in dealing with databases differs from each other in size, number of features and nature of the data and this is what the results illustrated, where the ratio of the classification reached to 100% in most datasets).
A chi-square-SVM based pedagogical rule extraction method for microarray data...IJAAS Team
Support Vector Machine (SVM) is currently an efficient classification technique due to its ability to capture nonlinearities in diagnostic systems, but it does not reveal the knowledge learnt during training. It is important to understand of how a decision is reached in the machine learning technology, such as bioinformatics. On the other hand, a decision tree has good comprehensibility; the process of converting such incomprehensible models into an understandable model is often regarded as rule extraction. In this paper we proposed an approach for extracting rules from SVM for microarray dataset by combining the merits of both the SVM and decision tree. The proposed approach consists of three steps; the SVM-CHI-SQUARE is employed to reduce the feature set. Dataset with reduced features is used to obtain SVM model and synthetic data is generated. Classification and Regression Tree (CART) is used to generate Rules as the Last phase. We use breast masses dataset from UCI repository where comprehensibility is a key requirement. From the result of the experiment as the reduced feature dataset is used, the proposed approach extracts smaller length rules, thereby improving the comprehensibility of the system. We obtained accuracy of 93.53%, sensitivity of 89.58%, specificity of 96.70%, and training time of 3.195 seconds. A comparative analysis is carried out done with other algorithms.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
بعض (وليس الكل) ملخصات الأبحاث الجيدة المنشورة فى بعض المجلات الجيدة وفيها تنوع من الافكار الابحاث الابتكارية التى يخدم فيها علوم الحاسبات فيها - انها تطبيقات حياتية
An Artificial Neural Network Model for Neonatal Disease DiagnosisWaqas Tariq
The significance of disease diagnosis by artificial intelligence is not obscure now days. The increasing demand of Artificial Neural Network application for predicting the disease shows better performance in the field of medical decision making. This paper represents the use of artificial neural networks in predicting neonatal disease diagnosis. The proposed technique involves training a Multi Layer Perceptron with a BP learning algorithm to recognize a pattern for the diagnosing and prediction of neonatal diseases. A comparative study of using different training algorithm of MLP, Quick Propagation, Conjugate Gradient Descent, shows the higher prediction accuracy. The Backpropogation algorithm was used to train the ANN architecture and the same has been tested for the various categories of neonatal disease. About 94 cases of different sign and symptoms parameter have been tested in this model. This study exhibits ANN based prediction of neonatal disease and improves the diagnosis accuracy of 75% with higher stability. Key words: Artificial Intelligence, Multi Layer Perceptron, Neural Network, Neonate
An approach for breast cancer diagnosis classification using neural networkacijjournal
Artificial neural network has been widely used in various fields as an intelligent tool in recent years, such
as artificial intelligence, pattern recognition, medical diagnosis, machine learning and so on. The
classification of breast cancer is a medical application that poses a great challenge for researchers and
scientists. Recently, the neural network has become a popular tool in the classification of cancer datasets.
Classification is one of the most active research and application areas of neural networks. Major
disadvantages of artificial neural network (ANN) classifier are due to its sluggish convergence and always
being trapped at the local minima. To overcome this problem, differential evolution algorithm (DE) has
been used to determine optimal value or near optimal value for ANN parameters. DE has been applied
successfully to improve ANN learning from previous studies. However, there are still some issues on DE
approach such as longer training time and lower classification accuracy. To overcome these problems,
island based model has been proposed in this system. The aim of our study is to propose an approach for
breast cancer distinguishing between different classes of breast cancer. This approach is based on the
Wisconsin Diagnostic and Prognostic Breast Cancer and the classification of different types of breast
cancer datasets. The proposed system implements the island-based training method to be better accuracy
and less training time by using and analysing between two different migration topologies
Accounting for variance in machine learning benchmarksDevansh16
Accounting for Variance in Machine Learning Benchmarks
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
An efficient convolutional neural network-based classifier for an imbalanced ...IAESIJAI
Imbalanced datasets pose a major challenge for the researchers while addressing machine learning tasks. In these types of datasets, samples of different classes are not in equal proportion rather the gap between the numbers of individual class samples is significantly large. Classification models perform better for datasets having equal proportion of data tuples in both the classes. But, in reality, the medical image datasets are skewed and hence are not always suitable for a model to achieve improved classification performance. Therefore, various techniques have been suggested in the literature to overcome this challenge. This paper applies oversampling technique on an imbalanced dataset and focuses on a customized convolutional neural network model that classifies the images into two categories: diseased and non-diseased. Outcome of the proposed model can assist the health experts in the detection of oral cancer. The proposed model exhibits 99% accuracy after data augmentation. Performance metrics such as precision, recall and F1-score values are very close to 1. In addition, statistical test is performed to validate the statistical significance of the model. It has been found that the proposed model is an optimised classifier in terms of number of network layers and number of neurons.
Correlation of artificial neural network classification and nfrs attribute fi...eSAT Journals
Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data mining task, with roots in machine learning. Naïve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques, greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
Artificial neural networks (ANN) consider classification as one of the most dynamic research and
application areas. ANN is the branch of Artificial Intelligence (AI). The neural network was trained by
back propagation algorithm. The different combinations of functions and its effect while using ANN as a
classifier is studied and the correctness of these functions are analyzed for various kinds of datasets. The
back propagation neural network (BPNN) can be used as a highly successful tool for dataset classification
with suitable combination of training, learning and transfer functions. When the maximum likelihood
method was compared with backpropagation neural network method, the BPNN was more accurate than
maximum likelihood method. A high predictive ability with stable and well functioning BPNN is possible.
Multilayer feed-forward neural network algorithm is also used for classification. However BPNN proves to
be more effective than other classification algorithms.
In this research, a hybrid wrapper model is proposed to identify the featured gene subset from the gene expression data. To balance the gap between exploration
and exploitation, a hybrid model with a popular meta-heuristic algorithm named
spider monkey optimizer (SMO) and simulated annealing (SA) is applied. In the proposed model, ReliefF is used as a filter to obtain the relevant gene subset
from dataset by removing the noise and outliers prior to feeding the data to the
wrapper SMO. To enhance the quality of the solution, simulated annealing is
deployed as local search with the SMO in the second phase, which will guide to the detection of the most optimal feature subset. To evaluate the performance of the proposed model, support vector machine (SVM) as a fitness function to recognize the most informative biomarker gene from the cancer datasets along with University of California, Irvine (UCI) datasets. To further evaluate the model, 4 different classifiers (SVM, na¨ıve Bayes (NB), decision tree (DT), and k-nearest neighbors (KNN)) are used. From the experimental results and analysis, it’s noteworthy to accept that the ReliefF-SMO-SA-SVM performs relatively better than its state-of-the-art counterparts. For cancer datasets, our model performs better in terms of accuracy with a maximum of 99.45%.
Iganfis Data Mining Approach for Forecasting Cancer Threatsijsrd.com
Healthcare facilities have at their disposal vast amounts of cancer patients’ data. Medical practitioners require more efficient techniques to extract relevant knowledge from this data for accurate decision-making. However the challenge is how to extract and act upon it in a timely manner. If well engineered, the huge data can aid in developing expert systems for decision support that can assist physicians in diagnosing and predicting some debilitating life threatening diseases such as cancer. Expert systems for decision support can reduce the cost, the waiting time, and liberate medical practitioners for more research, as well as reduce errors and mistakes that can be made by humans due to fatigue and tiredness. The process of utilizing health data effectively however, involves many challenges such as the problem of missing feature values, the curse of dimensionality due to a large number of attributes, and the course of actions to determine the features that can lead to more accurate diagnosis. Effective data mining tools can assist in early detection of diseases such as cancer. In This paper, we propose a new approach called IGANFIS. This approach optimally minimizes the number of features using the information gain (IG) algorithm which is usually used in text categorization to select the quality of text. The IG will be used for selecting the quality of cancer features by virtue of reducing them in number. The reduced number quality features dataset will then be applied to the Adaptive Neuro Fuzzy Inference System (ANFIS) to train and test the proposed approach. ANFIS method of training is ideally the hybrid learning algorithm which uses the gradient descent method and Least Square Estimate (LSE) for computing the error measure for each training pair. Each cycle of the ANFIS hybrid learning consists of a forward pass to present the input vector calculating the node outputs layer by layer repeating the process for all data and a backward pass using the steepest descent algorithm to update parameters, a process called back propagation.
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly.
For more information:
http://societyofdatascientists.com/controlling-informative-features-for-improved-accuracy-and-faster-predictions-in-omentum-cancer-models/?src=slideshare
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...IJERDJOURNAL
Abstract:- Predictive data mining is an upcoming and fast-growing field and offers a competitive edge for the benefit of organization. In recent decades, researchers have developed new techniques and intelligent algorithms for predictive data mining. In this research paper, we have proposed a novel training algorithm for optimizing neural networks for prediction purpose and to utilize it for the development of prediction models. Models developed in MATLAB Neural Network Toolbox have been tested for insurance datasets taken from a live data warehouse. A comparative study of the proposed algorithm with other popular first and second order algorithms has been presented to judge the predictive accuracy of the suggested technique. Various graphs have been presented to analyse the convergence behaviour of different algorithms towards point of minimum error.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
A class skew-insensitive ACO-based decision tree algorithm for imbalanced dat...nooriasukmaningtyas
Ant-tree-miner (ATM) has an advantage over the conventional decision tree algorithm in terms of feature selection. However, real world applications commonly involved imbalanced class problem where the classes have different importance. This condition impeded the entropy-based heuristic of existing ATM algorithm to develop effective decision boundaries due to its biasness towards the dominant class. Consequently, the induced decision trees are dominated by the majority class which lack in predictive ability on the rare class. This study proposed an enhanced algorithm called hellingerant-tree-miner (HATM) which is inspired by ant colony optimization (ACO) metaheuristic for imbalanced learning using decision tree classification algorithm. The proposed algorithm was compared to the existing algorithm, ATM in nine (9) publicly available imbalanced data sets. Simulation study reveals the superiority of HATM when the sample size increases with skewed class (Imbalanced Ratio < 50%). Experimental results demonstrate the performance of the existing algorithm measured by BACC has been improved due to the class skew-insensitiveness of hellinger distance. The statistical significance test shows that HATM has higher mean BACC score than ATM.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
Classification Of Iris Plant Using Feedforward Neural Networkirjes
The classification and recognition of type on the basis of individual features and behaviors constitute
a preliminary measure and is an important target in the behavioral sciences. Current statistical methods do not
always yield satisfactory answers. A Feed Forward Artificial Neural Network is the computer model inspired by
the structure of the Human Brain. It views as in the set of artificial nerve cells that are interconnected with the
other neurons. The primary aim of this paper is to demonstrate the process of developing the Artificial Neural
network based classifier which classifies the Iris database. The problem concerns the identification of Iris plant
species on the basis of plant attribute measurements. This paper is related to the use of feed forward neural
networks towards the identification of iris plants on the basis of the following measurements: sepal length, sepal
width, petal length, and petal width. Using this data set a Neural Network (NN) is used for the classification of
iris data set. The EBPA is used for training of this ANN. The results of simulations illustrate the effectiveness of
the neural system in iris class identification.
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
The development of microarray technology has suppli
ed a large volume of data to many fields. The gene
microarray analysis and classification have demonst
rated an effective way for the effective diagnosis
of
diseases and cancers. In as much as the data achiev
ing from microarray technology is very noisy and al
so
has thousands of features, feature selection plays
an important role in removing irrelevant and redund
ant
features and also reducing computational complexity
. There are two important approaches for gene
selection in microarray data analysis, the filters
and the wrappers. To select a concise subset of inf
ormative
genes, we introduce a hybrid feature selection whic
h combines two approaches. The fact of the matter i
s
that candidate’s features are first selected from t
he original set via several effective filters. The
candidate
feature set is further refined by more accurate wra
ppers. Thus, we can take advantage of both the filt
ers
and wrappers. Experimental results based on 11 micr
oarray datasets show that our mechanism can be
effected with a smaller feature set. Moreover, thes
e feature subsets can be obtained in a reasonable t
ime
Similar to AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATASETS (20)
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATASETS
1. International Journal on Soft Computing (IJSC) Vol.8, No. 3/4, November 2017
DOI:10.5121/ijsc.2017.8401 1
AN EFFICIENT PSO BASED ENSEMBLE
CLASSIFICATION MODEL ON HIGH DIMENSIONAL
DATASETS
G. Lalitha Kumari1
and N. Naga Malleswara Rao2
1
Research Scholar, Acharya Nagarjuna University, Guntur, AP, India
2
Professor, Dept. of IT, RVR & JC College of Engineering, Guntur, AP, India
Abstract
As the size of the biomedical databases are growing day by day, finding an essential features in the disease
prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to
analyze, predict and interpret the feature information using the traditional feature selection based
classification models. Most of the traditional feature selection based classification algorithms have
computational issues such as dimension reduction, uncertainty and class imbalance on microarray
datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high
efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle
swarm optimization (PSO) based Ensemble classification model was developed on high dimensional micro-
array datasets. Experimental results proved that the proposed model has high computational efficiency
compared to the traditional feature selection based classification models in terms of accuracy , true
positive rate and error rate are concerned.
Keywords
PSO, Neural network, Ensemble classification, High dimension dataset.
1. INTRODUCTION
Feature selection is an important and essential technique used in data filtering for improving
machine learning models on large databases. Different feature selection models have been
implemented in the literature, each of them uses a feature evaluation measure to extract feature
subsets on the same databases. Learning classification models with all the high dimensional
features may result serious issues such as performance and scalability. Feature selection is the
process of selecting subset of features so that the original feature space is reduced using the
selection measures. Feature selection measures can be categorized into three types: wrappers,
filters and embedded models.
Extreme learning has wide range of applications in different domains like face recognition,
human action recognition, landmark recognition and protein sequence classification, medical
disease prediction [1]. Extreme Learning Machine can be defined as a single-hidden layer feed-
forward neural network (SLFN) with learning model [2-4]. The traditional optimization
approaches like gradient descent based back-propagation [5] evaluate weights and biases. The
proposed technique is responsible for decreasing the training time effectively through random
assignment of weights and biases. The above extended EL method results better efficiency and
performance as compared to all traditional approaches. But, EL has two major issues, those are:-
2. International Journal on Soft Computing (IJSC) Vol.8, No. 3/4, November 2017
2
1)This model has over fitting problem and the performance cannot be predicted for unknown
datasets. 2) This model is not applicable to binary classification and uncertain datasets.
Feed-forward neural network can be considered as most commonly and widely implemented
neural network. The method has one or more hidden layer(s) along with an output layer. The
output layer is responsible for transmitting final response on the training dataset [6]. A large
number of research works have been implemented in the field of Feed-Forward Neural Networks
since years. This model has complex linear or nonlinear structure directly mapping from inputs.
These structures are not appropriate for classical parametric constraints to manage large inputs in
the traditional models. Another important feature of feed-forward neural network is the inter-
dependency among the layers through parameter mapping. Single-Hidden-Layer Feed-Forward
Networks (SLFNs) are treated as the most efficient and widely used feed-forward neural networks
on small datasets.
In order to resolve the issues of traditional EL, weighted-EL approach is developed subsequently.
The weights are increased gradually with respect to time in case of large sample size. In most of
the feed forward ANN techniques, parameters of each and every layer is required to be tuned
through several learning approaches. Gradient Descent-Based Approaches and Back-Propagation
(BP) techniques are some important learning algorithms in feed forward neural networks [7]. The
speed of learning models is very slow in case of feed forward neural networks compared to ANN.
Because of better generalization capability and fast computational speed, EL approach is named
as 'Extreme Learning Machine' (EL). A lot of problems are detected in case of conventional
Gradient-Descent Algorithms like stopping criterion, learning rate, number of epochs and local
minima [9].
The process of feature extraction has high significance in the field of classification. All features
are divided into two groups, those are:-
1) According to the first group, features extraction using noisy attributes and contextual
information.
2) The second group contains correlated features.
Traditional feature extraction models discard noisy features in order to decrease the high
dimensional features to a lower dimensional feature. Let us assume Nmin and Nmax are
minimum and maximum numbers of hidden neurons respectively, where N denotes the present
value of hidden neurons. For each and every N, the average accuracy rate of EL through 10-fold
cross-validation scheme is evaluated. At last, hidden neurons having maximum average accuracy
is chosen as optimal. After selecting the optimal numbers of hidden neurons, the EL classifier is
implemented in order to evaluate the classification accuracy by considering the outcomes of
principle component analysis(PCA) and the outcomes are averaged later.
The training datasets used in this paper have a significant issue for any classification models as
they have large number of feature space, ranging from 100 to 12000 features. The larger the
feature space increases the search space and computational memory for disease prediction.
Another crucial issue for handling the high dimensional features is the small sample size problem.
The accuracy of the model employed will be reduced if the size of the training data is not
sufficient relative to the feature space.
In the past, machine learning models used a single classification model to predict the test data
using the training samples. However, multiple classifiers can be used to predict the same test data
using the training samples. This process is known as ensemble learning. Ensemble classification
3. International Journal on Soft Computing (IJSC) Vol.8, No. 3/4, November 2017
3
has been successfully applied to different classification problems to improve the classification
accuracy using the optimal feature selection measures.
Particle swarm optimization (PSO) is very popular optimization techniques in machine learning
models. PSO is generally applied in the literature to adjust the initialization parameters of base
classifiers in the ensemble learning models. The main objective of this paper is to optimize the
traditional PSO parameters in the ensemble classification model in order to improve the accuracy
and error rate. In the ensemble model, neural network is used as one of the base classifier and
weights are initialized using the proposed PSO technique.
The main contribution to this paper includes:
• A novel PSO based ensemble model is usually designed and implemented to improve the
overall classification true positive rate on high dimensional feature selection.
• Ensemble learning model is constructed from a group of base classifiers to predict the
high dimensional class labels. Here, search space of the traditional PSO model is
optimized using the proposed measures such as optimized fitness measure and chaos
gauss based randomization measure.
2. RELATED WORKS
Medical data prediction is the most important and complicated requirement in recent era. Many
approaches are developed in the field of medical disease prediction such as medical disease
prediction. In [9], association rule mining approach is integrated with multi-layer
perceptron(MLP) and back propagation approaches in order to predict and detect the chances of
breast cancer. Modular neural network is basically implemented to recognise and analyse cardiac
diseases using Gravitational Search Algorithm (GSA) and fuzzy logic embedded algorithm for
better performance [3]. With the exponential growth of information technologies, data mining
approaches are implemented in various domains such as biomedical applications and disease
prediction. Special emphasis can be given in not only detecting cancer, but also predicting the
disease in an early stage [6].
According to [1], gene selection technique is used to enhance the performance of the classifier
through the detection of gene expression subsets. Among most widely implemented gene
selection approaches, some are given below:- principal component analysis [4], singular value
decomposition [5], independent component analysis [6], genetic algorithm (GA) [7], and
recursive feature elimination method [8, 9].A micro-array feature selection technique is generally
applied to the pre-existing approaches for pattern classification. It not only decreases the
influence of noise, but also reduces the error rate in medical datasets [4-6].
[10-12] Implemented a PSO based spectral filtering model to high dimensional features of the
original training data. Two reconstruction methods are used, one is the principle component
analysis and the other is Maximum likelihood estimation. Several distribution algorithms were
used in randomization models. In most of these techniques Bayesian analysis is used to predict
the original data distribution using the randomization operator and the randomization data.
Feature selection is an important step in defect prediction process and has been extensively
studied in the field of machine learning. [10-13] Investigated different feature selection models
that represent ranked list of features and applied them to UCI repository datasets. They concluded
that wrapper is best model for limited data and limited feature sets. Software defect prediction
models are commonly classified in the literature as Decision tree models, Support vector machine
models, artificial neural network models and Bayesian models are summarized in the table 1.
4. International Journal on Soft Computing (IJSC) Vol.8, No. 3/4, November 2017
4
Table 1: Overview of Traditional Classification Models on High dimensional Datasets
Classification Model Imbalance and
High
dimensionality
Property
Advantages Disadvantages
Decision Trees[5] Affected Robust to noisy data;
and decision rules
evaluation.
1)Prone to over-fitting.
2)Performance issue under
imbalanced property.
Artificial neural
networks
[7]
Affected Able to learn non-linear
functions. Robust
against errors.
1) Difficult to interpret
results.
2) Slow training and
prediction process.
Ensemble Models
[9-10]
Affected Best model for high
dimensional datasets
with complex feature
selection models
1)Fast processing
2)Low performance under
imbalanced data and
missing values.
Feature selection +
wrapper method[11] Highly Affected
The wrapper feature
selection approach is
useful in identifying
informative feature
subsets from high-
dimensional datasets.
Used traditional
algorithms such as
C4.5, KNN, Random
forest..
Limitations:
Accuracy is computed on
the subset of features.
Considers only less
number of features for
decision making patterns.
In the microarray datasets,
subset of features failed to
give affective decision
making patterns due to
insufficient gene patterns.
In order to handle feature
subsets (>100), single
classifier failed to perform
efficient results due to
memory constraints.
Ensemble classifier+
Biomedicine Field
affected Ensemble methods
proved to be superior to
individual classification
method for high
dimensional datasets.
The performance of the
classification need to
improve much.Features are
transformed into new ones
hence loss of originality. It
leads to overfitting of data.
Enhancing Ensemble
on Tweet Sentiment
Data
Affected Supports high
dimensionality upto
200 features..
Implemented existing base
classifiers for ensemble
model such as C4.5,
NB,SVM ,LR.Greatly
affects the Accuracy.
5. International Journal on Soft Computing (IJSC) Vol.8, No. 3/4, November 2017
5
3. PROPOSED MODEL
Proposed PSO based ensemble classifier is a multi-objective technique which finds the local and
global optimum solutions by iteratively searching in a high dimensional space. Traditionally, as
the size of the training dataset is small, medical disease prediction rate could be dramatically
reduced due to class imbalance and high dimension space. In this filtering method, each attribute
is tested for missing values. Most of the traditional methods are capable of extracting missing
values from training data that store numeric attributes. Filtering makes models more accurate and
faster. If the attribute is numerical and the value is null then it is replaced with computed value of
equation (1). Also, if the attribute is nominal and has missing values then it is replaced with
computed value of equation (2). Finally, if the class is numeric then it is converted to nominal.
Proposed PSO based ensemble model is usually designed and implemented to improve the overall
classification true positive rate on high dimensional feature selection. Generally, ensemble
learning model is constructed from a group of base classifiers to predict the high dimensional
class labels. Here, search space of the traditional PSO model is optimized using the proposed
measures such as optimized fitness measure and chaos gauss based randomization measure. In
our model, different base classifiers such as decision tree, Naïve Bayes, FFNN are used to test the
performance of proposed model to the traditional models.
Proposed Improved PSO based Ensemble Classification Algorithm:
Step 1: Data pre-processing on Training high dimensional data.
Load dataset 1 2 n
HD ,H D ....H D
For each attribute A(i) in the 1 2 n
HD ,H D ....H D
do
For each instance value I(j) in the A(i)
do
if(isNum(I(j)) && I(j)==null)
then
( ) ( ) ( ) ( )
n
A( j ) A( j ) A( j )
j 1/i j
I I I
j (| I(j) )I |) / (Max Min )
= ≠
= − µ −∑ ----(1)
end if
if(isNominal(Ai) && Ai(I)==null)
then
6. International Journal on Soft Computing (IJSC) Vol.8, No. 3/4, November 2017
6
Figure 1: Proposed Model
( )j Prior Pr ob(A(i),class ;I (m))= ---(2)
Here, mth class of the missing value is used to find the prior probability in place of missing value
end if
End for
Step 2: // improved PSO for optimal feature selection
a) Initializing particles with feature space, number of iterations, velocity, number of
particles etc.
b) Compute hybrid velocity and position for each particle in ‘d’ dimensions using the
following equations
7. International Journal on Soft Computing (IJSC) Vol.8, No. 3/4, November 2017
7
chaos1 i chaos2 i
2
chaos1 chaos2 chaos1 chaos2 chaos1 chaos2
v(d 1,i) .[ (d,i).v(d,i). (pBest X(d,i)) (gBest X(d,i))]
X(d 1,i) X(d,i) v(d 1,i)
is the convergence factor computed as
2
| 2 ( ) ( ) 4( ) |
+ = ψ ω θ − + θ −
+ = + +
ψ
ψ =
− θ + θ − θ + θ − θ + θ
chaos1 chaos2
------(3)
where , Ortho chaos gauss randomizationθ θ ∈
In this optimized model, inertia weight is computed as
max current max max min
max
min
max
(d,i) (I / I ).( )
: max inertia
: min inertia
I : max iteration
ω = ω − ω − ω
ω
ω
Step 3: Computing fitness value using ortho chaotic gauss randomization measure.
In this proposed PSO model, a random value between 0 to 1 is selected using the following
equation as
2
X
2
X
(X )
k k
i j j
X
k
j
1
R max{ . (1 ), e } -------(4)
2
K 1,2...iterations
(0,1)
−µ
−
σ
= µβ −β
πσ
=
β ∈
Proposed feature selection fitness measure is given as
|F|
i
i 1
i 1 i 2
f
1 2 i
i
f
i
f
Fitness w .acc w .(1 )
N
where w ,w R
f is the flag value 1 or 0 . '1' represents selectd feature , '0' non-selected feature.
N represents number of features.
acc : accuracy of the selecte
=
= + −
∈
∑
d features in the ith iteration
Step 4: Apply ensemble classification model with accuracy acci on the selected features in the ith
iteration. Here, Decision trees, NB, FFNN are used as base classifiers for high dimensional
classification.
Step 5: For each particle compute its fitness value and compute classification accuracy in the
previous step.
Step 6: Update particle velocity, position, global best and particle best according to the fitness
value conditions as shown in figure 1.
Step 7: This process is continuous until max iteration is reached. Otherwise go to step 2.
8. International Journal on Soft Computing (IJSC) Vol.8, No. 3/4, November 2017
8
4. EXPERIMENTAL RESULTS
In this section, we performed experiments using proposed model on well known microarray
datasets and compared the statistical performance with traditional models. Dataset [15] used for
experimental evaluation is listed in Table 2. Proposed model was implemented in the java
environment with Netbeans IDE. From these experimental results, 10% of the training instances
are used as test data for cross validation and performance analysis.
From the experimental results, it is clear that optimizing PSO in the base classifiers improves
classification rate along with true positive and true negative rate. Also, the main advantage of
using proposed model is to reduce the error rate on high dimensional features.
Table 2: Datasets and its characteristics
Micro array Datasets Attributes Size Data-Type
lung-Michigan 7000 Continuous/Numeric
lungCancer_train 12000 Continuous/Numeric
DLBCL-Stanford 4000 Continuous/Numeric
Proposed ensemble methods increase the performance of true positive rate and accuracy on entire
high dimensional datasets. Proposed model uses the entire training data set for construction of
decision patterns; therefore the prediction accuracy of each cross validation tends to be more
accurate than the traditional ensemble classification models.
True negative measure evaluates the ratio of number of instances that are not affected with the
cancer disease, which are identified correctly.
True positive measure defines the ratio of number of cancer disease instances that have been
predicted as positive rate.
Mean Absolute error Error: Average misclassification rate of each test data in the cross
validation.
Precision measure computes the ratio of correctly predicted cancer instances among the entire
disease affected instances.
Gene based cancer disease patterns using proposed feature selection based ensemble learning
model on ovarian cancer dataset.
Table 3, describes the optimization of improved PSO of individual weak classifiers using the
ensemble classifier. Here, the performance evaluation are performed in terms of statistical
analysis, true positive rate and error rate on ovarian dataset. From the table, it is clearly observed
that the proposed optimization model has high computational efficiency compared to traditional
classification models.
Table 3: Performance analyses of classification accuracy on ovarian dataset on 10 cross validation.
10 % test data Ovarian Dataset
Model
True
positive
Error
Rate
Run
time(ms)
PSO+NaiveBayes 0.87 0.2492 4526
PSO+RandomForest 0.8923 0.2293 4297
PSO+Neuralnetwork 0.9055 0.1987 3975
IPSO-Ensemble 0.9575 0.1314 3297
9. International Journal on Soft Computing (IJSC) Vol.8, No. 3/4, November 2017
9
Figure 3, describes the optimization of improved PSO of individual weak classifiers using the
ensemble classifier in graphical chart. Here, the performance evaluations are performed in terms
of statistical analysis, true positive rate and error rate on ovarian dataset. From the table, it is
clearly observed that the proposed optimization model has high computational efficiency
compared to traditional classification models.
Figure 3: Performance of ovarian cancer dataset using IPSO ensemble classification model with existing
models.
Table 4 and Figure 4 describe the performance of the improved PSO algorithm with ensemble
classification to the traditional models on microarray cancer datasets. From the table, it is clearly
observed that the proposed model has high accuracy and less error rate compare to traditional
models in terms of error rate, accuracy and runtime.
Table 4: Performance analysis of proposed model to the traditional models on three microarray datasets.
Datasets PSO+NaiveBayes PSO+C4.5 PSO+FFNN IPSO+Ensemble
Lung Cancer 0.697 0.8084 0.8525 0.9374
Lung Michigan 0.7935 0.8274 0.845 0.9153
Lymphoma 0.8153 0.8253 0.8574 0.9646
ErrorRate 0.364 0.304 0.294 0.225
Runtime(ms) 7351 6253 6364 5254
0
0.2
0.4
0.6
0.8
1
1.2
PerformanceAnalysis
Models
TruePositive
Error Rate
10. International Journal on Soft Computing (IJSC) Vol.8, No. 3/4, November 2017
10
0
0.2
0.4
0.6
0.8
1
1.2
Analysis
Models
Lung Cancer
Lung Michigan
Lymphoma
ErrorRate
Figure 4 : Performance of proposed model to traditional models in terms of accuracy, error rate and run-
time.
5. CONCLUSION
In this paper, an optimized PSO based feature selection method is integrated with the ensemble
model on microarray datasets. Most of the traditional feature selection based classification
algorithms have computational issues such as dimension reduction, uncertainty and class
imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme
learning machine due to its high efficiency, the fast processing speed for real-time applications.
Experimental results proved that the proposed model has high computational efficiency compared
to the traditional feature selection based classification models in terms of accuracy, true positive
rate and error rate are concerned.
REFERENCES
[1] Y. Yan, Q. Zhu, M. Shyu, and S. Chen, “A Classifier Ensemble Framework for Multimedia Big Data
Classification”, “IEEE 17th International Conference on Information Reuse and Integration”, pp. 615-
622, 2016.
[2] I. Triguero, M. Galar, S. Vluymans, C. Cornelis, H. Bustince, F. Herrera and Y. Saeys, “Evolutionary
Undersampling for Imbalanced Big Data Classification “,pp.715-722, 2015.
[3] I. Triguero, M. Galar, D. Merino, J. Maillo, H. Bustince, F. Herrera, “Evolutionary Undersampling for
Extremely Imbalanced Big Data Classification under Apache Spark”, “IEEE Congress on
Evolutionary Computation (CEC)”, pp.640-647, 2016.
[4] M. Popescu and J. M. Keller, “Random projections fuzzy k-nearest neighbor(RPFKNN) for big data
classification “, “IEEE International Conference on Fuzzy Systems (FUZZ)”,pp.1813-1817, 2016.
[5] R. Pakdel and J. Herbert, “Efficient Cloud-Based Framework for Big Data Classification”, “IEEE
Second International Conference on Big Data Computing Service and Applications “, pp.195-201,
2016.
[6] C. Lemnaru, M. Cuibus, A. Bona, A. Alic and R. Potolea, “A Distributed Methodology for
Imbalanced Classification Problems“, “11th International Symposium on Parallel and Distributed
Computing “, pp. 164-171, 2012.
[7] X. R. Jenifer and R. Lawrance, “An Adaptive Classification Model for Microarray Analysis using Big
Data”, 2016.
11. International Journal on Soft Computing (IJSC) Vol.8, No. 3/4, November 2017
11
[8] B. Jaison, A. Chilambuchelvan and K. A. Junaid, “Hybrid Classification Techniques for Microarray
Data “,”Springer Natl. Acad. Sci. Lett. “.
[9] N. C. Salvatore, A. Cerasa, I. Castiglioni, F. Gallivanone, A. Augimeri, M. Lopez, G. Arabia, M.
Morelli, M. C. Gilardi and A. Quattrone, “Machine learning on brain MRI data for differential
diagnosis of Parkinson’s disease and Progressive Supranuclear Palsy”, “Journal of Neuroscience
Methods “, pp.230– 237, 2014.
[10] M. K. Shahsavari, H. R. Bakhsh and H. Rashidi, “Efficient Classification of Parkinson’s Disease
Using Extreme Learning Machine and Hybrid Particle Swarm Optimization”, “4th International
Conference on Control, Instrumentation, and Automation (ICCIA) 27-28 January 2016, Qazvin
Islamic Azad University, Qazvin, Iran”, pp.148-154, 2016.
[11] C. Lemnaru, M. Cuibus, A. Bona, A. Alic and R. Potolea, “A Distributed Methodology for
Imbalanced Classification Problems“, “11th International Symposium on Parallel and Distributed
Computing “, pp. 164-171, 2012.
[12] X. R. Jenifer and R. Lawrance, “An Adaptive Classification Model for Microarray Analysis using Big
Data”, 2016.
[13] B. Jaison, A. Chilambuchelvan and K. A. Junaid, “Hybrid Classification Techniques for Microarray
Data “,”Springer Natl. Acad. Sci. Lett. “.
[14] A. Govada, V. S. Thomas, I. Samal and S. K. Sahay, “Distributed multi-class rule based classification
using RIPPER”, “IEEE International Conference on Computer and Information Technology”, pp.303-
309, 2016.
[15] http://datam.i2r.astar.edu.sg/datasets/krbd/index.html
AUTHORS
Lalitha Kumari Gaddala, B.Tech,M.Tech,has a work experience of 12 years. She is
pursuing her Ph.D in Computer Science and Engineering under Acharya Nagarjuna
University, Guntur and Andhra Pradesh. She is working as Senior Assistant Professor in
Prasad V Potluri Siddhartha Institute of Technology, Vijayawada. Her research interests
are Soft Computing-Optimization algorithms, Machine Learning. She has published few
papers in International conferences and International Journals.
Dr. N. Naga MalleswaraRao,B.Tech, M.Tech and Ph.D, has a work experience of 25
years and 9 Months. He is working as Professor, Department of IT, RVR& JC College
of Engineering, Guntur (Dt). His research interests are Computer Algorithms,
Compilers, and Image Processing. He has published few papers in International
conferences, National Conferences and International Journals.