International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Feature Selection Algorithm for Supervised and Semisupervised ClusteringEditor IJCATR
This document summarizes a research paper on feature selection algorithms for supervised and semi-supervised clustering. It discusses how semi-supervised learning uses both labeled and unlabeled data for training, between unsupervised and supervised learning. It also describes a fast clustering-based feature selection algorithm (FAST) that works in two steps: 1) using graph-theoretic clustering to separate features into clusters, and 2) selecting the most representative feature from each cluster to form a subset of features. The algorithm aims to efficiently obtain a good feature subset by removing unrelated and redundant features.
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAIJCI JOURNAL
This paper proposes a new method that intends on reducing the size of high dimensional dataset by
identifying and removing irrelevant and redundant features. Dataset reduction is important in the case of
machine learning and data mining. The measure of dependence is used to evaluate the relationship
between feature and target concept and or between features for irrelevant and redundant feature removal.
The proposed work initially removes all the irrelevant features and then a minimum spanning tree of
relevant features is constructed using Prim’s algorithm. Splitting the minimum spanning tree based on the
dependency between features leads to the generation of forests. A representative feature from each of the
forests is taken to form the final feature subset
An unsupervised feature selection algorithm with feature ranking for maximizi...Asir Singh
Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,
efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,
finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The
predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and
redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be
removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm
namely unsupervised learning with ranking based feature selection (FSULR). It removes redundant features by clustering and eliminates
irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this
proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes (NB),
instance based (IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for
classifiers.
A Survey on Constellation Based Attribute Selection Method for High Dimension...IJERA Editor
Attribute Selection is an important topic in Data Mining, because it is the effective way for reducing dimensionality, removing irrelevant data, removing redundant data, & increasing accuracy of the data. It is the process of identifying a subset of the most useful attributes that produces compatible results as the original entire set of attribute. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group called a cluster are more similar in some sense or another to each other than to those in other groups (Clusters). There are various approaches & techniques for attribute subset selection namely Wrapper approach, Filter Approach, Relief Algorithm, Distributional clustering etc. But each of one having some disadvantages like unable to handle large volumes of data, computational complexity, accuracy is not guaranteed, difficult to evaluate and redundancy detection etc. To get the upper hand on some of these issues in attribute selection this paper proposes a technique that aims to design an effective clustering based attribute selection method for high dimensional data. Initially, attributes are divided into clusters by using graph-based clustering method like minimum spanning tree (MST). In the second step, the most representative attribute that is strongly related to target classes is selected from each cluster to form a subset of attributes. The purpose is to increase the level of accuracy, reduce dimensionality; shorter training time and improves generalization by reducing over fitting.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
Feature Selection Algorithm for Supervised and Semisupervised ClusteringEditor IJCATR
This document summarizes a research paper on feature selection algorithms for supervised and semi-supervised clustering. It discusses how semi-supervised learning uses both labeled and unlabeled data for training, between unsupervised and supervised learning. It also describes a fast clustering-based feature selection algorithm (FAST) that works in two steps: 1) using graph-theoretic clustering to separate features into clusters, and 2) selecting the most representative feature from each cluster to form a subset of features. The algorithm aims to efficiently obtain a good feature subset by removing unrelated and redundant features.
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAIJCI JOURNAL
This paper proposes a new method that intends on reducing the size of high dimensional dataset by
identifying and removing irrelevant and redundant features. Dataset reduction is important in the case of
machine learning and data mining. The measure of dependence is used to evaluate the relationship
between feature and target concept and or between features for irrelevant and redundant feature removal.
The proposed work initially removes all the irrelevant features and then a minimum spanning tree of
relevant features is constructed using Prim’s algorithm. Splitting the minimum spanning tree based on the
dependency between features leads to the generation of forests. A representative feature from each of the
forests is taken to form the final feature subset
An unsupervised feature selection algorithm with feature ranking for maximizi...Asir Singh
Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,
efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,
finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The
predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and
redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be
removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm
namely unsupervised learning with ranking based feature selection (FSULR). It removes redundant features by clustering and eliminates
irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this
proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes (NB),
instance based (IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for
classifiers.
A Survey on Constellation Based Attribute Selection Method for High Dimension...IJERA Editor
Attribute Selection is an important topic in Data Mining, because it is the effective way for reducing dimensionality, removing irrelevant data, removing redundant data, & increasing accuracy of the data. It is the process of identifying a subset of the most useful attributes that produces compatible results as the original entire set of attribute. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group called a cluster are more similar in some sense or another to each other than to those in other groups (Clusters). There are various approaches & techniques for attribute subset selection namely Wrapper approach, Filter Approach, Relief Algorithm, Distributional clustering etc. But each of one having some disadvantages like unable to handle large volumes of data, computational complexity, accuracy is not guaranteed, difficult to evaluate and redundancy detection etc. To get the upper hand on some of these issues in attribute selection this paper proposes a technique that aims to design an effective clustering based attribute selection method for high dimensional data. Initially, attributes are divided into clusters by using graph-based clustering method like minimum spanning tree (MST). In the second step, the most representative attribute that is strongly related to target classes is selected from each cluster to form a subset of attributes. The purpose is to increase the level of accuracy, reduce dimensionality; shorter training time and improves generalization by reducing over fitting.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
A survey of modified support vector machine using particle of swarm optimizat...Editor Jacotech
This document summarizes a research paper that proposes a modified support vector machine (MSVM) classification algorithm using particle swarm optimization (PSO) for data classification in data streams. It discusses how new evolving features and concept drift in data streams can decrease the performance of traditional SVM classifiers. The proposed MSVM-PSO technique uses PSO to optimize feature selection and control the evaluation of new evolving features. PSO works in two phases - dynamic population selection and optimization of new evolved features. The methodology and implementation of MSVM-PSO is explained along with experimental results on three datasets showing it improves classification performance over traditional SVM.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document discusses machine learning algorithms and their applications. It begins with an abstract discussing supervised, unsupervised, and reinforcement learning techniques. It then discusses machine learning in more detail, explaining that machine learning algorithms represent data instances with a set of features and classify instances based on their labels. The main focus is on supervised and unsupervised learning techniques and their performance parameters. It provides an overview of support vector machines, neural networks, and other machine learning algorithms. In summary, the document provides a survey of different machine learning techniques, how they work, and their applications.
This document compares using genetic algorithm (GA) optimization with artificial neural networks (ANN) and support vector machines (SVM) for intrusion detection. It first describes ANN, SVM, and GA techniques. It then applies GA to optimize the feature selection and classification performed by ANN and SVM on the KDD Cup 99 intrusion detection dataset. The results show that GA improved the performance of both ANN and SVM classifiers, achieving 100% detection rates. Specifically, GA-ANN achieved the highest detection rate using the fewest number of features (100% detection using only 18 features), demonstrating GA's greater effectiveness at optimizing ANN compared to SVM.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
Survey on semi supervised classification methods and feature selectioneSAT Journals
Abstract Data mining also called knowledge discovery is a process of analyzing data from several perspective and summarize it into useful information. It has tremendous application in the area of classification like pattern recognition, discovering several disease type, analysis of medical image, recognizing speech, for identifying biometric, drug discovery etc. This is a survey based on several semisupervised classification method used by classifiers , in this both labeled and unlabeled data can be used for classification purpose.It is less expensive than other classification methods . Different techniques surveyed in this paper are low density separation approach, transductive SVM, semi-supervised based logistic discriminate procedure, self training nearest neighbour rule using cut edges, self training nearest neighbour rule using cut edges. Along with classification methods a review about various feature selection methods is also mentioned in this paper. Feature selection is performed to reduce the dimension of large dataset. After reducing attribute the data is given for classification hence the accuracy and performance of classification system can be improved .Several feature selection method include consistency based feature selection, fuzzy entropy measure feature selection with similarity classifier, Signal to noise ratio,Positive approximation. So each method has several benefits. Index Terms: Semisupervised classification, Transductive support vector machine, Feature selection, unlabeled samples
We conducted comparative analysis of different supervised dimension reduction techniques by integrating a set of different data splitting algorithms and demonstrate the relative efficacy of learning algorithms dependence of sample complexity. The issue of sample complexity discussed in the dependence of data splitting algorithms. In line with the expectations, every supervised learning classifier demonstrated different capability for different data splitting algorithms and no way to calculate overall ranking of techniques was directly available. We specifically focused the classifier ranking dependence of data splitting algorithms and devised a model built on weighted average rank Weighted Mean Rank Risk Adjusted Model (WMRRAM) for consent ranking of learning classifier algorithms.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
Feature selection and classification task are an essential process in dealing with large data sets that
comprise numerous number of input attributes. There are many search methods and classifiers that have
been used to find the optimal number of attributes. The aim of this paper is to find the optimal set of
attributes and improve the classification accuracy by adopting ensemble rule classifiers method. Research
process involves 2 phases; finding the optimal set of attributes and ensemble classifiers method for
classification task. Results are in terms of percentage of accuracy and number of selected attributes and
rules generated. 6 datasets were used for the experiment. The final output is an optimal set of attributes
with ensemble rule classifiers method. The experimental results conducted on public real dataset
demonstrate that the ensemble rule classifiers methods consistently show improve classification accuracy
on the selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
Survey on Supervised Method for Face Image Retrieval Based on Euclidean Dist...Editor IJCATR
This document summarizes various supervised methods for face image retrieval based on Euclidean distance. It discusses literature on active shape models, principal component analysis, linear discriminant analysis, locality-constrained linear coding, bag-of-words models, local binary patterns, and support vector machines. It evaluates support vector machines as the best classifier for face image retrieval systems due to its ability to significantly reduce the need for labeled training data and accurately classify faces, proteins, and characters. The document concludes that a content-based face retrieval system using support vector machines improves detection performance by retrieving similar faces from a database based on Euclidean distance calculations between local binary pattern features of the query and database images.
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
11.software modules clustering an effective approach for reusabilityAlexander Decker
This document summarizes previous work on using clustering techniques for software module classification and reusability. It discusses hierarchical clustering and non-hierarchical clustering methods. Previous studies have used these techniques for software component classification, identifying reusable software modules, course clustering based on industry needs, mobile phone clustering based on attributes, and customer clustering based on electricity load. The document provides background on clustering analysis and its uses in various domains including software testing, pattern recognition, and software restructuring.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Effective Feature Selection for Feature Possessing Group Structurerahulmonikasharma
This document proposes a new method called efficient group variable selection (EGVS) for feature selection when features have a group structure. EGVS has two stages: 1) within-group variable selection evaluates each feature individually to select discriminative features within each group. 2) Between-group variable selection re-evaluates all features to remove redundancy and obtain an optimal subset by considering relationships between groups. The method is demonstrated on benchmark datasets, showing it increases classification accuracy by leveraging the group structure during feature selection.
Classification problems specified in high dimensional data with smallnumber of observation are generally becoming common in specific microarray data. In the time of last two periods of years, manyefficient classification standard models and also Feature Selection (FS) algorithm which isalso referred as FS technique have basically been proposed for higher prediction accuracies. Although, the outcome of FS algorithm related to predicting accuracy is going to be unstable over the variations in considered trainingset, in high dimensional data. In this paperwe present a latest evaluation measure Q-statistic that includes the stability of the selected feature subset in inclusion to prediction accuracy. Then we are going to propose the standard Booster of a FS algorithm that boosts the basic value of the preferred Q-statistic of the algorithm applied. Therefore study on synthetic data and 14 microarray data sets shows that Booster boosts not only the value of Q-statistics but also the prediction accuracy of the algorithm applied.
This document discusses using machine learning clustering algorithms to analyze stock market data. It compares the K-means, COBWEB, DBSCAN, EM and OPTICS clustering algorithms in the WEKA tool on a stock market dataset containing 420 instances and 6 attributes. The K-means algorithm had the best performance with the lowest error and fastest runtime. It clustered the data into 4 groups in 0.16 seconds. The COBWEB algorithm clustered the data into 107 groups in 27.88 seconds. The DBSCAN algorithm found 21 clusters in 3.97 seconds. The paper concludes that K-means is best suited for stock market data mining applications due to its simplicity and speed compared to other algorithms.
This document discusses the design and implementation of a decoupling network for phased array antennas. It begins with an introduction explaining that decoupling networks allow the input ports of an antenna array to be matched and decoupled independently, without loss or reciprocity. It then describes three main approaches to decoupling network design: eigen analysis networks, multiport coupling networks, and mutual coupling coefficients methods. The document concludes by summarizing the simulation results and benefits of the decoupling network in improving antenna performance and compensating for mutual coupling effects between array elements.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A survey of modified support vector machine using particle of swarm optimizat...Editor Jacotech
This document summarizes a research paper that proposes a modified support vector machine (MSVM) classification algorithm using particle swarm optimization (PSO) for data classification in data streams. It discusses how new evolving features and concept drift in data streams can decrease the performance of traditional SVM classifiers. The proposed MSVM-PSO technique uses PSO to optimize feature selection and control the evaluation of new evolving features. PSO works in two phases - dynamic population selection and optimization of new evolved features. The methodology and implementation of MSVM-PSO is explained along with experimental results on three datasets showing it improves classification performance over traditional SVM.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document discusses machine learning algorithms and their applications. It begins with an abstract discussing supervised, unsupervised, and reinforcement learning techniques. It then discusses machine learning in more detail, explaining that machine learning algorithms represent data instances with a set of features and classify instances based on their labels. The main focus is on supervised and unsupervised learning techniques and their performance parameters. It provides an overview of support vector machines, neural networks, and other machine learning algorithms. In summary, the document provides a survey of different machine learning techniques, how they work, and their applications.
This document compares using genetic algorithm (GA) optimization with artificial neural networks (ANN) and support vector machines (SVM) for intrusion detection. It first describes ANN, SVM, and GA techniques. It then applies GA to optimize the feature selection and classification performed by ANN and SVM on the KDD Cup 99 intrusion detection dataset. The results show that GA improved the performance of both ANN and SVM classifiers, achieving 100% detection rates. Specifically, GA-ANN achieved the highest detection rate using the fewest number of features (100% detection using only 18 features), demonstrating GA's greater effectiveness at optimizing ANN compared to SVM.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
Survey on semi supervised classification methods and feature selectioneSAT Journals
Abstract Data mining also called knowledge discovery is a process of analyzing data from several perspective and summarize it into useful information. It has tremendous application in the area of classification like pattern recognition, discovering several disease type, analysis of medical image, recognizing speech, for identifying biometric, drug discovery etc. This is a survey based on several semisupervised classification method used by classifiers , in this both labeled and unlabeled data can be used for classification purpose.It is less expensive than other classification methods . Different techniques surveyed in this paper are low density separation approach, transductive SVM, semi-supervised based logistic discriminate procedure, self training nearest neighbour rule using cut edges, self training nearest neighbour rule using cut edges. Along with classification methods a review about various feature selection methods is also mentioned in this paper. Feature selection is performed to reduce the dimension of large dataset. After reducing attribute the data is given for classification hence the accuracy and performance of classification system can be improved .Several feature selection method include consistency based feature selection, fuzzy entropy measure feature selection with similarity classifier, Signal to noise ratio,Positive approximation. So each method has several benefits. Index Terms: Semisupervised classification, Transductive support vector machine, Feature selection, unlabeled samples
We conducted comparative analysis of different supervised dimension reduction techniques by integrating a set of different data splitting algorithms and demonstrate the relative efficacy of learning algorithms dependence of sample complexity. The issue of sample complexity discussed in the dependence of data splitting algorithms. In line with the expectations, every supervised learning classifier demonstrated different capability for different data splitting algorithms and no way to calculate overall ranking of techniques was directly available. We specifically focused the classifier ranking dependence of data splitting algorithms and devised a model built on weighted average rank Weighted Mean Rank Risk Adjusted Model (WMRRAM) for consent ranking of learning classifier algorithms.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
Feature selection and classification task are an essential process in dealing with large data sets that
comprise numerous number of input attributes. There are many search methods and classifiers that have
been used to find the optimal number of attributes. The aim of this paper is to find the optimal set of
attributes and improve the classification accuracy by adopting ensemble rule classifiers method. Research
process involves 2 phases; finding the optimal set of attributes and ensemble classifiers method for
classification task. Results are in terms of percentage of accuracy and number of selected attributes and
rules generated. 6 datasets were used for the experiment. The final output is an optimal set of attributes
with ensemble rule classifiers method. The experimental results conducted on public real dataset
demonstrate that the ensemble rule classifiers methods consistently show improve classification accuracy
on the selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
Survey on Supervised Method for Face Image Retrieval Based on Euclidean Dist...Editor IJCATR
This document summarizes various supervised methods for face image retrieval based on Euclidean distance. It discusses literature on active shape models, principal component analysis, linear discriminant analysis, locality-constrained linear coding, bag-of-words models, local binary patterns, and support vector machines. It evaluates support vector machines as the best classifier for face image retrieval systems due to its ability to significantly reduce the need for labeled training data and accurately classify faces, proteins, and characters. The document concludes that a content-based face retrieval system using support vector machines improves detection performance by retrieving similar faces from a database based on Euclidean distance calculations between local binary pattern features of the query and database images.
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
11.software modules clustering an effective approach for reusabilityAlexander Decker
This document summarizes previous work on using clustering techniques for software module classification and reusability. It discusses hierarchical clustering and non-hierarchical clustering methods. Previous studies have used these techniques for software component classification, identifying reusable software modules, course clustering based on industry needs, mobile phone clustering based on attributes, and customer clustering based on electricity load. The document provides background on clustering analysis and its uses in various domains including software testing, pattern recognition, and software restructuring.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Effective Feature Selection for Feature Possessing Group Structurerahulmonikasharma
This document proposes a new method called efficient group variable selection (EGVS) for feature selection when features have a group structure. EGVS has two stages: 1) within-group variable selection evaluates each feature individually to select discriminative features within each group. 2) Between-group variable selection re-evaluates all features to remove redundancy and obtain an optimal subset by considering relationships between groups. The method is demonstrated on benchmark datasets, showing it increases classification accuracy by leveraging the group structure during feature selection.
Classification problems specified in high dimensional data with smallnumber of observation are generally becoming common in specific microarray data. In the time of last two periods of years, manyefficient classification standard models and also Feature Selection (FS) algorithm which isalso referred as FS technique have basically been proposed for higher prediction accuracies. Although, the outcome of FS algorithm related to predicting accuracy is going to be unstable over the variations in considered trainingset, in high dimensional data. In this paperwe present a latest evaluation measure Q-statistic that includes the stability of the selected feature subset in inclusion to prediction accuracy. Then we are going to propose the standard Booster of a FS algorithm that boosts the basic value of the preferred Q-statistic of the algorithm applied. Therefore study on synthetic data and 14 microarray data sets shows that Booster boosts not only the value of Q-statistics but also the prediction accuracy of the algorithm applied.
This document discusses using machine learning clustering algorithms to analyze stock market data. It compares the K-means, COBWEB, DBSCAN, EM and OPTICS clustering algorithms in the WEKA tool on a stock market dataset containing 420 instances and 6 attributes. The K-means algorithm had the best performance with the lowest error and fastest runtime. It clustered the data into 4 groups in 0.16 seconds. The COBWEB algorithm clustered the data into 107 groups in 27.88 seconds. The DBSCAN algorithm found 21 clusters in 3.97 seconds. The paper concludes that K-means is best suited for stock market data mining applications due to its simplicity and speed compared to other algorithms.
This document discusses the design and implementation of a decoupling network for phased array antennas. It begins with an introduction explaining that decoupling networks allow the input ports of an antenna array to be matched and decoupled independently, without loss or reciprocity. It then describes three main approaches to decoupling network design: eigen analysis networks, multiport coupling networks, and mutual coupling coefficients methods. The document concludes by summarizing the simulation results and benefits of the decoupling network in improving antenna performance and compensating for mutual coupling effects between array elements.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document proposes a new multi-agent architecture for Pretty Good Privacy (PGP) to improve its performance. PGP currently uses a hierarchical structure where each component executes sequentially, causing idle time. The proposed architecture assigns each PGP component to an independent agent. Using semaphores, the agents can execute concurrently, eliminating bottlenecks and reducing overall execution time compared to the classic PGP architecture. Experimental results showed the new multi-agent approach runs 30% faster than classic PGP across different hardware configurations.
This document summarizes a research paper on computer-aided diagnosis of macular edema using color fundus images. It discusses how macular edema develops due to damage to blood vessels from diabetes, blurring vision. A computer system is proposed to detect edema in fundus images by comparing them to a reference image using rotational symmetry analysis. Hard exudates indicating edema risk are identified. The system also evaluates edema severity based on exudate proximity to the macula. Feature extraction and classification algorithms are introduced to automatically detect normal vs. edema images and diagnose patient condition.
International Journal of Engineering Research and Applications (IJERA) aims to cover the latest outstanding developments in the field of all Engineering Technologies & science.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Este documento resume las relaciones entre la física y otras ciencias como la astronomía, biología, química, deportes, matemáticas, geología y meteorología. También describe cómo la física es fundamental para la ingeniería ambiental, ya que los ingenieros ambientales aplican conceptos físicos para proyectos como el tratamiento de aguas residuales y residuos.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help boost feelings of calmness, happiness and focus.
La Web 2.0 permite que los usuarios sean activos en lugar de pasivos, participando y contribuyendo al contenido en línea. Algunas características clave incluyen el auge de blogs y redes sociales, así como plataformas que permiten a los usuarios crear y editar contenido por sí mismos.
O poema descreve que a Terra é redonda sem lados, mas ladrões podem ver coisas distorcidas. Também menciona que o Triângulo das Bermudas pode ser perigoso e que sogras nem sempre são bem vistas.
Este documento presenta un análisis de tendencias globales, nacionales y del mercado de alimentos en Venezuela realizado para la empresa Alimentos Todo Fresco, C.A. Identifica riesgos globales como crisis fiscales, desempleo, cambio climático y falta de liderazgo. A nivel nacional, analiza desafíos como la crisis cambiaria, baja productividad agrícola e infraestructura de comunicaciones emergente. Finalmente, examina tendencias de mercado como estudios de segmentación que usa la empresa para adaptar su oferta.
Iaetsd an efficient and large data base using subset selection algorithmIaetsd Iaetsd
The document presents a new feature selection algorithm called FAST (Feature Cluster-based Subset Selection) that aims to efficiently reduce dimensionality by removing irrelevant and redundant features. The FAST algorithm works in two steps: (1) it clusters features using graph theoretic methods, and (2) it selects the most representative feature from each cluster. This clustering-based approach has a high probability of selecting useful and independent features. The algorithm is evaluated on high dimensional datasets and shown to improve learning accuracy while reducing dimensionality compared to other feature selection methods.
The document describes a proposed fast clustering-based feature subset selection (FAST) algorithm for high-dimensional data. The FAST algorithm works in two steps: 1) clustering features using minimum spanning tree methods, and 2) selecting the most representative feature from each cluster. This identifies useful and independent features efficiently. Experimental results on 35 real-world datasets demonstrate that FAST produces smaller feature subsets and improves classifier performance compared to other feature selection algorithms.
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHmIJCI JOURNAL
In machine learning and data mining, attribute sel
ect is the practice of selecting a subset o
f most
consequential attributes for utilize in model const
ruction. Using an attribute select method is that t
he data
encloses many redundant or extraneous attributes. W
here redundant attributes are those which sup
ply
no supplemental information than the presently
selected attributes, and impertinent attribut
es offer
no valuable information in any context
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
Iaetsd an enhanced feature selection forIaetsd Iaetsd
The document discusses feature selection techniques for machine learning applications. It proposes an Enhanced Fast Clustering-based Feature Selection (EFAST) algorithm. The EFAST algorithm works in two steps: 1) features are clustered using graph-theoretic clustering methods, and 2) the most relevant representative feature strongly correlated with the target categories is selected from each cluster to form the optimal feature subset. Features from different clusters are relatively independent, so EFAST has a high chance of selecting a set of useful and independent features. The algorithm was tested on real-world data and showed improved performance over other feature selection methods by reducing features while also improving classifier performance.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A Survey on Classification of Feature Selection Strategiesijtsrd
Feature selection is an important part of machine learning. The Feature selection refers to the process of reducing the inputs for processing and analysis, or of finding the most meaningful inputs. A related term, feature engineering (or feature extraction), refers to the process of extracting useful information or features from existing data. Mining of particular information related to a concept is done on the basis of the feature of the data. The accessing of these features hence for data retrieval can be termed as the feature extraction mechanism. Different type of feature extraction methods is being used. In this paper, the different feature selection methodologies are examined in terms of need and method adopted for feature selection. The three types of method are mainly available, such as Shannons Entropy, Bayesian with K2 Prior and Bayesian Dirichlet with uniform prior (default). The objectives of this survey paper is to identify the existing contribution made by using their above mentioned algorithms and the result obtained. R. Pradeepa | K. Palanivel"A Survey on Classification of Feature Selection Strategies" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-2 , February 2018, URL: http://www.ijtsrd.com/papers/ijtsrd9632.pdf http://www.ijtsrd.com/computer-science/data-miining/9632/a-survey-on-classification-of-feature-selection-strategies/r-pradeepa
A fast clustering based feature subset selection algorithm for high-dimension...JPINFOTECH JAYAPRAKASH
The document proposes a fast clustering-based feature selection algorithm (FAST) to efficiently and effectively select useful feature subsets from high-dimensional data. FAST works in two steps: (1) it clusters features using minimum spanning trees, partitioning clusters so each represents a subset of independent features; (2) it selects the most representative feature from each cluster to form the output subset. Experiments on 35 real-world datasets show FAST not only selects smaller feature subsets but also improves performance of four common classifiers compared to other feature selection methods.
Correlation of artificial neural network classification and nfrs attribute fi...eSAT Journals
Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data mining task, with roots in machine learning. Naïve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques, greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
High dimesional data (FAST clustering ALG) PPTdeepan v
The document presents a feature selection algorithm called FAST (Fast clustering-based feature selection algorithm). FAST uses minimum spanning trees and clustering to identify relevant feature subsets while removing irrelevant and redundant features. This achieves dimensionality reduction and improves the accuracy of learning algorithms. The algorithm was experimentally evaluated on datasets with over 10,000 features and was shown to outperform other feature selection methods in terms of time complexity and selected feature proportions.
Supervised Machine Learning: A Review of Classification ...butest
This document provides an overview of supervised machine learning classification techniques. It discusses 1) general issues in supervised learning such as data preprocessing, feature selection, and algorithm selection, 2) logical/symbolic techniques, 3) perceptron-based techniques, 4) statistical techniques, 5) instance-based learners, 6) support vector machines, and 7) directions for classifier selection. The goal is to describe various supervised machine learning algorithms and provide references for further research rather than provide a comprehensive review of all techniques.
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
This document proposes a mutual information-based feature selection algorithm to select optimal features for network intrusion detection classification. The algorithm aims to handle dependent data features better than previous methods. It evaluates the effectiveness of the algorithm on network intrusion detection cases. Most previous methods suffer from low detection rates and high false alarm rates. The proposed approach uses feature selection, filtering, clustering, and clustering ensemble techniques in a hybrid data mining method to achieve high accuracy for intrusion detection systems.
Feature Subset Selection for High Dimensional Data using Clustering TechniquesIRJET Journal
The document discusses feature subset selection for high dimensional data using clustering techniques. It proposes a FAST algorithm that has three steps: (1) removing irrelevant features, (2) dividing features into clusters, (3) selecting the most representative feature from each cluster. The FAST algorithm uses DBSCAN, a density-based clustering algorithm, to cluster the features. DBSCAN can identify clusters of arbitrary shape and detect noise, making it suitable for high dimensional data. The goal of feature subset selection is to find a small number of discriminative features that best represent the data.
For years, the Machine Learning community has focused on developing efficient
algorithms that can produce very accurate classifiers. However, it is often much easier
to find several good classifiers based on dataset combination, instead of single classifier
applied on deferent datasets. The advantages of using classifier dataset combinations
instead of a single one are twofold: it helps lowering the computational complexity by
using simpler models, and it can improve the classification accuracy and performance.
Most Data mining applications are based on pattern matching algorithms, thus improving
the performance of the classification has a positive impact on the quality of the overall
data mining task. Since combination strategies proved very useful in improving the
performance, these techniques have become very important in applications such as
Cancer detection, Speech Technology and Natural Language Processing .The aim of this
paper is basically to propose proprietary metric, Normalized Geometric Index (NGI)
based on the latent properties of datasets for improving the accuracy of data mining tasks.
This document summarizes an article on using genetic algorithms for feature selection on brain tumor datasets. It discusses different feature selection methods like filter, wrapper and embedded methods. Specifically, it covers forward selection, backward elimination, recursive feature elimination, and genetic algorithms. It then reviews literature applying these various feature selection techniques to brain tumor classification problems. The goal is to identify the most important features to improve the accuracy of brain tumor detection systems.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Best 20 SEO Techniques To Improve Website Visibility In SERP
M43016571
1. Karthikeyan.P et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 3( Version 1), March 2014, pp.65-71
www.ijera.com 65|P a g e
High Dimensional Data Clustering Using Fast Cluster Based
Feature Selection
Karthikeyan.P1
, Saravanan.P2
, Vanitha.E3
1
PG Scholar, Department of Computer & Communication Engineering, PTR CET, Madurai, Tamil Nadu
2
Assistant Professor, Department of Computer Science Engineering, PTR CET, Madurai, Tamil Nadu
3
Assistant Professor, Department of Computer Science Engineering, PTR CET, Madurai, Tamil Nadu
ABSTRACT
Feature selection involves identifying a subset of the most useful features that produces compatible results as the
original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and
effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the
effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based
feature selection algorithm (FAST) is proposed and experimentally evaluated in this paper. The FAST algorithm
works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering
methods. In the second step, the most representative feature that is strongly related to target classes is selected
from each cluster to form a subset of features. Features in different clusters are relatively independent; the
clustering-based strategy of FAST has a high probability of producing a subset of useful and independent
features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST) using the
Kruskal‟s Algorithm clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated
through an empirical study.
Index Terms—Feature subset selection, filter method, feature clustering, graph-based clustering
I. INTRODUCTION
CLASSIFICATION
Data mining refers to "using a variety of
techniques to identify nuggets of information or
decision-making knowledge in bodies of data, and
extracting these in such a way that they can be put to
use in the areas such as decision support, prediction,
forecasting and estimation. The data is often
voluminous, but as it stands of low value as no direct
use can be made of it; it is the hidden information in
the data that is useful”. Data mine tools have to infer
a model from the database, and in the case of
supervised learning this requires the user to define
one or more classes.
The database contains one or more attributes
that denote the class of a tuple and these are known as
predicted attributes whereas the remaining attributes
are called predicting attributes. A combination of
values for the predicted attributes defines a class.
When learning classification rules the system has to
find the rules that predict the class from the
predicting attributes so firstly the user has to define
conditions for each class, the data mine system then
constructs descriptions for the classes. Basically the
system should given a case or tuple with certain
known attribute values be able to predict what class
this case belongs to, once classes are defined the
system should infer rules that govern the
classification therefore the system should be able to
find the description of each class.
With the aim of choosing a subset of good
features with respect to the target concepts, feature
subset selection is an effective way for reducing
dimensionality, removing irrelevant data, increasing
learning accuracy and improving result
comprehensibility.Many feature subset selection
methods have been proposed and studied for machine
learning applications. They can be divided into four
broad categories: the Embedded, Wrapper, Filter, and
Hybrid approaches. The embedded methods
incorporate feature selections a part of the training
process and are usually specific to given learning
algorithms, and therefore maybe more efficient than
the other three categories. Traditional machine
learning algorithms like decision trees or artificial
neural networks are examples of embedded
approaches.
The wrapper methods use the predictive
accuracy of a predetermined learning algorithm to
determine the goodness of the selected subsets, the
accuracy of the learning algorithms is usually high.
However, the generality of the selected features is
limited and the computational complexity is large.
The filter methods are independent of learning
algorithms, with good generality. Their
computational complexity is low, but the accuracy of
the learning algorithms is not guaranteed the hybrid
RESEARCH ARTICLE OPEN ACCESS
2. Karthikeyan.P et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 3( Version 1), March 2014, pp.65-71
www.ijera.com 66|P a g e
methods area combination of filter and wrapper
methods by using a filter method to reduce search
space that will be considered by the subsequent
wrapper. They mainly focus on combining filter and
wrapper methods to achieve the best possible
performance with a particular learning algorithm with
similar time complexity of the filter methods.
The wrapper methods are computationally
expensive and tend to overfit on small training sets.
The filter methods, in addition to their generality, are
usually a good choice when the number of features is
very large. Thus, we will focus on the filter method in
this paper.
1.1 System Architecture
Fig 1Architecture of Proposed Method
II. CLUSTERING
Clustering and segmentation are the
processes of creating a partition so that all the
members of each set of the partition are similar
according to some metric. A cluster is a set of objects
grouped together because of their similarity or
proximity. Objects are often decomposed into an
exhaustive and/or mutually exclusive set of clusters.
Clustering according to similarity is a very powerful
technique, the key to it being to translate some
intuitive measure of similarity into a quantitative
measure. When learning is unsupervised then the
system has to discover its own classes i.e. the system
clusters the data in the database. The system has to
discover subsets of related objects in the training set
and then it has to find descriptions that describe each
of these subsets. There are a number of approaches
for forming clusters. One approach is to form rules
which dictate membership in the same group based
on the level of similarity between members. Another
approach is to build set functions that measure some
property of partitions as functions of some parameter
of the partition.
III. FEATURE SELECTION
It is widely recognized that a large number
of features can adversely affect the performance of
inductive learning algorithms, and clustering is not an
exception. However, while there exists a large body
of literature devoted to this problem for supervised
learning task, feature selection for clustering has been
rarely addressed. The problem appears to be a
difficult one given that it inherits all the uncertainties
that surround this type of inductive learning.
Particularly, that there is not a single performance
measure widely accepted for this task and the lack of
supervision available.
In machine learning and statistics, feature
selection, also known as variable selection, attribute
selection or variable subset selection, is the process of
selecting a subset of relevant features for use in
model construction. The central assumption when
using a feature selection technique is that the data
contains many redundant or irrelevant features.
Redundant features are those which provide no more
information than the currently selected features, and
irrelevant features provide no useful information in
any context. Feature selection techniques are a subset
of the more general field of feature extraction.
Feature extraction creates new features from
functions of the original features, whereas feature
selection returns a subset of the features. Feature
selection techniques are often used in domains where
there are many features and comparatively few
samples (or data points). The archetypal case is the
use of feature selection in analyzing DNA
microarrays, where there are many thousands of
features, and a few tens to hundreds of samples.
Feature selection techniques provide three main
benefits when constructing predictive models
Improved model interpretability,
Shorter training times,
Enhanced generalization by reducing over fitting.
Feature selection is also useful as part of the
data analysis process, as shows which features are
important for prediction, and how these features are
related. With such an aim of choosing a subset of
good features with respect to the target concepts,
feature subset selection is an effective way for
reducing dimensionality, removing irrelevant data,
increasing learning accuracy, and improving result
comprehensibility. Irrelevant features, along with
3. Karthikeyan.P et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 3( Version 1), March 2014, pp.65-71
www.ijera.com 67|P a g e
redundant features, severely affect the accuracy of the
learning machines. Thus, feature subset selection
should be able to identify and remove as much of the
irrelevant and redundant information as possible.
Moreover, “good feature subsets contain features
highly correlated with (predictive of) the class, yet
uncorrelated with (not predictive of) each
other.”Many feature subset selection methods have
been proposed and studied for machine learning
applications. They can be divided into four broad
categories: the Embedded, Wrapper, Filter, and
Hybrid approaches
3.1 Wrapper Filter
Wrapper methods are widely recognized as a
superior alternative in supervised learning problems,
since by employing the inductive algorithm to
evaluate alternatives they have into account the
particular biases of the algorithm. How- ever, even
for algorithms that exhibit a moderate complexity, the
number of executions that the search process requires
results in a high computational cost, especially as we
shift to more exhaustive search strategies. The
wrapper methods use the predictive accuracy of a
predetermined learning algorithm to determine the
goodness of the selected subsets, the accuracy of the
learning algorithms is usually high. However, the
generality of the selected features is limited and the
computational complexity is large. The filter methods
are independent of learning algorithms, with good
generality. Their computational complexity is low,
but the accuracy of the learning algorithms is not
guaranteed
3.2 Hybrid Approach
The hybrid methods are a combination of
filter and wrapper methods by using a filter method to
reduce search space that will be considered by the
subsequent wrapper. They mainly focus on
combining filter and wrapper methods to achieve the
best possible performance with a particular learning
algorithm with similar time complexity of the filter
methods.
In cluster analysis, graph-theoretic methods
have been well studied and used in many
applications. Their results have, sometimes, the best
agreement with human performance. The general
graph-theoretic clustering is simple: compute a
neighborhood graph of instances, then delete any
edge in the graph that is much longer/shorter
(according to some criterion) than its neighbors. The
result is a forest and each tree in the forest represents
a cluster. In our study, we apply graph-theoretic
clustering methods to features. In particular, we adopt
the minimum spanning tree (MST)-based clustering
algorithms, because they do not assume that data
points are grouped around centers or separated by a
regular geometric curve and have been widely used in
practice.
Based on the MST method, we propose a
Fast clustering based feature Selection algorithm
(FAST). The FAST algorithm works in two steps. In
the first step, features are divided into clusters by
using graph-theoretic clustering methods. In the
second step, the most representative feature that is
strongly related to target classes is selected from each
cluster to form the final subset of features.Features in
different clusters are relatively independent; the
clustering based strategy of FAST has a high
probability of producing a subset of useful and
independent features. The proposed feature subset
selection algorithm FAST was tested various
numerical data sets. The experimental results show
that, compared with other five different types of
feature subset selection algorithms, the proposed
algorithm not only reduces the number of features,
but also improves the classification accuracy.
3.3 Using Mutual Information for Selecting
Features in Supervised Neural Net Learning
Investigates the application of the mutual in
for “criterion to evaluate a set of candidate features
and to select an informative subset to be used as input
data for a neural network classifier. Because the
mutual information measures arbitrary dependencies
between random variables, it is suitable for assessing
the “information content” of features in complex
classification tasks, where methods bases on linear
relations (like the correlation) are prone to mistakes.
The fact that the mutual information is
independent of the coordinates chosen permits a
robust estimation. Nonetheless, the use of the mutual
information for tasks characterized by high input
dimensionality requires suitable approximations
because of the prohibitive demands on computation
and samples. An algorithm is proposed that is based
on a “greedy” selection of the features and that takes
both the mutual information with respect to the output
class and with respect to the already-selected features
into account. Finally the results of a series of
experiments are discussed.
During “preprocessing” stage, where an
appropriate number of relevant features are extracted
from the raw data, has a crucial impact both on the
complexity of the learning phase and on the
achievable generalization performance. While it is
essential that the information contained in the input
vector is sufficient to determine the output class, the
presence of too many input features can burden the
training process and can produce a neural network
with more connection weights that those required by
the problem
4. Karthikeyan.P et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 3( Version 1), March 2014, pp.65-71
www.ijera.com 68|P a g e
A major weakness of these methods is that
they are not invariant under a transformation of the
variables. For example a linear scaling of the input
variables (that may be caused by a change of units for
the measurements) is sufficient to modify the PCA
results. Feature selection methods that are sufficient
for simple distributions of the patterns belonging to
different classes can fail in classification tasks with
complex decision boundaries. In addition, methods
based on a linear dependence (like the correlation)
cannot take care of arbitrary relations between the
pattern coordinates and the different classes. On the
contrary, the mutual information can measure
arbitrary relations between variables and it does not
depend on transformations acting on the different
variables.
Our objective was less ambitious, because
only the first of the above options was considered
(leaving the second for the capabilities of the neural
net to build complex features from simple ones). We
assumed that a set of candidate features with globally
sufficient information is available and that the
problem is that of extracting from this set a suitable
subset that is sufficient for the task, thereby reducing
the processing times in the operational phase and,
possibly, the training times and the cardinality of the
example set needed for a good generalization.
In particular we were interested in the
applicability of the mutual information measure. For
this reason we considered the estimation of the MI
from a finite set of samples, showing that the MI for
different features is over-estimated in approximately
the same way. This estimation is the building block of
the MIFS algorithm, where the features are selected
in a “greedy” manner, ranking them according to
their MI with respect to the class discounted by a
term that takes the mutual dependencies into account.
3.4 On Feature Selection through Clustering
The algorithm for feature selection that
clusters attributes using a special metric and then
makes use of the dendrogram of the resulting cluster
hierarchy to choose the most relevant attributes. The
main interest of our technique resides in the improved
understanding of the structure of the analyzed data
and of the relative importance of the at-tributes for
the selection process.
The performance, robustness, and usefulness
of classification algorithms are improved when
relatively few features are involved in the
classification. Thus, selecting relevant features for the
construction of classifiers has received a great deal of
attention. The central idea of this work is to introduce
an algorithm for feature selection that clusters
attributes using a special metric and, then uses a
hierarchical clustering for feature selection.
Hierarchical algorithms generate clusters that are
placed in a cluster tree, which i s commonly known
as a dendrogram. Clustering‟s are obtained by
extracting t hose clusters that are situated at a given
height in this tree. It shows that good classifiers can
be built by using a small number of attributes located
at the centers of the clusters identified in the
dendrogram. This type of data compression can be
achieved with little or no penalty in terms of the
accuracy of the classifier produced and highlights the
relative importance of attributes.
Clustering‟s were extracted from the tree
produced by the algorithm by cutting the tree at
various heights starting with the maximum height of
the tree created above (corresponding t o a single
cluster) and working down t o a height of 0 (which
consists of single-attribute clusters). A
„representative‟ attribute was created for each cluster
as the attribute that has the minimum total distance to
the other members of the cluster, again using the
Barth ´elemyMontjardet distance. A similar study
was undertaken f or the zoo database, after
eliminating the attribute animal which determines
uniquely the type of the animal. These results suggest
that this method has comparable accuracy to the
wrapper method and CSF. However, the tree of
attributes helps to understand the relationships
between attributes and their relative importance.
Attribute clustering help to build classifiers
in a semi-supervised manner allowing analysts a
certain degree of choice in the s election of the
features that may be considered by classifiers, and
illuminating relationships between attributes and their
relative importance for classification. With the
increased interest of data miners in n bio-computing
in n general, and in microarray data in particular,
classification problems that involve thousands of
features and relatively few examples came t o t he
fore. We intend to apply our techniques to this type of
data.
IV. IRRELEVANT FEATURES
REMOVAL
Irrelevant features, along with redundant
features, severely affect the accuracy of the learning
machines. Thus, feature subset selection should be
able to identify and remove as much of the irrelevant
and redundant information as possible. Moreover,
“good feature subsets contain features highly
correlated with (predictive of) the class, yet
uncorrelated with (not predictive of) each other.”
Keeping these in mind, we develop a novel algorithm
which can efficiently and effectively deal with both
irrelevant and redundant features, and obtain a good
feature subset. We achieve this through a new feature
selection framework which composed of the two
5. Karthikeyan.P et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 3( Version 1), March 2014, pp.65-71
www.ijera.com 69|P a g e
connected components of irrelevant feature removal
and redundant feature elimination. The former
obtains features relevant to the target concept by
eliminating irrelevant ones, and the latter removes
redundant features from relevant ones via choosing
representatives from different feature clusters, and
thus produces the final subset.
The irrelevant feature removal is
straightforward once the right relevance measure is
defined or selected, while the redundant feature
elimination is a bit of sophisticated. In our proposed
FAST algorithm, it involves 1) the construction of the
minimum spanning tree from a weighted complete
graph; 2) the partitioning of the MST into a forest
with each tree representing a cluster; and 3) the
selection of representative features from the clusters.
4.1 Load Data and Classify
Load the data into the process. The data has
to be preprocessed for removing missing values,
noise and outliers. Then the given dataset must be
converted into the arff format which is the standard
format for WEKA toolkit. From the arff format, only
the attributes and the values are extracted and stored
into the database. By considering the last column of
the dataset as the class attribute and select the distinct
class labels from that and classify the entire dataset
with respect to class labels.
4.2 Information Gain Computation
Relevant features have strong correlation
with target concept so are always necessary for a best
subset, while redundant features are not because their
values are completely correlated with each other.
Thus, notions of feature redundancy and feature
relevance are normally in terms of feature correlation
and feature-target concept correlation.
To find the relevance of each attribute with
the class label, Information gain is computed in this
module. This is also said to be Mutual Information
measure. Mutual information measures how much the
distribution of the feature values and target classes
differ from statistical independence. This is a
nonlinear estimation of correlation between feature
values or feature values and target classes. The
symmetric uncertainty (SU) is derived from the
mutual information by normalizing it to the entropies
of feature values or feature values and target classes,
and has been used to evaluate the goodness of
features for classification
The symmetric uncertainty is defined as follows:
𝐺𝑎𝑖𝑛 𝑋 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌
= 𝐻(𝑌) − 𝐻(𝑌│𝑋)
To calculate gain, we need to find the entropy and
conditional entropy values. The equations for that are
given below:
𝐻 𝑋 = − 𝑝 𝑥 𝑙𝑜𝑔2 𝑝(𝑥)
𝑥∈𝑋
𝐻 𝑋 𝑌 = − 𝑝 𝑦 𝑝 𝑥 𝑦
𝑥𝜖𝑋𝑦𝜖𝑌
𝑙𝑜𝑔2 𝑝(𝑥|𝑦)
Where p(x) is the probability density function and p
(x|y) is the conditional probability density function.
4.3 T-Relevance Calculation
The relevance between the feature Fi € F and the
target concept C is referred to as the T-Relevance of
Fi and C, and denoted by SU(Fi,C). If SU(Fi,C) is
greater than a predetermined threshold, we say that Fi
is a strong T-Relevance feature.
𝑆𝑈 𝑋, 𝑌 =
2 × 𝐺𝑎𝑖𝑛 𝑋 𝑌
𝐻 𝑋 + 𝐻 𝑌
After finding the relevance value, the redundant
attributes will be removed with respect to the
threshold value.
4.4 F-Correlation Calculation
The correlation between any pair of features Fi
and Fj (Fi,Fj € ^ F ^ i ≠ j) is called the F-Correlation
of Fi and Fj, and denoted by SU(Fi, Fj). The equation
symmetric uncertainty which is used for finding the
relevance between the attribute and the class is again
applied to find the similarity between two attributes
with respect to each label.
4.5 MST Construction
With the F-Correlation value computed above,
the Minimum Spanning tree is constructed. For that,
we use Kruskal‟s algorithm which form MST
effectively.
Kruskal's algorithm is a greedy algorithm in
graph theory that finds a minimum spanning tree for a
connected weighted graph. This means it finds a
subset of the edges that forms a tree that includes
every vertex, where the total weight of all the edges
in the tree is minimized. If the graph is not connected,
then it finds a minimum spanning forest (a minimum
spanning tree for each connected component).
Description:
1. Create a forest F (a set of trees), where each
vertex in the graph is a separate tree.
2. Create a set S containing all the edges in the
graph
3. While S is nonempty and F is not yet spanning
Remove an edge with minimum weight from S
If that edge connects two different trees, then add
it to the forest, combining two trees into a single
tree
6. Karthikeyan.P et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 3( Version 1), March 2014, pp.65-71
www.ijera.com 70|P a g e
Otherwise discard that edge.
At the termination of the algorithm, the
forest forms a minimum spanning forest of the graph.
If the graph is connected, the forest has a single
component and forms a minimum spanning tree. The
sample tree is as follows,
Fig 2. Correlations
ALGORITHM
_________________
inputs: D( 𝐹1, 𝐹2, ..., 𝐹𝑚, 𝐶) - the given data set
𝜃- the T-Relevance threshold.
output: S - selected feature subset .
//==== Part 1 : Irrelevant Feature Removal ====
1 for i = 1 to m do
2 T-Relevance = SU ( 𝐹𝑖, 𝐶)
3 if T-Relevance > 𝜃then
4 S = S ∪ { 𝐹𝑖};
//==== Part 2: Minimum Spanning Tree
Construction ====
5 G = NULL; //G is a complete graph
6 for each pair of features { 𝐹′ 𝑖, 𝐹′ 𝑗} ⊂ S do
7 F-Correlation = SU ( 𝐹′, 𝐹′ 𝑗)
8 𝐴𝑑𝑑𝐹′ 𝑖𝑎𝑛𝑑/ 𝑜𝑟𝐹′ 𝑗𝑡𝑜𝐺𝑤𝑖𝑡F-Correlation
𝑎𝑠𝑡𝑒𝑤𝑒𝑖𝑔𝑡𝑜𝑓𝑡𝑒𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔𝑒𝑑𝑔𝑒;
9 minSpanTree = KRUSKALS(G); //Using
KRUSKALS Algorithm to generate theminimum
spanning tree
//==== Part 3: Tree Partition and
Representative Feature Selection ====
10 Forest = minSpanTree
11 for each edge ∈Forest do
12 if
SU( 𝐹′ 𝑖, 𝐹′ 𝑗) <SU( 𝐹′ 𝑖, 𝐶) ∧SU( 𝐹′ 𝑖, 𝐹′ 𝑗) <SU( 𝐹′ 𝑗, 𝐶)
then
13 Forest = Forest − 𝐸𝑖𝑗
14 S = 𝜙
15 for each tree ∈Forest do
16 𝐹𝑗 𝑅= argmax 𝐹′ 𝑘∈𝑇𝑖SU( 𝐹′ 𝑘, 𝐶)
17 S = S ∪ { 𝐹𝑗𝑅};
18 returnS
In this tree, the vertices represent the
relevance value and the edges represent the F-
Correlation value. The complete graph G reflects the
correlations among all the target-relevant features.
Unfortunately, graph G has k vertices and k(k-1)/2
edges. For high-dimensional data, it is heavily dense
and the edges with different weights are strongly
interwoven. Moreover, the decomposition of
complete graph is NP-hard. Thus for graph G, we
build an MST, which connects all vertices such that
the sum of the weights of the edges is the minimum,
using the well knownKruskal algorithm. The weight
of edge (Fi`,Fj`) is F-Correlation SU(Fi`,Fj`).
4.6 Cluster Formation
After building the MST, in the third step, we first
remove the edges whose weights are smaller than
both of the T-Relevance SU(Fi`, C) and SU(Fj`, C),
from the MST. After removing all the unnecessary
edges, a forest Forest is obtained. Each tree Tj €
Forest represents a cluster that is denoted as V (Tj),
which is the vertex set of Tj as well. As illustrated
above, the features in each cluster are redundant, so
for each cluster V (Tj) we choose a representative
feature Fj R who‟s T-Relevance SU(Fj R,C) is the
greatest.
V. EXPERIMENTAL RESULTS
Fig 3.Dataset Loading
Fig 4. Dataset Conversion
Fig 5. Entropy & Gain Values
7. Karthikeyan.P et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 3( Version 1), March 2014, pp.65-71
www.ijera.com 71|P a g e
Fig 6. F- Correlation & Relevance
VI. CONCLUSION AND FUTURE
WORK
In this Project present a FAST clustering-
based feature subset selection algorithm for high
dimensional data. The algorithm involves 1)
removing irrelevant features, 2) constructing a
minimum spanning tree from relative ones, and 3)
partitioning the MST and selecting representative
features. In the proposed algorithm, a cluster consists
of features. Each cluster is treated as a single feature
and thus dimensionality is drastically reduced. The
text data from the four different aspects of the
proportion of selected features, run time,
classification accuracy of a given
classifier.Clustering-based feature subset selection
algorithm for high dimensional data. For the future
work, we plan to explore different types of
correlation measures, and study some formal
properties of feature space. In feature we are going to
classify the high dimensional data.
REFERENCES
[1] Almuallim H. and Dietterich T.G.,
Algorithms for Identifying Relevant
Features, In Proceedings of the 9th Canadian
Conference on AI, pp 38-45,1992.
[2] Bell D.A. and Wang, H., A formalism for
relevance and its application infeature subset
selection, Machine Learning, 41(2), pp 175-
195, 2000.
[3] Biesiada J. and Duch W., Features election
for high-
dimensionaldatałaPearsonredundancy based
filter, Advances inSoftComputing, 45, pp
242C249,2008.
[4] Dash M., Liu H. and Motoda H.,
Consistency based feature Selection,
InProceedings of the Fourth Pacific Asia
Conference on Knowledge Discoveryand
Data Mining, pp 98-109, 2000.
[5] Das S., Filters, wrappers and a boosting-
based hybrid for feature Selection,In
Proceedings of the Eighteenth International
Conference on MachineLearning, pp 74-81,
2001.
[6] Dash M. and Liu H., Consistency-based
search in feature selection. Artificial
Intelligence, 151(1-2), pp 155-176, 2003.
[7] Demsar J., Statistical comparison of
classifiers over multiple data sets, J.Mach.
Learn. Res., 7, pp 1-30, 2006.
[8] Fleuret F., Fast binary feature selection with
conditional mutual Information,Journal of
Machine Learning Research, 5, pp 1531-
1555, 2004.
[9] Forman G., An extensive empirical study of
feature selection metrics fortext
classification, Journal of Machine Learning
Research, 3, pp 1289-1305,2003.
[10] Garcia S and Herrera F., An extension on
“Statistical Comparisons ofClassifiers over
Multiple Data Sets” for all pairwise
comparisons, J. Mach.Learn. Res., 9, pp
2677-2694, 2008.
[11] Guyon I. and Elisseeff A., An introduction
to variable and feature selection,Journal of
Machine Learning Research, 3, pp 1157-
1182, 2003.
[12] Hall M.A., Correlation-Based Feature
Selection for Discrete and NumericClass
Machine Learning, In Proceedings of 17th
International Conferenceon Machine
Learning, pp 359-366, 2000.