Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,
efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,
finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The
predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and
redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be
removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm
namely unsupervised learning with ranking based feature selection (FSULR). It removes redundant features by clustering and eliminates
irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this
proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes (NB),
instance based (IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for
classifiers.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
We conducted comparative analysis of different supervised dimension reduction techniques by integrating a set of different data splitting algorithms and demonstrate the relative efficacy of learning algorithms dependence of sample complexity. The issue of sample complexity discussed in the dependence of data splitting algorithms. In line with the expectations, every supervised learning classifier demonstrated different capability for different data splitting algorithms and no way to calculate overall ranking of techniques was directly available. We specifically focused the classifier ranking dependence of data splitting algorithms and devised a model built on weighted average rank Weighted Mean Rank Risk Adjusted Model (WMRRAM) for consent ranking of learning classifier algorithms.
Iaetsd an enhanced feature selection forIaetsd Iaetsd
The document discusses feature selection techniques for machine learning applications. It proposes an Enhanced Fast Clustering-based Feature Selection (EFAST) algorithm. The EFAST algorithm works in two steps: 1) features are clustered using graph-theoretic clustering methods, and 2) the most relevant representative feature strongly correlated with the target categories is selected from each cluster to form the optimal feature subset. Features from different clusters are relatively independent, so EFAST has a high chance of selecting a set of useful and independent features. The algorithm was tested on real-world data and showed improved performance over other feature selection methods by reducing features while also improving classifier performance.
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Survey on semi supervised classification methods and feature selectioneSAT Journals
Abstract Data mining also called knowledge discovery is a process of analyzing data from several perspective and summarize it into useful information. It has tremendous application in the area of classification like pattern recognition, discovering several disease type, analysis of medical image, recognizing speech, for identifying biometric, drug discovery etc. This is a survey based on several semisupervised classification method used by classifiers , in this both labeled and unlabeled data can be used for classification purpose.It is less expensive than other classification methods . Different techniques surveyed in this paper are low density separation approach, transductive SVM, semi-supervised based logistic discriminate procedure, self training nearest neighbour rule using cut edges, self training nearest neighbour rule using cut edges. Along with classification methods a review about various feature selection methods is also mentioned in this paper. Feature selection is performed to reduce the dimension of large dataset. After reducing attribute the data is given for classification hence the accuracy and performance of classification system can be improved .Several feature selection method include consistency based feature selection, fuzzy entropy measure feature selection with similarity classifier, Signal to noise ratio,Positive approximation. So each method has several benefits. Index Terms: Semisupervised classification, Transductive support vector machine, Feature selection, unlabeled samples
Neighborhood search methods with moth optimization algorithm as a wrapper met...IJECEIAES
Feature selection methods are used to select a subset of features from data, therefore only the useful information can be mined from the samples to get better accuracy and improves the computational efficiency of the learning model. Moth-flame Optimization (MFO) algorithm is a population-based approach, that simulates the behavior of real moth in nature, one drawback of the MFO algorithm is that the solutions move toward the best solution, and it easily can be stuck in local optima as we investigated in this paper, therefore, we proposed a MFO Algorithm combined with a neighborhood search method for feature selection problems, in order to avoid the MFO algorithm getting trapped in a local optima, and helps in avoiding the premature convergence, the neighborhood search method is applied after a predefined number of unimproved iterations (the number of tries fail to improve the current solution). As a result, the proposed algorithm shows good performance when compared with the original MFO algorithm and with state-of-the-art approaches.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
We conducted comparative analysis of different supervised dimension reduction techniques by integrating a set of different data splitting algorithms and demonstrate the relative efficacy of learning algorithms dependence of sample complexity. The issue of sample complexity discussed in the dependence of data splitting algorithms. In line with the expectations, every supervised learning classifier demonstrated different capability for different data splitting algorithms and no way to calculate overall ranking of techniques was directly available. We specifically focused the classifier ranking dependence of data splitting algorithms and devised a model built on weighted average rank Weighted Mean Rank Risk Adjusted Model (WMRRAM) for consent ranking of learning classifier algorithms.
Iaetsd an enhanced feature selection forIaetsd Iaetsd
The document discusses feature selection techniques for machine learning applications. It proposes an Enhanced Fast Clustering-based Feature Selection (EFAST) algorithm. The EFAST algorithm works in two steps: 1) features are clustered using graph-theoretic clustering methods, and 2) the most relevant representative feature strongly correlated with the target categories is selected from each cluster to form the optimal feature subset. Features from different clusters are relatively independent, so EFAST has a high chance of selecting a set of useful and independent features. The algorithm was tested on real-world data and showed improved performance over other feature selection methods by reducing features while also improving classifier performance.
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Survey on semi supervised classification methods and feature selectioneSAT Journals
Abstract Data mining also called knowledge discovery is a process of analyzing data from several perspective and summarize it into useful information. It has tremendous application in the area of classification like pattern recognition, discovering several disease type, analysis of medical image, recognizing speech, for identifying biometric, drug discovery etc. This is a survey based on several semisupervised classification method used by classifiers , in this both labeled and unlabeled data can be used for classification purpose.It is less expensive than other classification methods . Different techniques surveyed in this paper are low density separation approach, transductive SVM, semi-supervised based logistic discriminate procedure, self training nearest neighbour rule using cut edges, self training nearest neighbour rule using cut edges. Along with classification methods a review about various feature selection methods is also mentioned in this paper. Feature selection is performed to reduce the dimension of large dataset. After reducing attribute the data is given for classification hence the accuracy and performance of classification system can be improved .Several feature selection method include consistency based feature selection, fuzzy entropy measure feature selection with similarity classifier, Signal to noise ratio,Positive approximation. So each method has several benefits. Index Terms: Semisupervised classification, Transductive support vector machine, Feature selection, unlabeled samples
Neighborhood search methods with moth optimization algorithm as a wrapper met...IJECEIAES
Feature selection methods are used to select a subset of features from data, therefore only the useful information can be mined from the samples to get better accuracy and improves the computational efficiency of the learning model. Moth-flame Optimization (MFO) algorithm is a population-based approach, that simulates the behavior of real moth in nature, one drawback of the MFO algorithm is that the solutions move toward the best solution, and it easily can be stuck in local optima as we investigated in this paper, therefore, we proposed a MFO Algorithm combined with a neighborhood search method for feature selection problems, in order to avoid the MFO algorithm getting trapped in a local optima, and helps in avoiding the premature convergence, the neighborhood search method is applied after a predefined number of unimproved iterations (the number of tries fail to improve the current solution). As a result, the proposed algorithm shows good performance when compared with the original MFO algorithm and with state-of-the-art approaches.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
Study on Relavance Feature Selection MethodsIRJET Journal
This document summarizes research on feature selection methods. It discusses how feature selection is used to reduce dimensionality when working with large datasets that have thousands of variables. Several feature selection algorithms are examined, including ant colony optimization, quadratic programming, variable ranking using filter, wrapper and embedded methods, and fast correlation-based filtering with sequential forward selection. Feature selection can improve classification efficiency and understanding of data by identifying the most meaningful features.
IRJET-Comparison between Supervised Learning and Unsupervised LearningIRJET Journal
This document compares supervised and unsupervised learning models in artificial neural networks. It describes how supervised learning uses labeled training data to learn relationships between inputs and outputs, while unsupervised learning identifies hidden patterns in unlabeled data. The document outlines techniques for each like classification and regression for supervised learning, and clustering and density estimation for unsupervised learning. It then presents experiments applying multilayer perceptron (supervised) and k-means clustering (unsupervised) to datasets, finding unsupervised learning had higher accuracy. The document concludes unsupervised learning is favored for this task but supervised learning remains useful for problems with labeled data.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
This document discusses using machine learning clustering algorithms to analyze stock market data. It compares the K-means, COBWEB, DBSCAN, EM and OPTICS clustering algorithms in the WEKA tool on a stock market dataset containing 420 instances and 6 attributes. The K-means algorithm had the best performance with the lowest error and fastest runtime. It clustered the data into 4 groups in 0.16 seconds. The COBWEB algorithm clustered the data into 107 groups in 27.88 seconds. The DBSCAN algorithm found 21 clusters in 3.97 seconds. The paper concludes that K-means is best suited for stock market data mining applications due to its simplicity and speed compared to other algorithms.
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
In search based test data generation, the problem of test data generation is reduced to that of
function minimization or maximization.Traditionally, for branch testing, the problem of test data
generation has been formulated as a minimization problem. In this paper we define an alternate
maximization formulation and experimentally compare it with the minimization formulation. We
use a genetic algorithm as the search technique and in addition to the usual genetic algorithm
operators we also employ the path prefix strategy as a branch ordering strategy and memory and elitism. Results indicate that there is no significant difference in the performance or the coverage obtained through the two approaches and either could be used in test data generation when coupled with the path prefix strategy, memory and elitism.
Classification of Machine Learning AlgorithmsAM Publications
The goal of various machine learning algorithms is to device learning algorithms that learns automatically without any human intervention or assistance. The emphasis of machine learning is on automatic methods. Supervised Learning, unsupervised learning and reinforcement learning are discussed in this paper. Machine learning is the core area of Artificial Intelligence. Although a subarea of AI, machine learning also intersects broadly with other fields, especially statistics, but also mathematics, physics, theoretical computer science and more.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...IRJET Journal
This document describes a comparative analysis of GUI-based machine learning approaches for predicting Parkinson's disease. It analyzes various machine learning algorithms including logistic regression, decision trees, support vector machines, random forests, k-nearest neighbors, and naive Bayes. The document discusses data preprocessing techniques like variable identification, data validation, cleaning and preparing. It also covers data visualization and evaluating model performance using accuracy calculations. The goal is to compare the performance of these machine learning algorithms and identify the approach that predicts Parkinson's disease with the highest accuracy based on a given hospital dataset.
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
Feature selection and classification task are an essential process in dealing with large data sets that
comprise numerous number of input attributes. There are many search methods and classifiers that have
been used to find the optimal number of attributes. The aim of this paper is to find the optimal set of
attributes and improve the classification accuracy by adopting ensemble rule classifiers method. Research
process involves 2 phases; finding the optimal set of attributes and ensemble classifiers method for
classification task. Results are in terms of percentage of accuracy and number of selected attributes and
rules generated. 6 datasets were used for the experiment. The final output is an optimal set of attributes
with ensemble rule classifiers method. The experimental results conducted on public real dataset
demonstrate that the ensemble rule classifiers methods consistently show improve classification accuracy
on the selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
This document summarizes a research paper that evaluated the effect of feature reduction using principal component analysis (PCA) on sentiment analysis of online product reviews. The researchers developed two models - Model I used unigram features directly, while Model II reduced the features to the top 57 principal components. Both support vector machines and naive Bayes classifiers showed improved accuracy when trained on the reduced feature set of Model II compared to the full feature set of Model I. Receiver operating characteristic curves also indicated better classification performance from both classifiers when using the reduced features. The results provide promising evidence that PCA can be an effective feature reduction method for sentiment analysis tasks.
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
This document presents a study that uses machine learning techniques to predict crime rates. Specifically, it aims to analyze crime data using supervised machine learning classification algorithms like decision trees, support vector machines, logistic regression, k-nearest neighbors, and random forests. The document outlines collecting and preprocessing crime data, selecting relevant features, training models on a portion of the data and testing them on the remaining data. It finds that random forest achieved the best prediction accuracy compared to other algorithms tested. The goal is to help law enforcement agencies better predict and reduce crime rates by analyzing historical crime data patterns.
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHmIJCI JOURNAL
In machine learning and data mining, attribute sel
ect is the practice of selecting a subset o
f most
consequential attributes for utilize in model const
ruction. Using an attribute select method is that t
he data
encloses many redundant or extraneous attributes. W
here redundant attributes are those which sup
ply
no supplemental information than the presently
selected attributes, and impertinent attribut
es offer
no valuable information in any context
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
For years, the Machine Learning community has focused on developing efficient
algorithms that can produce very accurate classifiers. However, it is often much easier
to find several good classifiers based on dataset combination, instead of single classifier
applied on deferent datasets. The advantages of using classifier dataset combinations
instead of a single one are twofold: it helps lowering the computational complexity by
using simpler models, and it can improve the classification accuracy and performance.
Most Data mining applications are based on pattern matching algorithms, thus improving
the performance of the classification has a positive impact on the quality of the overall
data mining task. Since combination strategies proved very useful in improving the
performance, these techniques have become very important in applications such as
Cancer detection, Speech Technology and Natural Language Processing .The aim of this
paper is basically to propose proprietary metric, Normalized Geometric Index (NGI)
based on the latent properties of datasets for improving the accuracy of data mining tasks.
Applying supervised and un supervised learning approaches for movie recommend...IAEME Publication
This paper compares supervised and unsupervised machine learning approaches for developing a movie recommender system. It applies classification and clustering techniques to movie rating data to classify movies as viewable or not and cluster users based on attributes. Experimental results show that classification accuracy is highest when 80% of data is used for training. Clustering identifies groups of similar movies and users. The authors conclude recommender systems built with machine learning can provide intelligent recommendations that approach or surpass human judgment.
Presented is the approach for supporting the outcome grading for activities of
operators of complex technical systems. It is based on comparisons of current
exercises with the activity database patterns in the wavelet representation metric
associated with observed parameters as well as on probabilistic assessments of skill
class recognition using sample distribution functions of exercise distances to cluster
centers in a scaling space and Bayesian likelihood estimations with the aid of
probabilistic profile of staying in activity parameter ranges. These techniques have
demonstrated the capabilities of recognizing sets of abnormal exercises in the scaling
spaces with the wavelet coefficient metric and detection of parameters characterizing
operator mistakes to reveal the causes of abnormality. The techniques presented
overcome limitations of existing methods and provide advantages over manual data
analysis since they greatly reduce the combinatorial enumeration of the options
considered.
Association rule discovery for student performance prediction using metaheuri...csandit
According to the increase of using data mining tech
niques in improving educational systems
operations, Educational Data Mining has been introd
uced as a new and fast growing research
area. Educational Data Mining aims to analyze data
in educational environments in order to
solve educational research problems. In this paper
a new associative classification technique
has been proposed to predict students final perform
ance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative cla
ssifiers maintain interpretability along
with high accuracy. In this research work, we have
employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract associat
ion rule for student performance prediction
as a multi-objective classification problem. Result
s indicate that the proposed swarm based
algorithm outperforms well-known classification tec
hniques on student performance prediction
classification problem.
The annual report summarizes the Austin Chamber of Commerce's activities and accomplishments in 2011. It highlights the growth in membership to over 2,500 members representing a diverse set of industries. It also describes the Chamber's support of the business community through various programs and events in areas like economic development, transportation, education, and technology. The Chamber engaged over 250 members as volunteers and produced over 100 events attended by more than 10,000 people.
The document is an annual report from Opportunity Austin, an organization that works to promote economic development in the Austin region. It summarizes the economic growth and diversification that occurred in Austin in 2011, including over 51 businesses expanding and 35 new companies relocating and bringing over 4,000 new jobs. It highlights key industries like biopharmaceutical, high tech, digital media, and clean tech that saw growth. It also provides a table listing many of the companies that expanded or relocated in 2011 along with details on the jobs and economic impact they brought.
The document summarizes a wayfinding sign system for Erie County. It lists the members of the Erie County Signing Region Trust that oversees the system. It describes how the signs will guide visitors to destinations and attractions within neighborhoods. It provides details on the design, phases of implementation including the City of Erie and County of Erie, stakeholders and messages, funding, and target date for completion in summer 2004.
This document discusses a leadership development program called Great200Leaders that aims to help ambitious business leaders overcome challenges to growth. The program involves a five step journey including a business review, leadership review, growth plan development, peer group workshops on key leadership topics, and ongoing coaching. Participants receive over 100 hours of support over 6 months to shift from being a good manager to a great leader. Research shows businesses who complete the program average a £200,000 increase in profitability, representing a £100 return for every £1 invested. Interested leaders should contact the named coach for more information.
This document outlines the agenda for a workshop on innovating and growing businesses. The workshop will cover defining innovation and its importance, assessing a business's current innovativeness through exercises, determining when innovation is needed, generating innovation ideas using open innovation principles, and making innovation happen within a business. Attendees will learn how to access the Innovation Advisory Service for free confidential assistance in developing innovation plans and actions. The overall goal is to help businesses learn how to start bringing innovation to their operations and activities.
Study on Relavance Feature Selection MethodsIRJET Journal
This document summarizes research on feature selection methods. It discusses how feature selection is used to reduce dimensionality when working with large datasets that have thousands of variables. Several feature selection algorithms are examined, including ant colony optimization, quadratic programming, variable ranking using filter, wrapper and embedded methods, and fast correlation-based filtering with sequential forward selection. Feature selection can improve classification efficiency and understanding of data by identifying the most meaningful features.
IRJET-Comparison between Supervised Learning and Unsupervised LearningIRJET Journal
This document compares supervised and unsupervised learning models in artificial neural networks. It describes how supervised learning uses labeled training data to learn relationships between inputs and outputs, while unsupervised learning identifies hidden patterns in unlabeled data. The document outlines techniques for each like classification and regression for supervised learning, and clustering and density estimation for unsupervised learning. It then presents experiments applying multilayer perceptron (supervised) and k-means clustering (unsupervised) to datasets, finding unsupervised learning had higher accuracy. The document concludes unsupervised learning is favored for this task but supervised learning remains useful for problems with labeled data.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
This document discusses using machine learning clustering algorithms to analyze stock market data. It compares the K-means, COBWEB, DBSCAN, EM and OPTICS clustering algorithms in the WEKA tool on a stock market dataset containing 420 instances and 6 attributes. The K-means algorithm had the best performance with the lowest error and fastest runtime. It clustered the data into 4 groups in 0.16 seconds. The COBWEB algorithm clustered the data into 107 groups in 27.88 seconds. The DBSCAN algorithm found 21 clusters in 3.97 seconds. The paper concludes that K-means is best suited for stock market data mining applications due to its simplicity and speed compared to other algorithms.
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
In search based test data generation, the problem of test data generation is reduced to that of
function minimization or maximization.Traditionally, for branch testing, the problem of test data
generation has been formulated as a minimization problem. In this paper we define an alternate
maximization formulation and experimentally compare it with the minimization formulation. We
use a genetic algorithm as the search technique and in addition to the usual genetic algorithm
operators we also employ the path prefix strategy as a branch ordering strategy and memory and elitism. Results indicate that there is no significant difference in the performance or the coverage obtained through the two approaches and either could be used in test data generation when coupled with the path prefix strategy, memory and elitism.
Classification of Machine Learning AlgorithmsAM Publications
The goal of various machine learning algorithms is to device learning algorithms that learns automatically without any human intervention or assistance. The emphasis of machine learning is on automatic methods. Supervised Learning, unsupervised learning and reinforcement learning are discussed in this paper. Machine learning is the core area of Artificial Intelligence. Although a subarea of AI, machine learning also intersects broadly with other fields, especially statistics, but also mathematics, physics, theoretical computer science and more.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...IRJET Journal
This document describes a comparative analysis of GUI-based machine learning approaches for predicting Parkinson's disease. It analyzes various machine learning algorithms including logistic regression, decision trees, support vector machines, random forests, k-nearest neighbors, and naive Bayes. The document discusses data preprocessing techniques like variable identification, data validation, cleaning and preparing. It also covers data visualization and evaluating model performance using accuracy calculations. The goal is to compare the performance of these machine learning algorithms and identify the approach that predicts Parkinson's disease with the highest accuracy based on a given hospital dataset.
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
Feature selection and classification task are an essential process in dealing with large data sets that
comprise numerous number of input attributes. There are many search methods and classifiers that have
been used to find the optimal number of attributes. The aim of this paper is to find the optimal set of
attributes and improve the classification accuracy by adopting ensemble rule classifiers method. Research
process involves 2 phases; finding the optimal set of attributes and ensemble classifiers method for
classification task. Results are in terms of percentage of accuracy and number of selected attributes and
rules generated. 6 datasets were used for the experiment. The final output is an optimal set of attributes
with ensemble rule classifiers method. The experimental results conducted on public real dataset
demonstrate that the ensemble rule classifiers methods consistently show improve classification accuracy
on the selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
This document summarizes a research paper that evaluated the effect of feature reduction using principal component analysis (PCA) on sentiment analysis of online product reviews. The researchers developed two models - Model I used unigram features directly, while Model II reduced the features to the top 57 principal components. Both support vector machines and naive Bayes classifiers showed improved accuracy when trained on the reduced feature set of Model II compared to the full feature set of Model I. Receiver operating characteristic curves also indicated better classification performance from both classifiers when using the reduced features. The results provide promising evidence that PCA can be an effective feature reduction method for sentiment analysis tasks.
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
This document presents a study that uses machine learning techniques to predict crime rates. Specifically, it aims to analyze crime data using supervised machine learning classification algorithms like decision trees, support vector machines, logistic regression, k-nearest neighbors, and random forests. The document outlines collecting and preprocessing crime data, selecting relevant features, training models on a portion of the data and testing them on the remaining data. It finds that random forest achieved the best prediction accuracy compared to other algorithms tested. The goal is to help law enforcement agencies better predict and reduce crime rates by analyzing historical crime data patterns.
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHmIJCI JOURNAL
In machine learning and data mining, attribute sel
ect is the practice of selecting a subset o
f most
consequential attributes for utilize in model const
ruction. Using an attribute select method is that t
he data
encloses many redundant or extraneous attributes. W
here redundant attributes are those which sup
ply
no supplemental information than the presently
selected attributes, and impertinent attribut
es offer
no valuable information in any context
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
For years, the Machine Learning community has focused on developing efficient
algorithms that can produce very accurate classifiers. However, it is often much easier
to find several good classifiers based on dataset combination, instead of single classifier
applied on deferent datasets. The advantages of using classifier dataset combinations
instead of a single one are twofold: it helps lowering the computational complexity by
using simpler models, and it can improve the classification accuracy and performance.
Most Data mining applications are based on pattern matching algorithms, thus improving
the performance of the classification has a positive impact on the quality of the overall
data mining task. Since combination strategies proved very useful in improving the
performance, these techniques have become very important in applications such as
Cancer detection, Speech Technology and Natural Language Processing .The aim of this
paper is basically to propose proprietary metric, Normalized Geometric Index (NGI)
based on the latent properties of datasets for improving the accuracy of data mining tasks.
Applying supervised and un supervised learning approaches for movie recommend...IAEME Publication
This paper compares supervised and unsupervised machine learning approaches for developing a movie recommender system. It applies classification and clustering techniques to movie rating data to classify movies as viewable or not and cluster users based on attributes. Experimental results show that classification accuracy is highest when 80% of data is used for training. Clustering identifies groups of similar movies and users. The authors conclude recommender systems built with machine learning can provide intelligent recommendations that approach or surpass human judgment.
Presented is the approach for supporting the outcome grading for activities of
operators of complex technical systems. It is based on comparisons of current
exercises with the activity database patterns in the wavelet representation metric
associated with observed parameters as well as on probabilistic assessments of skill
class recognition using sample distribution functions of exercise distances to cluster
centers in a scaling space and Bayesian likelihood estimations with the aid of
probabilistic profile of staying in activity parameter ranges. These techniques have
demonstrated the capabilities of recognizing sets of abnormal exercises in the scaling
spaces with the wavelet coefficient metric and detection of parameters characterizing
operator mistakes to reveal the causes of abnormality. The techniques presented
overcome limitations of existing methods and provide advantages over manual data
analysis since they greatly reduce the combinatorial enumeration of the options
considered.
Association rule discovery for student performance prediction using metaheuri...csandit
According to the increase of using data mining tech
niques in improving educational systems
operations, Educational Data Mining has been introd
uced as a new and fast growing research
area. Educational Data Mining aims to analyze data
in educational environments in order to
solve educational research problems. In this paper
a new associative classification technique
has been proposed to predict students final perform
ance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative cla
ssifiers maintain interpretability along
with high accuracy. In this research work, we have
employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract associat
ion rule for student performance prediction
as a multi-objective classification problem. Result
s indicate that the proposed swarm based
algorithm outperforms well-known classification tec
hniques on student performance prediction
classification problem.
The annual report summarizes the Austin Chamber of Commerce's activities and accomplishments in 2011. It highlights the growth in membership to over 2,500 members representing a diverse set of industries. It also describes the Chamber's support of the business community through various programs and events in areas like economic development, transportation, education, and technology. The Chamber engaged over 250 members as volunteers and produced over 100 events attended by more than 10,000 people.
The document is an annual report from Opportunity Austin, an organization that works to promote economic development in the Austin region. It summarizes the economic growth and diversification that occurred in Austin in 2011, including over 51 businesses expanding and 35 new companies relocating and bringing over 4,000 new jobs. It highlights key industries like biopharmaceutical, high tech, digital media, and clean tech that saw growth. It also provides a table listing many of the companies that expanded or relocated in 2011 along with details on the jobs and economic impact they brought.
The document summarizes a wayfinding sign system for Erie County. It lists the members of the Erie County Signing Region Trust that oversees the system. It describes how the signs will guide visitors to destinations and attractions within neighborhoods. It provides details on the design, phases of implementation including the City of Erie and County of Erie, stakeholders and messages, funding, and target date for completion in summer 2004.
This document discusses a leadership development program called Great200Leaders that aims to help ambitious business leaders overcome challenges to growth. The program involves a five step journey including a business review, leadership review, growth plan development, peer group workshops on key leadership topics, and ongoing coaching. Participants receive over 100 hours of support over 6 months to shift from being a good manager to a great leader. Research shows businesses who complete the program average a £200,000 increase in profitability, representing a £100 return for every £1 invested. Interested leaders should contact the named coach for more information.
This document outlines the agenda for a workshop on innovating and growing businesses. The workshop will cover defining innovation and its importance, assessing a business's current innovativeness through exercises, determining when innovation is needed, generating innovation ideas using open innovation principles, and making innovation happen within a business. Attendees will learn how to access the Innovation Advisory Service for free confidential assistance in developing innovation plans and actions. The overall goal is to help businesses learn how to start bringing innovation to their operations and activities.
The document contains a large amount of numerical data and symbols. It seems to be discussing various statistical calculations and formulas involving percentages, ratios, and other mathematical operations. Overall, it appears to be presenting quantitative analysis but the technical language and lack of context make the specific topic or conclusions difficult to determine from this summary alone.
A experiência Terra Encantada oferece passeios pela Península Way Stand Barra, permitindo que os visitantes desfrutem da natureza e paisagens encantadoras da região.
The presentation elucidates the need of a paradigm shift from a mother robin teaching to integration of technology for the development of autonomous learners.
The document outlines a presentation by a design group called YAP İstanbul Modern for their Young Architects Program. The group includes four members - Evren Başbuğ, İnanç Eray, Meriç Kara, and Engin Ayaz. They have varying backgrounds and degrees and have collaborated on previous projects. Their presentation is for a design called "Seapeaker" which aims to amplify the sounds of the sea underneath an existing structure in Istanbul. It includes research conducted, experimental phases, design principles explored, and proposals for how the system could be used and integrated sustainably.
This document discusses change management strategies for high-growth businesses. It covers changing the business through red ocean/blue ocean thinking to create new market spaces. Red ocean thinking involves competing on existing factors, while blue ocean thinking creates new value and eliminates unattractive factors. The document also discusses changing people through employee engagement models like the X model and focusing communications using the golden circle approach. Case studies of Cirque du Soleil and alternative transportation options are presented to illustrate these concepts.
Slides from an urban informatics presentation and workshop organized by Arup in San Francisco in September 21 2010, as part of the AIA Architecture and the City Festival 2010. The case study was the Mid-Market District in San Francisco, focusing on enhanced quality of life, city management and environmental sustainability via street-level data collection and visualization.
Zero Carbon Isn’t Really Zero: Why Embodied Carbon in Materials Can’t Be Ignoredenginayaz
A Design Intelligence webinar provided by Arup consultants focusing on environmentally sustainable building design. The webinar is based on a web article authored by Frances Yang and Engin Ayaz.
http://www.di.net/articles/archive/zero_carbon/
The document discusses traditional approaches to compensating employees, including achieving internal and external equity. It describes job evaluation methods like ranking, classification, and point factor analysis to determine the relative worth of jobs. Market data is then analyzed and internal job worth is reconciled with external market pay to develop an organizational pay structure. Emerging approaches to compensation like broadbanding, team-based pay, and skill-based pay are also mentioned.
GCE O' Level 1123 Examiner's Report Sum upSaima Abedi
The presentation is based on the information extracted from examiner's reports of last three years English language papers. It gives a quick idea about the Do and Don't for 1123.
The document provides tips for giving a powerful presentation in 3 or fewer sentences:
Research your subject matter thoroughly, understand your audience, and know your limits. Prepare a script with an opening, body, and summary that states your theme or purpose. Rehearse your presentation even before finalizing any visual aids.
This document lists several Web 2.0 tools including blogs, podcasts, RSS feeds, and wikis. These tools allow for more interactive and collaborative experiences on the web through sharing content, syndicating feeds of information, and allowing multiple users to jointly edit and structure content.
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
This document proposes a mutual information-based feature selection algorithm to select optimal features for network intrusion detection classification. The algorithm aims to handle dependent data features better than previous methods. It evaluates the effectiveness of the algorithm on network intrusion detection cases. Most previous methods suffer from low detection rates and high false alarm rates. The proposed approach uses feature selection, filtering, clustering, and clustering ensemble techniques in a hybrid data mining method to achieve high accuracy for intrusion detection systems.
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ijcsit
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
Data mining is indispensable for business organizations for extracting useful information from the huge
volume of stored data which can be used in managerial decision making to survive in the competition. Due
to the day-to-day advancements in information and communication technology, these data collected from e-
commerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high
dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets
selected in subsequent iterations by feature selection should be same or similar even in case of small
perturbations of the dataset and is called as selection stability. It is recently becomes important topic of
research community. The selection stability has been measured by various measures. This paper analyses
the selection of the suitable search method and stability measure for the feature selection algorithms and
also the influence of the characteristics of the dataset as the choice of the best approach is highly problem
dependent.
A Review on Feature Selection Methods For Classification TasksEditor IJCATR
In recent years, application of feature selection methods in medical datasets has greatly increased. The challenging task in
feature selection is how to obtain an optimal subset of relevant and non redundant features which will give an optimal solution without
increasing the complexity of the modeling task. Thus, there is a need to make practitioners aware of feature selection methods that have
been successfully applied in medical data sets and highlight future trends in this area. The findings indicate that most existing feature
selection methods depend on univariate ranking that does not take into account interactions between variables, overlook stability of the
selection algorithms and the methods that produce good accuracy employ more number of features. However, developing a universal
method that achieves the best classification accuracy with fewer features is still an open research area.
An experimental study on hypothyroid using rotation forestIJDKP
This paper majorly focuses on hypothyroid medical diseases caused by underactive thyroid glands. The
dataset used for the study on hypothyroid is taken from UCI repository. Classification of this thyroid
disease is a considerable task. An experimental study is carried out using rotation forest using features
selection methods to achieve better accuracy. An important step to gain good accuracy is a pre- processing
step, thus here two feature selection techniques are used. A filter method, Correlation features subset
selection and wrappers method has helped in removing irrelevant as well as useless features from the data
set. Fourteen different machine learning algorithms were tested on hypothyroid data set using rotation
forest which successfully turned out giving positively improved results
Iaetsd an efficient and large data base using subset selection algorithmIaetsd Iaetsd
The document presents a new feature selection algorithm called FAST (Feature Cluster-based Subset Selection) that aims to efficiently reduce dimensionality by removing irrelevant and redundant features. The FAST algorithm works in two steps: (1) it clusters features using graph theoretic methods, and (2) it selects the most representative feature from each cluster. This clustering-based approach has a high probability of selecting useful and independent features. The algorithm is evaluated on high dimensional datasets and shown to improve learning accuracy while reducing dimensionality compared to other feature selection methods.
Prognostication of the placement of students applying machine learning algori...BIJIAM Journal
Placement is the process of connecting the selected candidate with the employer. Every student might have adream of having a job offer when he or she is about to complete her course. All educational institutions aim athaving their students well placed in good organizations. The reputation of any institution depends on the placementof its students. Hence, many institutions try hard to have a good placement cell. Classification using machinelearning may be utilized to retrieve data from the student-databases. A prediction model that can foretell theeligibility of the students based on their academic and extracurricular achievements is proposed. Related data wascollected from many institutions for which the placement-prediction is made. This paradigm is being weighed upwith the existing algorithms, and findings have been made regarding the accuracy of predictions. It was found thatthe proposed algorithm performed significantly better and yielded good results.
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILEScscpconf
In this paper we have focused on an efficient feature selection method in classification of audio files.
The main objective is feature selection and extraction. We have selected a set of features for further
analysis, which represents the elements in feature vector. By extraction method we can compute a
numerical representation that can be used to characterize the audio using the existing toolbox. In this
study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute
which will separate the tuples into different classes. The pulse clarity is considered as a subjective
measure and it is used to calculate the gain of features of audio files. The splitting criterion is
employed in the application to identify the class or the music genre of a specific audio file from
testing database. Experimental results indicate that by using GR the application can produce a
satisfactory result for music genre classification. After dimensionality reduction best three features
have been selected out of various features of audio file and in this technique we will get more than
90% successful classification result.
In this paper we have focused on an efficient feature selection method in classification of audio files.
The main objective is feature selection and extraction. We have selected a set of features for further
analysis, which represents the elements in feature vector. By extraction method we can compute a
numerical representation that can be used to characterize the audio using the existing toolbox. In this
study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute
which will separate the tuples into different classes. The pulse clarity is considered as a subjective
measure and it is used to calculate the gain of features of audio files. The splitting criterion is
employed in the application to identify the class or the music genre of a specific audio file from
testing database. Experimental results indicate that by using GR the application can produce a
satisfactory result for music genre classification. After dimensionality reduction best three features
have been selected out of various features of audio file and in this technique we will get more than
90% successful classification result.
Classification problems specified in high dimensional data with smallnumber of observation are generally becoming common in specific microarray data. In the time of last two periods of years, manyefficient classification standard models and also Feature Selection (FS) algorithm which isalso referred as FS technique have basically been proposed for higher prediction accuracies. Although, the outcome of FS algorithm related to predicting accuracy is going to be unstable over the variations in considered trainingset, in high dimensional data. In this paperwe present a latest evaluation measure Q-statistic that includes the stability of the selected feature subset in inclusion to prediction accuracy. Then we are going to propose the standard Booster of a FS algorithm that boosts the basic value of the preferred Q-statistic of the algorithm applied. Therefore study on synthetic data and 14 microarray data sets shows that Booster boosts not only the value of Q-statistics but also the prediction accuracy of the algorithm applied.
The document describes a proposed fast clustering-based feature subset selection (FAST) algorithm for high-dimensional data. The FAST algorithm works in two steps: 1) clustering features using minimum spanning tree methods, and 2) selecting the most representative feature from each cluster. This identifies useful and independent features efficiently. Experimental results on 35 real-world datasets demonstrate that FAST produces smaller feature subsets and improves classifier performance compared to other feature selection algorithms.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
This document discusses online feature selection (OFS) for data mining applications. It addresses two tasks of OFS: 1) learning with full input, where the learner can access all features to select a subset, and 2) learning with partial input, where only a limited number of features can be accessed for each instance. Novel algorithms are presented for each task, and their performance is analyzed theoretically. Experiments on real-world datasets demonstrate the efficacy of the proposed OFS techniques for applications in computer vision, bioinformatics, and other domains involving high-dimensional sequential data.
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Artificial intelligence based pattern recognition is
one of the most important tools in process control to identify
process problems. The objective of this study was to
evaluate the relative performance of a feature-based
Recognizer compared with the raw data-based recognizer.
The study focused on recognition of seven commonly
researched patterns plotted on the quality chart. The
artificial intelligence based pattern recognizer trained using
the three selected statistical features resulted in significantly
better performance compared with the raw data-based
recognizer.
Feature selection is a problem closely related to dimensionality reduction. A commonly used
approach in feature selection is ranking the individual features according to some criteria and
then search for an optimal feature subset based on an evaluation criterion to test the optimality.
The objective of this work is to predict more accurately the presence of Learning Disability
(LD) in school-aged children with reduced number of symptoms. For this purpose, a novel
hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The approach follows
a ranking of the symptoms of LD according to their importance in the data domain. Each
symptoms significance or priority values reflect its relative importance to predict LD among the
various cases. Then by eliminating least significant features one by one and evaluating the
feature subset at each stage of the process, an optimal feature subset is generated. The
experimental results shows the success of the proposed method in removing redundant
attributes efficiently from the LD dataset without sacrificing the classification performance.
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...csandit
Feature selection is a problem closely related to dimensionality reduction. A commonly used
approach in feature selection is ranking the individual features according to some criteria and
then search for an optimal feature subset based on an evaluation criterion to test the optimality.
The objective of this work is to predict more accurately the presence of Learning Disability
(LD) in school-aged children with reduced number of symptoms. For this purpose, a novel
hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The approach follows
a ranking of the symptoms of LD according to their importance in the data domain. Each
symptoms significance or priority values reflect its relative importance to predict LD among the
various cases. Then by eliminating least significant features one by one and evaluating the
feature subset at each stage of the process, an optimal feature subset is generated. The
experimental results shows the success of the proposed method in removing redundant
attributes efficiently from the LD dataset without sacrificing the classification performance.
This document is a research proposal on attribute selection and representation for software defect prediction. The proposal discusses limitations in existing attribute selection methods and the importance of pre-processing data. It aims to propose a new attribute selection method that improves accuracy by addressing shortcomings, and to study appropriate classifiers. The methodology involves a literature review on pre-processing, attribute selection and classification methods. It will then propose and implement a new attribute selection process, compare it using different classifiers and pre-processing, and evaluate it against existing techniques in a technical report.
Similar to An unsupervised feature selection algorithm with feature ranking for maximizing performance of the classifiers (20)
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Literature Review Basics and Understanding Reference Management.pptx
An unsupervised feature selection algorithm with feature ranking for maximizing performance of the classifiers
1. International Journal of Automation and Computing 12(5), October 2015, 511-517
DOI: 10.1007/s11633-014-0859-5
An Unsupervised Feature Selection Algorithm with
Feature Ranking for Maximizing Performance of
the Classifiers
Danasingh Asir Antony Gnana Singh1
Subramanian Appavu Alias Balamurugan2
Epiphany Jebamalar Leavline3
1
Department of Computer Science and Engineering, Bharathidasan Institute of Technology, Anna University, Tiruchirappalli 620024, India
2
Department of Information Technology, K.L.N College of Information Technology, Sivagangai 630611, India
3
Department of Electronics and Communication Engineering, Bharathidasan Institute of Technology, Anna University,
Tiruchirappalli 620024, India
Abstract: Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,
efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,
finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The
predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and
redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be
removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm
namely unsupervised learning with ranking based feature selection (FSULR). It removes redundant features by clustering and eliminates
irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this
proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes (NB),
instance based (IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for
classifiers.
Keywords: Feature selection algorithm, classification, clustering, prediction, feature ranking, predictive model.
1 Introduction
Prediction plays a vital role in decision making. This
is done by the predictive model that is known as classi-
fier or supervised learner[1]
. This predictive model is em-
ployed to predict the unseen and unknown data in vari-
ous applications[2]
. Improving the predictive accuracy is
a major challenge among the researchers. The accuracy
of the classifier highly relies on the training dataset which
is used to train the predictive model. The irrelevant and
redundant features of the training dataset reduce the pre-
dictive accuracy[3]
. Hence, irrelevant and redundant fea-
tures must be eliminated from the training dataset for im-
proving the accuracy of the predictive model. The per-
formance improvement of the predictive model include im-
proving the predictive accuracy, reducing the time taken to
build the predictive model and reducing the number of fea-
tures present in the training dataset[3]
. Selecting significant
features from the training dataset for improving the perfor-
mance of the supervised or unsupervised learner is known
as feature selection. This feature selection is classified as
wrapper, filter, embedded and hybrid methods[4]
.
The filter method selects the important features and re-
moves the redundant and irrelevant features from the given
Regular Paper
Manuscript received November 4, 2013; accepted April 3, 2014
Recommended by Editor-in-Chief Huo-Sheng Hu
c Institute of Automation, Chinese Academy of Science and
Springer-Verlag Berlin Heidelberg 2015
dataset using mathematical statistical measure. It performs
well regardless of the classifier chosen, therefore it can work
for any type of classification algorithm[5, 6]
.
The embedded method uses the training phase of the su-
pervised learning algorithm for selecting the important fea-
tures from the training dataset. Therefore, its performance
is based on the classifiers used[7]
.
The wrapper method uses the classifier to determine im-
portant features available in the dataset[8]
. The combi-
nation of filter and wrapper method is known as hybrid
method[4, 9]
.
Many feature selection algorithms are proposed recently,
which concentrate only on removing the irrelevant features.
They do not deal with removing the redundant features
from the training dataset. These redundant features reduce
the accuracy of the classifiers. This paper proposes a filter
based feature selection algorithm named as unsupervised
learning with ranking based feature selection (FSULR).
This algorithm selects the most significant features from the
dataset and removes the redundant and irrelevant features.
The unsupervised learner learns the training dataset using
expectation maximization function and groups the features
into various clusters and these clustered features are ranked
based on the χ2
measure within the cluster. The significant
features from each cluster are chosen based on the threshold
function.
2. 512 International Journal of Automation and Computing 12(5), October 2015
The remaining paper is organized as follows. The rele-
vant literature is reviewed in Section 2. Section 3 discusses
the proposed feature selection algorithm. Section 4 explores
and discusses the experimental results. Conclusion is drawn
in Section 5 with future enhancement of this work.
2 Related work
This section discusses various types of feature selection,
cluster and classification algorithms as a part of the related
works of this proposed algorithm.
2.1 Feature selection algorithm
The feature selection algorithm selects the most signif-
icant features from the dataset using ranking based tech-
niques, subset based and unsupervised based techniques.
In the ranking based technique, the individual features “fi”
are ranked by applying one of the mathematical measures
such as information gain, gain ratio, on the training dataset
“TD”. The ranked features are selected as significant fea-
tures for learning algorithm by a threshold value “TV ” cal-
culated by the threshold function. In subset based tech-
nique, the features of the training datasets are separated
into maximum number of possible feature subsets “S” and
each subset is evaluated by an evaluation criteria to identify
the significance of the features present in the subset. The
subset containing most significant features is considered as
a selected candidate feature subset. In unsupervised based
technique, the cluster analysis is carried out to identify the
significant features from the training dataset[10−12]
.
2.1.1 Feature selection based on correlation (FS-
Cor)
In this feature subset selection, the entire feature set
F = {f1, f2, · · · , fx} of a training dataset “TD” is sub di-
vided into feature subsets “FSi”. Then, two types of the
correlation measures are calculated on each feature subset
“FSi”. One is the feature-feature correlation that is the
correlation measure among the features present in a fea-
ture subset“FSi”, and the other one is feature-class cor-
relation that is the correlation measure between the indi-
vidual feature and the class value of the training dataset.
These two correlation measures are computed for all the in-
dividual feature subsets of the training dataset. The signif-
icant feature subset is identified based on the comparison
between the feature-feature correlation and feature−class
correlation. If the feature−class correlation value is higher
than the feature−feature correlation value, the correspond-
ing feature subset is selected as a significant feature subset
of the training dataset[13]
.
2.1.2 Feature selection based on consistency mea-
sure (FSCon)
In this feature subset selection, entire feature set “F”
of the training dataset “TD” is subdivided into maximum
possible combinations of feature subsets “FSi” and the con-
sistency measure is calculated for each subset to identify the
significant feature subset. The consistency measure ensures
that the same tuple “Ti” is not present in a different class
“Ci”[5, 14, 15]
.
2.1.3 Feature selection based on χ2
(FSChi)
This is a ranking based feature selection technique. The
χ2
statistical measure is applied on the training dataset
“TD” to identify the significance level of each feature
presents in the training dataset. The χ2
value “CV ” is
computed based on the sum of ratio of the difference be-
tween the observed (oi,j) and expected (eij ) frequencies of
the features fi, fj to the expected frequencies (eij) of the
features fi, fj of the possible instance value combinations
of the features[16]
.
2.1.4 Feature selection based on information gain
(FSInfo)
In this feature selection technique, the information gain
measure is applied on the training dataset to identify the
significant features based on information gain value of the
individual features[10]
in terms of entropy. The entropy
value of each feature of the training dataset “TD” is calcu-
lated and ranked based on the information gain value[17]
.
2.1.5 Feature selection based on gain ratio (FSGai-
ra)
In this feature selection technique, the information gain
ratio GR(f) is calculated for each feature of the training
dataset “TD” to identify the significant feature based on
the information present in the features of the “TD”[17]
.
2.1.6 ReliefF
This feature selection technique selects the significant
features from the training dataset “TD” based on the
weighted probabilistic function w(f) with nearest neighbor
principle. If the nearest neighbors of a tuple “T” belong to
the same class, it is termed as “nearest hit”. If the nearest
neighbors of a tuple “T” belong to a different class, it is
termed as “nearest miss”. The probabilistic weight func-
tion value w(f) is calculated based on “nearest hit” and
“nearest miss” values[18−20]
.
2.1.7 Feature selection based on symmetric uncer-
tainty (FSUnc)
This technique uses the correlation measure to select the
significant feature from the training dataset “TD”. In addi-
tion to that, the symmetric uncertainty “SU” is calculated
using the entropy measure to identify the similarity between
the two features fi and fj
[21, 22]
.
2.1.8 Feature selection based on unsupervised
learning algorithm
Unsupervised learning is formally known as clustering
algorithm. This algorithm groups similar objects with re-
spect to the given criteria like density, distance, etc. The
objects present in a group are highly similar than the out-
liers. This technique is applied for selecting the significant
features from the training dataset. Each unsupervised algo-
rithm has its own advantages and disadvantages that deter-
mine the application of each algorithm. This paper utilizes
expectation maximization (EM) clustering technique to se-
lect the significant features by identifying the independent
features in order to remove the redundant features from a
training dataset “TD”[23−25]
.
3. D. A. A. G. Singh et al. / An Unsupervised Feature Selection Algorithm with Feature Ranking for· · · 513
2.2 Supervised learning algorithm
The supervised learning algorithm builds the predictive
model “PM” by learning the training dataset “TD” to pre-
dict the unlabeled tuple “T”. This model can be built by
various supervised learning techniques such as tree based,
probabilistic and rule based. This paper uses the supervised
learners namely naive bayes (NB), instance based IB1 and
C4.5/J48 supervised learners to evaluate and compare the
performance of the proposed feature selection algorithm in
terms of predictive accuracy and time taken to build the
prediction model with the existing algorithms[26−28]
.
2.2.1 Naive Bayes (NB) classifier
This classifier works based on the Bayesian theory. The
probabilistic function is applied on training dataset “TD”
that contains the features F = {f1, f2, · · ·, fx}, tuple T =
{t1, t2, · · ·, ty} and classes C = {c1, c2, · · ·, cz}. The un-
labeled tuple “UT” is predicted to identify its class Ci
with the probabilistic condition P(Ck|T) > P(Cw|T) where
k = w[29]
.
2.2.2 Decision tree based C4.5/J48 classifier
This tree based classifier constructs the predictive model
using decision tree. Basically, the statistical tool informa-
tion gain is used to learn the dataset and construct the
decision tree. Information gain is computed for each at-
tribute present in the training dataset “TD”. The feature
with higher information value is considered as the root node
and the dataset is divided into further levels. The infor-
mation gain value is computed for all the nodes and this
process is iterated until a single class value is obtained in
all the nodes[30, 31]
.
2.2.3 IB1
This classifier utilizes the nearest neighbor principle to
measure the similarity among the objects to predict the
unlabeled tuple “UT”. The Euclidean distance measure is
used to compute the distance between various tuples present
in the training dataset[32]
.
3 Unsupervised learning with ranking
based feature selection (FSULR)
This algorithm follows a sequence of steps to select the
significant features from the training dataset. Initially, the
training dataset is transposed. Then features are clustered
and the features in each cluster are ranked based on the
χ2
value. Then the threshold value is computed to select
highly significant features from each clustered features. All
the selected features from different clusters are combined
together as candidate significant features selected from the
training dataset.
Algorithm 1. FSULR
Input: Training dataset TD, tuple set T =
{t1, t2, · · · , tl}, feather set F = {f1, f2, · · · , fm}, class la-
bels C = {C1, C2, · · · , Cn} and T, F, C ∈ TD
Output: Selected significant feature subset SSF=
{f1,2 , · · · , fi} ∈ F
Method:
1) Begin
2) Swap (TD)
3) {Return → Swapped TD = SD }// Swap the tuple – T
(row) into feature – F (column)
4) EM Cluster (SD)
5) {Return # of grouped feature sets GF =
{GF1, GF2, GF3, · · ·, GFn}}//GF ∈ F
6) for (L = 1, L ≤ n, L + +)
7) {χ2
(GFL, C)
8) {Return ranked feature members of grouped feature
set RGFK , where K = 1, · · · , n with χ2
value CV }}//
Computing and ranking the χ2
value for the grouped fea-
tures
9) Threshold(CV Max, CV Min)//CV Max, CV Min
are maximum and minimum χ2
values, respectively
10) {Return the value “ϕ”} //ϕ is break-off threshold
value
11) for (M = 1, M ≤ n, M + +)
12) { Break off Feature (RGFM,CVM , ϕ)
13) { Return BFSSK, where K = 1, · · · , n}} //BFSSK
is the break off feature subset based on CV
14) for (N = 1, N ≤ n, N + +)
15) {Return SSF=Union(BFSSN )}//SSF is the se-
lected significant feature subset
16) End.
Phase 1. In this phase, the FSULR algorithm re-
ceives the training dataset “TD” as input with the feature
F = {f1, f2, · · · , fx}, tuples T = {t1t2, · · ·, ty} and class
C = {c1, c2, · · ·, cz}. Then the class C is removed. The
function Swap(·) swaps the features “F” into tuples “T”
and returns swapped dataset “SD” for grouping the fea-
tures “fi”.
Phase 2. In this phase, expectation maximization max-
imal likelihood function[33]
and Gaussian mixture model
as seen in (1) and (2) are used to group the features us-
ing the function EM-Cluster(·). It receives the swapped
dataset “SD” to group the “SD” into grouped feature sets
GF = {GF1, GF2, GF3, · · ·, GFn}.
η(b|Dz) = P(Dz|b) =
m
k=1
P(zk|b) =
m
k=1 l∈L
P(bk|L, b)P (L|b) (1)
where D denotes the dataset containing an unknown pa-
rameter b and known product Z and L is the unseen data.
Expectation maximization is used as distribution function
and this is derived as Gaussian mixture model multivariate
distribution P(z|L, b) as expressed in (2).
P(z|L, b) =
1
(2π)
π
2 Δ l
e− 1
2
(z−λl)Γ −1
l
(z−λl)
. (2)
With this scenario, a value α is randomly picked from
the set “S” with instances “I” from non-continuous set
N = {1, 2, 3, · · · , n}. Let λl be the mean vector of the un-
seen variable and the covariance matrix l(l = 1, 2, · · · , n)
4. 514 International Journal of Automation and Computing 12(5), October 2015
with the discrete distribution P(l|b). The cross-validation
computes the likelihood in correspondence to the value n
to determine n Gaussian components for computing the ex-
pectation maximization. This developed Gaussian mixture
model[34]
is used for clustering the features based on the
level of Gaussian component possessed by the individual
features “fi”[35]
and return the n number of grouped fea-
ture subsets GF = {GF1, GF2, GF3, · · ·, GFn}.
Phase 3. In this phase, the most significant features are
selected from the clustered feature subset by χ2
(·) function.
It receives the clustered feature set GFK with the class fea-
ture C and computes the χ2
value “CV ” for all the features
as shown in (3)[16]
CV =
m
i=1
n
j=1
(oij − eij )
eij
. (3)
The feature “fi” contains distinct tuple values
tv1, tv2,· · ·, tvn. Then, the χ2
value “CV ” is computed for
the features fi and fj . oij is the observed frequency and
eij is the expected frequency of possible tuple values. To
identify the significant features, the features are ranked by
their χ2
value “CV ” and the ranked grouped feature set is
obtained. The Threshold(·) function receives the CV Min
and CV Max values and returns the break-off threshold
value ϕ as shown in (4). The function Break off Feature()
recognizes the ranked grouped feature set RGF with ϕ and
breaks off the top ranked “CV ” up to the value ϕ and
returns them as break off feature subset BFSS.
ϕ = H ×
α − β
2
+ β (4)
where ϕ is break off-threshold value, H is threshold constant
that is equal to 1, α is the maximum χ2
value (CV Max)
of the feature, and β is the minimum χ2
value (CV Min)
of the feature.
Phase 4. This phase combines break off feature subset
BFSS threshold from the clustered features by the func-
tion Union(·) and produces the selected significant feature
subset SSF as the candidate selected features.
4 Experimental setup
Totally, 21 datasets are used to conduct the experiment
with number of features ranging from 2 to 684, number of
instances from 5 to 1728 and number of class labels from 2
to 24 as listed in the Table 1. These datasets are taken from
the Weka software[36]
and UCI Repository[37]
. The perfor-
mance of the proposed algorithm FSULR is analyzed and
compared with the other feature selection algorithms FS-
Cor, FSChi, FSCon, FSGai-ra, ReliefF, FSUnc and FSInfo
with NB, J48 and IB1 classifiers.
5 Experimental procedure, results and
discussion
This experiment is conducted with 21 well known publi-
cally available datasets as shown in the Table 1 by 7 feature
selection algorithms namely FSCor, FSChi, FSCon, FSGai-
ra, ReliefF, FSUnc and FSInfo. Initially, the datasets are
fed as an input to the feature selection algorithms and the
selected features are obtained as output of the feature se-
lection algorithms. These selected features are given to the
Table 1 Datasets
Serial number Dataset Number of features Number of instances Number of classes
1 Breast cancer 9 286 2
2 Contact lenses 24 5 5
3 Credit–g 20 1000 2
4 Diabetes 8 768 2
5 Glass 9 214 7
6 Ionosphere 34 351 2
7 Iris 2D 2 150 3
8 Iris 4 150 3
9 Labor 16 57 2
10 Segment challenge 19 1500 7
11 Soybean 683 36 14
12 Vote 435 17 3
13 Weather nominal 4 14 2
14 Weather numeric 14 5 2
15 Audiology 69 226 24
16 Car 6 1728 4
17 Cylinder bands 39 540 2
18 Dermatology 34 366 3
19 Ecoli 7 336 8
20 Anneal 39 898 6
21 Hay-train 10 373 9
5. D. A. A. G. Singh et al. / An Unsupervised Feature Selection Algorithm with Feature Ranking for· · · 515
NB, J48 and IB1 classifiers and the predictive accuracy and
the time taken to build the predictive model are calculated
with the 10-fold cross validation test mode.
The performance of the proposed FSULR algorithm is
analyzed in terms of predictive accuracy, time to build the
model and the number of features reduced. Fig. 1 expresses
the overall performance of feature selection algorithms in
producing the predictive accuracy for supervised learners
and it is observed that the FSLUR performs better than
all the other feature selection algorithms compared. The
second and third positions are retained by the FSCon and
FSInfo respectively. Fig. 2 shows that the proposed FSULR
achieves better accuracy than the other algorithms com-
pared for NB and IB1 classifiers. The FSCon achieves bet-
ter results for J48 classifier as compared to all the other
classifiers.
Fig. 1 Comparison of overall prediction accuracy in percentage
with respective feature selection algorithm
Fig. 2 Comparison of prediction accuracy in percentage with
respect to the feature selection algorithm and classifier
From Fig.3, it is evident that FSULR takes more time
to build the predictive model compared to all other algo-
rithms. From Fig.4, it is observed that the FSGai-ra and
FSUnc require less time to build the model for NB classi-
fier than other algorithms compared. The FSUnc takes less
time to build model for J48 classifier. The FSGai-ra and
ReliefF consume less time to build the predictive model for
IB1 classifier. Fig.5 exhibits that ReliefF reduces the num-
ber of features significantly than other algorithms compared
and FSCon reduces the least number of features.
Fig. 3 Comparison of overall time taken to build the predictive
model (in second) with respect to the feature selection algorithm
Fig. 4 Comparison of time taken to build the predictive model
(in second) with respect to the feature selection algorithm and
classifier
Fig. 5 Comparison of number of features reduction with respect
to the feature selection algorithm
6 Conclusions and future work
This paper proposed a feature selection algorithm namely
unsupervised learning with ranking based feature selection
algorithm (FSULR). Performance of this algorithm is an-
6. 516 International Journal of Automation and Computing 12(5), October 2015
alyzed in terms of accuracy produced for classifier, time
taken to build predictive model and feature reduction. The
performance of this algorithm is compared with other fea-
ture selection algorithms namely correlation based FSCor
χ2
based FSChi, consistency based FSCon, gain ratio based
FSGai-ra, ReliefF, symmetric uncertainty based FSUnc and
information gain based FSInfo algorithm with NB, J48, IB1
classifiers. The FSULR achieves better prediction accuracy
than the other feature selection algorithms and achieves
higher accuracy for IB1 and NB classifiers compared to the
other feature selection algorithms. The FSULR is also con-
siderably good in reducing the time to build model for IB1
compared to FSGai-ra and ReliefF. FSULR is considerably
good in reducing the time to build model for J48 classi-
fiers compared to FSCon algorithm. The FSULR reduces
the number of features compared to FSCor, FSCon and
FSGai-ra. In future, this work can be extended with other
statistical measures for ranking the features within the clus-
ters.
References
[1] J. Sinno, Q. Y. Pan. A survey on transfer learning. IEEE
Transactions on Knowledge and Data Engineering, vol. 22,
no. 10, pp. 1345–135, 2010.
[2] M. R. Rashedur, R. M. Fazle. Using and comparing different
decision tree classification techniques for mining ICDDR, B
Hospital Surveillance data. Expert Systems with Applica-
tions, vol. 38, no. 9, pp. 11421–11436, 2011.
[3] M. Wasikowski, X. W. Chen. Combating the small sam-
ple class imbalance problem using feature selection. IEEE
Transactions on Knowledge and Data Engineering, vol. 22,
no. 10, pp. 1388–1400, 2010.
[4] Q. B. Song, J. J. Ni, G. T. Wang. A fast clustering-based
feature subset selection algorithm for high-dimensional
data. IEEE Transactions on Knowledge and Data Engineer-
ing, vol. l5, no. 1, pp. 1–14, 2013.
[5] J. F. Artur, A. T. M´ario. Efficient feature selection filters for
high-dimensional data. Pattern Recognition Letters, vol. 33,
no. 13, pp. 1794–1804, 2012.
[6] J. Wu, L. Chen, Y. P. Feng, Z. B. Zheng, M. C. Zhou,
Z. H. Wu. Predicting quality of service for selection by
neighborhood-based collaborative filtering. IEEE Transac-
tions on Systems, Man, and Cybernetics: Systems, vol. 43,
no. 2, pp. 428–439, 2012.
[7] C. P. Hou, F. P. Nie, X. Li, D. Yi, Y. Wu. Joint embedding
learning and sparse regression: A framework for unsuper-
vised feature selection. IEEE Transactions on Cybernetics,
vol. 44, no. 6, pp. 793–804, 2014.
[8] P. Bermejo, L. dela Ossa, J. A. G´amez, J. M. Puerta.
Fast wrapper feature subset selection in high-dimensional
datasets by means of filter re-ranking Original Research
Article. Knowledge-based Systems, vol. 25, no. 1, pp. 35–44,
2012.
[9] S. Atulji, G. Shameek, V. K. Jayaramanb. Hybrid biogeog-
raphy based simultaneous feature selection and MHC class
I peptide binding prediction using support vector machines
and random forests. Journal of Immunological Methods,
vol. 387, no. 1-2, pp. 284–292, 2013.
[10] H. L. Wei, S. Billings. Feature subset selection and ranking
for data dimensionality reduction. IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 29, no. 1,
pp. 162–166, 2007.
[11] X. V. Nguyen, B. James. Comments on supervised fea-
ture selection by clustering using conditional mutual
information-based distances. Pattern Recognition, vol. 46,
no. 4, pp. 1220–1225, 2013.
[12] A. G. Iffat, S. S. Leslie. Feature subset selection in large di-
mensionality domains. Pattern Recognition, vol. 43, no. 1,
pp. 5–13, 2010.
[13] M. Hall. Correlation-based Feature Selection for Machine
Learning, Ph. D dissertation, The University of Waikato,
New Zealond, 1999.
[14] Y. Lei, L. Huan. Efficient feature selection via analysis of
relevance and redundancy. Journal of Machine Learning Re-
search, vol. 5, no. 1, pp. 1205–1224, 2004.
[15] M. Dash, H. Liu, H. Motoda. Consistency based feature se-
lection. In Proceedings of the 4th Pacific Asia Conference
on Knowledge Discovery and Data Mining, Kyoto, Japan,
pp. 98–109, 2000.
[16] H. Peng, L. Fulmi, C. Ding. Feature selection based
on mutual information criteria of max-dependency, max-
relevance, and min-redundancy. IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 27, no. 8,
pp. 1226–1238, 2005.
[17] H. Uˇguz. A two-stage feature selection method for text cat-
egorization by using information gain, principal component
analysis and genetic algorithm. Knowledge-based Systems,
vol. 24, no. 7, pp. 1024–1032, 2011.
[18] M. Robnik-ˇSikonja, I. Kononenko. Theoretical and empir-
ical analysis of ReliefF and RReliefF. Machine Learning,
vol. 53, no. 1-2, pp. 23–69, 2003.
[19] S. Yijun, T. Sinisa, G. Steve. Local-learning-based feature
selection for high-dimensional data analysis. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, vol. 32,
no. 9, pp. 1610–1626, 2010.
[20] W. Peng, S. Cesar, S. Edward. Prediction based on in-
tegration of decisional DNA and a feature selection algo-
rithm RELIEF-F. Cybernetics and Systems, vol. 44, no. 3,
pp. 173–183, 2013.
7. D. A. A. G. Singh et al. / An Unsupervised Feature Selection Algorithm with Feature Ranking for· · · 517
[21] H. W. Liu, J. G. Sun, L. Liu, H. J. Zhang. Feature selec-
tion with dynamic mutual information. Pattern Recogni-
tion, vol. 42, no. 7, pp. 1330–1339, 2009.
[22] M. C. Lee. Using support vector machine with a hybrid fea-
ture selection method to the stock trend prediction. Expert
Systems with Applications, vol. 36, no. 8, pp. 10896–10904,
2009.
[23] P. Mitra, C. A. Murthy, S. K. Pal. Unsupervised feature
selection using feature similarity. IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 24, no. 3,
pp. 301–312, 2002.
[24] J. Handl, J. Knowles. Feature subset selection in unsu-
pervised learning via multi objective optimization. Inter-
national Journal of Computational Intelligence Research,
vol. 2, no. 3, pp. 217–238, 2006.
[25] H. Liu, L. Yu. Toward integrating feature selection algo-
rithms for classification and clustering. IEEE Transactions
on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491–
502, 2005.
[26] S. Garc´ıa, J. Luengo, J. A. S´aez, V. L´oez, F. Herrera. A
survey of discretization techniques: Taxonomy and empir-
ical analysis in supervised learning. IEEE Transactions on
Knowledge and Data Engineering, vol. 25, no. 4, pp. 734–
750, 2013.
[27] S. A. A. Balamurugan, R. Rajaram. Effective and efficient
feature selection for large-scale data using Bayes theorem.
International Journal of Automation and Computing, vol. 6,
no. 1, pp. 62–71, 2009.
[28] J. A. Mangai, V. S. Kumar, S. A. alias Balamurugan.
A novel feature selection framework for automatic web
page classification. International Journal of Automation
and Computing, vol. 9, no. 4, pp. 442–448, 2012.
[29] H. J. Huang, C. N. Hsu. Bayesian classification for data
from the same unknown class. IEEE Transactions on Sys-
tems, Man, and Cybernetics, Part B: Cybernetic, vol. 32,
no. 2, pp. 137–145, 2002.
[30] S. Ruggieri. Efficient C4.5. IEEE Transactions on Knowl-
edge and Data Engineering, vol. 14, no. 2, pp. 438–444,
2002.
[31] P. Kemal, G. Salih. A novel hybrid intelligent method based
on C4.5 decision tree classifier and one-against-all approach
for multi-class classification problems. Expert Systems with
Applications, vol. 36, no. 2, pp. 1587–1592, 2009.
[32] W. W. Cheng, E. H¨ullermeier. Combining instance-based
learning and logistic regression for multilabel classification.
Machine Learning, vol. 76, no. 3, pp. 211–225, 2009.
[33] J. H. Hsiu, P. Saumyadipta, I. L. Tsung. Maximum likeli-
hood inference for mixtures of skew Student-t-normal dis-
tributions through practical EM-type algorithms. Statistics
and Computing, vol. 22, no. 1, pp. 287–299, 2012.
[34] M. H. C. Law, M. A. T. Jain, A. K. Jain. Simultaneous fea-
ture selection and clustering using mixture models. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
vol. 26, no. 9, pp. 1154–1166, 2004.
[35] T. W. Lee, M. S. Lewicki, T. J. Sejnowski. ICA mix-
ture models for unsupervised classification of non-gaussian
classes and automatic context switching in blind signal sep-
aration. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, vol. 22, no. 10, pp. 1078–1089, 2002.
[36] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reute-
mann. The WEKA data mining software: An update. ACM
SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10–18,
2009.
[37] K. Bache, M. Lichman. UCI Machine Learning Repository,
[Online], Available: http://archive.ics.uci.edu/ml, Irvine,
CA: University of California, School of Information and
Computer Science, 2013.
Danasingh Asir Antony Gnana Singh
received the M. Eng. and B. Eng. degrees
from Anna University, India. He is cur-
rently working as a teaching fellow in the
Department of Computer Science and En-
gineering, Bharathidasan Institute of Tech-
nology, Anna University, India.
His research interests include data min-
ing, wireless networks, parallel computing,
mobile computing, computer networks, image processing, soft-
ware engineering, soft computing, teaching learning process and
engineering education.
E-mail: asirantony@gmail.com (Corresponding author)
ORCID iD: 0000-0002-4913-2886
Subramanian Appavu Alias Balamu-
rugan received the Ph. D. degree from
Anna University, India. He is currently
working as a professor and the head of the
Department of Information Technology in
K.L.N College of Information Technology,
India.
His research interests include pattern
recognition, data mining and informatics.
E-mail: app s@yahoo.com
Epiphany Jebamalar Leavline re-
ceived the M. Eng. and B. Eng. degrees
from Anna University, India, and received
the MBA degree from Alagappa Univer-
sity, India. She is currently working as
an assistant professor in the Department of
Electronics and Communication Engineer-
ing, Bharathidasan Institute of Technology,
Anna University, India.
Her research interests include image processing, signal pro-
cessing, VLSI design, data mining, teaching learning process and
engineering education.
E-mail: jebi.lee@gmail.com