Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
Classification problems specified in high dimensional data with smallnumber of observation are generally becoming common in specific microarray data. In the time of last two periods of years, manyefficient classification standard models and also Feature Selection (FS) algorithm which isalso referred as FS technique have basically been proposed for higher prediction accuracies. Although, the outcome of FS algorithm related to predicting accuracy is going to be unstable over the variations in considered trainingset, in high dimensional data. In this paperwe present a latest evaluation measure Q-statistic that includes the stability of the selected feature subset in inclusion to prediction accuracy. Then we are going to propose the standard Booster of a FS algorithm that boosts the basic value of the preferred Q-statistic of the algorithm applied. Therefore study on synthetic data and 14 microarray data sets shows that Booster boosts not only the value of Q-statistics but also the prediction accuracy of the algorithm applied.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
In this paper we have focused on an efficient feature selection method in classification of audio files.
The main objective is feature selection and extraction. We have selected a set of features for further
analysis, which represents the elements in feature vector. By extraction method we can compute a
numerical representation that can be used to characterize the audio using the existing toolbox. In this
study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute
which will separate the tuples into different classes. The pulse clarity is considered as a subjective
measure and it is used to calculate the gain of features of audio files. The splitting criterion is
employed in the application to identify the class or the music genre of a specific audio file from
testing database. Experimental results indicate that by using GR the application can produce a
satisfactory result for music genre classification. After dimensionality reduction best three features
have been selected out of various features of audio file and in this technique we will get more than
90% successful classification result.
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
Feature selection and classification task are an essential process in dealing with large data sets that
comprise numerous number of input attributes. There are many search methods and classifiers that have
been used to find the optimal number of attributes. The aim of this paper is to find the optimal set of
attributes and improve the classification accuracy by adopting ensemble rule classifiers method. Research
process involves 2 phases; finding the optimal set of attributes and ensemble classifiers method for
classification task. Results are in terms of percentage of accuracy and number of selected attributes and
rules generated. 6 datasets were used for the experiment. The final output is an optimal set of attributes
with ensemble rule classifiers method. The experimental results conducted on public real dataset
demonstrate that the ensemble rule classifiers methods consistently show improve classification accuracy
on the selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...csandit
Feature selection is a problem closely related to dimensionality reduction. A commonly used
approach in feature selection is ranking the individual features according to some criteria and
then search for an optimal feature subset based on an evaluation criterion to test the optimality.
The objective of this work is to predict more accurately the presence of Learning Disability
(LD) in school-aged children with reduced number of symptoms. For this purpose, a novel
hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The approach follows
a ranking of the symptoms of LD according to their importance in the data domain. Each
symptoms significance or priority values reflect its relative importance to predict LD among the
various cases. Then by eliminating least significant features one by one and evaluating the
feature subset at each stage of the process, an optimal feature subset is generated. The
experimental results shows the success of the proposed method in removing redundant
attributes efficiently from the LD dataset without sacrificing the classification performance.
A Survey on Classification of Feature Selection Strategiesijtsrd
Feature selection is an important part of machine learning. The Feature selection refers to the process of reducing the inputs for processing and analysis, or of finding the most meaningful inputs. A related term, feature engineering (or feature extraction), refers to the process of extracting useful information or features from existing data. Mining of particular information related to a concept is done on the basis of the feature of the data. The accessing of these features hence for data retrieval can be termed as the feature extraction mechanism. Different type of feature extraction methods is being used. In this paper, the different feature selection methodologies are examined in terms of need and method adopted for feature selection. The three types of method are mainly available, such as Shannons Entropy, Bayesian with K2 Prior and Bayesian Dirichlet with uniform prior (default). The objectives of this survey paper is to identify the existing contribution made by using their above mentioned algorithms and the result obtained. R. Pradeepa | K. Palanivel"A Survey on Classification of Feature Selection Strategies" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-2 , February 2018, URL: http://www.ijtsrd.com/papers/ijtsrd9632.pdf http://www.ijtsrd.com/computer-science/data-miining/9632/a-survey-on-classification-of-feature-selection-strategies/r-pradeepa
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
Classification problems specified in high dimensional data with smallnumber of observation are generally becoming common in specific microarray data. In the time of last two periods of years, manyefficient classification standard models and also Feature Selection (FS) algorithm which isalso referred as FS technique have basically been proposed for higher prediction accuracies. Although, the outcome of FS algorithm related to predicting accuracy is going to be unstable over the variations in considered trainingset, in high dimensional data. In this paperwe present a latest evaluation measure Q-statistic that includes the stability of the selected feature subset in inclusion to prediction accuracy. Then we are going to propose the standard Booster of a FS algorithm that boosts the basic value of the preferred Q-statistic of the algorithm applied. Therefore study on synthetic data and 14 microarray data sets shows that Booster boosts not only the value of Q-statistics but also the prediction accuracy of the algorithm applied.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
In this paper we have focused on an efficient feature selection method in classification of audio files.
The main objective is feature selection and extraction. We have selected a set of features for further
analysis, which represents the elements in feature vector. By extraction method we can compute a
numerical representation that can be used to characterize the audio using the existing toolbox. In this
study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute
which will separate the tuples into different classes. The pulse clarity is considered as a subjective
measure and it is used to calculate the gain of features of audio files. The splitting criterion is
employed in the application to identify the class or the music genre of a specific audio file from
testing database. Experimental results indicate that by using GR the application can produce a
satisfactory result for music genre classification. After dimensionality reduction best three features
have been selected out of various features of audio file and in this technique we will get more than
90% successful classification result.
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
Feature selection and classification task are an essential process in dealing with large data sets that
comprise numerous number of input attributes. There are many search methods and classifiers that have
been used to find the optimal number of attributes. The aim of this paper is to find the optimal set of
attributes and improve the classification accuracy by adopting ensemble rule classifiers method. Research
process involves 2 phases; finding the optimal set of attributes and ensemble classifiers method for
classification task. Results are in terms of percentage of accuracy and number of selected attributes and
rules generated. 6 datasets were used for the experiment. The final output is an optimal set of attributes
with ensemble rule classifiers method. The experimental results conducted on public real dataset
demonstrate that the ensemble rule classifiers methods consistently show improve classification accuracy
on the selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...csandit
Feature selection is a problem closely related to dimensionality reduction. A commonly used
approach in feature selection is ranking the individual features according to some criteria and
then search for an optimal feature subset based on an evaluation criterion to test the optimality.
The objective of this work is to predict more accurately the presence of Learning Disability
(LD) in school-aged children with reduced number of symptoms. For this purpose, a novel
hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The approach follows
a ranking of the symptoms of LD according to their importance in the data domain. Each
symptoms significance or priority values reflect its relative importance to predict LD among the
various cases. Then by eliminating least significant features one by one and evaluating the
feature subset at each stage of the process, an optimal feature subset is generated. The
experimental results shows the success of the proposed method in removing redundant
attributes efficiently from the LD dataset without sacrificing the classification performance.
A Survey on Classification of Feature Selection Strategiesijtsrd
Feature selection is an important part of machine learning. The Feature selection refers to the process of reducing the inputs for processing and analysis, or of finding the most meaningful inputs. A related term, feature engineering (or feature extraction), refers to the process of extracting useful information or features from existing data. Mining of particular information related to a concept is done on the basis of the feature of the data. The accessing of these features hence for data retrieval can be termed as the feature extraction mechanism. Different type of feature extraction methods is being used. In this paper, the different feature selection methodologies are examined in terms of need and method adopted for feature selection. The three types of method are mainly available, such as Shannons Entropy, Bayesian with K2 Prior and Bayesian Dirichlet with uniform prior (default). The objectives of this survey paper is to identify the existing contribution made by using their above mentioned algorithms and the result obtained. R. Pradeepa | K. Palanivel"A Survey on Classification of Feature Selection Strategies" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-2 , February 2018, URL: http://www.ijtsrd.com/papers/ijtsrd9632.pdf http://www.ijtsrd.com/computer-science/data-miining/9632/a-survey-on-classification-of-feature-selection-strategies/r-pradeepa
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...cscpconf
Multi-label spatial classification based on association rules with multi objective genetic
algorithms (MOGA) enriched by semi supervised learning is proposed in this paper. It is to deal
with multiple class labels problem. In this paper we adapt problem transformation for the multi
label classification. We use hybrid evolutionary algorithm for the optimization in the generation
of spatial association rules, which addresses single label. MOGA is used to combine the single
labels into multi labels with the conflicting objectives predictive accuracy and
comprehensibility. Semi supervised learning is done through the process of rule cover
clustering. Finally associative classifier is built with a sorting mechanism. The algorithm is
simulated and the results are compared with MOGA based associative classifier, which out
performs the existing
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit
Attribute reduction and classification task are an essential process in dealing with large data
sets that comprise numerous number of input attributes. There are many search methods and
classifiers that have been used to find the optimal number of attributes. The aim of this paper is
to find the optimal set of attributes and improve the classification accuracy by adopting
ensemble rule classifiers method. Research process involves 2 phases; finding the optimal set of
attributes and ensemble classifiers method for classification task. Results are in terms of
percentage of accuracy and number of selected attributes and rules generated. 6 datasets were
used for the experiment. The final output is an optimal set of attributes with ensemble rule
classifiers method. The experimental results conducted on public real dataset demonstrate that
the ensemble rule classifiers methods consistently show improve classification accuracy on the
selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Survey on semi supervised classification methods and feature selectioneSAT Journals
Abstract Data mining also called knowledge discovery is a process of analyzing data from several perspective and summarize it into useful information. It has tremendous application in the area of classification like pattern recognition, discovering several disease type, analysis of medical image, recognizing speech, for identifying biometric, drug discovery etc. This is a survey based on several semisupervised classification method used by classifiers , in this both labeled and unlabeled data can be used for classification purpose.It is less expensive than other classification methods . Different techniques surveyed in this paper are low density separation approach, transductive SVM, semi-supervised based logistic discriminate procedure, self training nearest neighbour rule using cut edges, self training nearest neighbour rule using cut edges. Along with classification methods a review about various feature selection methods is also mentioned in this paper. Feature selection is performed to reduce the dimension of large dataset. After reducing attribute the data is given for classification hence the accuracy and performance of classification system can be improved .Several feature selection method include consistency based feature selection, fuzzy entropy measure feature selection with similarity classifier, Signal to noise ratio,Positive approximation. So each method has several benefits. Index Terms: Semisupervised classification, Transductive support vector machine, Feature selection, unlabeled samples
The feature selection or extraction is the most important task in Opinion mining and Sentimental Analysis
(OSMA) for calculating the polarity score. These scores are used to determine the positive, negative, and
neutral polarity about the product, user reviews, user comments, and etc., in social media for the purpose
of decision making and Business Intelligence to individuals or organizations. In this paper, we have
performed an experimental study for different feature extraction or selection techniques available for
opinion mining task. This experimental study is carried out in four stages. First, the data collection process
has been done from readily available sources. Second, the pre-processing techniques are applied
automatically using the tools to extract the terms, POS (Parts-of-Speech). Third, different feature selection
or extraction techniques are applied over the content. Finally, the empirical study is carried out for
analyzing the sentiment polarity with different features.
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONIJDKP
A lot of classification algorithms are available in the area of data mining for solving the same kind of problem with a little guidance for recommending the most appropriate algorithm to use which gives best results for the dataset at hand. As a way of optimizing the chances of recommending the most appropriate classification algorithm for a dataset, this paper focuses on the different factors considered by data miners and researchers in different studies when selecting the classification algorithms that will yield desired knowledge for the dataset at hand. The paper divided the factors affecting classification algorithms recommendation into business and technical factors. The technical factors proposed are measurable and can be exploited by recommendation software tools.
Feature selection for multiple water quality status: integrated bootstrapping...IJECEIAES
STORET is one method to determine the river water quality, and to classify them into four classes (very good, good, medium and bad) based on the data of water for each attribute or feature. The success of the formation of pattern recognition model much depends on the quality of data. There are two issues as the concern of this research as follows, the data having disproportionate amount among the classes (imbalance class) and the finding of noise on its attribute. Therefore, this research integrates the SMOTE Technique and bootstrapping to handle the problem of imbalance class. While an experiment is conducted to eliminate the noise on the attribute by using some feature selection algorithms with filter approach (information gain, rule, derivation, correlation and chi square). This research has some stages as follows: data understanding, pre-processing, imbalance class, feature selection, classification and performance evaluation. Based on the result of testing using 10-fold cross validation, it shows that the use of the SMOTE-bootstrapping technique is able to increase the accuracy from 83.3% to be 98.8%. While the process of noise elimination onthe data attribute is also able to increase the accuracy to be 99.5% (the use of feature subset produced by the information gain algorithm and the decision tree classification algorithm).
Effective Feature Selection for Feature Possessing Group Structurerahulmonikasharma
Feature selection has become an interesting research topic in recent years. It is an effective method to tackle the data with high dimension. The underlying structure has been ignored by the previous feature selection method and it determines the feature individually. Considering this we focus on the problem where feature possess some group structure. To solve this problem we present group feature selection method at group level to execute feature selection. Its objective is to execute the feature selection in within the group and between the group of features that select discriminative features and remove redundant features to obtain optimal subset. We demonstrate our method on data sets and perform the task to achieve classification accuracy.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
Integrated bio-search approaches with multi-objective algorithms for optimiza...TELKOMNIKA JOURNAL
Optimal selection of features is very difficult and crucial to achieve, particularly for the task of classification. It is due to the traditional method of selecting features that function independently and generated the collection of irrelevant features, which therefore affects the quality of the accuracy of the classification. The goal of this paper is to leverage the potential of bio-inspired search algorithms, together with wrapper, in optimizing multi-objective algorithms, namely ENORA and NSGA-II to generate an optimal set of features. The main steps are to idealize the combination of ENORA and NSGA-II with suitable bio-search algorithms where multiple subset generation has been implemented. The next step is to validate the optimum feature set by conducting a subset evaluation. Eight (8) comparison datasets of various sizes have been deliberately selected to be checked. Results shown that the ideal combination of multi-objective algorithms, namely ENORA and NSGA-II, with the selected bio-inspired search algorithm is promising to achieve a better optimal solution (i.e. a best features with higher classification accuracy) for the selected datasets. This discovery implies that the ability of bio-inspired wrapper/filtered system algorithms will boost the efficiency of ENORA and NSGA-II for the task of selecting and classifying features.
DEVELOPING A CASE-BASED RETRIEVAL SYSTEM FOR SUPPORTING IT CUSTOMERSIJCSEA Journal
The case-based reasoning (CBR) approach is a modern approach. It is adopted for designing knowledgebased expertise systems. The aforementioned approach depends much on the stored experiences. These experiences serve as cases that can be employed for solving new problems. That is done through retrieving similar cases from the system and utilizing their solutions. The latter approach aims at solving problems through reviewing, processing and applying their experiences. The present study aimed to shed a light on a CBR application. That is done to develop a new system for assisting Internet users and solving the problems faced by those users. In addition, the present study focuses on the cases retrieval stage. It aimed at designing and building an experienced inquiry system for solving any problem that internet users might face when using a case-based reasoning (CBR) dialogue. That shall enable internet users to solve the problems faced when using the Internet. The system that was developed by the researcher operates through displaying the similar cases through a dialogue. That is done through using a well-developed algorithm and reviewing the relevant previous studies. It was found that the success rate of the proposed system is high.
A Customizable Model of Head-Related Transfer Functions Based on Pinna Measur...Waqas Tariq
This paper proposes a method to model Head-Related Transfer Functions (HRTFs) based on the shape and size of the outer ear. Using signal processing tools, such as Prony’s signal modeling method, a dynamic model of the pinna has been obtained, that completes the structural model of HRTFs used for digital audio spatialization. Listening tests conducted on 10 subjects showed that HRTFs created using this pinna model were 5% more effective than generic HRTFs in the frontal plane. This model has been able to reduce the computational and storage demands of audio spatialization, while preserving a sufficient number of perceptually relevant spectral cues.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...cscpconf
Multi-label spatial classification based on association rules with multi objective genetic
algorithms (MOGA) enriched by semi supervised learning is proposed in this paper. It is to deal
with multiple class labels problem. In this paper we adapt problem transformation for the multi
label classification. We use hybrid evolutionary algorithm for the optimization in the generation
of spatial association rules, which addresses single label. MOGA is used to combine the single
labels into multi labels with the conflicting objectives predictive accuracy and
comprehensibility. Semi supervised learning is done through the process of rule cover
clustering. Finally associative classifier is built with a sorting mechanism. The algorithm is
simulated and the results are compared with MOGA based associative classifier, which out
performs the existing
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit
Attribute reduction and classification task are an essential process in dealing with large data
sets that comprise numerous number of input attributes. There are many search methods and
classifiers that have been used to find the optimal number of attributes. The aim of this paper is
to find the optimal set of attributes and improve the classification accuracy by adopting
ensemble rule classifiers method. Research process involves 2 phases; finding the optimal set of
attributes and ensemble classifiers method for classification task. Results are in terms of
percentage of accuracy and number of selected attributes and rules generated. 6 datasets were
used for the experiment. The final output is an optimal set of attributes with ensemble rule
classifiers method. The experimental results conducted on public real dataset demonstrate that
the ensemble rule classifiers methods consistently show improve classification accuracy on the
selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Survey on semi supervised classification methods and feature selectioneSAT Journals
Abstract Data mining also called knowledge discovery is a process of analyzing data from several perspective and summarize it into useful information. It has tremendous application in the area of classification like pattern recognition, discovering several disease type, analysis of medical image, recognizing speech, for identifying biometric, drug discovery etc. This is a survey based on several semisupervised classification method used by classifiers , in this both labeled and unlabeled data can be used for classification purpose.It is less expensive than other classification methods . Different techniques surveyed in this paper are low density separation approach, transductive SVM, semi-supervised based logistic discriminate procedure, self training nearest neighbour rule using cut edges, self training nearest neighbour rule using cut edges. Along with classification methods a review about various feature selection methods is also mentioned in this paper. Feature selection is performed to reduce the dimension of large dataset. After reducing attribute the data is given for classification hence the accuracy and performance of classification system can be improved .Several feature selection method include consistency based feature selection, fuzzy entropy measure feature selection with similarity classifier, Signal to noise ratio,Positive approximation. So each method has several benefits. Index Terms: Semisupervised classification, Transductive support vector machine, Feature selection, unlabeled samples
The feature selection or extraction is the most important task in Opinion mining and Sentimental Analysis
(OSMA) for calculating the polarity score. These scores are used to determine the positive, negative, and
neutral polarity about the product, user reviews, user comments, and etc., in social media for the purpose
of decision making and Business Intelligence to individuals or organizations. In this paper, we have
performed an experimental study for different feature extraction or selection techniques available for
opinion mining task. This experimental study is carried out in four stages. First, the data collection process
has been done from readily available sources. Second, the pre-processing techniques are applied
automatically using the tools to extract the terms, POS (Parts-of-Speech). Third, different feature selection
or extraction techniques are applied over the content. Finally, the empirical study is carried out for
analyzing the sentiment polarity with different features.
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONIJDKP
A lot of classification algorithms are available in the area of data mining for solving the same kind of problem with a little guidance for recommending the most appropriate algorithm to use which gives best results for the dataset at hand. As a way of optimizing the chances of recommending the most appropriate classification algorithm for a dataset, this paper focuses on the different factors considered by data miners and researchers in different studies when selecting the classification algorithms that will yield desired knowledge for the dataset at hand. The paper divided the factors affecting classification algorithms recommendation into business and technical factors. The technical factors proposed are measurable and can be exploited by recommendation software tools.
Feature selection for multiple water quality status: integrated bootstrapping...IJECEIAES
STORET is one method to determine the river water quality, and to classify them into four classes (very good, good, medium and bad) based on the data of water for each attribute or feature. The success of the formation of pattern recognition model much depends on the quality of data. There are two issues as the concern of this research as follows, the data having disproportionate amount among the classes (imbalance class) and the finding of noise on its attribute. Therefore, this research integrates the SMOTE Technique and bootstrapping to handle the problem of imbalance class. While an experiment is conducted to eliminate the noise on the attribute by using some feature selection algorithms with filter approach (information gain, rule, derivation, correlation and chi square). This research has some stages as follows: data understanding, pre-processing, imbalance class, feature selection, classification and performance evaluation. Based on the result of testing using 10-fold cross validation, it shows that the use of the SMOTE-bootstrapping technique is able to increase the accuracy from 83.3% to be 98.8%. While the process of noise elimination onthe data attribute is also able to increase the accuracy to be 99.5% (the use of feature subset produced by the information gain algorithm and the decision tree classification algorithm).
Effective Feature Selection for Feature Possessing Group Structurerahulmonikasharma
Feature selection has become an interesting research topic in recent years. It is an effective method to tackle the data with high dimension. The underlying structure has been ignored by the previous feature selection method and it determines the feature individually. Considering this we focus on the problem where feature possess some group structure. To solve this problem we present group feature selection method at group level to execute feature selection. Its objective is to execute the feature selection in within the group and between the group of features that select discriminative features and remove redundant features to obtain optimal subset. We demonstrate our method on data sets and perform the task to achieve classification accuracy.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
Integrated bio-search approaches with multi-objective algorithms for optimiza...TELKOMNIKA JOURNAL
Optimal selection of features is very difficult and crucial to achieve, particularly for the task of classification. It is due to the traditional method of selecting features that function independently and generated the collection of irrelevant features, which therefore affects the quality of the accuracy of the classification. The goal of this paper is to leverage the potential of bio-inspired search algorithms, together with wrapper, in optimizing multi-objective algorithms, namely ENORA and NSGA-II to generate an optimal set of features. The main steps are to idealize the combination of ENORA and NSGA-II with suitable bio-search algorithms where multiple subset generation has been implemented. The next step is to validate the optimum feature set by conducting a subset evaluation. Eight (8) comparison datasets of various sizes have been deliberately selected to be checked. Results shown that the ideal combination of multi-objective algorithms, namely ENORA and NSGA-II, with the selected bio-inspired search algorithm is promising to achieve a better optimal solution (i.e. a best features with higher classification accuracy) for the selected datasets. This discovery implies that the ability of bio-inspired wrapper/filtered system algorithms will boost the efficiency of ENORA and NSGA-II for the task of selecting and classifying features.
DEVELOPING A CASE-BASED RETRIEVAL SYSTEM FOR SUPPORTING IT CUSTOMERSIJCSEA Journal
The case-based reasoning (CBR) approach is a modern approach. It is adopted for designing knowledgebased expertise systems. The aforementioned approach depends much on the stored experiences. These experiences serve as cases that can be employed for solving new problems. That is done through retrieving similar cases from the system and utilizing their solutions. The latter approach aims at solving problems through reviewing, processing and applying their experiences. The present study aimed to shed a light on a CBR application. That is done to develop a new system for assisting Internet users and solving the problems faced by those users. In addition, the present study focuses on the cases retrieval stage. It aimed at designing and building an experienced inquiry system for solving any problem that internet users might face when using a case-based reasoning (CBR) dialogue. That shall enable internet users to solve the problems faced when using the Internet. The system that was developed by the researcher operates through displaying the similar cases through a dialogue. That is done through using a well-developed algorithm and reviewing the relevant previous studies. It was found that the success rate of the proposed system is high.
A Customizable Model of Head-Related Transfer Functions Based on Pinna Measur...Waqas Tariq
This paper proposes a method to model Head-Related Transfer Functions (HRTFs) based on the shape and size of the outer ear. Using signal processing tools, such as Prony’s signal modeling method, a dynamic model of the pinna has been obtained, that completes the structural model of HRTFs used for digital audio spatialization. Listening tests conducted on 10 subjects showed that HRTFs created using this pinna model were 5% more effective than generic HRTFs in the frontal plane. This model has been able to reduce the computational and storage demands of audio spatialization, while preserving a sufficient number of perceptually relevant spectral cues.
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAIJCI JOURNAL
This paper proposes a new method that intends on reducing the size of high dimensional dataset by
identifying and removing irrelevant and redundant features. Dataset reduction is important in the case of
machine learning and data mining. The measure of dependence is used to evaluate the relationship
between feature and target concept and or between features for irrelevant and redundant feature removal.
The proposed work initially removes all the irrelevant features and then a minimum spanning tree of
relevant features is constructed using Prim’s algorithm. Splitting the minimum spanning tree based on the
dependency between features leads to the generation of forests. A representative feature from each of the
forests is taken to form the final feature subset
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILEScscpconf
In this paper we have focused on an efficient feature selection method in classification of audio files.
The main objective is feature selection and extraction. We have selected a set of features for further
analysis, which represents the elements in feature vector. By extraction method we can compute a
numerical representation that can be used to characterize the audio using the existing toolbox. In this
study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute
which will separate the tuples into different classes. The pulse clarity is considered as a subjective
measure and it is used to calculate the gain of features of audio files. The splitting criterion is
employed in the application to identify the class or the music genre of a specific audio file from
testing database. Experimental results indicate that by using GR the application can produce a
satisfactory result for music genre classification. After dimensionality reduction best three features
have been selected out of various features of audio file and in this technique we will get more than
90% successful classification result.
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Enhancing feature selection with a novel hybrid approach incorporating geneti...IJECEIAES
Computing advances in data storage are leading to rapid growth in large-scale datasets. Using all features increases temporal/spatial complexity and negatively influences performance. Feature selection is a fundamental stage in data preprocessing, removing redundant and irrelevant features to minimize the number of features and enhance the performance of classification accuracy. Numerous optimization algorithms were employed to handle feature selection (FS) problems, and they outperform conventional FS techniques. However, there is no metaheuristic FS method that outperforms other optimization algorithms in many datasets. This motivated our study to incorporate the advantages of various optimization techniques to obtain a powerful technique that outperforms other methods in many datasets from different domains. In this article, a novel combined method GASI is developed using swarm intelligence (SI) based feature selection techniques and genetic algorithms (GA) that uses a multi-objective fitness function to seek the optimal subset of features. To assess the performance of the proposed approach, seven datasets have been collected from the UCI repository and exploited to test the newly established feature selection technique. The experimental results demonstrate that the suggested method GASI outperforms many powerful SI-based feature selection techniques studied. GASI obtains a better average fitness value and improves classification performance.
Feature selection is a problem closely related to dimensionality reduction. A commonly used
approach in feature selection is ranking the individual features according to some criteria and
then search for an optimal feature subset based on an evaluation criterion to test the optimality.
The objective of this work is to predict more accurately the presence of Learning Disability
(LD) in school-aged children with reduced number of symptoms. For this purpose, a novel
hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The approach follows
a ranking of the symptoms of LD according to their importance in the data domain. Each
symptoms significance or priority values reflect its relative importance to predict LD among the
various cases. Then by eliminating least significant features one by one and evaluating the
feature subset at each stage of the process, an optimal feature subset is generated. The
experimental results shows the success of the proposed method in removing redundant
attributes efficiently from the LD dataset without sacrificing the classification performance.
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Similar to Unsupervised Feature Selection Based on the Distribution of Features Attributed to Imbalanced Data Sets (20)
The Use of Java Swing’s Components to Develop a WidgetWaqas Tariq
Widget is a kind of application provides a single service such as a map, news feed, simple clock, battery-life indicators, etc. This kind of interactive software object has been developed to facilitate user interface (UI) design. A user interface (UI) function may be implemented using different widgets with the same function. In this article, we present the widget as a platform that is generally used in various applications, such as in desktop, web browser, and mobile phone. We also describe a visual menu of Java Swing’s components that will be used to establish widget. It will assume that we have successfully compiled and run a program that uses Swing components.
3D Human Hand Posture Reconstruction Using a Single 2D ImageWaqas Tariq
Passive sensing of the 3D geometric posture of the human hand has been studied extensively over the past decade. However, these research efforts have been hampered by the computational complexity caused by inverse kinematics and 3D reconstruction. In this paper, our objective focuses on 3D hand posture estimation based on a single 2D image with aim of robotic applications. We introduce the human hand model with 27 degrees of freedom (DOFs) and analyze some of its constraints to reduce the DOFs without any significant degradation of performance. A novel algorithm to estimate the 3D hand posture from eight 2D projected feature points is proposed. Experimental results using real images confirm that our algorithm gives good estimates of the 3D hand pose. Keywords: 3D hand posture estimation; Model-based approach; Gesture recognition; human- computer interface; machine vision.
Camera as Mouse and Keyboard for Handicap Person with Troubleshooting Ability...Waqas Tariq
Camera mouse has been widely used for handicap person to interact with computer. The utmost important of the use of camera mouse is must be able to replace all roles of typical mouse and keyboard. It must be able to provide all mouse click events and keyboard functions (include all shortcut keys) when it is used by handicap person. Also, the use of camera mouse must allow users troubleshooting by themselves. Moreover, it must be able to eliminate neck fatigue effect when it is used during long period. In this paper, we propose camera mouse system with timer as left click event and blinking as right click event. Also, we modify original screen keyboard layout by add two additional buttons (button “drag/ drop” is used to do drag and drop of mouse events and another button is used to call task manager (for troubleshooting)) and change behavior of CTRL, ALT, SHIFT, and CAPS LOCK keys in order to provide shortcut keys of keyboard. Also, we develop recovery method which allows users go from camera and then come back again in order to eliminate neck fatigue effect. The experiments which involve several users have been done in our laboratory. The results show that the use of our camera mouse able to allow users do typing, left and right click events, drag and drop events, and troubleshooting without hand. By implement this system, handicap person can use computer more comfortable and reduce the dryness of eyes.
A Proposed Web Accessibility Framework for the Arab DisabledWaqas Tariq
The Web is providing unprecedented access to information and interaction for people with disabilities. This paper presents a Web accessibility framework which offers the ease of the Web accessing for the disabled Arab users and facilitates their lifelong learning as well. The proposed framework system provides the disabled Arab user with an easy means of access using their mother language so they don’t have to overcome the barrier of learning the target-spoken language. This framework is based on analyzing the web page meta-language, extracting its content and reformulating it in a suitable format for the disabled users. The basic objective of this framework is supporting the equal rights of the Arab disabled people for their access to the education and training with non disabled people. Key Words : Arabic Moon code, Arabic Sign Language, Deaf, Deaf-blind, E-learning Interactivity, Moon code, Web accessibility , Web framework , Web System, WWW.
Real Time Blinking Detection Based on Gabor FilterWaqas Tariq
New method of blinking detection is proposed. The utmost important of blinking detections method is robust against different users, noise, and also change of eye shape. In this paper, we propose blinking detections method by measuring of distance between two arcs of eye (upper part and lower part). We detect eye arcs by apply Gabor filter onto eye image. As we know that Gabor filter has advantage on image processing application since it able to extract spatial localized spectral features, such line, arch, and other shape are more easily detected. After two of eye arcs are detected, we measure the distance between both by using connected labeling method. The open eye is marked by the distance between two arcs is more than threshold and otherwise, the closed eye is marked by the distance less than threshold. The experiment result shows that our proposed method robust enough against different users, noise, and eye shape changes with perfectly accuracy.
Computer Input with Human Eyes-Only Using Two Purkinje Images Which Works in ...Waqas Tariq
A method for computer input with human eyes-only using two Purkinje images which works in a real time basis without calibration is proposed. Experimental results shows that cornea curvature can be estimated by using two light sources derived Purkinje images so that no calibration for reducing person-to-person difference of cornea curvature. It is found that the proposed system allows usersf movements of 30 degrees in roll direction and 15 degrees in pitch direction utilizing detected face attitude which is derived from the face plane consisting three feature points on the face, two eyes and nose or mouth. Also it is found that the proposed system does work in a real time basis.
Toward a More Robust Usability concept with Perceived Enjoyment in the contex...Waqas Tariq
Mobile multimedia service is relatively new but has quickly dominated people¡¯s lives, especially among young people. To explain this popularity, this study applies and modifies the Technology Acceptance Model (TAM) to propose a research model and conduct an empirical study. The goal of study is to examine the role of Perceived Enjoyment (PE) and what determinants can contribute to PE in the context of using mobile multimedia service. The result indicates that PE is influencing on Perceived Usefulness (PU) and Perceived Ease of Use (PEOU) and directly Behavior Intention (BI). Aesthetics and flow are key determinants to explain Perceived Enjoyment (PE) in mobile multimedia usage.
Collaborative Learning of Organisational KnolwedgeWaqas Tariq
This paper presents recent research into methods used in Australian Indigenous Knowledge sharing and looks at how these can support the creation of suitable collaborative envi- ronments for timely organisational learning. The protocols and practices as used today and in the past by Indigenous communities are presented and discussed in relation to their relevance to a personalised system of knowledge sharing in modern organisational cultures. This research focuses on user models, knowledge acquisition and integration of data for constructivist learning in a networked repository of or- ganisational knowledge. The data collected in the repository is searched to provide collections of up-to-date and relevant material for training in a work environment. The aim is to improve knowledge collection and sharing in a team envi- ronment. This knowledge can then be collated into a story or workflow that represents the present knowledge in the organisation.
Our research aims to propose a global approach for specification, design and verification of context awareness Human Computer Interface (HCI). This is a Model Based Design approach (MBD). This methodology describes the ubiquitous environment by ontologies. OWL is the standard used for this purpose. The specification and modeling of Human-Computer Interaction are based on Petri nets (PN). This raises the question of representation of Petri nets with XML. We use for this purpose, the standard of modeling PNML. In this paper, we propose an extension of this standard for specification, generation and verification of HCI. This extension is a methodological approach for the construction of PNML with Petri nets. The design principle uses the concept of composition of elementary structures of Petri nets as PNML Modular. The objective is to obtain a valid interface through verification of properties of elementary Petri nets represented with PNML.
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 BoardWaqas Tariq
The main aim of this paper is to build a system that is capable of detecting and recognizing the hand gesture in an image captured by using a camera. The system is built based on Altera’s FPGA DE2 board, which contains a Nios II soft core processor. Image processing techniques and a simple but effective algorithm are implemented to achieve this purpose. Image processing techniques are used to smooth the image in order to ease the subsequent processes in translating the hand sign signal. The algorithm is built for translating the numerical hand sign signal and the result are displayed on the seven segment display. Altera’s Quartus II, SOPC Builder and Nios II EDS software are used to construct the system. By using SOPC Builder, the related components on the DE2 board can be interconnected easily and orderly compared to traditional method that requires lengthy source code and time consuming. Quartus II is used to compile and download the design to the DE2 board. Then, under Nios II EDS, C programming language is used to code the hand sign translation algorithm. Being able to recognize the hand sign signal from images can helps human in controlling a robot and other applications which require only a simple set of instructions provided a CMOS sensor is included in the system.
An overview on Advanced Research Works on Brain-Computer InterfaceWaqas Tariq
A brain–computer interface (BCI) is a proficient result in the research field of human- computer synergy, where direct articulation between brain and an external device occurs resulting in augmenting, assisting and repairing human cognitive. Advanced works like generating brain-computer interface switch technologies for intermittent (or asynchronous) control in natural environments or developing brain-computer interface by Fuzzy logic Systems or by implementing wavelet theory to drive its efficacies are still going on and some useful results has also been found out. The requirements to develop this brain machine interface is also growing day by day i.e. like neuropsychological rehabilitation, emotion control, etc. An overview on the control theory and some advanced works on the field of brain machine interface are shown in this paper.
Exploring the Relationship Between Mobile Phone and Senior Citizens: A Malays...Waqas Tariq
There is growing ageing phenomena with the rise of ageing population throughout the world. According to the World Health Organization (2002), the growing ageing population indicates 694 million, or 223% is expected for people aged 60 and over, since 1970 and 2025.The growth is especially significant in some advanced countries such as North America, Japan, Italy, Germany, United Kingdom and so forth. This growing older adult population has significantly impact the social-culture, lifestyle, healthcare system, economy, infrastructure and government policy of a nation. However, there are limited research studies on the perception and usage of a mobile phone and its service for senior citizens in a developing nation like Malaysia. This paper explores the relationship between mobile phones and senior citizens in Malaysia from the perspective of a developing country. We conducted an exploratory study using contextual interviews with 5 senior citizens of how they perceive their mobile phones. This paper reveals 4 interesting themes from this preliminary study, in addition to the findings of the desirable mobile requirements for local senior citizens with respect of health, safety and communication purposes. The findings of this study bring interesting insight to local telecommunication industries as a whole, and will also serve as groundwork for more in-depth study in the future.
Principles of Good Screen Design in WebsitesWaqas Tariq
Visual techniques for proper arrangement of the elements on the user screen have helped the designers to make the screen look good and attractive. Several visual techniques emphasize the arrangement and ordering of the screen elements based on particular criteria for best appearance of the screen. This paper investigates few significant visual techniques in various web user interfaces and showcases the results for better understanding and their presence.
Virtual teams are used more and more by companies and other organizations to receive benefits. They are a great way to enable teamwork in situations where people are not sitting in the same physical place at the same time. As companies seek to increase the use of virtual teams, a need exists to explore the context of these teams, the virtuality of a team and software that may help these teams working virtualy. Virtual teams have the same basic principles as traditional teams, but there is one big difference. This difference is the way the team members communicate. Instead of using the dynamics of in-office face-to-face exchange, they now rely on special communication channels enabled by modern technologies, such as e-mails, faxes, phone calls and teleconferences, virtual meetings etc. This is why this paper is focused on the issues regarding virtual teams, and how these teams are created and progressing in Albania.
Cognitive Approach Towards the Maintenance of Web-Sites Through Quality Evalu...Waqas Tariq
It is a well established fact that the Web-Applications require frequent maintenance because of cutting– edge business competitions. The authors have worked on quality evaluation of web-site of Indian ecommerce domain. As a result of that work they have made a quality-wise ranking of these sites. According to their work and also the survey done by various other groups Futurebazaar web-site is considered to be one of the best Indian e-shopping sites. In this research paper the authors are assessing the maintenance of the same site by incorporating the problems incurred during this evaluation. This exercise gives a real world maintainability problem of web-sites. This work will give a clear picture of all the quality metrics which are directly or indirectly related with the maintainability of the web-site.
USEFul: A Framework to Mainstream Web Site Usability through Automated Evalua...Waqas Tariq
A paradox has been observed whereby web site usability is proven to be an essential element in a web site, yet at the same time there exist an abundance of web pages with poor usability. This discrepancy is the result of limitations that are currently preventing web developers in the commercial sector from producing usable web sites. In this paper we propose a framework whose objective is to alleviate this problem by automating certain aspects of the usability evaluation process. Mainstreaming comes as a result of automation, therefore enabling a non-expert in the field of usability to conduct the evaluation. This results in reducing the costs associated with such evaluation. Additionally, the framework allows the flexibility of adding, modifying or deleting guidelines without altering the code that references them since the guidelines and the code are two separate components. A comparison of the evaluation results carried out using the framework against published evaluations of web sites carried out by web site usability professionals reveals that the framework is able to automatically identify the majority of usability violations. Due to the consistency with which it evaluates, it identified additional guideline-related violations that were not identified by the human evaluators.
Robot Arm Utilized Having Meal Support System Based on Computer Input by Huma...Waqas Tariq
A robot arm utilized having meal support system based on computer input by human eyes only is proposed. The proposed system is developed for handicap/disabled persons as well as elderly persons and tested with able persons with several shapes and size of eyes under a variety of illumination conditions. The test results with normal persons show the proposed system does work well for selection of the desired foods and for retrieve the foods as appropriate as usersf requirements. It is found that the proposed system is 21% much faster than the manually controlled robotics.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Unsupervised Feature Selection Based on the Distribution of Features Attributed to Imbalanced Data Sets
1. Mina Alibeigi, Sattar Hashemi & Ali Hamzeh
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (1) : 2011 14
Unsupervised Feature Selection Based on the Distribution of
Features Attributed to Imbalanced Data Sets
Mina Alibeigi alibeigi@cse.shirazu.ac.ir
Computer Science and Engineering Department
Shiraz University
Shiraz, 71348-51154, Iran
Sattar Hashemi s_hashemi@shirazu.ac.ir
Computer Science and Engineering Department
Shiraz University
Shiraz, 71348-51154, Iran
Ali Hamzeh hamzeh@shirazu.ac.ir
Computer Science and Engineering Department
Shiraz University
Shiraz, 71348-51154, Iran
Abstract
Since dealing with high dimensional data is computationally complex and sometimes even
intractable, recently several feature reduction methods have been developed to reduce the
dimensionality of the data in order to simplify the calculation analysis in various applications such
as text categorization, signal processing, image retrieval and gene expressions among many
others. Among feature reduction techniques, feature selection is one of the most popular methods
due to the preservation of the original meaning of features. However, most of the current feature
selection methods do not have a good performance when fed on imbalanced data sets which are
pervasive in real world applications.
In this paper, we propose a new unsupervised feature selection method attributed to imbalanced
data sets, which will remove redundant features from the original feature space based on the
distribution of features. To show the effectiveness of the proposed method, popular feature
selection methods have been implemented and compared. Experimental results on the several
imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of the
proposed method in comparison with other rival methods in terms of both AUC and F1
performance measures of 1-Nearest Neighbor and Naïve Bayes classifiers and the percent of the
selected features.
Keywords: Feature, Feature Selection, Filter Approach, Imbalanced Data Sets.
1. INTRODUCTION
Since data mining is capable of finding new useful information from data sets, it has been widely
applied in various domains such as pattern recognition, decision support systems, signal
processing, financial forecasts and etc [1]. However by the appearance of the internet, data sets
are getting larger and larger which may lead to traditional data mining and machine learning
algorithms to do slowly and not efficiently. One of the key solutions to solve this problem is to
reduce the amount of data by sampling methods [2], [3]. But in many applications, the number of
instances in the data set is not too large, whereas the number of features in these data sets is
more than one thousands or even more. In this case, sampling is not a good choice.
Theoretically, having more features, the discrimination power will be higher in classification.
However, this theory is not always true in reality since some features may be unimportant to
predict the class labels or even be irrelevant [4], [5]. Since many factors such as the quality of the
2. Mina Alibeigi, Sattar Hashemi & Ali Hamzeh
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (1) : 2011 15
data, are responsible in the success of a learning algorithm, in order to extract information more
efficiently, the data set should not contains irrelevant, noisy or redundant features [6].
Furthermore, high dimensionality of data set may cause the “curse of dimensionality” problem [7].
Feature reduction (dimensionality reduction) methods are one of the key solutions to all these
problems.
Feature reduction refers to the problem of reducing the dimension by which the data set is
described [8]. The general purpose of these methods is to represent data set with fewer features
to reduce the computational complexity whereas preserving or even improving the discriminative
capability [8]. Since feature reduction can brings a lot of advantages to learning algorithms, such
as avoiding over-fitting and robustness in the presence of noise as well as higher accuracy, it has
attracted a lot of attention in the three last decades. Therefore, vast variety of feature reduction
methods suggested which are totally divided into two major categories including feature
extraction and feature subset selection. Feature extraction techniques project data into a new
reduced subspace in which the initial meaning of the features are not kept any more. Some of the
well-known state-of-the-art feature extraction methods are principal component analysis (PCA)
[5], non-linear PCA [13] and linear discriminant analysis (LDA) [13]. In comparison, feature
selection methods preserve the primary information and meaning of features in the selected
subset. The purpose of these schemes is to remove noisy and redundant features from the
original feature subspace [13]. Therefore, due to preserving the initial meaning of features,
feature selection approaches are in more of interest [8], [9].
Feature selection methods can be broadly divided into two categories: filter and wrapper
approaches [9]. Filter approaches choose features from the original feature space according to
pre-specified evaluation criterions, which are independent of specified learning algorithms.
Conversely, wrapper approaches select features with higher prediction performances estimated
according to specified learning algorithms. Thus wrappers can achieve better performance than
filters. However, wrapper approaches are less common than filter ones because they need higher
computational resources and often intractable for large scale problems [9]. Due to their
computational efficiency and independency to any specified learning algorithm, filter approaches
are more popular and common for high dimensional data sets [9].
As was stated above, feature selection has been studied intensively [4], [5], [6], [8], [9] but its
importance to resolving the class imbalance problem was recently mentioned by researchers [10].
The class imbalance problem refers to the issue that occurs when one or more classes of a data
set have significantly more number of instances (majority class) than other classes of that data
set (minority class) [10].In this type of data sets, the minority class has higher importance than the
majority class. Since, nowadays, imbalanced data sets are pervasive in real world applications
such as biological data analysis, text classification, web categorization, risk management, image
classification, fraud detection and many other applications, it is important to propose a new
feature selection method which is appropriate for imbalanced data sets.
Therefore, in this study, we present a new filter unsupervised feature selection algorithm which
has the benefits of filter approaches and is designed to have a high performance on imbalanced
data sets. The proposed approach chooses more informative features considering the importance
of the minority class, according to relation between the distributions of features which are
approximated by probability density function (PDF). The main idea of the proposed scheme is
firstly approximating the PDF of each feature independently in an unsupervised manner and then
removing those features for which their PDFs have higher covering areas with the PDFs of other
features which are known as redundant features.
The rest of this paper is organized as follow. Section 2 discusses the related researches for
unsupervised feature selection. Section 3 explains the proposed method for unsupervised feature
selection applications. Our experimental results are given in section 4 and section 5 concludes
the paper by a conclusion part.
3. Mina Alibeigi, Sattar Hashemi & Ali Hamzeh
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (1) : 2011 16
2. RELATED WORK
Conventional feature selection methods evaluate various subsets of features and select the best
subset among all with the best evaluation according to an effective criterion related to the
application. These methods often suffer from high computational complexity through their
searching process when applied to large data sets. The complexity of an exhaustive search is
exponential in terms of the number of features of the data set. To overcome these shortcomings,
several heuristic schemas have been proposed such as Branch and Bound (B&B) method which
guarantees to find the optimal subset of features with computational time expectedly less than the
exponential under the monotonicity assumption [12]. B&B starts from the full set of features and
removes features by a depth first search strategy until the removing of one feature can improve
the evaluation of the remaining subset of features [12]. Another popular approach is Sequential
Forward Selection (SFS) which searches to find the best subset of features in an iterative manner
starting from the empty set of features. In each step, SFS adds that feature to the current subset
of selected features which yields to maximize the evaluation criterion for the new selected feature
subset [13]. However, heuristic approaches are simple and fast with quadratic complexity, but
they often suffer from lack of backtracking and thus act poorly for nonmonotonic criterions. In [24],
another heuristic method called Sequential Floating Forward Selection (SFFS) was proposed
which performs sequential forward selection with the backtracking capability at the cost of higher
computational complexity.
The former methods can be applied in both supervised and unsupervised schemas according to
their evaluation criteria. Since the interest of this paper is developing an unsupervised feature
selection method, here, we investigate only the unsupervised methods. These methods can be
generally divided into two divisions: filter and wrapper approaches [4], [8], [13]. The principle of
wrapper approaches is to select subset of features regarding a specified clustering algorithm.
These methods find a subset of features that using them for training a specified clustering; the
highest performance can be achieved. Some examples of these approaches are [14], [15], [16].
Conversely, filter methods select features according to an evaluation criterion independent of
specified clustering algorithm. The goal of these methods is to find irrelevant and redundant
features and remove them from the original feature space. In order to find irrelevant and
redundant features, various dependency measures have been suggested such as correlation
coefficient [6], linear dependency [18] and consistency measures [19].
In this paper, we propose a feature subset selection based on the distribution of features which is
able to handle the nonlinearity dependency between features in an unsupervised framework with
a high performance for imbalanced data sets because of considering higher importance of the
minority class which is the most important class in an imbalanced data set. The following section
explains the proposed method in details.
3. THE PROPOSED UNSUPERVISED FEATURE SELECTION METHOD
ATTRIBUTED TO IMBALANCED DATA SETS
The proposed unsupervised feature selection which is a filter approach attributed to imbalanced
data sets, includes four steps. In the first step features are scaled in the range [0, 1]. Then, the
probability density function (PDF) of each feature is estimated which gives a good overview about
the distribution of instances for a specific feature. The third step is computing the number of times
that the PDF of one feature is similar to PDF of other remaining features. At last, features with
higher counter of being similar to other features are removed. Each step is described in details as
follows.
The proposed method finds the relation between each two features as if they are similar or not
according to their PDFs and removes those features which are more similar to other features as
redundant features because all or most of their information is repeated in other features.
As was explained before, the first step in the proposed feature selection approach is scaling
feature values in the range [0, 1]. Afterwards, PDF is estimated for each feature. The methods for
estimating probability density functions can be totally categorized into parametric and non-
parametric approaches [21]. The parametric methods assume a particular form for the density,
4. Mina Alibeigi, Sattar Hashemi & Ali Hamzeh
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (1) : 2011 17
such as Gaussian, so that only the parameters (mean and variance) need to be estimated. In
comparison, non-parametric methods do not assume any knowledge about the density of the
data and computes the density directly from the instances and because of this reason they are in
more of interest. The general form of non-parametric probability density estimation methods is
according to the following formula:
VN
k
xp
*
)( ≅ (1)
where, p(x) is the value of the estimated probability density function for instance x, V is the
volume surrounding x, N is the total number of instances and k is the number of instances inside
V. Two basic approaches can be adapted to practical non-parametric density estimation methods
based on the status of k and V. Fixing the value of k and determining the corresponding volume V
that contains exactly k instances inside, leads to methods commonly referred to as K Nearest
Neighbor (KNN) methods. On the other hand, when the volume V is chosen to be fixed and k is
determined, the non-parametric estimation method is called Kernel Density Estimation (KDE).
Generally, the probability densities that estimated via KNN approaches are not very satisfactory
because of some drawbacks. Because, KNN PDF estimation methods are prone to local noise.
Moreover, the resulting PDF via KNN method is not a true probability density since its integral
over all the instance space diverges [25]. In spite of these reasons, in this study, we estimate
probability density functions through the KDE method with Gaussian kernel. It is noted that our
proposed feature selection algorithm is not sensitive to any particular estimation method.
However, using more accurate estimation methods cause the algorithm to perform more
efficiently.
In order to compare PDFs of different features, all feature values are scaled into the [0, 1] interval
because the range of various features may be different. Afterwards, the probability density
functions for each of the features are computed according to KDE methods.
Having estimated the probability density function for each feature, the similarity between each of
the two features is calculated. Two features are considered as similar features if the Mean
Square Error (MSE) of their PDFs be less than a user specified threshold. Similar features
contain nearly the same information because their PDFs are sufficiently similar. Thus, one of the
similar features can be removed without a considerable loss of information. Among similar
features, features which are similar to more other features of the whole feature space are
removed. By removing the feature which has higher frequency of being similar with other
features, the loss of information is minimized. Also, as the instances of all classes contribute
equally for estimating the PDF of each feature, then instances of the minority classes are given
higher importance in the PDF estimation process. Thus, features which are more informative
according to minority classes are given higher chance to be selected. Algorithm 1 represents the
steps of the proposed feature selection approach.
4. EXPERIMENTAL RESULTS AND DISCUSSION
The comparisons were carried out in three data sets coming from the UCI Machine Learning
Repository including Ecoli, Ionosphere and Sonar which are all imbalanced. Table I shows a
summary of the characteristics of the data sets used in this paper to assess the performance of
the proposed method. The first column of Table I shows the name of the data set. Number of
features and number of classes are showed in the second and third columns, respectively. The
last column in each row is the number of instances per each class.
In order to evaluate the performance of a feature selection method, the performance of classifiers
trained on the features selected by the mentioned feature selection method, is compared to the
performance of classifiers trained on the full set of features named as baseline performance.
There are many classifiers in machine learning domains with different biases. The most well-
known classifiers for evaluating a feature selection method are Naive Bayes (NB) [11] classifier
and K- Nearest Neighbor (KNN) classifier [13]. Naïve Bayes is a simple probabilistic classifier
based on the assumption of class conditional independence of features [25]. K-Nearest Neighbor
5. Mina Alibeigi, Sattar Hashemi & Ali Hamzeh
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (1) : 2011 18
is a lazy learning algorithm which classifies each new test instance based on its K nearest
training instances [25].
For imbalanced data sets, classifiers have difficulties to classify instances from the minority class
because they simply classify instances as the majority class achieving a high accuracy. So, in
these data sets, accuracy is not a good performance measure. There are a number of other
statistics such as AUC (Area Under receiver operating characteristic Curve) and F-measures
[26]. AUC and F1-measure are two of the statistics which are commonly used to evaluate
classifiers focusing on the importance of the minority class. In this paper, we evaluate the
performance of different feature selection methods based on AUC and F1-measure evaluation
statistics.
Algorithm 1: The steps of the proposed unsupervised feature selection method.
Unsupervised Feature Selection Based on the Distribution of Features
Input: D={d1, d2, …, dN} // N is the number of instances
Input: F={f1, f2, …, fn} // n is the number of features
Output: F(S)
// The selected subset of features, at the beginning, F(S)
= F
Begin
Step 1. Scale each feature in range [0,1]
Step 2. Estimate the probability density function (PDF) for each feature
Step 3. For i=1 to n-1
For j=i+1 to n
Calculate MES(density of feature i, density of feature j)
If MSE <= epsilon
Consider features i and j to be similar
Increment the similarity counter of both features i and j
Step 4. Between each similar features, remove feature with higher similarity counter
Step 5. Return list of remaining features as the list of selected features (F(S)
)
End
TABLE 1: Characteristics of data sets used in this study for experimental evaluations.
Comparisons are done in Weka framework [22]. To show the effectiveness of the proposed
method, we compared our method with two of the commonly used supervised approaches
proposed by Hall et al. [19] and Lie et al. [4] named as Correlation-based Feature subset
Evaluation and Consistency-based feature Subset Evaluation, which are abbreviated in results
as CfsSubsetEval and ConsistencySubsetEval, respectively. We also compared the proposed
method with an unsupervised Sequential Forward Selection (SFS) scheme for which Entropy is
used as the evaluation criterion. This method is illustrated as SFS with entropy in experiments.
The entropy criterion for this method is defined according to formula (2).
Name # Features # Class # Instances Per Class
Sonar 60 2 97, 111
Ionosphere 34 2 126, 225
Ecoli 7 8 143, 77, 52, 35, 20, 5, 2, 2
6. Mina Alibeigi, Sattar Hashemi & Ali Hamzeh
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (1) : 2011 19
∑∑
= = −−
+
−=
l
p
l
q qpsimqpsim
qpsimqpsim
Entropy
1 1 )),(1log(*)),(1(
)),(log(*),((
pqD
eqpSim
α−
=),( (2)
2
1
1
2,,
])
minmax
([∑
= −
−
=
M
j jj
jqjp
pq
xx
D
where Dpq is the distance between two instances p and q and xp,j denotes the jth feature value for
instance p. maxj and minj are respectively the maximum and minimum values for the jth feature
and M denotes the number of features. In (2), α is a positive constant which is set as
D
5.0ln−
=α
where D is the average distance between all instances.
Tables 2-4, separately illustrate the experimental results on each of the introduced data sets. The
fist column of these tables, is the name of the feature selection method. The second column of
each table, is the number of selected features by the corresponding feature selection method.
AUC and F1 performance of Naïve Bayes (NB) classifier are shown in third and forth columns,
respectively. Also, AUC and F1 performance of K-Nearest Neighbor (KNN) classifier are shown in
the last two columns of each table.
As the results show in Tables 2-4, the AUC and F1 performance of the proposed method is fairly
comparable to the performance of the Baseline method while the proposed method removes
some redundant features (about half of the original features, see Figure 1) which lead to less
computational complexity. This illustration acknowledges that feature selection is a key solution
for classifiers on high dimensional imbalanced data sets.
Also, the proposed feature selection method has higher 1-NN classifier performance than
CfsSubsetEval and ConsistencySubsetEval feature selection schemes in terms of both AUC and
F1 evaluation measures and is comparable to both mentioned feature selection methods in terms
of AUC and F1 performance of NB classifier. However, it is noticeable that the proposed method
is an unsupervised approach which has access to less information in comparison with
CfsSubsetEval and ConsistencySubsetEval feature selection methods which are supervised
methods and have access to the class labels. Furthermore, our method has higher performance
in comparison with other rival unsupervised feature selection scheme named as SFS with
Entropy in experiments in terms of both AUC and F1 performances of 1-NN and NB classifiers. In
general, it can be concluded that the proposed feature selection approach is more efficient than
the other rival unsupervised feature selection and is comparable to the commonly used
supervised feature selection schemas considered in experiments.
Figure 1 shows the comparison among rival feature selection methods in terms of the percent of
selected features for each data set. Those feature selection methods which select a small percent
of features while having a suitable performance, are more of interest. As can be seen, this
property is true for the proposed feature selection method which makes it a good choice of action
for feature selection on imbalanced data sets.
7. Mina Alibeigi, Sattar Hashemi & Ali Hamzeh
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (1) : 2011 20
Feature Selection Method # Selected Features NB F1 NB AUC 1-NN F1 1-NN AUC
Baseline 34 0.829 0.935 0.857 0.822
CfsSubsetEval 14 0.92 0.958 0.885 0.852
ConsistencySubsetEval 7 0.872 0.926 0.875 0.849
SFS with Entropy 14 0.778 0.82 0.791 0.747
The Proposed Method 12 0.92 0.958 0.91 0.889
TABLE 2: Experimental results on Ionosphere data set in terms of the number of selected features and the
AUC and F1 evaluation performances for NB and 1-NN classifiers.
Feature Selection Method # Selected Features NB F1 NB AUC 1-NN F1 1-NN AUC
Baseline 7 0.854 0.96 0.801 0.875
CfsSubsetEval 6 0.854 0.96 0.799 0.873
ConsistencySubsetEval 6 0.854 0.96 0.799 0.873
SFS with Entropy 6 0.791 0.947 0.766 0.854
The Proposed Method 6 0.854 0.96 0.799 0.873
TABLE 3: Experimental results on Ecoli data set in terms of the number of selected features and the AUC
and F1 evaluation performances for NB and 1-NN classifiers.
Feature Selection Method # Selected Features NB F1 NB AUC 1-NN F1 1-NN AUC
Baseline 60 0.673 0.8 0.865 0.862
CfsSubsetEval 19 0.675 0.812 0.836 0.834
ConsistencySubsetEval 14 0.666 0.811 0.85 0.847
SFS with Entropy 14 0.557 0.658 0.659 0.657
The Proposed Method 14 0.652 0.769 0.88 0.878
TABLE 4: Experimental results on Sonar data set in terms of the number of selected features and the AUC
and F1 evaluation performances for NB and 1-NN classifiers.
FIGURE 1: The percentage of selected features for each data set and in average for all data sets for the
proposed feature selection method and other rival methods.
8. Mina Alibeigi, Sattar Hashemi & Ali Hamzeh
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (1) : 2011 21
5. CONSLUSION AND FUTURE WORK
Feature selection techniques have a key role when encountering high dimensional data sets.
Recently, filter based feature selection methods are of more interest because of their
independence to any particular learning algorithm and their efficiency on high dimensional data
sets. Since, most of the current feature selection methods perform poorly when fed on with
imbalanced data sets, designing a feature selection method which is able to handle imbalanced
data sets is recently mentioned by researchers.
Therefore, in this study, we proposed a new filter unsupervised feature selection scheme
attributed to imbalanced data sets, which selects features based on the relation between
probability density estimations of features. The main idea is that a feature, for which its
distribution is more similar to the distribution of other features, is redundant because all or most of
its information is repeated in those similar features. So, this feature can be removed from the
original feature space with the least loss of information. Experimental results on a set of
imbalanced data sets show that the proposed feature selection approach compared to the rival
unsupervised feature selection method, can find a more informative subset of features which are
more useful for classifying the instances of the minority classes. Also, the performance of the
proposed method is comparable to the performance of two commonly used supervised feature
selection frameworks in terms of both AUC and F1 evaluation measures.
For future work, it might be useful to apply this idea in the field of supervised feature selection
and find the probability density functions per classes for each feature and find the similarity
between features by considering their densities per classes.
6. ACKNOWLEDGEMENT
This work is supported by the Iran Tele Communication Research Center.
7. REFERENCES
1. U. Fayyad, G. Piatetsky-Shapiro and P. Smyth. “From data mining to knowledge discovery in
databases”, AI Magazine, vol. 17, pp. 37–54, 1996
2. M. Lindenbaum, S. Markovitch and D. Rusakov. “Selective sampling for nearest neighbor
classifiers”, Machine learning, vol. 54, pp. 125–152, 2004
3. A.I. Schein and L.H. Ungar, “Active learning for logistic regression: an evaluation”, Machine
Learning, vol. 68, pp. 235–265, 2007
4. M.A. Hall. “Correlation-based feature subset selection for machine learning”, Ph.D.
Dissertation, Department of Computer Science, University of Waikato, Hamilton, New
Zealand, 1999
5. I.K. Fodor. “A survey of dimension reduction techniques”, Technical Report UCRL- ID-
148494, Lawrence Livermore National Laboratory, US Department of Energy, 2002
6. M.A. Hall. “Correlation-based feature selection for discrete and numeric class machine
learning”, Department of Computer Science, University of Waikato, Hamilton, New Zealand,
2000
7. R. Bellman. “Adaptive Control Processes: A Guided Tour”, Princeton University Press,
Princeton, 1961
8. H. Liu, J. Sun, L. Liu and H. Zhang, “Feature selection with dynamic mutual information”,
Pattern Recognition, vol. 42, pp. 1330 – 1339, 2009
9. N. Pradhananga. “Effective Linear-Time Feature Selection”, Department of Computer
Science, University of Waikato, Hamilton, New Zealand, 2007
9. Mina Alibeigi, Sattar Hashemi & Ali Hamzeh
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (1) : 2011 22
10. M. Wasikowski and X. Chen. “Combating the small sample class imbalance problem using
feature selection”, IEEE Transactions on knowledge and data engineering, 2009
11. G.H. John and P. Langley. “Estimating Continuous Distributions in Bayesian Classifiers”. In:
Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, pp. 338-345, 1995
12. M.P. Narendra and K. Fukunaga. “A branch and bound algorithm for feature subset
selection”, IEEE Trans. Comput. Vol. 26, pp. 917–922, 1997
13. P.A. Devijver and J. Kittler. “Pattern Recognition: A Statistical Approach”, Englewood Cliffs:
Prentice Hall, 1982
14. M. Dash and H. Liu. “Unsupervised Feature Selection”, Proc. Pacific Asia conf. Knowledge
Discovery and Data Mining, pp. 110-121, 2000
15. J. Dy and C. Btodley. “Feature Subset Selection and Order Identification for Unsupervised
Learning”, Proc. 17th Int’l. Conf. Machine Learning, 2000
16. S.Basu, C.A. Micchelli and P. Olsen. “Maximum Entropy and Maximum Likelihood Criteria for
Feature Selection from Multi-variate Data”, Proc. IEEE Int’l. Symp. Circuits and Systems, pp.
267-270, 2000
17. S.K .Pal, R.K. De and J. Basak. “Unsupervised Feature Evaluation: A Neuro-Fuzzy
Approach”, IEEE Trans. Neural Network, vol. 11, pp. 366-376, 2000
18. S.K .Das. “Feature Selection with a Linear Dependence Measure”, IEEE Trans. Computers,
pp. 1106-1109, 1971
19. G.T. Toussaint and T.R. Vilmansen. “Comments on Feature Selection with a Linear
Dependence Measure”, IEEE Trans. Computers, 408, 1972
20. H. Liu and R. Setiono. “A probabilistic approach to feature selection - A filter solution”. In:
13th International Conference on Machine Learning, pp. 319-327, 1996
21. K. Fukunaga. “Introduction to Statistical Pattern Recognition”, Academic Press, 2nd Ed. 1990
22. E. Frank, M.A. Hall, G. Holmes, R. Kirkby and B. Pfahringer. “Weka - a machine learning
workbench for data mining”, In The Data Mining and Knowledge Discovery Handbook, pp.
1305-1314, 2005
23. M. Dash and H. Liu. “Unsupervised Feature Selection”, Proc. Pacific Asia Conf. Knowledge
Discovery and Data Mining, pp. 110-121, 2000
24. P. Pudil, J. Novovicova and J. Kittler. “Floating Search Methods in Feature Selection”, Pattern
Recognition Letters, vol. 15, pp. 1119-1125, 1994
25. R.O. Duda, P.E. Hart and D.G. Stork. “Pattern Classification”, Second Edition, Wiley, 1997
26. G. Forman. "An extensive empirical study of feature selection metrics for text classification",
Journal of Machine Learning Research, vol. 3, pp. 1289-1305, 2003