SlideShare a Scribd company logo
1 of 10
A Fast Clustering-Based Feature Subset Selection Algorithm for
High-Dimensional Data
ABSTRACT:
Feature selection involves identifying a subset of the most useful features that produces compatible results as
the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and
effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the
effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based
feature selection algorithm (FAST) is proposed and experimentally evaluated in this paper.
The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-
theoretic clustering methods. In the second step, the most representative feature that is strongly related to target
classes is selected from each cluster to form a subset of features. Features in different clusters are relatively
independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful and
independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST)
clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical
study.
Extensive experiments are carried out to compare FAST and several representative feature selection algorithms,
namely, FCBF, ReliefF, CFS, Consist, and FOCUS-SF, with respect to four types of well-known classifiers,
namely, the probabilitybased Naive Bayes, the tree-based C4.5, the instance-based IB1, and the rule-based
RIPPER before and after feature selection. The results, on 35 publicly available real-world high-dimensional
GLOBALSOFT TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE
BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS
CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401
Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
image, microarray, and text data, demonstrate that the FAST not only produces smaller subsets of features but
also improves the performances of the four types of classifiers.
EXISTING SYSTEM:
The embedded methods incorporate feature selection as a part of the training process and are usually specific to
given learning algorithms, and therefore may be more efficient than the other three categories. Traditional
machine learning algorithms like decision trees or artificial neural networks are examples of embedded
approaches. The wrapper methods use the predictive accuracy of a predetermined learning algorithm to
determine the goodness of the selected subsets, the accuracy of the learning algorithms is usually high.
However, the generality of the selected features is limited and the computational complexity is large. The filter
methods are independent of learning algorithms, with good generality. Their computational complexity is low,
but the accuracy of the learning algorithms is not guaranteed. The hybrid methods are a combination of filter
and wrapper methods by using a filter method to reduce search space that will be considered by the subsequent
wrapper. They mainly focus on combining filter and wrapper methods to achieve the best possible performance
with a particular learning algorithm with similar time complexity of the filter methods.
DISADVANTAGES:
1. The generality of the selected features is limited and the computational complexity is large.
2. Their computational complexity is low, but the accuracy of the learning algorithms is not guaranteed.
3. The hybrid methods are a combination of filter and wrapper methods by using a filter method to reduce
search space that will be considered by the subsequent wrapper.
PROPOSED SYSTEM:
Feature subset selection can be viewed as the process of identifying and removing as many irrelevant and
redundant features as possible. This is because irrelevant features do not contribute to the predictive accuracy
and redundant features do not redound to getting a better predictor for that they provide mostly information
which is already present in other feature(s). Of the many feature subset selection algorithms, some can
effectively eliminate irrelevant features but fail to handle redundant features yet some of others can eliminate
the irrelevant while taking care of the redundant features.
Our proposed FAST algorithm falls into the second group. Traditionally, feature subset selection research has
focused on searching for relevant features. A well-known example is Relief which weighs each feature
according to its ability to discriminate instances under different targets based on distance-based criteria
function. However, Relief is ineffective at removing redundant features as two predictive but highly correlated
features are likely both to be highly weighted. Relief-F extends Relief, enabling this method to work with noisy
and incomplete data sets and to deal with multiclass problems, but still cannot identify redundant features.
ADVANTAGES:
Good feature subsets contain features highly correlated with (predictive of) the class, yet uncorrelated
with (not predictive of) each other.
The efficiently and effectively deal with both irrelevant and redundant features, and obtain a good
feature subset.
Generally all the six algorithms achieve significant reduction of dimensionality by selecting only a small
portion of the original features.
The null hypothesis of the Friedman test is that all the feature selection algorithms are equivalent in
terms of runtime.
HARDWARE & SOFTWARE REQUIREMENTS:
HARDWARE REQUIREMENT:
 Processor - Pentium –IV
 Speed - 1.1 GHz
 RAM - 256 MB (min)
 Hard Disk - 20 GB
 Floppy Drive - 1.44 MB
 Key Board - Standard Windows Keyboard
 Mouse - Two or Three Button Mouse
 Monitor - SVGA
SOFTWARE REQUIREMENTS:
 Operating System : Windows XP
 Front End : Java JDK 1.7
 Scripts : JavaScript.
 Tools : Netbeans
 Database : SQL Server or MS-Access
 Database Connectivity : JDBC.
FLOW CHART:
Data set
Irrelevant feature removal
Minimum Spinning tree
constriction
Tree partition & representation
feature selection
MAIN MODULES:-
DISTRIBUTED CLUSTERING:
SUBSET SELECTION ALGORITHM:
TIME COMPLEXITY:
MICROARRAY DATA:
DATA RESOURCE:
IRRELEVANT FEATURE:
MODULE DESCRIPTION:
DISTRIBUTED CLUSTERING:
The Distributional clustering has been used to cluster words into groups based either on their participation in
particular grammatical relations with other words by Pereira et al. or on the distribution of class labels
associated with each word by Baker and McCallum . As distributional clustering of words are agglomerative in
nature, and result in suboptimal word clusters and high computational cost, proposed a new information-
theoretic divisive algorithm for word clustering and applied it to text classification. proposed to cluster features
using a special metric of distance, and then makes use of the of the resulting cluster hierarchy to choose the
most relevant attributes. Unfortunately, the cluster evaluation measure based on distance does not identify a
feature subset that allows the classifiers to improve their original performance accuracy. Furthermore, even
compared with other feature selection methods, the obtained accuracy is lower.
SUBSET SELECTION ALGORITHM:
The Irrelevant features, along with redundant features, severely affect the accuracy of the learning machines.
Thus, feature subset selection should be able to identify and remove as much of the irrelevant and redundant
information as possible. Moreover, “good feature subsets contain features highly correlated with (predictive of)
the class, yet uncorrelated with (not predictive of) each other. Keeping these in mind, we develop a novel
algorithm which can efficiently and effectively deal with both irrelevant and redundant features, and obtain a
good feature subset.
TIME COMPLEXITY:
The major amount of work for Algorithm 1 involves the computation of SU values for TR relevance and F-
Correlation, which has linear complexity in terms of the number of instances in a given data set. The first part of
the algorithm has a linear time complexity in terms of the number of features m. Assuming features are selected
as relevant ones in the first part, when k ¼ only one feature is selected.
MICROARRAY DATA:
The proportion of selected features has been improved by each of the six algorithms compared with that on the
given data sets. This indicates that the six algorithms work well with microarray data. FAST ranks 1 again with
the proportion of selected features of 0.71 percent. Of the six algorithms, only CFS cannot choose features for
two data sets whose dimensionalities are 19,994 and 49,152, respectively.
DATA RESOURCE:
The purposes of evaluating the performance and effectiveness of our proposed FAST algorithm, verifying
whether or not the method is potentially useful in practice, and allowing other researchers to confirm our
results, 35 publicly available data sets1 were used. The numbers of features of the 35 data sets vary from 37 to
49, 52 with a mean of 7,874. The dimensionalities of the 54.3 percent data sets exceed 5,000, of which 28.6
percent data sets have more than 10,000 features. The 35 data sets cover a range of application domains such as
text, image and bio microarray data classification in the corresponding statistical information that for the data
sets with continuous-valued features, the well-known off-the-shelf MDL method was used to discredit the
continuous values.
IRRELEVANT FEATURE:
The irrelevant feature removal is straightforward once the right relevance measure is defined or selected, while
the redundant feature elimination is a bit of sophisticated. In our proposed FAST algorithm, it involves 1.the
construction of the minimum spanning tree from a weighted complete graph; 2. The partitioning of the MST
into a forest with each tree representing a cluster; and 3.the selection of representative features from the
clusters.
MODULE DESCRIPTION:
USER MODULE:
In this module, Users are having authentication and security to access the detail which is presented in the
ontology system. Before accessing or searching the details user should have the account in that otherwise they
should register first.
DISTRIBUTED CLUSTERING:
The Distributional clustering has been used to cluster words into groups based either on their participation in
particular grammatical relations with other words by Pereira et al. or on the distribution of class labels
associated with each word by Baker and McCallum . As distributional clustering of words are agglomerative in
nature, and result in suboptimal word clusters and high computational cost, proposed a new information-
theoretic divisive algorithm for word clustering and applied it to text classification.
We proposed to cluster features using a special metric of distance, and then makes use of the of the resulting
cluster hierarchy to choose the most relevant attributes. Unfortunately, the cluster evaluation measure based on
distance does not identify a feature subset that allows the classifiers to improve their original performance
accuracy. Furthermore, even compared with other feature selection methods, the obtained accuracy is lower.
SUBSET SELECTION ALGORITHM:
The Irrelevant features, along with redundant features, severely affect the accuracy of the learning machines.
Thus, feature subset selection should be able to identify and remove as much of the irrelevant and redundant
information as possible. Moreover, “good feature subsets contain features highly correlated with (predictive of)
the class, yet uncorrelated with (not predictive of) each other. Keeping these in mind, we develop a novel
algorithm which can efficiently and effectively deal with both irrelevant and redundant features, and obtain a
good feature subset.
TIME COMPLEXITY:
The major amount of work for Algorithm 1 involves the computation of SU values for TR relevance and F-
Correlation, which has linear complexity in terms of the number of instances in a given data set. The first part of
the algorithm has a linear time complexity in terms of the number of features m. Assuming features are selected
as relevant ones in the first part, when k ¼ only one feature is selected.
.CONCLUSION:
In this paper, we have presented a novel clustering-based feature subset selection algorithm for high
dimensional data. The algorithm involves 1) removing irrelevant features, 2) constructing a minimum spanning
tree from relative ones, and 3) partitioning the MST and selecting representative features. In the proposed
algorithm, a cluster consists of features. Each cluster is treated as a single feature and thus dimensionality is
drastically reduced. Generally, the proposed algorithm obtained the best proportion of selected features, the best
runtime, and the best classification accuracy confirmed the conclusions.
We have presented a novel clustering-based feature subset selection algorithm for high dimensional data. The
algorithm involves removing irrelevant features, constructing a minimum spanning tree from relative ones, and
partitioning the MST and selecting representative features. In the proposed algorithm, a cluster consists of
features. Each cluster is treated as a single feature and thus dimensionality is drastically reduced.
We have compared the performance of the proposed algorithm with those of the five well-known feature
selection algorithms FCBF, CFS, Consist, and FOCUS-SF on the publicly available image, microarray, and text
data from the four different aspects of the proportion of selected features, runtime, classification accuracy of a
given classifier, and the Win/Draw/Loss record.
Generally, the proposed algorithm obtained the best proportion of selected features, the best runtime, and the
best classification accuracy for Naive, and RIPPER, and the second best classification accuracy for IB1. The
Win/Draw/Loss records confirmed the conclusions. We also found that FAST obtains the rank of 1 for
microarray data, the rank of 2 for text data, and the rank of 3 for image data in terms of classification accuracy
of the four different types of classifiers, and CFS is a good alternative. At the same time, FCBF is a good
alternative for image and text data. Moreover, Consist, and FOCUS-SF are alternatives for text data. For the
future work, we plan to explore different types of correlation measures, and study some formal properties of
feature space.
REFERENCES:
[1] H. Almuallim and T.G. Dietterich, “Algorithms for Identifying Relevant Features,” Proc. Ninth Canadian
Conf. Artificial Intelligence, pp. 38-45, 1992.
[2] H. Almuallim and T.G. Dietterich, “Learning Boolean Concepts in the Presence of Many Irrelevant
Features,” Artificial Intelligence, vol. 69, nos. 1/2, pp. 279-305, 1994.
[3] A. Arauzo-Azofra, J.M. Benitez, and J.L. Castro, “A Feature Set Measure Based on Relief,” Proc. Fifth Int’l
Conf. Recent Advances in Soft Computing, pp. 104-109, 2004.
[4] L.D. Baker and A.K. McCallum, “Distributional Clustering of Words for Text Classification,” Proc. 21st
Ann. Int’l ACM SIGIR Conf. Research and Development in information Retrieval, pp. 96-103, 1998.
[5] R. Battiti, “Using Mutual Information for Selecting Features in Supervised Neural Net Learning,” IEEE
Trans. Neural Networks, vol. 5, no. 4, pp. 537-550, July 1994.
[6] D.A. Bell and H. Wang, “A Formalism for Relevance and Its Application in Feature Subset Selection,”
Machine Learning, vol. 41, no. 2, pp. 175-195, 2000.
[7] J. Biesiada and W. Duch, “Features Election for High-Dimensional data a Pearson Redundancy Based
Filter,” Advances in Soft Computing, vol. 45, pp. 242-249, 2008.
[8] R. Butterworth, G. Piatetsky-Shapiro, and D.A. Simovici, “On Feature Selection through Clustering,” Proc.
IEEE Fifth Int’l Conf. Data Mining, pp. 581-584, 2005.
[9] C. Cardie, “Using Decision Trees to Improve Case-Based Learning,” Proc. 10th Int’l Conf. Machine
Learning, pp. 25-32, 1993.
[10] P. Chanda, Y. Cho, A. Zhang, and M. Ramanathan, “Mining of Attribute Interactions Using Information
Theoretic Metrics,” Proc. IEEE Int’l Conf. Data Mining Workshops, pp. 350-355, 2009.
[11] S. Chikhi and S. Benhammada, “ReliefMSS: A Variation on a Feature Ranking Relieff Algorithm,” Int’l J.
Business Intelligence and Data Mining, vol. 4, nos. 3/4, pp. 375-390, 2009.
[12] W. Cohen, “Fast Effective Rule Induction,” Proc. 12th Int’l Conf. Machine Learning (ICML ’95), pp. 115-
123, 1995.
[13] M. Dash and H. Liu, “Feature Selection for Classification,” Intelligent Data Analysis, vol. 1, no. 3, pp.
131-156, 1997.
[14] M. Dash, H. Liu, and H. Motoda, “Consistency Based Feature Selection,” Proc. Fourth Pacific Asia Conf.
Knowledge Discovery and Data Mining, pp. 98-109, 2000.
[15] S. Das, “Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection,” Proc. 18th Int’l Conf.
Machine Learning, pp. 74- 81, 2001.

More Related Content

What's hot

C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
C LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHmC LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHm
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHmIJCI JOURNAL
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
 
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAEFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAIJCI JOURNAL
 
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONA NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONcscpconf
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksEditor IJCATR
 
Hybridization of Meta-heuristics for Optimizing Routing protocol in VANETs
Hybridization of Meta-heuristics for Optimizing Routing protocol in VANETsHybridization of Meta-heuristics for Optimizing Routing protocol in VANETs
Hybridization of Meta-heuristics for Optimizing Routing protocol in VANETsIJERA Editor
 
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
 
Decentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelDecentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelSayed Abulhasan Quadri
 
Differential Evolution Algorithm (DEA)
Differential Evolution Algorithm (DEA) Differential Evolution Algorithm (DEA)
Differential Evolution Algorithm (DEA) A. Bilal Özcan
 
Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...ijbbjournal
 
Branch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection AlgorithmsBranch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection AlgorithmsChamin Nalinda Loku Gam Hewage
 
Fuzzy Genetic Algorithm Approach for Verification of Reachability and Detect...
Fuzzy Genetic Algorithm Approach for Verification  of Reachability and Detect...Fuzzy Genetic Algorithm Approach for Verification  of Reachability and Detect...
Fuzzy Genetic Algorithm Approach for Verification of Reachability and Detect...Dr. Amir Mosavi, PhD., P.Eng.
 
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDINGDATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDINGijccsa
 

What's hot (16)

M43016571
M43016571M43016571
M43016571
 
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
C LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHmC LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHm
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...
 
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAEFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
 
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONA NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification Tasks
 
Hybridization of Meta-heuristics for Optimizing Routing protocol in VANETs
Hybridization of Meta-heuristics for Optimizing Routing protocol in VANETsHybridization of Meta-heuristics for Optimizing Routing protocol in VANETs
Hybridization of Meta-heuristics for Optimizing Routing protocol in VANETs
 
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...
 
Decentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelDecentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis Model
 
Differential Evolution Algorithm (DEA)
Differential Evolution Algorithm (DEA) Differential Evolution Algorithm (DEA)
Differential Evolution Algorithm (DEA)
 
PDN for Machine Learning
PDN for Machine LearningPDN for Machine Learning
PDN for Machine Learning
 
my IEEE
my IEEEmy IEEE
my IEEE
 
Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...
 
Branch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection AlgorithmsBranch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection Algorithms
 
Fuzzy Genetic Algorithm Approach for Verification of Reachability and Detect...
Fuzzy Genetic Algorithm Approach for Verification  of Reachability and Detect...Fuzzy Genetic Algorithm Approach for Verification  of Reachability and Detect...
Fuzzy Genetic Algorithm Approach for Verification of Reachability and Detect...
 
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDINGDATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
 

Similar to A fast clustering based feature subset selection algorithm for high-dimensional data

2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...IEEEMEMTECHSTUDENTSPROJECTS
 
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...IEEEGLOBALSOFTTECHNOLOGIES
 
Cloudsim a fast clustering-based feature subset selection algorithm for high...
Cloudsim  a fast clustering-based feature subset selection algorithm for high...Cloudsim  a fast clustering-based feature subset selection algorithm for high...
Cloudsim a fast clustering-based feature subset selection algorithm for high...ecway
 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...ecway
 
Android a fast clustering-based feature subset selection algorithm for high-...
Android  a fast clustering-based feature subset selection algorithm for high-...Android  a fast clustering-based feature subset selection algorithm for high-...
Android a fast clustering-based feature subset selection algorithm for high-...ecway
 
Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
 
An integrated mechanism for feature selection
An integrated mechanism for feature selectionAn integrated mechanism for feature selection
An integrated mechanism for feature selectionsai kumar
 
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
 
A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...IJERA Editor
 
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...csandit
 
Feature selection a novel
Feature selection a novelFeature selection a novel
Feature selection a novelcsandit
 
Threshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniquesThreshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniquesjournalBEEI
 
Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...butest
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Optimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setOptimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setijccmsjournal
 

Similar to A fast clustering based feature subset selection algorithm for high-dimensional data (20)

2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
 
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
 
Cloudsim a fast clustering-based feature subset selection algorithm for high...
Cloudsim  a fast clustering-based feature subset selection algorithm for high...Cloudsim  a fast clustering-based feature subset selection algorithm for high...
Cloudsim a fast clustering-based feature subset selection algorithm for high...
 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...
 
Android a fast clustering-based feature subset selection algorithm for high-...
Android  a fast clustering-based feature subset selection algorithm for high-...Android  a fast clustering-based feature subset selection algorithm for high-...
Android a fast clustering-based feature subset selection algorithm for high-...
 
Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
SEO PROCESS
SEO PROCESSSEO PROCESS
SEO PROCESS
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...
 
An integrated mechanism for feature selection
An integrated mechanism for feature selectionAn integrated mechanism for feature selection
An integrated mechanism for feature selection
 
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
 
A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...
 
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
 
Feature selection a novel
Feature selection a novelFeature selection a novel
Feature selection a novel
 
Threshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniquesThreshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniques
 
D0931621
D0931621D0931621
D0931621
 
Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Optimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setOptimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature set
 

More from IEEEFINALYEARPROJECTS

Scalable face image retrieval using attribute enhanced sparse codewords
Scalable face image retrieval using attribute enhanced sparse codewordsScalable face image retrieval using attribute enhanced sparse codewords
Scalable face image retrieval using attribute enhanced sparse codewordsIEEEFINALYEARPROJECTS
 
Scalable face image retrieval using attribute enhanced sparse codewords
Scalable face image retrieval using attribute enhanced sparse codewordsScalable face image retrieval using attribute enhanced sparse codewords
Scalable face image retrieval using attribute enhanced sparse codewordsIEEEFINALYEARPROJECTS
 
Reversible watermarking based on invariant image classification and dynamic h...
Reversible watermarking based on invariant image classification and dynamic h...Reversible watermarking based on invariant image classification and dynamic h...
Reversible watermarking based on invariant image classification and dynamic h...IEEEFINALYEARPROJECTS
 
Reversible data hiding with optimal value transfer
Reversible data hiding with optimal value transferReversible data hiding with optimal value transfer
Reversible data hiding with optimal value transferIEEEFINALYEARPROJECTS
 
Query adaptive image search with hash codes
Query adaptive image search with hash codesQuery adaptive image search with hash codes
Query adaptive image search with hash codesIEEEFINALYEARPROJECTS
 
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
Noise reduction based on partial reference, dual-tree complex wavelet transfo...Noise reduction based on partial reference, dual-tree complex wavelet transfo...
Noise reduction based on partial reference, dual-tree complex wavelet transfo...IEEEFINALYEARPROJECTS
 
Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...IEEEFINALYEARPROJECTS
 
An access point based fec mechanism for video transmission over wireless la ns
An access point based fec mechanism for video transmission over wireless la nsAn access point based fec mechanism for video transmission over wireless la ns
An access point based fec mechanism for video transmission over wireless la nsIEEEFINALYEARPROJECTS
 
Towards differential query services in cost efficient clouds
Towards differential query services in cost efficient cloudsTowards differential query services in cost efficient clouds
Towards differential query services in cost efficient cloudsIEEEFINALYEARPROJECTS
 
Spoc a secure and privacy preserving opportunistic computing framework for mo...
Spoc a secure and privacy preserving opportunistic computing framework for mo...Spoc a secure and privacy preserving opportunistic computing framework for mo...
Spoc a secure and privacy preserving opportunistic computing framework for mo...IEEEFINALYEARPROJECTS
 
Secure and efficient data transmission for cluster based wireless sensor netw...
Secure and efficient data transmission for cluster based wireless sensor netw...Secure and efficient data transmission for cluster based wireless sensor netw...
Secure and efficient data transmission for cluster based wireless sensor netw...IEEEFINALYEARPROJECTS
 
Privacy preserving back propagation neural network learning over arbitrarily ...
Privacy preserving back propagation neural network learning over arbitrarily ...Privacy preserving back propagation neural network learning over arbitrarily ...
Privacy preserving back propagation neural network learning over arbitrarily ...IEEEFINALYEARPROJECTS
 
Harnessing the cloud for securely outsourcing large
Harnessing the cloud for securely outsourcing largeHarnessing the cloud for securely outsourcing large
Harnessing the cloud for securely outsourcing largeIEEEFINALYEARPROJECTS
 
Geo community-based broadcasting for data dissemination in mobile social netw...
Geo community-based broadcasting for data dissemination in mobile social netw...Geo community-based broadcasting for data dissemination in mobile social netw...
Geo community-based broadcasting for data dissemination in mobile social netw...IEEEFINALYEARPROJECTS
 
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...IEEEFINALYEARPROJECTS
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...IEEEFINALYEARPROJECTS
 
A secure protocol for spontaneous wireless ad hoc networks creation
A secure protocol for spontaneous wireless ad hoc networks creationA secure protocol for spontaneous wireless ad hoc networks creation
A secure protocol for spontaneous wireless ad hoc networks creationIEEEFINALYEARPROJECTS
 
Utility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approachUtility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approachIEEEFINALYEARPROJECTS
 
Two tales of privacy in online social networks
Two tales of privacy in online social networksTwo tales of privacy in online social networks
Two tales of privacy in online social networksIEEEFINALYEARPROJECTS
 

More from IEEEFINALYEARPROJECTS (20)

Scalable face image retrieval using attribute enhanced sparse codewords
Scalable face image retrieval using attribute enhanced sparse codewordsScalable face image retrieval using attribute enhanced sparse codewords
Scalable face image retrieval using attribute enhanced sparse codewords
 
Scalable face image retrieval using attribute enhanced sparse codewords
Scalable face image retrieval using attribute enhanced sparse codewordsScalable face image retrieval using attribute enhanced sparse codewords
Scalable face image retrieval using attribute enhanced sparse codewords
 
Reversible watermarking based on invariant image classification and dynamic h...
Reversible watermarking based on invariant image classification and dynamic h...Reversible watermarking based on invariant image classification and dynamic h...
Reversible watermarking based on invariant image classification and dynamic h...
 
Reversible data hiding with optimal value transfer
Reversible data hiding with optimal value transferReversible data hiding with optimal value transfer
Reversible data hiding with optimal value transfer
 
Query adaptive image search with hash codes
Query adaptive image search with hash codesQuery adaptive image search with hash codes
Query adaptive image search with hash codes
 
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
Noise reduction based on partial reference, dual-tree complex wavelet transfo...Noise reduction based on partial reference, dual-tree complex wavelet transfo...
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
 
Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...
 
An access point based fec mechanism for video transmission over wireless la ns
An access point based fec mechanism for video transmission over wireless la nsAn access point based fec mechanism for video transmission over wireless la ns
An access point based fec mechanism for video transmission over wireless la ns
 
Towards differential query services in cost efficient clouds
Towards differential query services in cost efficient cloudsTowards differential query services in cost efficient clouds
Towards differential query services in cost efficient clouds
 
Spoc a secure and privacy preserving opportunistic computing framework for mo...
Spoc a secure and privacy preserving opportunistic computing framework for mo...Spoc a secure and privacy preserving opportunistic computing framework for mo...
Spoc a secure and privacy preserving opportunistic computing framework for mo...
 
Secure and efficient data transmission for cluster based wireless sensor netw...
Secure and efficient data transmission for cluster based wireless sensor netw...Secure and efficient data transmission for cluster based wireless sensor netw...
Secure and efficient data transmission for cluster based wireless sensor netw...
 
Privacy preserving back propagation neural network learning over arbitrarily ...
Privacy preserving back propagation neural network learning over arbitrarily ...Privacy preserving back propagation neural network learning over arbitrarily ...
Privacy preserving back propagation neural network learning over arbitrarily ...
 
Non cooperative location privacy
Non cooperative location privacyNon cooperative location privacy
Non cooperative location privacy
 
Harnessing the cloud for securely outsourcing large
Harnessing the cloud for securely outsourcing largeHarnessing the cloud for securely outsourcing large
Harnessing the cloud for securely outsourcing large
 
Geo community-based broadcasting for data dissemination in mobile social netw...
Geo community-based broadcasting for data dissemination in mobile social netw...Geo community-based broadcasting for data dissemination in mobile social netw...
Geo community-based broadcasting for data dissemination in mobile social netw...
 
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
A secure protocol for spontaneous wireless ad hoc networks creation
A secure protocol for spontaneous wireless ad hoc networks creationA secure protocol for spontaneous wireless ad hoc networks creation
A secure protocol for spontaneous wireless ad hoc networks creation
 
Utility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approachUtility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approach
 
Two tales of privacy in online social networks
Two tales of privacy in online social networksTwo tales of privacy in online social networks
Two tales of privacy in online social networks
 

Recently uploaded

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Recently uploaded (20)

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

A fast clustering based feature subset selection algorithm for high-dimensional data

  • 1. A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data ABSTRACT: Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST) is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph- theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST) clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms, namely, FCBF, ReliefF, CFS, Consist, and FOCUS-SF, with respect to four types of well-known classifiers, namely, the probabilitybased Naive Bayes, the tree-based C4.5, the instance-based IB1, and the rule-based RIPPER before and after feature selection. The results, on 35 publicly available real-world high-dimensional GLOBALSOFT TECHNOLOGIES IEEE PROJECTS & SOFTWARE DEVELOPMENTS IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401 Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
  • 2. image, microarray, and text data, demonstrate that the FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers. EXISTING SYSTEM: The embedded methods incorporate feature selection as a part of the training process and are usually specific to given learning algorithms, and therefore may be more efficient than the other three categories. Traditional machine learning algorithms like decision trees or artificial neural networks are examples of embedded approaches. The wrapper methods use the predictive accuracy of a predetermined learning algorithm to determine the goodness of the selected subsets, the accuracy of the learning algorithms is usually high. However, the generality of the selected features is limited and the computational complexity is large. The filter methods are independent of learning algorithms, with good generality. Their computational complexity is low, but the accuracy of the learning algorithms is not guaranteed. The hybrid methods are a combination of filter and wrapper methods by using a filter method to reduce search space that will be considered by the subsequent wrapper. They mainly focus on combining filter and wrapper methods to achieve the best possible performance with a particular learning algorithm with similar time complexity of the filter methods. DISADVANTAGES: 1. The generality of the selected features is limited and the computational complexity is large. 2. Their computational complexity is low, but the accuracy of the learning algorithms is not guaranteed. 3. The hybrid methods are a combination of filter and wrapper methods by using a filter method to reduce search space that will be considered by the subsequent wrapper. PROPOSED SYSTEM: Feature subset selection can be viewed as the process of identifying and removing as many irrelevant and redundant features as possible. This is because irrelevant features do not contribute to the predictive accuracy and redundant features do not redound to getting a better predictor for that they provide mostly information which is already present in other feature(s). Of the many feature subset selection algorithms, some can effectively eliminate irrelevant features but fail to handle redundant features yet some of others can eliminate the irrelevant while taking care of the redundant features.
  • 3. Our proposed FAST algorithm falls into the second group. Traditionally, feature subset selection research has focused on searching for relevant features. A well-known example is Relief which weighs each feature according to its ability to discriminate instances under different targets based on distance-based criteria function. However, Relief is ineffective at removing redundant features as two predictive but highly correlated features are likely both to be highly weighted. Relief-F extends Relief, enabling this method to work with noisy and incomplete data sets and to deal with multiclass problems, but still cannot identify redundant features. ADVANTAGES: Good feature subsets contain features highly correlated with (predictive of) the class, yet uncorrelated with (not predictive of) each other. The efficiently and effectively deal with both irrelevant and redundant features, and obtain a good feature subset. Generally all the six algorithms achieve significant reduction of dimensionality by selecting only a small portion of the original features. The null hypothesis of the Friedman test is that all the feature selection algorithms are equivalent in terms of runtime. HARDWARE & SOFTWARE REQUIREMENTS: HARDWARE REQUIREMENT:  Processor - Pentium –IV  Speed - 1.1 GHz  RAM - 256 MB (min)  Hard Disk - 20 GB  Floppy Drive - 1.44 MB  Key Board - Standard Windows Keyboard  Mouse - Two or Three Button Mouse
  • 4.  Monitor - SVGA SOFTWARE REQUIREMENTS:  Operating System : Windows XP  Front End : Java JDK 1.7  Scripts : JavaScript.  Tools : Netbeans  Database : SQL Server or MS-Access  Database Connectivity : JDBC. FLOW CHART: Data set Irrelevant feature removal Minimum Spinning tree constriction Tree partition & representation feature selection
  • 5. MAIN MODULES:- DISTRIBUTED CLUSTERING: SUBSET SELECTION ALGORITHM: TIME COMPLEXITY: MICROARRAY DATA: DATA RESOURCE: IRRELEVANT FEATURE: MODULE DESCRIPTION: DISTRIBUTED CLUSTERING: The Distributional clustering has been used to cluster words into groups based either on their participation in particular grammatical relations with other words by Pereira et al. or on the distribution of class labels associated with each word by Baker and McCallum . As distributional clustering of words are agglomerative in nature, and result in suboptimal word clusters and high computational cost, proposed a new information- theoretic divisive algorithm for word clustering and applied it to text classification. proposed to cluster features using a special metric of distance, and then makes use of the of the resulting cluster hierarchy to choose the most relevant attributes. Unfortunately, the cluster evaluation measure based on distance does not identify a feature subset that allows the classifiers to improve their original performance accuracy. Furthermore, even compared with other feature selection methods, the obtained accuracy is lower.
  • 6. SUBSET SELECTION ALGORITHM: The Irrelevant features, along with redundant features, severely affect the accuracy of the learning machines. Thus, feature subset selection should be able to identify and remove as much of the irrelevant and redundant information as possible. Moreover, “good feature subsets contain features highly correlated with (predictive of) the class, yet uncorrelated with (not predictive of) each other. Keeping these in mind, we develop a novel algorithm which can efficiently and effectively deal with both irrelevant and redundant features, and obtain a good feature subset. TIME COMPLEXITY: The major amount of work for Algorithm 1 involves the computation of SU values for TR relevance and F- Correlation, which has linear complexity in terms of the number of instances in a given data set. The first part of the algorithm has a linear time complexity in terms of the number of features m. Assuming features are selected as relevant ones in the first part, when k ¼ only one feature is selected. MICROARRAY DATA: The proportion of selected features has been improved by each of the six algorithms compared with that on the given data sets. This indicates that the six algorithms work well with microarray data. FAST ranks 1 again with the proportion of selected features of 0.71 percent. Of the six algorithms, only CFS cannot choose features for two data sets whose dimensionalities are 19,994 and 49,152, respectively. DATA RESOURCE: The purposes of evaluating the performance and effectiveness of our proposed FAST algorithm, verifying whether or not the method is potentially useful in practice, and allowing other researchers to confirm our results, 35 publicly available data sets1 were used. The numbers of features of the 35 data sets vary from 37 to 49, 52 with a mean of 7,874. The dimensionalities of the 54.3 percent data sets exceed 5,000, of which 28.6 percent data sets have more than 10,000 features. The 35 data sets cover a range of application domains such as text, image and bio microarray data classification in the corresponding statistical information that for the data sets with continuous-valued features, the well-known off-the-shelf MDL method was used to discredit the continuous values. IRRELEVANT FEATURE:
  • 7. The irrelevant feature removal is straightforward once the right relevance measure is defined or selected, while the redundant feature elimination is a bit of sophisticated. In our proposed FAST algorithm, it involves 1.the construction of the minimum spanning tree from a weighted complete graph; 2. The partitioning of the MST into a forest with each tree representing a cluster; and 3.the selection of representative features from the clusters. MODULE DESCRIPTION: USER MODULE: In this module, Users are having authentication and security to access the detail which is presented in the ontology system. Before accessing or searching the details user should have the account in that otherwise they should register first. DISTRIBUTED CLUSTERING: The Distributional clustering has been used to cluster words into groups based either on their participation in particular grammatical relations with other words by Pereira et al. or on the distribution of class labels associated with each word by Baker and McCallum . As distributional clustering of words are agglomerative in nature, and result in suboptimal word clusters and high computational cost, proposed a new information- theoretic divisive algorithm for word clustering and applied it to text classification. We proposed to cluster features using a special metric of distance, and then makes use of the of the resulting cluster hierarchy to choose the most relevant attributes. Unfortunately, the cluster evaluation measure based on distance does not identify a feature subset that allows the classifiers to improve their original performance accuracy. Furthermore, even compared with other feature selection methods, the obtained accuracy is lower. SUBSET SELECTION ALGORITHM: The Irrelevant features, along with redundant features, severely affect the accuracy of the learning machines. Thus, feature subset selection should be able to identify and remove as much of the irrelevant and redundant information as possible. Moreover, “good feature subsets contain features highly correlated with (predictive of) the class, yet uncorrelated with (not predictive of) each other. Keeping these in mind, we develop a novel
  • 8. algorithm which can efficiently and effectively deal with both irrelevant and redundant features, and obtain a good feature subset. TIME COMPLEXITY: The major amount of work for Algorithm 1 involves the computation of SU values for TR relevance and F- Correlation, which has linear complexity in terms of the number of instances in a given data set. The first part of the algorithm has a linear time complexity in terms of the number of features m. Assuming features are selected as relevant ones in the first part, when k ¼ only one feature is selected. .CONCLUSION: In this paper, we have presented a novel clustering-based feature subset selection algorithm for high dimensional data. The algorithm involves 1) removing irrelevant features, 2) constructing a minimum spanning tree from relative ones, and 3) partitioning the MST and selecting representative features. In the proposed algorithm, a cluster consists of features. Each cluster is treated as a single feature and thus dimensionality is drastically reduced. Generally, the proposed algorithm obtained the best proportion of selected features, the best runtime, and the best classification accuracy confirmed the conclusions. We have presented a novel clustering-based feature subset selection algorithm for high dimensional data. The algorithm involves removing irrelevant features, constructing a minimum spanning tree from relative ones, and partitioning the MST and selecting representative features. In the proposed algorithm, a cluster consists of features. Each cluster is treated as a single feature and thus dimensionality is drastically reduced. We have compared the performance of the proposed algorithm with those of the five well-known feature selection algorithms FCBF, CFS, Consist, and FOCUS-SF on the publicly available image, microarray, and text data from the four different aspects of the proportion of selected features, runtime, classification accuracy of a given classifier, and the Win/Draw/Loss record. Generally, the proposed algorithm obtained the best proportion of selected features, the best runtime, and the best classification accuracy for Naive, and RIPPER, and the second best classification accuracy for IB1. The Win/Draw/Loss records confirmed the conclusions. We also found that FAST obtains the rank of 1 for microarray data, the rank of 2 for text data, and the rank of 3 for image data in terms of classification accuracy of the four different types of classifiers, and CFS is a good alternative. At the same time, FCBF is a good alternative for image and text data. Moreover, Consist, and FOCUS-SF are alternatives for text data. For the
  • 9. future work, we plan to explore different types of correlation measures, and study some formal properties of feature space. REFERENCES: [1] H. Almuallim and T.G. Dietterich, “Algorithms for Identifying Relevant Features,” Proc. Ninth Canadian Conf. Artificial Intelligence, pp. 38-45, 1992. [2] H. Almuallim and T.G. Dietterich, “Learning Boolean Concepts in the Presence of Many Irrelevant Features,” Artificial Intelligence, vol. 69, nos. 1/2, pp. 279-305, 1994. [3] A. Arauzo-Azofra, J.M. Benitez, and J.L. Castro, “A Feature Set Measure Based on Relief,” Proc. Fifth Int’l Conf. Recent Advances in Soft Computing, pp. 104-109, 2004. [4] L.D. Baker and A.K. McCallum, “Distributional Clustering of Words for Text Classification,” Proc. 21st Ann. Int’l ACM SIGIR Conf. Research and Development in information Retrieval, pp. 96-103, 1998. [5] R. Battiti, “Using Mutual Information for Selecting Features in Supervised Neural Net Learning,” IEEE Trans. Neural Networks, vol. 5, no. 4, pp. 537-550, July 1994. [6] D.A. Bell and H. Wang, “A Formalism for Relevance and Its Application in Feature Subset Selection,” Machine Learning, vol. 41, no. 2, pp. 175-195, 2000. [7] J. Biesiada and W. Duch, “Features Election for High-Dimensional data a Pearson Redundancy Based Filter,” Advances in Soft Computing, vol. 45, pp. 242-249, 2008. [8] R. Butterworth, G. Piatetsky-Shapiro, and D.A. Simovici, “On Feature Selection through Clustering,” Proc. IEEE Fifth Int’l Conf. Data Mining, pp. 581-584, 2005. [9] C. Cardie, “Using Decision Trees to Improve Case-Based Learning,” Proc. 10th Int’l Conf. Machine Learning, pp. 25-32, 1993. [10] P. Chanda, Y. Cho, A. Zhang, and M. Ramanathan, “Mining of Attribute Interactions Using Information Theoretic Metrics,” Proc. IEEE Int’l Conf. Data Mining Workshops, pp. 350-355, 2009.
  • 10. [11] S. Chikhi and S. Benhammada, “ReliefMSS: A Variation on a Feature Ranking Relieff Algorithm,” Int’l J. Business Intelligence and Data Mining, vol. 4, nos. 3/4, pp. 375-390, 2009. [12] W. Cohen, “Fast Effective Rule Induction,” Proc. 12th Int’l Conf. Machine Learning (ICML ’95), pp. 115- 123, 1995. [13] M. Dash and H. Liu, “Feature Selection for Classification,” Intelligent Data Analysis, vol. 1, no. 3, pp. 131-156, 1997. [14] M. Dash, H. Liu, and H. Motoda, “Consistency Based Feature Selection,” Proc. Fourth Pacific Asia Conf. Knowledge Discovery and Data Mining, pp. 98-109, 2000. [15] S. Das, “Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection,” Proc. 18th Int’l Conf. Machine Learning, pp. 74- 81, 2001.