SlideShare a Scribd company logo
1 of 11
A Fast Clustering-Based Feature Subset Selection Algorithm
for High-Dimensional Data
ABSTRACT:
Feature selection involves identifying a subset of the most useful features that
produces compatible results as the original entire set of features. A feature
selection algorithm may be evaluated from both the efficiency and effectiveness
points of view. While the efficiency concerns the time required to find a subset of
features, the effectiveness is related to the quality of the subset of features. Based
on these criteria, a fast clustering-based feature selection algorithm (FAST) is
proposed and experimentally evaluated in this paper. The FAST algorithm works
in two steps. In the first step, features are divided into clusters by using graph-
theoretic clustering methods. In the second step, the most representative feature
that is strongly related to target classes is selected from each cluster to form a
subset of features. Features in different clusters are relatively independent, the
clustering-based strategy of FAST has a high probability of producing a subset of
useful and independent features. To ensure the efficiency of FAST, we adopt the
efficient minimum-spanning tree (MST) clustering method. The efficiency and
effectiveness of the FAST algorithm are evaluated through an empirical study.
Extensive experiments are carried out to compare FAST and several representative
feature selection algorithms, namely, FCBF, ReliefF, CFS, Consist, and FOCUS-
SF, with respect to four types of well-known classifiers, namely, the
probabilitybased Naive Bayes, the tree-based C4.5, the instance-based IB1, and the
rule-based RIPPER before and after feature selection. The results, on 35 publicly
available real-world high-dimensional image, microarray, and text data,
demonstrate that the FAST not only produces smaller subsets of features but also
improves the performances of the four types of classifiers.
EXISTING SYSTEM:
The embedded methods incorporate feature selection as a part of the training
process and are usually specific to given learning algorithms, and therefore may be
more efficient than the other three categories. Traditional machine learning
algorithms like decision trees or artificial neural networks are examples of
embedded approaches. The wrapper methods use the predictive accuracy of a
predetermined learning algorithm to determine the goodness of the selected
subsets, the accuracy of the learning algorithms is usually high. However, the
generality of the selected features is limited and the computational complexity is
large. The filter methods are independent of learning algorithms, with good
generality. Their computational complexity is low, but the accuracy of the learning
algorithms is not guaranteed. The hybrid methods are a combination of filter and
wrapper methods by using a filter method to reduce search space that will be
considered by the subsequent wrapper. They mainly focus on combining filter and
wrapper methods to achieve the best possible performance with a particular
learning algorithm with similar time complexity of the filter methods.
DISADVANTAGES OF EXISTING SYSTEM:
The generality of the selected features is limited and the computational
complexity is large.
Their computational complexity is low, but the accuracy of the learning
algorithms is not guaranteed.
The hybrid methods are a combination of filter and wrapper methods by
using a filter method to reduce search space that will be considered by the
subsequent wrapper.
PROPOSED SYSTEM
Feature subset selection can be viewed as the process of identifying and removing
as many irrelevant and redundant features as possible. This is because irrelevant
features do not contribute to the predictive accuracy and redundant features do not
redound to getting a better predictor for that they provide mostly information
which is already present in other feature(s). Of the many feature subset selection
algorithms, some can effectively eliminate irrelevant features but fail to handle
redundant features yet some of others can eliminate the irrelevant while taking care
of the redundant features. Our proposed FAST algorithm falls into the second
group. Traditionally, feature subset selection research has focused on searching for
relevant features. A well-known example is Relief which weighs each feature
according to its ability to discriminate instances under different targets based on
distance-based criteria function. However, Relief is ineffective at removing
redundant features as two predictive but highly correlated features are likely both
to be highly weighted. Relief-F extends Relief, enabling this method to work with
noisy and incomplete data sets and to deal with multiclass problems, but still
cannot identify redundant features.
ADVANTAGES OF PROPOSED SYSTEM:
Good feature subsets contain features highly correlated with (predictive of)
the class, yet uncorrelated with (not predictive of) each other.
The efficiently and effectively deal with both irrelevant and redundant
features, and obtain a good feature subset.
Generally all the six algorithms achieve significant reduction of
dimensionality by selecting only a small portion of the original features.
The null hypothesis of the Friedman test is that all the feature selection
algorithms are equivalent in terms of runtime.
MODULES:
 Distributed clustering
 Subset Selection Algorithm
 Time complexity
 Microarray data
 Data Resource
 Irrelevant feature
MODULE DESCRIPTION
1. Distributed clustering
The Distributional clustering has been used to cluster words into groups based
either on their participation in particular grammatical relations with other words by
Pereira et al. or on the distribution of class labels associated with each word by
Baker and McCallum . As distributional clustering of words are agglomerative in
nature, and result in suboptimal word clusters and high computational cost,
proposed a new information-theoretic divisive algorithm for word clustering and
applied it to text classification. proposed to cluster features using a special metric
of distance, and then makes use of the of the resulting cluster hierarchy to choose
the most relevant attributes. Unfortunately, the cluster evaluation measure based on
distance does not identify a feature subset that allows the classifiers to improve
their original performance accuracy. Furthermore, even compared with other
feature selection methods, the obtained accuracy is lower.
2. Subset Selection Algorithm
The Irrelevant features, along with redundant features, severely affect the accuracy
of the learning machines. Thus, feature subset selection should be able to identify
and remove as much of the irrelevant and redundant information as possible.
Moreover, “good feature subsets contain features highly correlated with (predictive
of) the class, yet uncorrelated with (not predictive of) each other. Keeping these in
mind, we develop a novel algorithm which can efficiently and effectively deal with
both irrelevant and redundant features, and obtain a good feature subset.
3. Time complexity
The major amount of work for Algorithm 1 involves the computation of SU values
for TR relevance and F-Correlation, which has linear complexity in terms of the
number of instances in a given data set. The first part of the algorithm has a linear
time complexity in terms of the number of features m. Assuming features are
selected as relevant ones in the first part, when k ¼ only one feature is selected.
4. Microarray data
The proportion of selected features has been improved by each of the six
algorithms compared with that on the given data sets. This indicates that the six
algorithms work well with microarray data. FAST ranks 1 again with the
proportion of selected features of 0.71 percent. Of the six algorithms, only CFS
cannot choose features for two data sets whose dimensionalities are 19,994 and
49,152, respectively.
5. Data Resource
The purposes of evaluating the performance and effectiveness of our proposed
FAST algorithm, verifying whether or not the method is potentially useful in
practice, and allowing other researchers to confirm our results, 35 publicly
available data sets1 were used. The numbers of features of the 35 data sets vary
from 37 to 49, 52 with a mean of 7,874. The dimensionalities of the 54.3 percent
data sets exceed 5,000, of which 28.6 percent data sets have more than 10,000
features. The 35 data sets cover a range of application domains such as text, image
and bio microarray data classification. The corresponding statistical information.
Note that for the data sets with continuous-valued features, the well-known off-the-
shelf MDL method was used to discredit the continuous values.
6. Irrelevant feature
The irrelevant feature removal is straightforward once the right relevance measure
is defined or selected, while the redundant feature elimination is a bit of
sophisticated. In our proposed FAST algorithm, it involves 1.the construction of
the minimum spanning tree from a weighted complete graph; 2. The partitioning of
the MST into a forest with each tree representing a cluster; and 3.the selection of
representative features from the clusters.
SYSTEM FLOW:
Data set
Irrelevant feature removal
Selected Feature
Minimum Spinning tree
constriction
Tree partition & representation
feature selection
SYSTEM CONFIGURATION:-
HARDWARE CONFIGURATION:-
 Processor - Pentium –IV
 Speed - 1.1 Ghz
 RAM - 256 MB(min)
 Hard Disk - 20 GB
 Key Board - Standard Windows Keyboard
 Mouse - Two or Three Button Mouse
 Monitor - SVGA
SOFTWARE CONFIGURATION:-
 Operating System : Windows XP
 Programming Language : JAVA
 Java Version : JDK 1.6 & above.
REFERENCE:
Qinbao Song, Jingjie Ni, and Guangtao Wang, “A Fast Clustering-Based Feature
Subset Selection Algorithm for High-Dimensional Data”, IEEE
TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25,
NO. 1, JANUARY 2013.

More Related Content

What's hot

Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...ijbbjournal
 
Iaetsd an efficient and large data base using subset selection algorithm
Iaetsd an efficient and large data base using subset selection algorithmIaetsd an efficient and large data base using subset selection algorithm
Iaetsd an efficient and large data base using subset selection algorithmIaetsd Iaetsd
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
 
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
 
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...ijcsa
 
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...gregoryg
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
 
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHGPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHijdms
 
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
C LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHmC LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHm
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHmIJCI JOURNAL
 
Decentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelDecentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelSayed Abulhasan Quadri
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisSangeeta Das
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
 

What's hot (17)

Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...
 
Iaetsd an efficient and large data base using subset selection algorithm
Iaetsd an efficient and large data base using subset selection algorithmIaetsd an efficient and large data base using subset selection algorithm
Iaetsd an efficient and large data base using subset selection algorithm
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
 
PDN for Machine Learning
PDN for Machine LearningPDN for Machine Learning
PDN for Machine Learning
 
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
 
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...
 
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHGPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
 
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
C LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHmC LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHm
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
 
F017533540
F017533540F017533540
F017533540
 
Rohit 10103543
Rohit 10103543Rohit 10103543
Rohit 10103543
 
Decentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelDecentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis Model
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence Analysis
 
Drug discovery presentation
Drug discovery presentationDrug discovery presentation
Drug discovery presentation
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 

Viewers also liked

2012 - 2013 bulk ieee projects for sale
2012 - 2013 bulk ieee projects for sale2012 - 2013 bulk ieee projects for sale
2012 - 2013 bulk ieee projects for saleJPINFOTECH JAYAPRAKASH
 
Efficient algorithms for neighbor discovery in wireless networks
Efficient algorithms for neighbor discovery in wireless networksEfficient algorithms for neighbor discovery in wireless networks
Efficient algorithms for neighbor discovery in wireless networksJPINFOTECH JAYAPRAKASH
 
A probabilistic model of visual cryptography Scheme With Dynamic Group
A probabilistic model of visual cryptography Scheme With Dynamic GroupA probabilistic model of visual cryptography Scheme With Dynamic Group
A probabilistic model of visual cryptography Scheme With Dynamic GroupJPINFOTECH JAYAPRAKASH
 
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization ...
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization ...A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization ...
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization ...JPINFOTECH JAYAPRAKASH
 
Discovery and verification of neighbor positions in mobile ad hoc networks
Discovery and verification of neighbor positions in mobile ad hoc networksDiscovery and verification of neighbor positions in mobile ad hoc networks
Discovery and verification of neighbor positions in mobile ad hoc networksJPINFOTECH JAYAPRAKASH
 
Adaptive opportunistic routing for wireless ad hoc networks
Adaptive opportunistic routing for wireless ad hoc networksAdaptive opportunistic routing for wireless ad hoc networks
Adaptive opportunistic routing for wireless ad hoc networksJPINFOTECH JAYAPRAKASH
 
2012 - 2013 DOTNET IEEE PROJECT TITLES
2012 - 2013 DOTNET IEEE PROJECT TITLES2012 - 2013 DOTNET IEEE PROJECT TITLES
2012 - 2013 DOTNET IEEE PROJECT TITLESJPINFOTECH JAYAPRAKASH
 
A real time adaptive algorithm for video streaming over multiple wireless acc...
A real time adaptive algorithm for video streaming over multiple wireless acc...A real time adaptive algorithm for video streaming over multiple wireless acc...
A real time adaptive algorithm for video streaming over multiple wireless acc...JPINFOTECH JAYAPRAKASH
 
Cooperative positioning and tracking in disruption tolerant networks
Cooperative positioning and tracking in disruption tolerant networksCooperative positioning and tracking in disruption tolerant networks
Cooperative positioning and tracking in disruption tolerant networksJPINFOTECH JAYAPRAKASH
 
Packet hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacksPacket hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacksJPINFOTECH JAYAPRAKASH
 

Viewers also liked (15)

2012 - 2013 bulk ieee projects for sale
2012 - 2013 bulk ieee projects for sale2012 - 2013 bulk ieee projects for sale
2012 - 2013 bulk ieee projects for sale
 
2015 - 2016 ieee ns2 project titles
2015 - 2016 ieee ns2 project titles2015 - 2016 ieee ns2 project titles
2015 - 2016 ieee ns2 project titles
 
2012-2013 IEEE PROJECT TITLES
2012-2013 IEEE PROJECT TITLES2012-2013 IEEE PROJECT TITLES
2012-2013 IEEE PROJECT TITLES
 
Efficient algorithms for neighbor discovery in wireless networks
Efficient algorithms for neighbor discovery in wireless networksEfficient algorithms for neighbor discovery in wireless networks
Efficient algorithms for neighbor discovery in wireless networks
 
A probabilistic model of visual cryptography Scheme With Dynamic Group
A probabilistic model of visual cryptography Scheme With Dynamic GroupA probabilistic model of visual cryptography Scheme With Dynamic Group
A probabilistic model of visual cryptography Scheme With Dynamic Group
 
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization ...
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization ...A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization ...
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization ...
 
Discovery and verification of neighbor positions in mobile ad hoc networks
Discovery and verification of neighbor positions in mobile ad hoc networksDiscovery and verification of neighbor positions in mobile ad hoc networks
Discovery and verification of neighbor positions in mobile ad hoc networks
 
Adaptive opportunistic routing for wireless ad hoc networks
Adaptive opportunistic routing for wireless ad hoc networksAdaptive opportunistic routing for wireless ad hoc networks
Adaptive opportunistic routing for wireless ad hoc networks
 
2012 - 2013 DOTNET IEEE PROJECT TITLES
2012 - 2013 DOTNET IEEE PROJECT TITLES2012 - 2013 DOTNET IEEE PROJECT TITLES
2012 - 2013 DOTNET IEEE PROJECT TITLES
 
Spatial approximate string search
Spatial approximate string searchSpatial approximate string search
Spatial approximate string search
 
A real time adaptive algorithm for video streaming over multiple wireless acc...
A real time adaptive algorithm for video streaming over multiple wireless acc...A real time adaptive algorithm for video streaming over multiple wireless acc...
A real time adaptive algorithm for video streaming over multiple wireless acc...
 
An adaptive cloud downloading service
An adaptive cloud downloading serviceAn adaptive cloud downloading service
An adaptive cloud downloading service
 
Cooperative positioning and tracking in disruption tolerant networks
Cooperative positioning and tracking in disruption tolerant networksCooperative positioning and tracking in disruption tolerant networks
Cooperative positioning and tracking in disruption tolerant networks
 
Packet hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacksPacket hiding methods for preventing selective jamming attacks
Packet hiding methods for preventing selective jamming attacks
 
2013 2014 bulk ieee projects
2013 2014 bulk ieee projects2013 2014 bulk ieee projects
2013 2014 bulk ieee projects
 

Similar to A fast clustering based feature subset selection algorithm for high-dimensional data

JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...IEEEGLOBALSOFTTECHNOLOGIES
 
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...IEEEMEMTECHSTUDENTSPROJECTS
 
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...IEEEGLOBALSOFTTECHNOLOGIES
 
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...IEEEGLOBALSOFTTECHNOLOGIES
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Cloudsim a fast clustering-based feature subset selection algorithm for high...
Cloudsim  a fast clustering-based feature subset selection algorithm for high...Cloudsim  a fast clustering-based feature subset selection algorithm for high...
Cloudsim a fast clustering-based feature subset selection algorithm for high...ecway
 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...ecway
 
Android a fast clustering-based feature subset selection algorithm for high-...
Android  a fast clustering-based feature subset selection algorithm for high-...Android  a fast clustering-based feature subset selection algorithm for high-...
Android a fast clustering-based feature subset selection algorithm for high-...ecway
 
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAEFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAIJCI JOURNAL
 
Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
 
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
 
A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...IJERA Editor
 
Optimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setOptimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setijccmsjournal
 
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature SetOptimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature Setijccmsjournal
 
An integrated mechanism for feature selection
An integrated mechanism for feature selectionAn integrated mechanism for feature selection
An integrated mechanism for feature selectionsai kumar
 
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...On Feature Selection Algorithms and Feature Selection Stability Measures : A ...
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...AIRCC Publishing Corporation
 
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ijcsit
 

Similar to A fast clustering based feature subset selection algorithm for high-dimensional data (20)

JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
 
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
 
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
 
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
 
M43016571
M43016571M43016571
M43016571
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Cloudsim a fast clustering-based feature subset selection algorithm for high...
Cloudsim  a fast clustering-based feature subset selection algorithm for high...Cloudsim  a fast clustering-based feature subset selection algorithm for high...
Cloudsim a fast clustering-based feature subset selection algorithm for high...
 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...
 
Android a fast clustering-based feature subset selection algorithm for high-...
Android  a fast clustering-based feature subset selection algorithm for high-...Android  a fast clustering-based feature subset selection algorithm for high-...
Android a fast clustering-based feature subset selection algorithm for high-...
 
SEO PROCESS
SEO PROCESSSEO PROCESS
SEO PROCESS
 
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAEFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
 
Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...
 
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
 
D0931621
D0931621D0931621
D0931621
 
A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...
 
Optimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setOptimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature set
 
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature SetOptimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
 
An integrated mechanism for feature selection
An integrated mechanism for feature selectionAn integrated mechanism for feature selection
An integrated mechanism for feature selection
 
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...On Feature Selection Algorithms and Feature Selection Stability Measures : A ...
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...
 
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...
 

Recently uploaded

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Recently uploaded (20)

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

A fast clustering based feature subset selection algorithm for high-dimensional data

  • 1. A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data ABSTRACT: Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST) is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph- theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent, the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST) clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms, namely, FCBF, ReliefF, CFS, Consist, and FOCUS-
  • 2. SF, with respect to four types of well-known classifiers, namely, the probabilitybased Naive Bayes, the tree-based C4.5, the instance-based IB1, and the rule-based RIPPER before and after feature selection. The results, on 35 publicly available real-world high-dimensional image, microarray, and text data, demonstrate that the FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers. EXISTING SYSTEM: The embedded methods incorporate feature selection as a part of the training process and are usually specific to given learning algorithms, and therefore may be more efficient than the other three categories. Traditional machine learning algorithms like decision trees or artificial neural networks are examples of embedded approaches. The wrapper methods use the predictive accuracy of a predetermined learning algorithm to determine the goodness of the selected subsets, the accuracy of the learning algorithms is usually high. However, the generality of the selected features is limited and the computational complexity is large. The filter methods are independent of learning algorithms, with good generality. Their computational complexity is low, but the accuracy of the learning algorithms is not guaranteed. The hybrid methods are a combination of filter and wrapper methods by using a filter method to reduce search space that will be considered by the subsequent wrapper. They mainly focus on combining filter and
  • 3. wrapper methods to achieve the best possible performance with a particular learning algorithm with similar time complexity of the filter methods. DISADVANTAGES OF EXISTING SYSTEM: The generality of the selected features is limited and the computational complexity is large. Their computational complexity is low, but the accuracy of the learning algorithms is not guaranteed. The hybrid methods are a combination of filter and wrapper methods by using a filter method to reduce search space that will be considered by the subsequent wrapper. PROPOSED SYSTEM Feature subset selection can be viewed as the process of identifying and removing as many irrelevant and redundant features as possible. This is because irrelevant features do not contribute to the predictive accuracy and redundant features do not redound to getting a better predictor for that they provide mostly information which is already present in other feature(s). Of the many feature subset selection algorithms, some can effectively eliminate irrelevant features but fail to handle redundant features yet some of others can eliminate the irrelevant while taking care
  • 4. of the redundant features. Our proposed FAST algorithm falls into the second group. Traditionally, feature subset selection research has focused on searching for relevant features. A well-known example is Relief which weighs each feature according to its ability to discriminate instances under different targets based on distance-based criteria function. However, Relief is ineffective at removing redundant features as two predictive but highly correlated features are likely both to be highly weighted. Relief-F extends Relief, enabling this method to work with noisy and incomplete data sets and to deal with multiclass problems, but still cannot identify redundant features. ADVANTAGES OF PROPOSED SYSTEM: Good feature subsets contain features highly correlated with (predictive of) the class, yet uncorrelated with (not predictive of) each other. The efficiently and effectively deal with both irrelevant and redundant features, and obtain a good feature subset. Generally all the six algorithms achieve significant reduction of dimensionality by selecting only a small portion of the original features. The null hypothesis of the Friedman test is that all the feature selection algorithms are equivalent in terms of runtime.
  • 5. MODULES:  Distributed clustering  Subset Selection Algorithm  Time complexity  Microarray data  Data Resource  Irrelevant feature MODULE DESCRIPTION 1. Distributed clustering The Distributional clustering has been used to cluster words into groups based either on their participation in particular grammatical relations with other words by Pereira et al. or on the distribution of class labels associated with each word by Baker and McCallum . As distributional clustering of words are agglomerative in nature, and result in suboptimal word clusters and high computational cost, proposed a new information-theoretic divisive algorithm for word clustering and applied it to text classification. proposed to cluster features using a special metric of distance, and then makes use of the of the resulting cluster hierarchy to choose the most relevant attributes. Unfortunately, the cluster evaluation measure based on distance does not identify a feature subset that allows the classifiers to improve
  • 6. their original performance accuracy. Furthermore, even compared with other feature selection methods, the obtained accuracy is lower. 2. Subset Selection Algorithm The Irrelevant features, along with redundant features, severely affect the accuracy of the learning machines. Thus, feature subset selection should be able to identify and remove as much of the irrelevant and redundant information as possible. Moreover, “good feature subsets contain features highly correlated with (predictive of) the class, yet uncorrelated with (not predictive of) each other. Keeping these in mind, we develop a novel algorithm which can efficiently and effectively deal with both irrelevant and redundant features, and obtain a good feature subset. 3. Time complexity The major amount of work for Algorithm 1 involves the computation of SU values for TR relevance and F-Correlation, which has linear complexity in terms of the number of instances in a given data set. The first part of the algorithm has a linear time complexity in terms of the number of features m. Assuming features are selected as relevant ones in the first part, when k ¼ only one feature is selected. 4. Microarray data
  • 7. The proportion of selected features has been improved by each of the six algorithms compared with that on the given data sets. This indicates that the six algorithms work well with microarray data. FAST ranks 1 again with the proportion of selected features of 0.71 percent. Of the six algorithms, only CFS cannot choose features for two data sets whose dimensionalities are 19,994 and 49,152, respectively. 5. Data Resource The purposes of evaluating the performance and effectiveness of our proposed FAST algorithm, verifying whether or not the method is potentially useful in practice, and allowing other researchers to confirm our results, 35 publicly available data sets1 were used. The numbers of features of the 35 data sets vary from 37 to 49, 52 with a mean of 7,874. The dimensionalities of the 54.3 percent data sets exceed 5,000, of which 28.6 percent data sets have more than 10,000 features. The 35 data sets cover a range of application domains such as text, image and bio microarray data classification. The corresponding statistical information. Note that for the data sets with continuous-valued features, the well-known off-the- shelf MDL method was used to discredit the continuous values. 6. Irrelevant feature
  • 8. The irrelevant feature removal is straightforward once the right relevance measure is defined or selected, while the redundant feature elimination is a bit of sophisticated. In our proposed FAST algorithm, it involves 1.the construction of the minimum spanning tree from a weighted complete graph; 2. The partitioning of the MST into a forest with each tree representing a cluster; and 3.the selection of representative features from the clusters.
  • 9. SYSTEM FLOW: Data set Irrelevant feature removal Selected Feature Minimum Spinning tree constriction Tree partition & representation feature selection
  • 10. SYSTEM CONFIGURATION:- HARDWARE CONFIGURATION:-  Processor - Pentium –IV  Speed - 1.1 Ghz  RAM - 256 MB(min)  Hard Disk - 20 GB  Key Board - Standard Windows Keyboard  Mouse - Two or Three Button Mouse  Monitor - SVGA SOFTWARE CONFIGURATION:-  Operating System : Windows XP  Programming Language : JAVA  Java Version : JDK 1.6 & above. REFERENCE: Qinbao Song, Jingjie Ni, and Guangtao Wang, “A Fast Clustering-Based Feature
  • 11. Subset Selection Algorithm for High-Dimensional Data”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 1, JANUARY 2013.