SlideShare a Scribd company logo
1 of 5
Download to read offline
Ms.Sonam R. Yadav Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622,Vol. 5, Issue 6, ( Part -2) June 2015, pp.67-71
www.ijera.com 67 | P a g e
A Survey on Constellation Based Attribute Selection Method for
High Dimensional Data
Ms.Sonam R. Yadav*, Prof. Ravi Patki**
*(Department of Computer Engineering, Savitribai Phule Pune University, Pune)
** (Department of Information Technology, Savitribai Phule Pune University, Pune)
ABSTRACT
Attribute Selection is an important topic in Data Mining, because it is the effective way for reducing
dimensionality, removing irrelevant data, removing redundant data, & increasing accuracy of the data. It is the
process of identifying a subset of the most useful attributes that produces compatible results as the original entire
set of attribute. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in
the same group called a cluster are more similar in some sense or another to each other than to those in other
groups (Clusters).
There are various approaches & techniques for attribute subset selection namely Wrapper approach, Filter
Approach, Relief Algorithm, Distributional clustering etc. But each of one having some disadvantages like
unable to handle large volumes of data, computational complexity, accuracy is not guaranteed, difficult to
evaluate and redundancy detection etc.
To get the upper hand on some of these issues in attribute selection this paper proposes a technique that aims
to design an effective clustering based attribute selection method for high dimensional data. Initially, attributes
are divided into clusters by using graph-based clustering method like minimum spanning tree (MST). In the
second step, the most representative attribute that is strongly related to target classes is selected from each cluster
to form a subset of attributes. The purpose is to increase the level of accuracy, reduce dimensionality; shorter
training time and improves generalization by reducing over fitting.
Keywords – Attribute Selection, Clustering, Data Mining, Graph-based Clustering, Minimum Spanning Tree.
I. INTRODUCTION
Attribute selection is the process of identifying a
subset of the most useful features that produces
compatible results as the original entire set of
features. An attribute selection algorithm may be
evaluated from both the efficiency and effectiveness
points of view. While the efficiency concerns the
time required to find a subset of attributes, the
effectiveness is related to the quality of the subset of
attributes [1]. It is an important topic in Data
Mining, because it is the effective way for reducing
dimensionality, removing irrelevant data, removing
redundant data, & increasing accuracy of the data.
Cluster analysis or clustering is the task of
grouping a set of objects in such a way that objects
in the same cluster are more similar to each other
than to those in other clusters. It is a main task of
exploratory data mining, and a common technique
for statistical data analysis, used in many fields,
including machine learning, pattern recognition,
image analysis, information retrieval, and
bioinformatics [2].
Attribute selection, also known as feature
selection or variable subset selection. The process of
selecting a subset of relevant attributes is used in
model construction. The assumption when using an
attribute selection technique is that the data contains
many redundant or irrelevant (noisy) attributes.
Redundant attributes and irrelevant attributes
provide no useful information in any context and it
diminishes the quality of the selected attributes [5].
Attribute selection techniques are a subset of the
more general field of attribute extraction in high
dimensional data. Attribute extraction creates new
data from original attributes of dataset, whereas
attribute selection returns a subset of the attributes.
Attribute selection techniques are frequently used in
domains where there are many attributes and
comparatively few samples.
An attribute selection algorithm proposed in this
paper can be seen as the combination of a search
technique for proposing new feature subsets along
with an efficiency and effectiveness. The simplest
step is to test each possible subset of attributes
finding the one which increases the accuracy.
The proposed algorithm consists of three parts:
(i) removing irrelevant attributes
(ii) constructing a MST from relative one
(iii) Partitioning the MST and selecting
representative attributes.
RESEARCH ARTICLE OPEN ACCESS
Ms.Sonam R. Yadav Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622,Vol. 5, Issue 6, ( Part -2) June 2015, pp.67-71
www.ijera.com 68 | P a g e
II. BASIC CONCEPTS
In this section, a brief introduction about the
basic concepts of the Data Mining, Clustering, and
Minimum Spanning tree is provided.
2.1 Data Mining:
Data mining is the way of discovering the interesting
knowledge from large amounts of information
sources or data warehouses. When there is huge
amount of data and certain information is to be
found out from that data, then different stages of
data mining is applied to it to gain information from
it. Data mining tasks classified into two forms: 1.
Descriptive mining tasks: Represent the general
properties of the data. 2. Predictive mining tasks:
Perform the implication on the current data.
Different Data mining Functionalities are:
Characterization and Discrimination, Mining
Frequent Patterns, Association and Correlations,
Classification and Prediction, Cluster Analysis,
Outlier Analysis, Evolution Analysis. Out of these
functionalities this paper focuses on the cluster
Analysis.
2.2 Cluster Analysis:
Clustering is the grouping similar objects into one
class. A cluster is an association of data objects that
are similar to one another within the same cluster
and are dissimilar to the objects in different clusters.
Document clustering (Text clustering) is closely
related to the concept of data clustering. Document
clustering is a more exact technique for
unsupervised document organization, automatic
topic extraction and fast information retrieval or
filtering. Clustering helps to reduce the dimension
and it simplifies the task as number of dataset is
minimized to form a cluster.
2.3 Minimum Spanning Tree:
A minimum spanning tree (MST) is an undirected,
connected, acyclic weighted graph with minimum
weight. The idea is to start with an empty graph and
try to add edges one at a time, the resulting graph is
a subset of some minimum Spanning tree. Each
graph has several spanning trees. This method is
mainly used to make the appropriate attribute subset
clustering but it take time to construct the cluster.
Various Applications:
 Design of computer networks and
Telecommunications networks
 Transportation networks, water supply
networks, and electrical grids.
 Cluster analysis
 Constructing trees for broadcasting in
computer networks
 Image registration and segmentation
III. BACKGROUND AND COMPARATIVE
ANALYSIS
Many algorithms & techniques have been
proposed up till now for clustering based feature/
attribute selection. Some of them have focused on
minimizing redundant data set and to improve the
accuracy whereas some other features subset
selection algorithm focuses on searching for relevant
features
Fuzzy Logic used for improving the accuracy.
In this, after removal of redundant data, clustering is
done based on Prim’s Algorithm. It results in MST
(Clusters) which guarantees redundancy. Once we
get the minimized redundant data sets, accuracy
automatically gets increases [3].
An algorithm for filtering information based on
the Pearson test approach has been implemented and
tested on feature selection. This test is frequently
used in biomedical data analysis and should be used
only for nominal (discretized) features. FCBF (Fast
Correlation Based Feature Selection) algorithm is
the modified algorithm, FCBF#, has a different
search strategy than the original FCBF and it can
produce more accurate classifiers for the size k
subset selection problem [4].
Relief is well known and good feature set
estimator. Feature set estimators evaluate features
individually. The fundamental idea of Relief
algorithm is estimate the quality of subset of features
by comparing the nearest features with the selected
features. With nearest hit (H) from the same class
and nearest miss (M) from the different class
perform the evaluation function to estimate the
quality of features. This method used to guiding the
search process as well as selecting the most
appropriate feature set.
Hierarchical clustering is a procedure of
grouping data objects into a tree of clusters. It has
two types: 1) Agglomerative approach is a bottom
up approach; the clustering processes starts with
each object forming a separate group and then merge
these atomic group into larger clusters or group,
until all the objects are in a single cluster. 2)
Divisive approach is reverse process of
agglomerative; it is top down approach, starts with
all of objects in the same cluster. In each iteration, a
clusters split up into smaller clusters. And finally
from the individual clusters subset is taken out.
Affinity Propagation Algorithm which is used to
solve wide range of clustering problems. Algorithm
proceeds in the same way i.e. removes irrelevant
data, then construct minimum spanning tree and
partition the same and select the most effective
features. This algorithm aims to reduce the
dimensions of the data. In this, data is clustered
depending upon their similarities with each other
using pair wise constrain. A cluster consists of
Ms.Sonam R. Yadav Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622,Vol. 5, Issue 6, ( Part -2) June 2015, pp.67-71
www.ijera.com 69 | P a g e
features. And then each cluster is treated as a single feature and thus dimensionality is reduced.
Table 3.1 Comparison of Previous Techniques
S.NO Techniques
(or)Algorithms
Advantages Disadvantages
1. Consistency Measure Fast, Remove noisy and
irrelevant data
Unable to handle large
volumes of data
2. Wrapper Approach Accuracy is high Computational
complexity is large
3. Filter Approach Suitable for very large
features
Accuracy is not
guaranteed
4. Agglomerative linkage
algorithm
Reduce Complexity Decrease the Quality
when dimensionality
become high
5. INTERACT Algorithm Improve Accuracy Only deal with
irrelevant data
6. Distributional
clustering
Higher classification
accuracy
Difficult to evaluation
7. Relief Algorithm Improve efficiency and
Reduce Cost
Powerless to detect
Redundant features
Comparison of various algorithms and techniques are discussed as follows. Some of them are having
low computational complexity, higher accuracy or some are having redundancy problem while some are good at
classification level but difficult to evaluate.
IV. PROPOSED SYSTEM
The proposed clustering-based attribute selection algorithm is based on minimum spanning tree (MST). The
proposed algorithm works in two steps. In the first step, attributes are divided into clusters by using graph-
theoretic clustering methods. In the second step, the most representative attribute that is strongly related to target
classes is selected from each cluster to form a subset of attributes. Attributes in different clusters are relatively
independent; the clustering-based strategy of proposed algorithm has a high probability of producing a subset of
useful and independent attributes.
Proposed algorithm is a straight-forward method in which main basic concept used is clustering. The
need for clustering is to find the closest attribute selection among large volume of data i.e. over the high
dimensional data. In business analysis, in hospitals for patient record maintenance, in banking system and for
various marketing purposes high dimensional data is been used. Extraction of one particular file related to
specific thing or person, from such huge amount of high dimensional data is very difficult and time consuming.
High dimensional data refer to multiple attributes to each of the record i.e. suppose in a banking system each
account holder is having various attributes related to account. For example Account details including Account
number, account type, customer number, amount, last date of transactions, last time of transaction, type of
transaction etc. So, from such huge amount of data retrieval is bit complex and time required is more. And in
applications like banking system, patients record maintenance; time is the most important factor. And another
important factor is accuracy.
Proposed algorithm can efficiently and effectively deal with both irrelevant and redundant attributes,
and obtain a good attribute subset. The irrelevant attribute removal is straightforward once the right relevance
measure is defined or selected, while the redundant attribute elimination is a bit of sophisticated.
Ms.Sonam R. Yadav Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622,Vol. 5, Issue 6, ( Part -2) June 2015, pp.67-71
www.ijera.com 70 | P a g e
Figure 4.1 System Architecture
In proposed algorithm, it involves
1. The construction of the minimum spanning tree from a weighted complete graph;
2. The partitioning of the MST into a forest with each tree representing a cluster; and
3. The selection of representative attributes from the clusters.
System architecture consists of different phases of the processing. Overall procedure includes construction of
minimum spanning tree followed by partitioning of the same and finally the process of selection of attributes
from the clusters.
V. UTILITY
 Proposed system may be used across all domains over unlimited data set to search for the effective and
best result.
 Proposed system would be very useful in various hospitals for patient record maintenance which is
nothing but high dimensional data and from that irrelevant and redundant features have to be removed
and finally selected features have to be produced.
 The same is applicable in banking system where the data is high dimensional and for each transaction
or process, access time and accuracy must be very high.
 Proposed system may be used to for various marketing purposes where there is a lot of data and the
only relevant and suitable data have to be published each time.
 Proposed system is useful when there is high dimensional data, and a pattern or a result is to be finding
out from that large amount of data in less time with greater accuracy.
VI. CONCLUSION
Proposed algorithm describes a systematic workflow for the attribute selection over large amount of data
sets. It uses minimum spanning tree to do so. In this paper, we have presented a novel clustering-based attribute
subset selection algorithm for high dimensional data. The algorithm involves (i) removing irrelevant attributes,
(ii) constructing a minimum spanning tree from relative ones, and (iii) partitioning the MST and selecting
representative attributes. Each cluster is treated as a single attribute and thus dimensionality is drastically
reduced.
REFERENCES
[1] N.Magendiran and J.Jayaranjani, An Efficient Fast Clustering-Based Feature Subset Selection Algorithm
for High-Dimensional Data - (ICETS’14)
[2] Mr. M. Senthil Kumar and Ms. V. Latha Jothi, A Fast Clustering Based Feature Subset Selection Using
Affinity Propagation Algorithm - (ICGICT’14)
Ms.Sonam R. Yadav Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622,Vol. 5, Issue 6, ( Part -2) June 2015, pp.67-71
www.ijera.com 71 | P a g e
[3] T.Jaga Priya Vathana, C. Saravanabhavan, and Dr.J. Vellingiri, A Survey On Feature Selection Algorithm
For high Dimensional Data Using Fuzzy Logic - (IJES)
[4] Lei Yu and Huan Liu,Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter
Solution, International Conference on Machine Learning (ICML-2003), Washington DC, 2003.
[5] S.Swetha and A.Harpika, A Novel Feature Subset Algorithm For High Dimensional Data – (IJRRECS)
[6] A. Arauzo-Azofra, J.M. Benitez, and J.L. Castro, A Feature Set Measure Based on Relief, Proc. Fifth Int‟l
Conf. Recent Advances in Soft Computing, pp. 104-109, 2004s
[7] R. Butterworth, G. Piatetsky-Shapiro, and D.A. Simovici, “On Feature Selection through Clustering,”
Proc. IEEE Fifth Int’l Conf.Data Mining, pp. 581-584, 2005
[8] Saurabh Soni & Pratik Patel, ”IFSS – An Improved Filter-Wrapper Algorithm for Feature Subset
Selection”, International Journal of Computer Application (0975-8887), Volume 95-No. 14, June 2014.

More Related Content

What's hot

Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...IRJET Journal
 
Feature Selection Algorithm for Supervised and Semisupervised Clustering
Feature Selection Algorithm for Supervised and Semisupervised ClusteringFeature Selection Algorithm for Supervised and Semisupervised Clustering
Feature Selection Algorithm for Supervised and Semisupervised ClusteringEditor IJCATR
 
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGPATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGIJDKP
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection methodIJSRD
 
Classification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmClassification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmeSAT Publishing House
 
11.software modules clustering an effective approach for reusability
11.software modules clustering an effective approach for  reusability11.software modules clustering an effective approach for  reusability
11.software modules clustering an effective approach for reusabilityAlexander Decker
 
A Review of Various Clustering Techniques
A Review of Various Clustering TechniquesA Review of Various Clustering Techniques
A Review of Various Clustering TechniquesIJEACS
 
Automatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for ClusteringAutomatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for Clusteringidescitation
 
automatic classification in information retrieval
automatic classification in information retrievalautomatic classification in information retrieval
automatic classification in information retrievalBasma Gamal
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
An integrated mechanism for feature selection
An integrated mechanism for feature selectionAn integrated mechanism for feature selection
An integrated mechanism for feature selectionsai kumar
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesIRJET Journal
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
 

What's hot (20)

Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
 
F04463437
F04463437F04463437
F04463437
 
Feature Selection Algorithm for Supervised and Semisupervised Clustering
Feature Selection Algorithm for Supervised and Semisupervised ClusteringFeature Selection Algorithm for Supervised and Semisupervised Clustering
Feature Selection Algorithm for Supervised and Semisupervised Clustering
 
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGPATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
 
G046024851
G046024851G046024851
G046024851
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Classification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmClassification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithm
 
11.software modules clustering an effective approach for reusability
11.software modules clustering an effective approach for  reusability11.software modules clustering an effective approach for  reusability
11.software modules clustering an effective approach for reusability
 
A Review of Various Clustering Techniques
A Review of Various Clustering TechniquesA Review of Various Clustering Techniques
A Review of Various Clustering Techniques
 
Dp33701704
Dp33701704Dp33701704
Dp33701704
 
Du35687693
Du35687693Du35687693
Du35687693
 
Automatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for ClusteringAutomatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for Clustering
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
automatic classification in information retrieval
automatic classification in information retrievalautomatic classification in information retrieval
automatic classification in information retrieval
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
An integrated mechanism for feature selection
An integrated mechanism for feature selectionAn integrated mechanism for feature selection
An integrated mechanism for feature selection
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
 

Viewers also liked

A Synchronizing Devicefor Power Electronic Converters
A Synchronizing Devicefor Power Electronic ConvertersA Synchronizing Devicefor Power Electronic Converters
A Synchronizing Devicefor Power Electronic ConvertersIJERA Editor
 
Indexing Building Evaluation Criteria
Indexing Building Evaluation CriteriaIndexing Building Evaluation Criteria
Indexing Building Evaluation CriteriaIJERA Editor
 
NFC based parking payment system
NFC based parking payment systemNFC based parking payment system
NFC based parking payment systemIJERA Editor
 
Baltimore SPUG - Worst Practices and Blunders
Baltimore SPUG - Worst Practices and BlundersBaltimore SPUG - Worst Practices and Blunders
Baltimore SPUG - Worst Practices and BlundersScott Hoag
 
Game Design & Business Models
Game Design & Business ModelsGame Design & Business Models
Game Design & Business ModelsHeroEngine
 
The Minimum Total Heating Lander By The Maximum Principle Pontryagin
The Minimum Total Heating Lander By The Maximum Principle PontryaginThe Minimum Total Heating Lander By The Maximum Principle Pontryagin
The Minimum Total Heating Lander By The Maximum Principle PontryaginIJERA Editor
 
Fixed points of contractive and Geraghty contraction mappings under the influ...
Fixed points of contractive and Geraghty contraction mappings under the influ...Fixed points of contractive and Geraghty contraction mappings under the influ...
Fixed points of contractive and Geraghty contraction mappings under the influ...IJERA Editor
 
Ibm app security assessment_ds
Ibm app security assessment_dsIbm app security assessment_ds
Ibm app security assessment_dsArun Gopinath
 
Gisand Remote Sensing Applied To Land Use ChangeOf The Prefecture Of CASABLAN...
Gisand Remote Sensing Applied To Land Use ChangeOf The Prefecture Of CASABLAN...Gisand Remote Sensing Applied To Land Use ChangeOf The Prefecture Of CASABLAN...
Gisand Remote Sensing Applied To Land Use ChangeOf The Prefecture Of CASABLAN...IJERA Editor
 

Viewers also liked (20)

B046010815
B046010815B046010815
B046010815
 
A Synchronizing Devicefor Power Electronic Converters
A Synchronizing Devicefor Power Electronic ConvertersA Synchronizing Devicefor Power Electronic Converters
A Synchronizing Devicefor Power Electronic Converters
 
Indexing Building Evaluation Criteria
Indexing Building Evaluation CriteriaIndexing Building Evaluation Criteria
Indexing Building Evaluation Criteria
 
CLAT Brochure
CLAT BrochureCLAT Brochure
CLAT Brochure
 
NFC based parking payment system
NFC based parking payment systemNFC based parking payment system
NFC based parking payment system
 
U04405117121
U04405117121U04405117121
U04405117121
 
F44083035
F44083035F44083035
F44083035
 
Thomas__Smith1
Thomas__Smith1Thomas__Smith1
Thomas__Smith1
 
Baltimore SPUG - Worst Practices and Blunders
Baltimore SPUG - Worst Practices and BlundersBaltimore SPUG - Worst Practices and Blunders
Baltimore SPUG - Worst Practices and Blunders
 
Game Design & Business Models
Game Design & Business ModelsGame Design & Business Models
Game Design & Business Models
 
The Minimum Total Heating Lander By The Maximum Principle Pontryagin
The Minimum Total Heating Lander By The Maximum Principle PontryaginThe Minimum Total Heating Lander By The Maximum Principle Pontryagin
The Minimum Total Heating Lander By The Maximum Principle Pontryagin
 
Aj04605254258
Aj04605254258Aj04605254258
Aj04605254258
 
Dy4301752755
Dy4301752755Dy4301752755
Dy4301752755
 
Fa4301925930
Fa4301925930Fa4301925930
Fa4301925930
 
Fixed points of contractive and Geraghty contraction mappings under the influ...
Fixed points of contractive and Geraghty contraction mappings under the influ...Fixed points of contractive and Geraghty contraction mappings under the influ...
Fixed points of contractive and Geraghty contraction mappings under the influ...
 
A43060105
A43060105A43060105
A43060105
 
Ibm app security assessment_ds
Ibm app security assessment_dsIbm app security assessment_ds
Ibm app security assessment_ds
 
F43012934
F43012934F43012934
F43012934
 
W4502140150
W4502140150W4502140150
W4502140150
 
Gisand Remote Sensing Applied To Land Use ChangeOf The Prefecture Of CASABLAN...
Gisand Remote Sensing Applied To Land Use ChangeOf The Prefecture Of CASABLAN...Gisand Remote Sensing Applied To Land Use ChangeOf The Prefecture Of CASABLAN...
Gisand Remote Sensing Applied To Land Use ChangeOf The Prefecture Of CASABLAN...
 

Similar to A Survey on Constellation Based Attribute Selection Method for High Dimensional Data

Literature Survey: Clustering Technique
Literature Survey: Clustering TechniqueLiterature Survey: Clustering Technique
Literature Survey: Clustering TechniqueEditor IJCATR
 
High dimesional data (FAST clustering ALG) PPT
High dimesional data (FAST clustering ALG) PPTHigh dimesional data (FAST clustering ALG) PPT
High dimesional data (FAST clustering ALG) PPTdeepan v
 
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataIOSR Journals
 
Ijartes v1-i2-006
Ijartes v1-i2-006Ijartes v1-i2-006
Ijartes v1-i2-006IJARTES
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data miningeSAT Journals
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data miningeSAT Publishing House
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Jayanti Pande
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit
 
Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesIRJET Journal
 
Building a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy LogicBuilding a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy LogicIJDKP
 
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILESAN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILEScscpconf
 
An efficient feature selection in
An efficient feature selection inAn efficient feature selection in
An efficient feature selection incsandit
 
Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant AnalysisData mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant AnalysisIOSR Journals
 

Similar to A Survey on Constellation Based Attribute Selection Method for High Dimensional Data (20)

Literature Survey: Clustering Technique
Literature Survey: Clustering TechniqueLiterature Survey: Clustering Technique
Literature Survey: Clustering Technique
 
High dimesional data (FAST clustering ALG) PPT
High dimesional data (FAST clustering ALG) PPTHigh dimesional data (FAST clustering ALG) PPT
High dimesional data (FAST clustering ALG) PPT
 
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online Data
 
Ijartes v1-i2-006
Ijartes v1-i2-006Ijartes v1-i2-006
Ijartes v1-i2-006
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
 
Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Unsupervised Feature Selection Based on the Distribution of Features Attribut...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
 
Building a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy LogicBuilding a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy Logic
 
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILESAN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
 
An efficient feature selection in
An efficient feature selection inAn efficient feature selection in
An efficient feature selection in
 
Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant AnalysisData mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant Analysis
 

Recently uploaded

Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 

Recently uploaded (20)

Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 

A Survey on Constellation Based Attribute Selection Method for High Dimensional Data

  • 1. Ms.Sonam R. Yadav Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622,Vol. 5, Issue 6, ( Part -2) June 2015, pp.67-71 www.ijera.com 67 | P a g e A Survey on Constellation Based Attribute Selection Method for High Dimensional Data Ms.Sonam R. Yadav*, Prof. Ravi Patki** *(Department of Computer Engineering, Savitribai Phule Pune University, Pune) ** (Department of Information Technology, Savitribai Phule Pune University, Pune) ABSTRACT Attribute Selection is an important topic in Data Mining, because it is the effective way for reducing dimensionality, removing irrelevant data, removing redundant data, & increasing accuracy of the data. It is the process of identifying a subset of the most useful attributes that produces compatible results as the original entire set of attribute. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group called a cluster are more similar in some sense or another to each other than to those in other groups (Clusters). There are various approaches & techniques for attribute subset selection namely Wrapper approach, Filter Approach, Relief Algorithm, Distributional clustering etc. But each of one having some disadvantages like unable to handle large volumes of data, computational complexity, accuracy is not guaranteed, difficult to evaluate and redundancy detection etc. To get the upper hand on some of these issues in attribute selection this paper proposes a technique that aims to design an effective clustering based attribute selection method for high dimensional data. Initially, attributes are divided into clusters by using graph-based clustering method like minimum spanning tree (MST). In the second step, the most representative attribute that is strongly related to target classes is selected from each cluster to form a subset of attributes. The purpose is to increase the level of accuracy, reduce dimensionality; shorter training time and improves generalization by reducing over fitting. Keywords – Attribute Selection, Clustering, Data Mining, Graph-based Clustering, Minimum Spanning Tree. I. INTRODUCTION Attribute selection is the process of identifying a subset of the most useful features that produces compatible results as the original entire set of features. An attribute selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of attributes, the effectiveness is related to the quality of the subset of attributes [1]. It is an important topic in Data Mining, because it is the effective way for reducing dimensionality, removing irrelevant data, removing redundant data, & increasing accuracy of the data. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same cluster are more similar to each other than to those in other clusters. It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics [2]. Attribute selection, also known as feature selection or variable subset selection. The process of selecting a subset of relevant attributes is used in model construction. The assumption when using an attribute selection technique is that the data contains many redundant or irrelevant (noisy) attributes. Redundant attributes and irrelevant attributes provide no useful information in any context and it diminishes the quality of the selected attributes [5]. Attribute selection techniques are a subset of the more general field of attribute extraction in high dimensional data. Attribute extraction creates new data from original attributes of dataset, whereas attribute selection returns a subset of the attributes. Attribute selection techniques are frequently used in domains where there are many attributes and comparatively few samples. An attribute selection algorithm proposed in this paper can be seen as the combination of a search technique for proposing new feature subsets along with an efficiency and effectiveness. The simplest step is to test each possible subset of attributes finding the one which increases the accuracy. The proposed algorithm consists of three parts: (i) removing irrelevant attributes (ii) constructing a MST from relative one (iii) Partitioning the MST and selecting representative attributes. RESEARCH ARTICLE OPEN ACCESS
  • 2. Ms.Sonam R. Yadav Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622,Vol. 5, Issue 6, ( Part -2) June 2015, pp.67-71 www.ijera.com 68 | P a g e II. BASIC CONCEPTS In this section, a brief introduction about the basic concepts of the Data Mining, Clustering, and Minimum Spanning tree is provided. 2.1 Data Mining: Data mining is the way of discovering the interesting knowledge from large amounts of information sources or data warehouses. When there is huge amount of data and certain information is to be found out from that data, then different stages of data mining is applied to it to gain information from it. Data mining tasks classified into two forms: 1. Descriptive mining tasks: Represent the general properties of the data. 2. Predictive mining tasks: Perform the implication on the current data. Different Data mining Functionalities are: Characterization and Discrimination, Mining Frequent Patterns, Association and Correlations, Classification and Prediction, Cluster Analysis, Outlier Analysis, Evolution Analysis. Out of these functionalities this paper focuses on the cluster Analysis. 2.2 Cluster Analysis: Clustering is the grouping similar objects into one class. A cluster is an association of data objects that are similar to one another within the same cluster and are dissimilar to the objects in different clusters. Document clustering (Text clustering) is closely related to the concept of data clustering. Document clustering is a more exact technique for unsupervised document organization, automatic topic extraction and fast information retrieval or filtering. Clustering helps to reduce the dimension and it simplifies the task as number of dataset is minimized to form a cluster. 2.3 Minimum Spanning Tree: A minimum spanning tree (MST) is an undirected, connected, acyclic weighted graph with minimum weight. The idea is to start with an empty graph and try to add edges one at a time, the resulting graph is a subset of some minimum Spanning tree. Each graph has several spanning trees. This method is mainly used to make the appropriate attribute subset clustering but it take time to construct the cluster. Various Applications:  Design of computer networks and Telecommunications networks  Transportation networks, water supply networks, and electrical grids.  Cluster analysis  Constructing trees for broadcasting in computer networks  Image registration and segmentation III. BACKGROUND AND COMPARATIVE ANALYSIS Many algorithms & techniques have been proposed up till now for clustering based feature/ attribute selection. Some of them have focused on minimizing redundant data set and to improve the accuracy whereas some other features subset selection algorithm focuses on searching for relevant features Fuzzy Logic used for improving the accuracy. In this, after removal of redundant data, clustering is done based on Prim’s Algorithm. It results in MST (Clusters) which guarantees redundancy. Once we get the minimized redundant data sets, accuracy automatically gets increases [3]. An algorithm for filtering information based on the Pearson test approach has been implemented and tested on feature selection. This test is frequently used in biomedical data analysis and should be used only for nominal (discretized) features. FCBF (Fast Correlation Based Feature Selection) algorithm is the modified algorithm, FCBF#, has a different search strategy than the original FCBF and it can produce more accurate classifiers for the size k subset selection problem [4]. Relief is well known and good feature set estimator. Feature set estimators evaluate features individually. The fundamental idea of Relief algorithm is estimate the quality of subset of features by comparing the nearest features with the selected features. With nearest hit (H) from the same class and nearest miss (M) from the different class perform the evaluation function to estimate the quality of features. This method used to guiding the search process as well as selecting the most appropriate feature set. Hierarchical clustering is a procedure of grouping data objects into a tree of clusters. It has two types: 1) Agglomerative approach is a bottom up approach; the clustering processes starts with each object forming a separate group and then merge these atomic group into larger clusters or group, until all the objects are in a single cluster. 2) Divisive approach is reverse process of agglomerative; it is top down approach, starts with all of objects in the same cluster. In each iteration, a clusters split up into smaller clusters. And finally from the individual clusters subset is taken out. Affinity Propagation Algorithm which is used to solve wide range of clustering problems. Algorithm proceeds in the same way i.e. removes irrelevant data, then construct minimum spanning tree and partition the same and select the most effective features. This algorithm aims to reduce the dimensions of the data. In this, data is clustered depending upon their similarities with each other using pair wise constrain. A cluster consists of
  • 3. Ms.Sonam R. Yadav Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622,Vol. 5, Issue 6, ( Part -2) June 2015, pp.67-71 www.ijera.com 69 | P a g e features. And then each cluster is treated as a single feature and thus dimensionality is reduced. Table 3.1 Comparison of Previous Techniques S.NO Techniques (or)Algorithms Advantages Disadvantages 1. Consistency Measure Fast, Remove noisy and irrelevant data Unable to handle large volumes of data 2. Wrapper Approach Accuracy is high Computational complexity is large 3. Filter Approach Suitable for very large features Accuracy is not guaranteed 4. Agglomerative linkage algorithm Reduce Complexity Decrease the Quality when dimensionality become high 5. INTERACT Algorithm Improve Accuracy Only deal with irrelevant data 6. Distributional clustering Higher classification accuracy Difficult to evaluation 7. Relief Algorithm Improve efficiency and Reduce Cost Powerless to detect Redundant features Comparison of various algorithms and techniques are discussed as follows. Some of them are having low computational complexity, higher accuracy or some are having redundancy problem while some are good at classification level but difficult to evaluate. IV. PROPOSED SYSTEM The proposed clustering-based attribute selection algorithm is based on minimum spanning tree (MST). The proposed algorithm works in two steps. In the first step, attributes are divided into clusters by using graph- theoretic clustering methods. In the second step, the most representative attribute that is strongly related to target classes is selected from each cluster to form a subset of attributes. Attributes in different clusters are relatively independent; the clustering-based strategy of proposed algorithm has a high probability of producing a subset of useful and independent attributes. Proposed algorithm is a straight-forward method in which main basic concept used is clustering. The need for clustering is to find the closest attribute selection among large volume of data i.e. over the high dimensional data. In business analysis, in hospitals for patient record maintenance, in banking system and for various marketing purposes high dimensional data is been used. Extraction of one particular file related to specific thing or person, from such huge amount of high dimensional data is very difficult and time consuming. High dimensional data refer to multiple attributes to each of the record i.e. suppose in a banking system each account holder is having various attributes related to account. For example Account details including Account number, account type, customer number, amount, last date of transactions, last time of transaction, type of transaction etc. So, from such huge amount of data retrieval is bit complex and time required is more. And in applications like banking system, patients record maintenance; time is the most important factor. And another important factor is accuracy. Proposed algorithm can efficiently and effectively deal with both irrelevant and redundant attributes, and obtain a good attribute subset. The irrelevant attribute removal is straightforward once the right relevance measure is defined or selected, while the redundant attribute elimination is a bit of sophisticated.
  • 4. Ms.Sonam R. Yadav Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622,Vol. 5, Issue 6, ( Part -2) June 2015, pp.67-71 www.ijera.com 70 | P a g e Figure 4.1 System Architecture In proposed algorithm, it involves 1. The construction of the minimum spanning tree from a weighted complete graph; 2. The partitioning of the MST into a forest with each tree representing a cluster; and 3. The selection of representative attributes from the clusters. System architecture consists of different phases of the processing. Overall procedure includes construction of minimum spanning tree followed by partitioning of the same and finally the process of selection of attributes from the clusters. V. UTILITY  Proposed system may be used across all domains over unlimited data set to search for the effective and best result.  Proposed system would be very useful in various hospitals for patient record maintenance which is nothing but high dimensional data and from that irrelevant and redundant features have to be removed and finally selected features have to be produced.  The same is applicable in banking system where the data is high dimensional and for each transaction or process, access time and accuracy must be very high.  Proposed system may be used to for various marketing purposes where there is a lot of data and the only relevant and suitable data have to be published each time.  Proposed system is useful when there is high dimensional data, and a pattern or a result is to be finding out from that large amount of data in less time with greater accuracy. VI. CONCLUSION Proposed algorithm describes a systematic workflow for the attribute selection over large amount of data sets. It uses minimum spanning tree to do so. In this paper, we have presented a novel clustering-based attribute subset selection algorithm for high dimensional data. The algorithm involves (i) removing irrelevant attributes, (ii) constructing a minimum spanning tree from relative ones, and (iii) partitioning the MST and selecting representative attributes. Each cluster is treated as a single attribute and thus dimensionality is drastically reduced. REFERENCES [1] N.Magendiran and J.Jayaranjani, An Efficient Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data - (ICETS’14) [2] Mr. M. Senthil Kumar and Ms. V. Latha Jothi, A Fast Clustering Based Feature Subset Selection Using Affinity Propagation Algorithm - (ICGICT’14)
  • 5. Ms.Sonam R. Yadav Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622,Vol. 5, Issue 6, ( Part -2) June 2015, pp.67-71 www.ijera.com 71 | P a g e [3] T.Jaga Priya Vathana, C. Saravanabhavan, and Dr.J. Vellingiri, A Survey On Feature Selection Algorithm For high Dimensional Data Using Fuzzy Logic - (IJES) [4] Lei Yu and Huan Liu,Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution, International Conference on Machine Learning (ICML-2003), Washington DC, 2003. [5] S.Swetha and A.Harpika, A Novel Feature Subset Algorithm For High Dimensional Data – (IJRRECS) [6] A. Arauzo-Azofra, J.M. Benitez, and J.L. Castro, A Feature Set Measure Based on Relief, Proc. Fifth Int‟l Conf. Recent Advances in Soft Computing, pp. 104-109, 2004s [7] R. Butterworth, G. Piatetsky-Shapiro, and D.A. Simovici, “On Feature Selection through Clustering,” Proc. IEEE Fifth Int’l Conf.Data Mining, pp. 581-584, 2005 [8] Saurabh Soni & Pratik Patel, ”IFSS – An Improved Filter-Wrapper Algorithm for Feature Subset Selection”, International Journal of Computer Application (0975-8887), Volume 95-No. 14, June 2014.