SlideShare a Scribd company logo
International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 1 Issue 3, December 2012
www.ijsr.net
A New Link Based Approach for Categorical Data
Clustering
Chiranth B.O1
, Panduranga Rao M.V2
, Basavaraj Patil S3
1
Dept. of Computer science and Engineering,
B.T.L Institute of Technology, Bangalore, India
bo.chiranth@gmail.com
2
Dept. of Computer Science and Engineering,
B.T.L Institute of Technology, Bangalore, India
raomvp@yahoo.com
3
Dept of Computer Science and Engineering,
B.T.L Institute of Technology, Bangalore, India
csehodbtlit@gmail.com
Abstract:The data generated by conventional categorical data clustering is incomplete because the information provided is also
incomplete. This project presents a new link-based approach, which improves the categorical clustering by discovering unknown entries
through similarity between clusters in an ensemble. A graph partitioning technique is applied to a weighted bipartite graph to obtain the
final clustering result. So the link-based approach outperforms both conventional clustering algorithms for categorical data and well-
known cluster ensemble technique.Data clustering is one of the fundamental tools we have for understanding the structure of a data set.
It plays a crucial, foundation role in machine learning, data mining, information retrieval and pattern recognition. The experimental
results on multiple real data sets suggest that the proposed link-based method almost always outperforms both conventional clustering
algorithms for categorical data and well-known cluster ensemble technique. This paper proposes an Algorithm called Weighted Triple-
Quality (WTQ), which also uses k-means algorithm for basic clustering. Once using does the basic clustering consensus functions we
can get cluster ensembles of categorical data. This categorical data is converted to refined matrix.
Keywords: Clustering, categorical data, cluster ensembles, link-based similarity, data mining
1. Introduction
Dataclustering is one of the fundamental tools we have for
understanding the structure of a data set. It plays a crucial,
foundational role in machine learning, data mining,
information retrieval, and pattern recognition. Clustering
aims to categorize data into groups or clusters such that the
data in the same cluster are more similar to each other than to
those in different clusters. Many well established clustering
algorithms, such as k-means [1] and PAM [2], have been
designed for numerical data, whose inherent properties can
be naturally employed to measure a distance (e.g.,
Euclidean) between feature vectors [3], [4]. However, these
cannot be directly applied for clustering of categorical data,
where domain values are discrete and have no ordering
defined. An example of categorical attribute is Sex={female,
male} or Shape={circle, rectangle}.
As a result, many categorical data clustering algorithms
have been introduced in recent years, with applications to
interesting domains such as protein interaction data [5]. The
initial method was developedin [6] by making use of
Gower’s similarity coefficient [7]. Following that, the k-
modes algorithm in [8] extended the conventional K-means
with a simple matching dissimilarity measure and a
frequency-based method to update centroids (i.e., clusters’
representative). As a single-pass algorithm, Squeezer [9]
makes use of a prespecified similarity threshold to determine
which of the existing clusters (or a new cluster) to which a
data point under examination is assigned. LIMBO [10] is a
hierarchical clustering algorithm that uses the Information
Bottleneck (IB) framework to define a distance measure for
Categorical tuples. The concepts of evolutionary computing
and genetic algorithm have also been adopted by a
partitioning method for categorical data, i.e., GAClust [11].
Cobweb [12] is a model-based method primarily exploited
for categorical data sets. Different graph models have also
been investigated by the STIRR [13], ROCK [14], and
CLICK [15] techniques. In addition, several density-based
algorithms have also been devised for such purpose, for
instance, CACTUS [16], COOLCAT [17], and CLOPE [18].
Although, a large number of algorithms have been
introduced for clustering categorical data, the No Free Lunch
theorem [19] suggeststhere is no single clustering algorithm
that performs best for all data sets [20] and can discover all
types of cluster shapes and structures presented in data. Each
algorithm has its own strengths and weaknesses. For a
particular data set, different algorithms, or even the same
algorithm with different parameters, usually provide distinct
solutions. Therefore, it is difficult for users to decide which
algorithm would be the proper alternative for a given set of
data. Recently, cluster ensembles have emerged as an
effective solution that is ableto overcome these limitations,
and improve the robustness as well as the quality of
clustering results. The main objective of cluster ensembles is
to combine different clustering decisions in such a way as to
achieve accuracy superior to that of any individual
clustering. Examples of well-known ensemble methods are:
1. The feature-based approach that transforms the problem
of cluster ensembles to clustering categorical data (i.e.,
cluster labels) [11],
8
International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 1 Issue 3, December 2012
www.ijsr.net
2. The direct approach that finds the final partition through
relabeling the base clustering results [15], [16],
3. Graph-based algorithms that employ graph partitioning
methodology [17], [18], [19], and
4. The pairwise-similarity approach that makes use of co-
occurrence relations between data points [20], [11], [12].
2. Cluster Ensemble Methods
Figure 1. The Basic Process of Cluster Ensembles
It has been shown that ensembles are most effective when
constructed from a set of predictors whose errors are
dissimilar [17]. To a great extent, diversity among ensemble
members is introduced to enhance the result of an ensemble
[18]. Particularly for data clustering, the results obtained
with any single algorithm over much iteration are usually
very similar. In such a circumstance where all ensemble
members agree on how a data set should be partitioned,
aggregating the base clustering results will show no
improvement over any of the constituent members. As a
result, several heuristics have been proposed to introduce
artificial instabilities in clustering algorithms, giving
diversity within a cluster ensemble. The following ensemble
generation methods yield different clustering of the same
data, by exploiting different cluster models and different data
partitions.
1. Homogeneous ensembles. Base clustering are
created using repeated runs of a single clustering
algorithm, with several sets of parameter
initializations, such as cluster centers of the k-
means clustering technique [11], [20], [19].
2. Homogeneous ensembles. Base clustering are
created using repeated runs of a single clustering
algorithm, with several sets of parameter
initializations, such as cluster centers of the k-
means clustering technique [11], [20], [19].
3. Data subspace/sampling. A cluster ensemble can
also be achieved by generating base clustering from
different subsets of initial data. It is intuitively
assumed that each clustering algorithm will provide
different levels of performance for different
partitions of a data set [17].Data subspace/sampling.
A cluster ensemble can also be achieved by
generating base clustering from different subsets of
initial data. It is intuitively assumed that each
clustering algorithm will provide different levels of
performance for different partitions of a data set
[17].
4. Data subspace/sampling. A cluster ensemble can
also be achieved by generating base clustering from
different subsets of initial data. It is intuitively
assumed that each clustering algorithm will provide
different levels of performance for different
partitions of a data set [17].
5. Mixed heuristics. In addition to using one of the
aforementioned methods, any combination of them
can be applied as well [17], [11], [3], [8], [20], [2],
[19].
Figure 2. Examples of (a) cluster ensemble and the
corresponding (b) label assignment matrix, (c) pair wise
similarity matrix, and (d) binary cluster association matrix
respectively.
3. A New Link Based Approach
Existing cluster ensemble methods to categorical data
analysis rely on the typical pairwise-similarity and binary
cluster-association matrices [8], [9], which summarize the
underlying ensemble information at a rather coarse level.
Many matrix entries are left “unknown” and simply recorded
as “0.” Regardless of a consensus function, the quality of the
final clustering result may be degraded. As a result, a link-
based method has been established with the ability to
discover unknown values and, hence, improve the accuracy
of the ultimate data partition [13].
In spite of promising findings, this initial framework is
based on data point- data point pairwise-similarity matrix,
which is highly expensive to obtain. The link-based
similarity technique, SimRank [2] that is employed to
estimate the similarity among data points is inapplicable to a
large data set. To overcome these problems, a new link-
based cluster ensemble (LCE) approach is introduced herein.
It is more efficient than the former model, where a BM-like
matrix is used to represent the ensemble information.
Figure 3. A Link Based Cluster Framework
9
International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 1 Issue 3, December 2012
www.ijsr.net
4. Weighted Triple Quality (WTQ): A New
Link Based Algorithm
Figure 4. An Example of Cluster Network
Algorithm: WAQ (G, Cx, Cy):
5. Experimental Design
The experiments set out to investigate the performance of
LCE compared to a number of clustering algorithms, both
specifically developed for categorical data analysis and those
state-of-the-art cluster ensemble techniques found in
literature. Baseline model is also included in the assessment,
which simply applies SPEC, as a consensus function, to the
conventional BM. For comparison, as in [8], each clustering
method divides data points into a partition of K (the number
of true classes for each data set) clusters, which is then
evaluated against the corresponding true partition using the
following set of label-based evaluation indices:
Classification Accuracy (CA) [3], Normalized Mutual
Information (NMI) [9] and Adjusted Rand (AR) Index [9].
Further details of these quality measures are provided in
Section I of the online supplementaryNote that, true classes
are known for all data sets but are explicitly not used by the
cluster ensemble process. They are only used to evaluate the
quality of the clustering results.as follows:
Insert/Break/Continuous.
6. Conclusion
This paper presents a novel, highly effective link-based
cluster ensemble approach to categorical data clustering. It
transforms the original categorical data matrix to an
information-preserving numerical variation (RM), to which
an effective graph partitioning technique can be directly
applied. The problem of constructing the RM is efficiently
resolved by the similarity among categorical labels (or
clusters), using the Weighted Triple-Quality similarity
algorithm. The empirical study, with different ensemble
types, validity measures, and data sets, suggests that the
proposed link-based method usually achieves superior
clustering results compared to those of the traditional
categorical data algorithms and benchmark cluster ensemble
techniques. The prominent future work includes an extensive
study regarding the behavior of other link-based similarity
measures within this problem context. Also, the new method
will be applied to specific domains, including tourism and
medical data sets.
References
[1] D.S. Hochbaum and D.B. Shmoys, “A Best Possible
Heuristic for the K-Center Problem,” Math. of
Operational Research, vol. 10, no. 2, pp. 180-184,
1985. R. Caves, Multinational Enterprise and Economic
Analysis, Cambridge University Press, Cambridge,
1982. (book style)
[2] D.S. Hochbaum and D.B. Shmoys, “A Best Possible Heuristic
for the K-Center Problem,” Math. of Operational Research,
vol. 10, no. 2, pp. 180-184, 1985.
[3] A.K. Jain and R.C. Dubes, Algorithms for Clustering.
Prentice-Hall, 1998.
[4] P. Zhang, X. Wang, and P.X. Song, “Clustering
Categorical Data Based on Distance Vectors,” The J.
Am. Statistical Assoc., vol. 101, no. 473, pp. 355-367,
2006.
[5] J. Grambeier and A. Rudolph, “Techniques of Cluster
Algo- rithms in Data Mining,” Data Mining and
Knowledge Discovery, vol. 6, pp. 303-360, 2002.
[6] K.C. Gowda and E. Diday, “Symbolic Clustering Using
a New Dissimilarity Measure,” Pattern Recognition,
vol. 24, no. 6, pp. 567- 578, 1991.
[7] J.C. Gower, “A General Coefficient of Similarity and
Some of Its Properties,” Biometrics, vol. 27, pp. 857-
871, 1971.
[8] Z. Huang, “Extensions to the K-Means Algorithm for
Clustering Large Data Sets with Categorical Values,”
Data Mining and Knowledge Discovery, vol. 2, pp.
283-304, 1998.
[9] Z. He, X. Xu, and S. Deng, “Squeezer: An Efficient
Algorithm for Clustering Categorical Data,” J.
Computer Science and Technology, vol. 17, no. 5, pp.
611-624, 2002.
[10] P. Andritsos and V. Tzerpos, “Information-Theoretic
Software Clustering,” IEEE Trans. Software Eng., vol.
31, no. 2, pp. 150-165, Feb. 2005.
[11] D. Cristofor and D. Simovici, “Finding Median
Partitions Using Information-Theoretical-Based
Genetic Algorithms,” J. Universal Computer Science,
vol. 8, no. 2, pp. 153-172, 2002.
[12] D.H. Fisher, “Knowledge Acquisition via Incremental
Conceptual Clustering,” Machine Learning, vol. 2, pp.
139-172, 1987.
[13] D. Gibson, J. Kleinberg, and P. Raghavan, “Clustering
Catego- rical Data: An Approach Based on Dynamical
Systems,” VLDB J., vol. 8, nos. 3-4, pp. 222-236, 2000.
[14] S. Guha, R. Rastogi, and K. Shim, “ROCK: A Robust
Clustering Algorithm for Categorical Attributes,”
Information Systems, vol. 25, no. 5, pp. 345-366, 2000.
[15] M.J. Zaki and M. Peters, “Clicks: Mining Subspace
Clusters in Categorical Data via Kpartite Maximal
10
International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 1 Issue 3, December 2012
www.ijsr.net
Cliques,” Proc. Int’l Conf. Data Eng. (ICDE), pp. 355-
356, 2005.
[16] V. Ganti, J. Gehrke, and R. Ramakrishnan, “CACTUS:
Clustering Categorical Data Using Summaries,” Proc.
ACM SIGKDD Int’l Conf. Knowledge Discovery and
Data Mining (KDD), pp. 73-83, 1999.
[17] D. Barbara, Y. Li, and J. Couto, “COOLCAT: An
Entropy-Based Algorithm for Categorical Clustering,”
Proc. Int’l Conf. Information and Knowledge
Management (CIKM), pp. 582-589, 2002.
[18] Y. Yang, S. Guan, and J. You, “CLOPE: A Fast and
Effective Clustering Algorithm for Transactional Data,”
Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery
and Data Mining (KDD), pp. 682- 687, 2002.
[19] D.H. Wolpert and W.G. Macready, “No Free Lunch
Theorems for Search,” Technical Report SFI-TR-95-
02-010, Santa Fe Inst., 1995.
[20] L.I. Kuncheva and S.T. Hadjitodorov, “Using Diversity
in Cluster Ensembles,” Proc. IEEE Int’l Conf. Systems,
Man and Cybernetics, p p. 1214-1219, 2004.
Chiranth B Oreceived the B.E. degree in
Information Science and Engineering from
Golden Valley Institute of Technology, KGF.
At present pursuing the Master of Technology
in Computer Science and Engineering
Department at BTL institute of Technology, Bangalore.
M.V.Panduranga Rao is a research scholar at National
Institute of Technology Karnataka,
Mangalore, India. His research interests are in
the field of Real time and Embedded systems
on Linux platform and Security. He has
published various research papers across
India and in IEEE international conference in
Okinawa, Japan. He has also authored two reference books
on Linux Internals. He is the Life member of Indian Society
for Technical Education and IAENG.
S. Basavaraj Patil Started career as Faculty Member in
Vijayanagar Engineering College, Bellary
(1993-1997). Then moved to Kuvempu
University BDT College of Engineering as
a Faculty member. During this period
(1997-2004), carried out Ph.D. Research
work on Neural Network based Techniques
for Pattern Recognition in the area Computer Science &
Engineering. Also consulted many software companies
(ArisGlobal, Manthan Systems, etc.,) in the area of data
mining and pattern recognition His areas of research interests
are Neural Networks and Genetic Algorithms. He is
presently mentoring two PhD and one MSc (Engg by
Research) Students. He is presently working as professor at
BTL institute of technology.
11

More Related Content

What's hot

PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGPATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
IJDKP
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
IJERD Editor
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
IOSR Journals
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
PRAWEEN KUMAR
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
Alexander Decker
 
Iaetsd a survey on one class clustering
Iaetsd a survey on one class clusteringIaetsd a survey on one class clustering
Iaetsd a survey on one class clustering
Iaetsd Iaetsd
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957
IJMER
 
Similarity distance measures
Similarity  distance measuresSimilarity  distance measures
Similarity distance measures
thilagasna
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduce
IJDKP
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
acijjournal
 
Juha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somJuha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somArchiLab 7
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
IJCSIS Research Publications
 
Recent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A ReviewRecent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A Review
IOSRjournaljce
 
Combined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex dataCombined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex data
csandit
 
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATACOMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
cscpconf
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databases
ijdms
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Editor IJARCET
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce
IJECEIAES
 
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERINGAN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
International Journal of Technical Research & Application
 
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
IRJET Journal
 

What's hot (20)

PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGPATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
Iaetsd a survey on one class clustering
Iaetsd a survey on one class clusteringIaetsd a survey on one class clustering
Iaetsd a survey on one class clustering
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957
 
Similarity distance measures
Similarity  distance measuresSimilarity  distance measures
Similarity distance measures
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduce
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Juha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somJuha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the som
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
Recent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A ReviewRecent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A Review
 
Combined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex dataCombined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex data
 
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATACOMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databases
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce
 
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERINGAN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
 
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
 

Similar to A new link based approach for categorical data clustering

F04463437
F04463437F04463437
F04463437
IOSR-JEN
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
IJERA Editor
 
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...
IOSR Journals
 
G0354451
G0354451G0354451
G0354451
iosrjournals
 
Introduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIntroduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering Ensemble
IJSRD
 
A Kernel Approach for Semi-Supervised Clustering Framework for High Dimension...
A Kernel Approach for Semi-Supervised Clustering Framework for High Dimension...A Kernel Approach for Semi-Supervised Clustering Framework for High Dimension...
A Kernel Approach for Semi-Supervised Clustering Framework for High Dimension...
IJCSIS Research Publications
 
Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysis
inventy
 
A03202001005
A03202001005A03202001005
A03202001005
theijes
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
 
Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks
IJECEIAES
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online Data
IOSR Journals
 
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
IJECEIAES
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
IOSR Journals
 
Applications Of Clustering Techniques In Data Mining A Comparative Study
Applications Of Clustering Techniques In Data Mining  A Comparative StudyApplications Of Clustering Techniques In Data Mining  A Comparative Study
Applications Of Clustering Techniques In Data Mining A Comparative Study
Fiona Phillips
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET Journal
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
IJORCS
 
A0360109
A0360109A0360109
A0360109
iosrjournals
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
IJECEIAES
 

Similar to A new link based approach for categorical data clustering (20)

F04463437
F04463437F04463437
F04463437
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
 
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...
 
G0354451
G0354451G0354451
G0354451
 
Introduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIntroduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering Ensemble
 
A Kernel Approach for Semi-Supervised Clustering Framework for High Dimension...
A Kernel Approach for Semi-Supervised Clustering Framework for High Dimension...A Kernel Approach for Semi-Supervised Clustering Framework for High Dimension...
A Kernel Approach for Semi-Supervised Clustering Framework for High Dimension...
 
Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysis
 
A03202001005
A03202001005A03202001005
A03202001005
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online Data
 
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
 
Applications Of Clustering Techniques In Data Mining A Comparative Study
Applications Of Clustering Techniques In Data Mining  A Comparative StudyApplications Of Clustering Techniques In Data Mining  A Comparative Study
Applications Of Clustering Techniques In Data Mining A Comparative Study
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
A0360109
A0360109A0360109
A0360109
 
Spe165 t
Spe165 tSpe165 t
Spe165 t
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 

More from International Journal of Science and Research (IJSR)

Innovations in the Diagnosis and Treatment of Chronic Heart Failure
Innovations in the Diagnosis and Treatment of Chronic Heart FailureInnovations in the Diagnosis and Treatment of Chronic Heart Failure
Innovations in the Diagnosis and Treatment of Chronic Heart Failure
International Journal of Science and Research (IJSR)
 
Design and implementation of carrier based sinusoidal pwm (bipolar) inverter
Design and implementation of carrier based sinusoidal pwm (bipolar) inverterDesign and implementation of carrier based sinusoidal pwm (bipolar) inverter
Design and implementation of carrier based sinusoidal pwm (bipolar) inverter
International Journal of Science and Research (IJSR)
 
Polarization effect of antireflection coating for soi material system
Polarization effect of antireflection coating for soi material systemPolarization effect of antireflection coating for soi material system
Polarization effect of antireflection coating for soi material system
International Journal of Science and Research (IJSR)
 
Image resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fittingImage resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fitting
International Journal of Science and Research (IJSR)
 
Ad hoc networks technical issues on radio links security & qo s
Ad hoc networks technical issues on radio links security & qo sAd hoc networks technical issues on radio links security & qo s
Ad hoc networks technical issues on radio links security & qo s
International Journal of Science and Research (IJSR)
 
Microstructure analysis of the carbon nano tubes aluminum composite with diff...
Microstructure analysis of the carbon nano tubes aluminum composite with diff...Microstructure analysis of the carbon nano tubes aluminum composite with diff...
Microstructure analysis of the carbon nano tubes aluminum composite with diff...
International Journal of Science and Research (IJSR)
 
Improving the life of lm13 using stainless spray ii coating for engine applic...
Improving the life of lm13 using stainless spray ii coating for engine applic...Improving the life of lm13 using stainless spray ii coating for engine applic...
Improving the life of lm13 using stainless spray ii coating for engine applic...
International Journal of Science and Research (IJSR)
 
An overview on development of aluminium metal matrix composites with hybrid r...
An overview on development of aluminium metal matrix composites with hybrid r...An overview on development of aluminium metal matrix composites with hybrid r...
An overview on development of aluminium metal matrix composites with hybrid r...
International Journal of Science and Research (IJSR)
 
Pesticide mineralization in water using silver nanoparticles incorporated on ...
Pesticide mineralization in water using silver nanoparticles incorporated on ...Pesticide mineralization in water using silver nanoparticles incorporated on ...
Pesticide mineralization in water using silver nanoparticles incorporated on ...
International Journal of Science and Research (IJSR)
 
Comparative study on computers operated by eyes and brain
Comparative study on computers operated by eyes and brainComparative study on computers operated by eyes and brain
Comparative study on computers operated by eyes and brain
International Journal of Science and Research (IJSR)
 
T s eliot and the concept of literary tradition and the importance of allusions
T s eliot and the concept of literary tradition and the importance of allusionsT s eliot and the concept of literary tradition and the importance of allusions
T s eliot and the concept of literary tradition and the importance of allusions
International Journal of Science and Research (IJSR)
 
Effect of select yogasanas and pranayama practices on selected physiological ...
Effect of select yogasanas and pranayama practices on selected physiological ...Effect of select yogasanas and pranayama practices on selected physiological ...
Effect of select yogasanas and pranayama practices on selected physiological ...
International Journal of Science and Research (IJSR)
 
Grid computing for load balancing strategies
Grid computing for load balancing strategiesGrid computing for load balancing strategies
Grid computing for load balancing strategies
International Journal of Science and Research (IJSR)
 
A new algorithm to improve the sharing of bandwidth
A new algorithm to improve the sharing of bandwidthA new algorithm to improve the sharing of bandwidth
A new algorithm to improve the sharing of bandwidth
International Journal of Science and Research (IJSR)
 
Main physical causes of climate change and global warming a general overview
Main physical causes of climate change and global warming   a general overviewMain physical causes of climate change and global warming   a general overview
Main physical causes of climate change and global warming a general overview
International Journal of Science and Research (IJSR)
 
Performance assessment of control loops
Performance assessment of control loopsPerformance assessment of control loops
Performance assessment of control loops
International Journal of Science and Research (IJSR)
 
Capital market in bangladesh an overview
Capital market in bangladesh an overviewCapital market in bangladesh an overview
Capital market in bangladesh an overview
International Journal of Science and Research (IJSR)
 
Faster and resourceful multi core web crawling
Faster and resourceful multi core web crawlingFaster and resourceful multi core web crawling
Faster and resourceful multi core web crawling
International Journal of Science and Research (IJSR)
 
Extended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy imagesExtended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy images
International Journal of Science and Research (IJSR)
 
Parallel generators of pseudo random numbers with control of calculation errors
Parallel generators of pseudo random numbers with control of calculation errorsParallel generators of pseudo random numbers with control of calculation errors
Parallel generators of pseudo random numbers with control of calculation errors
International Journal of Science and Research (IJSR)
 

More from International Journal of Science and Research (IJSR) (20)

Innovations in the Diagnosis and Treatment of Chronic Heart Failure
Innovations in the Diagnosis and Treatment of Chronic Heart FailureInnovations in the Diagnosis and Treatment of Chronic Heart Failure
Innovations in the Diagnosis and Treatment of Chronic Heart Failure
 
Design and implementation of carrier based sinusoidal pwm (bipolar) inverter
Design and implementation of carrier based sinusoidal pwm (bipolar) inverterDesign and implementation of carrier based sinusoidal pwm (bipolar) inverter
Design and implementation of carrier based sinusoidal pwm (bipolar) inverter
 
Polarization effect of antireflection coating for soi material system
Polarization effect of antireflection coating for soi material systemPolarization effect of antireflection coating for soi material system
Polarization effect of antireflection coating for soi material system
 
Image resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fittingImage resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fitting
 
Ad hoc networks technical issues on radio links security & qo s
Ad hoc networks technical issues on radio links security & qo sAd hoc networks technical issues on radio links security & qo s
Ad hoc networks technical issues on radio links security & qo s
 
Microstructure analysis of the carbon nano tubes aluminum composite with diff...
Microstructure analysis of the carbon nano tubes aluminum composite with diff...Microstructure analysis of the carbon nano tubes aluminum composite with diff...
Microstructure analysis of the carbon nano tubes aluminum composite with diff...
 
Improving the life of lm13 using stainless spray ii coating for engine applic...
Improving the life of lm13 using stainless spray ii coating for engine applic...Improving the life of lm13 using stainless spray ii coating for engine applic...
Improving the life of lm13 using stainless spray ii coating for engine applic...
 
An overview on development of aluminium metal matrix composites with hybrid r...
An overview on development of aluminium metal matrix composites with hybrid r...An overview on development of aluminium metal matrix composites with hybrid r...
An overview on development of aluminium metal matrix composites with hybrid r...
 
Pesticide mineralization in water using silver nanoparticles incorporated on ...
Pesticide mineralization in water using silver nanoparticles incorporated on ...Pesticide mineralization in water using silver nanoparticles incorporated on ...
Pesticide mineralization in water using silver nanoparticles incorporated on ...
 
Comparative study on computers operated by eyes and brain
Comparative study on computers operated by eyes and brainComparative study on computers operated by eyes and brain
Comparative study on computers operated by eyes and brain
 
T s eliot and the concept of literary tradition and the importance of allusions
T s eliot and the concept of literary tradition and the importance of allusionsT s eliot and the concept of literary tradition and the importance of allusions
T s eliot and the concept of literary tradition and the importance of allusions
 
Effect of select yogasanas and pranayama practices on selected physiological ...
Effect of select yogasanas and pranayama practices on selected physiological ...Effect of select yogasanas and pranayama practices on selected physiological ...
Effect of select yogasanas and pranayama practices on selected physiological ...
 
Grid computing for load balancing strategies
Grid computing for load balancing strategiesGrid computing for load balancing strategies
Grid computing for load balancing strategies
 
A new algorithm to improve the sharing of bandwidth
A new algorithm to improve the sharing of bandwidthA new algorithm to improve the sharing of bandwidth
A new algorithm to improve the sharing of bandwidth
 
Main physical causes of climate change and global warming a general overview
Main physical causes of climate change and global warming   a general overviewMain physical causes of climate change and global warming   a general overview
Main physical causes of climate change and global warming a general overview
 
Performance assessment of control loops
Performance assessment of control loopsPerformance assessment of control loops
Performance assessment of control loops
 
Capital market in bangladesh an overview
Capital market in bangladesh an overviewCapital market in bangladesh an overview
Capital market in bangladesh an overview
 
Faster and resourceful multi core web crawling
Faster and resourceful multi core web crawlingFaster and resourceful multi core web crawling
Faster and resourceful multi core web crawling
 
Extended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy imagesExtended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy images
 
Parallel generators of pseudo random numbers with control of calculation errors
Parallel generators of pseudo random numbers with control of calculation errorsParallel generators of pseudo random numbers with control of calculation errors
Parallel generators of pseudo random numbers with control of calculation errors
 

Recently uploaded

Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 

Recently uploaded (20)

Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 

A new link based approach for categorical data clustering

  • 1. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Volume 1 Issue 3, December 2012 www.ijsr.net A New Link Based Approach for Categorical Data Clustering Chiranth B.O1 , Panduranga Rao M.V2 , Basavaraj Patil S3 1 Dept. of Computer science and Engineering, B.T.L Institute of Technology, Bangalore, India bo.chiranth@gmail.com 2 Dept. of Computer Science and Engineering, B.T.L Institute of Technology, Bangalore, India raomvp@yahoo.com 3 Dept of Computer Science and Engineering, B.T.L Institute of Technology, Bangalore, India csehodbtlit@gmail.com Abstract:The data generated by conventional categorical data clustering is incomplete because the information provided is also incomplete. This project presents a new link-based approach, which improves the categorical clustering by discovering unknown entries through similarity between clusters in an ensemble. A graph partitioning technique is applied to a weighted bipartite graph to obtain the final clustering result. So the link-based approach outperforms both conventional clustering algorithms for categorical data and well- known cluster ensemble technique.Data clustering is one of the fundamental tools we have for understanding the structure of a data set. It plays a crucial, foundation role in machine learning, data mining, information retrieval and pattern recognition. The experimental results on multiple real data sets suggest that the proposed link-based method almost always outperforms both conventional clustering algorithms for categorical data and well-known cluster ensemble technique. This paper proposes an Algorithm called Weighted Triple- Quality (WTQ), which also uses k-means algorithm for basic clustering. Once using does the basic clustering consensus functions we can get cluster ensembles of categorical data. This categorical data is converted to refined matrix. Keywords: Clustering, categorical data, cluster ensembles, link-based similarity, data mining 1. Introduction Dataclustering is one of the fundamental tools we have for understanding the structure of a data set. It plays a crucial, foundational role in machine learning, data mining, information retrieval, and pattern recognition. Clustering aims to categorize data into groups or clusters such that the data in the same cluster are more similar to each other than to those in different clusters. Many well established clustering algorithms, such as k-means [1] and PAM [2], have been designed for numerical data, whose inherent properties can be naturally employed to measure a distance (e.g., Euclidean) between feature vectors [3], [4]. However, these cannot be directly applied for clustering of categorical data, where domain values are discrete and have no ordering defined. An example of categorical attribute is Sex={female, male} or Shape={circle, rectangle}. As a result, many categorical data clustering algorithms have been introduced in recent years, with applications to interesting domains such as protein interaction data [5]. The initial method was developedin [6] by making use of Gower’s similarity coefficient [7]. Following that, the k- modes algorithm in [8] extended the conventional K-means with a simple matching dissimilarity measure and a frequency-based method to update centroids (i.e., clusters’ representative). As a single-pass algorithm, Squeezer [9] makes use of a prespecified similarity threshold to determine which of the existing clusters (or a new cluster) to which a data point under examination is assigned. LIMBO [10] is a hierarchical clustering algorithm that uses the Information Bottleneck (IB) framework to define a distance measure for Categorical tuples. The concepts of evolutionary computing and genetic algorithm have also been adopted by a partitioning method for categorical data, i.e., GAClust [11]. Cobweb [12] is a model-based method primarily exploited for categorical data sets. Different graph models have also been investigated by the STIRR [13], ROCK [14], and CLICK [15] techniques. In addition, several density-based algorithms have also been devised for such purpose, for instance, CACTUS [16], COOLCAT [17], and CLOPE [18]. Although, a large number of algorithms have been introduced for clustering categorical data, the No Free Lunch theorem [19] suggeststhere is no single clustering algorithm that performs best for all data sets [20] and can discover all types of cluster shapes and structures presented in data. Each algorithm has its own strengths and weaknesses. For a particular data set, different algorithms, or even the same algorithm with different parameters, usually provide distinct solutions. Therefore, it is difficult for users to decide which algorithm would be the proper alternative for a given set of data. Recently, cluster ensembles have emerged as an effective solution that is ableto overcome these limitations, and improve the robustness as well as the quality of clustering results. The main objective of cluster ensembles is to combine different clustering decisions in such a way as to achieve accuracy superior to that of any individual clustering. Examples of well-known ensemble methods are: 1. The feature-based approach that transforms the problem of cluster ensembles to clustering categorical data (i.e., cluster labels) [11], 8
  • 2. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Volume 1 Issue 3, December 2012 www.ijsr.net 2. The direct approach that finds the final partition through relabeling the base clustering results [15], [16], 3. Graph-based algorithms that employ graph partitioning methodology [17], [18], [19], and 4. The pairwise-similarity approach that makes use of co- occurrence relations between data points [20], [11], [12]. 2. Cluster Ensemble Methods Figure 1. The Basic Process of Cluster Ensembles It has been shown that ensembles are most effective when constructed from a set of predictors whose errors are dissimilar [17]. To a great extent, diversity among ensemble members is introduced to enhance the result of an ensemble [18]. Particularly for data clustering, the results obtained with any single algorithm over much iteration are usually very similar. In such a circumstance where all ensemble members agree on how a data set should be partitioned, aggregating the base clustering results will show no improvement over any of the constituent members. As a result, several heuristics have been proposed to introduce artificial instabilities in clustering algorithms, giving diversity within a cluster ensemble. The following ensemble generation methods yield different clustering of the same data, by exploiting different cluster models and different data partitions. 1. Homogeneous ensembles. Base clustering are created using repeated runs of a single clustering algorithm, with several sets of parameter initializations, such as cluster centers of the k- means clustering technique [11], [20], [19]. 2. Homogeneous ensembles. Base clustering are created using repeated runs of a single clustering algorithm, with several sets of parameter initializations, such as cluster centers of the k- means clustering technique [11], [20], [19]. 3. Data subspace/sampling. A cluster ensemble can also be achieved by generating base clustering from different subsets of initial data. It is intuitively assumed that each clustering algorithm will provide different levels of performance for different partitions of a data set [17].Data subspace/sampling. A cluster ensemble can also be achieved by generating base clustering from different subsets of initial data. It is intuitively assumed that each clustering algorithm will provide different levels of performance for different partitions of a data set [17]. 4. Data subspace/sampling. A cluster ensemble can also be achieved by generating base clustering from different subsets of initial data. It is intuitively assumed that each clustering algorithm will provide different levels of performance for different partitions of a data set [17]. 5. Mixed heuristics. In addition to using one of the aforementioned methods, any combination of them can be applied as well [17], [11], [3], [8], [20], [2], [19]. Figure 2. Examples of (a) cluster ensemble and the corresponding (b) label assignment matrix, (c) pair wise similarity matrix, and (d) binary cluster association matrix respectively. 3. A New Link Based Approach Existing cluster ensemble methods to categorical data analysis rely on the typical pairwise-similarity and binary cluster-association matrices [8], [9], which summarize the underlying ensemble information at a rather coarse level. Many matrix entries are left “unknown” and simply recorded as “0.” Regardless of a consensus function, the quality of the final clustering result may be degraded. As a result, a link- based method has been established with the ability to discover unknown values and, hence, improve the accuracy of the ultimate data partition [13]. In spite of promising findings, this initial framework is based on data point- data point pairwise-similarity matrix, which is highly expensive to obtain. The link-based similarity technique, SimRank [2] that is employed to estimate the similarity among data points is inapplicable to a large data set. To overcome these problems, a new link- based cluster ensemble (LCE) approach is introduced herein. It is more efficient than the former model, where a BM-like matrix is used to represent the ensemble information. Figure 3. A Link Based Cluster Framework 9
  • 3. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Volume 1 Issue 3, December 2012 www.ijsr.net 4. Weighted Triple Quality (WTQ): A New Link Based Algorithm Figure 4. An Example of Cluster Network Algorithm: WAQ (G, Cx, Cy): 5. Experimental Design The experiments set out to investigate the performance of LCE compared to a number of clustering algorithms, both specifically developed for categorical data analysis and those state-of-the-art cluster ensemble techniques found in literature. Baseline model is also included in the assessment, which simply applies SPEC, as a consensus function, to the conventional BM. For comparison, as in [8], each clustering method divides data points into a partition of K (the number of true classes for each data set) clusters, which is then evaluated against the corresponding true partition using the following set of label-based evaluation indices: Classification Accuracy (CA) [3], Normalized Mutual Information (NMI) [9] and Adjusted Rand (AR) Index [9]. Further details of these quality measures are provided in Section I of the online supplementaryNote that, true classes are known for all data sets but are explicitly not used by the cluster ensemble process. They are only used to evaluate the quality of the clustering results.as follows: Insert/Break/Continuous. 6. Conclusion This paper presents a novel, highly effective link-based cluster ensemble approach to categorical data clustering. It transforms the original categorical data matrix to an information-preserving numerical variation (RM), to which an effective graph partitioning technique can be directly applied. The problem of constructing the RM is efficiently resolved by the similarity among categorical labels (or clusters), using the Weighted Triple-Quality similarity algorithm. The empirical study, with different ensemble types, validity measures, and data sets, suggests that the proposed link-based method usually achieves superior clustering results compared to those of the traditional categorical data algorithms and benchmark cluster ensemble techniques. The prominent future work includes an extensive study regarding the behavior of other link-based similarity measures within this problem context. Also, the new method will be applied to specific domains, including tourism and medical data sets. References [1] D.S. Hochbaum and D.B. Shmoys, “A Best Possible Heuristic for the K-Center Problem,” Math. of Operational Research, vol. 10, no. 2, pp. 180-184, 1985. R. Caves, Multinational Enterprise and Economic Analysis, Cambridge University Press, Cambridge, 1982. (book style) [2] D.S. Hochbaum and D.B. Shmoys, “A Best Possible Heuristic for the K-Center Problem,” Math. of Operational Research, vol. 10, no. 2, pp. 180-184, 1985. [3] A.K. Jain and R.C. Dubes, Algorithms for Clustering. Prentice-Hall, 1998. [4] P. Zhang, X. Wang, and P.X. Song, “Clustering Categorical Data Based on Distance Vectors,” The J. Am. Statistical Assoc., vol. 101, no. 473, pp. 355-367, 2006. [5] J. Grambeier and A. Rudolph, “Techniques of Cluster Algo- rithms in Data Mining,” Data Mining and Knowledge Discovery, vol. 6, pp. 303-360, 2002. [6] K.C. Gowda and E. Diday, “Symbolic Clustering Using a New Dissimilarity Measure,” Pattern Recognition, vol. 24, no. 6, pp. 567- 578, 1991. [7] J.C. Gower, “A General Coefficient of Similarity and Some of Its Properties,” Biometrics, vol. 27, pp. 857- 871, 1971. [8] Z. Huang, “Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values,” Data Mining and Knowledge Discovery, vol. 2, pp. 283-304, 1998. [9] Z. He, X. Xu, and S. Deng, “Squeezer: An Efficient Algorithm for Clustering Categorical Data,” J. Computer Science and Technology, vol. 17, no. 5, pp. 611-624, 2002. [10] P. Andritsos and V. Tzerpos, “Information-Theoretic Software Clustering,” IEEE Trans. Software Eng., vol. 31, no. 2, pp. 150-165, Feb. 2005. [11] D. Cristofor and D. Simovici, “Finding Median Partitions Using Information-Theoretical-Based Genetic Algorithms,” J. Universal Computer Science, vol. 8, no. 2, pp. 153-172, 2002. [12] D.H. Fisher, “Knowledge Acquisition via Incremental Conceptual Clustering,” Machine Learning, vol. 2, pp. 139-172, 1987. [13] D. Gibson, J. Kleinberg, and P. Raghavan, “Clustering Catego- rical Data: An Approach Based on Dynamical Systems,” VLDB J., vol. 8, nos. 3-4, pp. 222-236, 2000. [14] S. Guha, R. Rastogi, and K. Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes,” Information Systems, vol. 25, no. 5, pp. 345-366, 2000. [15] M.J. Zaki and M. Peters, “Clicks: Mining Subspace Clusters in Categorical Data via Kpartite Maximal 10
  • 4. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Volume 1 Issue 3, December 2012 www.ijsr.net Cliques,” Proc. Int’l Conf. Data Eng. (ICDE), pp. 355- 356, 2005. [16] V. Ganti, J. Gehrke, and R. Ramakrishnan, “CACTUS: Clustering Categorical Data Using Summaries,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp. 73-83, 1999. [17] D. Barbara, Y. Li, and J. Couto, “COOLCAT: An Entropy-Based Algorithm for Categorical Clustering,” Proc. Int’l Conf. Information and Knowledge Management (CIKM), pp. 582-589, 2002. [18] Y. Yang, S. Guan, and J. You, “CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp. 682- 687, 2002. [19] D.H. Wolpert and W.G. Macready, “No Free Lunch Theorems for Search,” Technical Report SFI-TR-95- 02-010, Santa Fe Inst., 1995. [20] L.I. Kuncheva and S.T. Hadjitodorov, “Using Diversity in Cluster Ensembles,” Proc. IEEE Int’l Conf. Systems, Man and Cybernetics, p p. 1214-1219, 2004. Chiranth B Oreceived the B.E. degree in Information Science and Engineering from Golden Valley Institute of Technology, KGF. At present pursuing the Master of Technology in Computer Science and Engineering Department at BTL institute of Technology, Bangalore. M.V.Panduranga Rao is a research scholar at National Institute of Technology Karnataka, Mangalore, India. His research interests are in the field of Real time and Embedded systems on Linux platform and Security. He has published various research papers across India and in IEEE international conference in Okinawa, Japan. He has also authored two reference books on Linux Internals. He is the Life member of Indian Society for Technical Education and IAENG. S. Basavaraj Patil Started career as Faculty Member in Vijayanagar Engineering College, Bellary (1993-1997). Then moved to Kuvempu University BDT College of Engineering as a Faculty member. During this period (1997-2004), carried out Ph.D. Research work on Neural Network based Techniques for Pattern Recognition in the area Computer Science & Engineering. Also consulted many software companies (ArisGlobal, Manthan Systems, etc.,) in the area of data mining and pattern recognition His areas of research interests are Neural Networks and Genetic Algorithms. He is presently mentoring two PhD and one MSc (Engg by Research) Students. He is presently working as professor at BTL institute of technology. 11