SlideShare a Scribd company logo
1 of 9
Download to read offline
International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014
DOI:10.5121/ijitcs.2014.4601 1
A H-K CLUSTERING ALGORITHM FOR HIGH
DIMENSIONAL DATA USING ENSEMBLE LEARNING
Rashmi Paithankar1
and Bharat Tidke2
1
Department of Computer Engg
Flora Institute of Technology, Pune
Maharashtra, India
2
Assistant Professor, Department of Computer Engg
Flora Institute of Technology, Pune
Maharashtra, India
ABSTRACT
Advances made to the traditional clustering algorithms solves the various problems such as curse of
dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can
solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we
apply it to high dimensional data it causes the dimensional disaster problem due to high computational
complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms
improve the performance for clustering high dimension dataset from different aspects in different extent.
Still these algorithms will improve the performance form a single perspective. The objective of the
proposed model is to improve the performance of traditional H-K clustering and overcome the limitations
such as high computational complexity and poor accuracy for high dimensional data by combining the
three different approaches of clustering algorithm as subspace clustering algorithm and ensemble
clustering algorithm with H-K clustering algorithm.
KEYWORDS
H-K clustering, ensemble, subspace
1 . INTRODUCTION
As an important technique in data mining, clustering analysis groups the observations having
similar properties which can be called as an unsupervised classification[1] which helps to extract
the relevant information from high dimensional data. Hierarchical clustering and partition
clustering are the basic types of clustering algorithms. Hierarchical clustering seeks to build a
hierarchy of clusters which can be formed by using single link and complete link clustering
algorithms. It does not require to pre specify the number of clusters. Examples for these
algorithms are BRICH (Balance Iterative Reducing and Clustering using Hierarchies) and CURE
(Cluster Using Representatives). Another important type of clustering is Partition clustering,
which obtains a single partition of the data instead of clustering structure. It uses criteria function
optimization to create clusters locally or globally [1]. Partition cluster have advantage in large
applications but we have to pre specify the number of desired output clusters. The K-means
algorithm is the most typical partition algorithm, which is quite popular as it is easy to implement
and does not require user to specify many parameters.
Applying the traditional clustering algorithms on the high dimensional datasets regularly
presented a great challenge for traditional data mining techniques both in terms of effectiveness
International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014
2
and efficiency. Increasing sparsity of data and increasing difficulty in distinguishing distances
between data points which is due to the so called ‘dimensionality disaster’ makes clustering
difficult. So, adaptations to existing algorithms are required to maintain data quality and speed.
Research in the area of clustering introduced a lot of new concepts as subspace clustering,
ensemble clustering, and H-K clustering algorithm. The traditional H-K clustering algorithm can
solve the randomness and apriority of the initial centers of K-means clustering algorithm.
However, it will lead to a dimensional disaster problem when apply to high dimensional dataset
clustering due to its high computational complexity and provides clusters with poor accuracy.
Subspace clustering which is an extension of traditional clustering, finds the clusters in various
datasets [8] and provides scalability, end user comprehensibility of the results, non-presumption,
insensitivity to the order of input records, accuracy and speed also removes redundancy and find
overlapping clusters in the subspaces[4,5,6,7,9,10]. Ensemble clustering ‘the knowledge reuse
framework’, firstly proposed by Strel and Ghosh [11] is the technique which uses the two
mechanisms as generation mechanism which generates the clusters using different criteria and
consensus function will choose the most appropriate solution form the set of solutions . It
overcome the challenges created by high dimensional data and gives high performance on real
world datasets for applications as Internet applications and medical diagnostics [2,3,12,13,19,20].
The proposed model combines the three techniques, subspace clustering, H-K clustering and
ensemble clustering and their advantages to improve the performance of clustering result on high
dimensional data which will simultaneously overcome the limitations of H-K clustering algorithm
for high dimensional data ( as high computational complexity and poor accuracy).
2. MOTIVATION
The traditional algorithms for clustering does not give the effective and efficient results
when we want to deal with high dimensional data as it has the disadvantages such as the "curse of
dimensionality" and the "empty space phenomenon". In high dimensional spaces, the data are
inherently sparse, and the distance between each pair of points is almost the same for a wide
variety of data distributions and distance functions [4]. Meanwhile, the notion of density is even
more troublesome than that of distance. These problems can be referred to as the “curse of
dimensionality”. To overcome these problems of irrelevant and noisy features and sparsity of data
it is important to provide advanced clustering algorithm that will solve the above problems and
cluster the data efficiently. Proposed model has provided with the combination of advanced
clustering algorithms that will improve the cluster quality and speed.
3. RELATED WORK
A lot of work has been done in the area of clustering, based on the research until date, the
general categorization for high dimensional data set clustering includes: 1- Dimension reduction,
2- Subspace clustering, 3 - Ensemble Clustering and 4 - H-K clustering [1] [11] [14]. Following
section gives an overview and some of the limitations of the above techniques.
3.1. Dimension reduction
Feature selection and feature transformation are the most popular techniques of
dimension reduction [5]. Feature transformation techniques create a combination of multiple
attributes and make summary of them. [5]. These methods include techniques such as principle
component analysis and singular value decomposition. Feature selection methods reveal groups
of objects having the similar attributes by picking up the most relevant attributes form the dataset
[5]. Yanchang et al. [16] proposed a method in which he used transformation technique and break
the high dimensional clustering into several one or two dimensional clustering phases and apply
common clustering algorithms on them. Experiments with different datasets showed that, the time
International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014
3
complexity of clustering can be linear with the dimensionality of datasets. This framework can
easily process hybrid datasets but may face problems for datasets containing overlapping clusters.
Chen et al. [17] proposed IMSND (Initialization Method based on Shared Neighborhood Density)
which is a local density based method used to find the probability density of a point to search for
initial cluster centers on high dimensional data. Author implemented this method on the spherical
K-Means algorithm. An experimental evaluation shows the increased performance of K-Means
algorithm. But in both methods (Feature Selection and transformation) we will have losing
information which naturally affects accuracy [16, 17] and feature selection algorithms have
difficulty when clusters are found in different subspaces. This type of data motivated the
evolution of subspace clustering algorithms.
3.2. Subspace clustering
Subspace clustering is an extension of traditional clustering which finds the clusters that
exist in multiple or possibly overlapping clusters [8]. Bottom up approach and Top down
approach are two major kinds of subspace clustering based on search strategy. Top down
algorithms makes the use of full set of dimensions and reveal the set of subspaces iteratively
which starts from an initial set of subspace [8]. Example algorithms are CLIQUE, ENCLUS, and
MAFIA etc. Bottom up approaches consider each object as a separate cluster and combine them
to form clusters [8]. Example algorithms are PROCLUS, ORCLUS, and PREDECON etc.
Agrawal et al. [10] proposed a clustering algorithm ‘CLIQUE’ which identifies dense
clusters in subspaces of maximum dimensionality that satisfies the requirements of data for data
mining applications as scalability, end user comprehensibility of the results, non-presumption,
and insensitivity to the order of input records but does not evaluate the quality of clustering in
different subspaces. Chen et al. [6] presented a technique for solving the problem of selecting the
k representative clusters by examining the relationship between low dimensional subspace
clusters and high dimensional ones by using an approximate method ‘PCoC’. Muller et al. [12]
presented a novel model called ‘RESCU’, which extracts the most interesting, non - redundant
clusters by using global optimization and provide a proof that proved this problem as NP- hard.
Kriegel et al. [7] Proposed finding overlapping clusters in the subspaces, by using the filter
refinement architecture, which speed up the subspace finding process and scales at most quadratic
w. r. t. to the data dimensionality and subspace dimensionality. Proposed approach overcomes the
problems of exponentially scaling of algorithms with the data or subspace dimensionality and the
problems caused by the use of global density threshold for clustering. Input data is preprocessed
by using the algorithms as DBSCAN, K-Means, and SNN, which finds the base clusters. After
words base clusters are merged to find maximal dimensional cluster approximation on which post
processing (Pruning, Refinement etc) is applied. Ali et al. [5] proposed a method based on divide
and conquers technique, which is a two step clustering. First it select the subspaces based on
size/level and again perform clustering on that subspaces based on similarity that uses K-means
algorithm. This method improves accuracy and efficiency of original K-means algorithm.
3.3. Ensemble Clustering
Ensemble clustering combines the solutions of multiple clustering algorithms using the consensus
function and form the more relevant solution. General procedure for it is shown in fig. 1
International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014
4
Fig. 1 Ensemble clustering process
Tidke et al. [3] proposed a clustering ensemble method based on two staged clustering
algorithm to overcome the challenges created by high dimensional data such as, curse of
dimensionality and the problem of visualizing high dimensional data in certain cases. PROCLUS
is used for initial subspace clustering, K-means partitioning algorithm is applied on generated
subspaces which is followed by split and merge techniques in which threshold value, distance
function and mean square error conditions are considered respectively. Zhizhou KONG et al.
[12], proposed the mechanism of the integrate mechanism of testing and information rolling
method to decrease the error probability of matching cluster members, sand using a method of
category weight to ensemble clustering. Clusters members are generated using Ward’s Method, k-
means Method and Median Method. P. Viswanath et al. [13] presented an ensemble of leaders
clustering methods where the entire ensemble requires only a single scan of the data set.
‘Deferring buffer scheme’ which is an improvement over ‘Blocked access scheme’ used for
accessing data from dataset and Consensus function called ‘Co association Matrix’ is used to
ensemble individual partitions. Weiwei Zhuang [19], Derek Greene [20] applied the ensemble
clustering algorithm for the real time applications data such as Internet applications and medical
diagnostics respectively.
3.4. H-K clustering
H-K clustering algorithm is proposed and implemented for deciding the k clusters for k-means
algorithm. It is implemented in divisive H-K and agglomerative H-K clustering. Divisive H-K
algorithm implements a top-down approach which splits the whole dataset into the small clusters.
It divides the K clusters into K+1 clusters using K-means method. Agglomerative clustering
works by merging the small clusters together. It merges the K clusters into K-I clusters.
In 2005, Tung-Shou Chen et al. [15] proposed H-K (Hierarchical K-means clustering algorithm)
clustering algorithm, combining hierarchical clustering method and partition clustering method
organically for data clustering. Compared with single algorithm, H-K clustering algorithm can
solve the problem of randomness and apriority of initial centers selection in k-means clustering
process, and obtain better clustering result. But it is a pity that it still needs high computing
complexity.
Ying HE et al. [14] proposed ensemble learning for high dimension data clustering, and proposes
a new clustering algorithm named EPCAHK clustering algorithm(Ensemble Principle Analysis
Hierarchical K-means clustering algorithm, EPCAHK), which helps to improve the performance
of traditional H-K clustering algorithm in high dimensional datasets. Firstly the high dimensional
dataset is converted to low dimensional using PCA data reduction technique. Subsequently, the
clustering results of the hierarchical stage for obtaining initial information (e.g., the cluster
number or the initial clustering centers) are integrated by using the min-transitive closure method.
Finally, the final clustering result is achieved by using K-means clustering algorithm based on the
ensemble clustering results, and provides some issues which need to be addressed in the future as
the relationship between ensemble size and the ensemble clustering algorithm performance,
distribution of the dataset and the clustering performance.
International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014
5
Table I comparative analysis of techniques for high dimensional data clustering
Author Clustering Technique Method Observation
Yanchang et al.[16] Dimension Reduction - Convert high
dimensional data to Low
dimension
- Common clustering
algorithms
- Improves performance
- Loses information
- Difficult to find clusters
in different subspaces
Chen et al.[17] Dimension Reduction - IMSND
- Spherical K-Means
algorithm
Agrawal et al.[10] Subspace clustering CLIQUE - identifies
dense clusters in
subspaces of maximum
dimensionality
- Provides scalability, end
user comprehensibility of
the results, non-
presumption, insensitivity
to the order of input
records
- Improves accuracy and
speed
Chen et al.[6] Subspace clustering - A technique for
solving the problem of
selecting the k
representative clusters
Muller et al.[12] Subspace clustering - Extracts the most
interesting, non -
redundant clusters
-Removes redundancy
Kriegel et al. [7] Subspace clustering - Filter refinement
architecture
-Find overlapping
clusters in the subspaces
- Speed up the subspace
finding process
Tidke et al.[3] Ensemble clustering -Two staged clustering
algorithm
- PROCLUS
-Overcome the
challenges created by
high dimensional data
Zhizhou KONG et al.
[12]
Ensemble clustering - Category weight
method
-Decrease the error
probability of matching
cluster members
Weiwei Zhuang [19]
Derek Greene [20]
Ensemble clustering Apply on real time applications data such as Internet
applications and medical diagnostics
Tung-Shou Chen et
al.[15]
H-K clustering -Combine hierarchical
clustering method and
partition clustering
method
-Removes randomness
and apriority of initial
centers selection in k-
means clustering process
- High computing
complexity
Ying HE et al. [14] H-K clustering -Ensemble Principle
Analysis Hierarchical
K-means clustering
algorithm- EPCAHK
-Improves clustering
performance
4.PROPOSED MODEL
The flow chart of proposed model is shown in the below,
International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014
6
Fig. 2 Block diagram of H-K clustering algorithm based on ensemble learning
Based on the above flowchart of proposed model, the following content will unfold these
five sub-stages in details:
Stage1. Dataset preprocessing
Import the dataset for clustering and get the information of it as, number of clusters k and
the sample number N and the threshold value. And if the dataset contain any missing values
replace it with zero and obtain the pre-processed dataset D. (Mostly prefer the real time datasets
for more accurate results)
Stage2. The subspace clustering process
Adopt the subspace clustering algorithm - ORCLUS, on the pre-processed dataset D
which will take the dataset through following steps and the output will give the subsets of dataset.
The ORCLUS algorithm is mainly divided into three steps: assign clusters, subspace
determination and merge. Assign phase select the k centers and assign the points iteratively to the
nearest centers, which is followed by the subspace determination phase that will find the subspace
Ei of dimensionality d by calculating the covariance matrix for cluster Ci and selecting the d
orthogonal eigen vectors having the least eigen value ( i.e. least spread). Finally the merge stage
reduces the number of clusters by combining the clusters which are closer and similar to each
other and having least spread.
Stage3. The H-K clustering process
Adopt the H-K clustering on the k subspaces which is the output of stage 2. Apply the
divisive HK means clustering, on the k subspaces it divides the k cluster dataset into k+1 clusters
using k means method. This will help pick up the two elements that are furthest from each other
in this cluster, so as to divide the distance between the two into 3 equivalent parts to produce one
more new cluster. Repeat this process for the range of [2, k+10] and by applying the random
selection method select the L cluster as the output and store them as H(1),H(2),H(3). . . . . , H(L).
Stage4. Split clustering process
This stage follows a split strategy based on a predefined threshold value. If the size of the
lth
cluster is greater than the threshold then split the cluster into two new clusters depending upon
the distance between each point and the two centroids of the two new clusters (Centroid is the
International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014
7
mean of each cluster). The distance between point and cluster is calculated using Euclidean
distance. Again repeat the above procedure of the newly created cluster, run out of the predefined
threshold value. Above steps are given below in the form of pseudo code.
Algorithm 1:
Input: k clusters and threshold T
Output: n cluster
1. Start with k clusters.
2. Check density of each cluster for given threshold T.
3. If density is more than threshold split the cluster into two based on the distance each point is
assign to its closest centroid.
=	∑ ∑ || ( )	− 	 || 2 (1)
Where ∣∣ xi
(j)
- cj ∣∣2
is a chosen distance measure between a data point xi
(j)
and cj the cluster
centre, is an indicator of the distance of the n data points from their respective cluster centers.
4. Repeat it for each cluster till it reaches threshold value. Now when hierarchy of cluster with
similar in size formed by splitting phase merging is required to find out the closest cluster to be
merge.
Stage5. Ensemble clustering process
After splitting the cluster step cluster adopt the merging and merge the closest cluster
based on an objective function. Proposed model uses distance function as an objective function to
merge the nearly cluster. In proposed method, child cluster from any parent cluster can be
merged, if there distance is smaller than other cluster in the hierarchy. Also check mean square
error(MSE) of each merged cluster with the parent cluster if found to be larger, that cluster must
be unmerged and available to be merge with some other cluster in the hierarchy this process
repeats until all MSE of all possible combination of merged cluster is checked with its parent
cluster. Finally the number of cluster merged and remain are the output cluster. The algorithm
steps are given below:
Algorithm 2:
Input: hierarchy of cluster
Output: partition C1….Cn
1. Start with n node cluster.
2. Find the closest two cluster using Euclidean distance from the hierarchy and merge them
3. Calculate MSE of root cluster and new merge cluster
=	 ∑ ∑ || − ||€ 2xi - j|| (2)
Where, j is the mean of cluster Cj and x is the data object belongs to Cj cluster. Formula to
compute j is shown in equation (3).
j = (1/nj)∑xi € cj xi (3)
In sum of squared error formula, the distance from the data object to its cluster centroid is squared
and distances are minimized for each data object. Main objective of this formula is to generate
compact and separate clusters as possible
4. If MSE of new merge cluster is smaller than the cluster after splitting keep it otherwise
unmerges them.
5. Repeat until all possible clusters are merged according to step 4.
International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014
8
The above steps can be applied on the high dimensional datasets such as,
1.Cancer (Breast) dataset having 9 attributes as
Age,Menopause,Tumorsize,Invnodes,Nodecaps,Degmalig,Breast,Breast-quad,Irradiat and
2.Wdbc dataset having 9 attributes as Clump thickness, Uniformity of cell size, Uniformity of
cell shape, Amount of marginal adhesion, Frequency of bare nuclei, Single epithelial cell size,
Bland chromatin, Normal nucleoli, Mitoses.
5.CONCLUSION
High dimensional dataset processing faces many problems such as ‘curse of
dimensionality’ and ‘the sparsity of data in the high dimensional space’. The proposed model
provides a solution algorithm for processing the high dimensional dataset which is a combination
of the three approaches and makes use of the advantages of ensemble and subspace clustering and
simultaneously overcomes the limitations of the traditional H-K clustering such as, high
computational complexity and poor accuracy by providing a three stage clustering process in
which, firstly the dataset D is converted to subspaces using the subspace clustering algorithm
(ORCLUS). Each subspace in the output will reveal the different characteristics of the original
dataset. Considering each subspace as the different dataset, adopt the hierarchical clustering.
Subsequently, the clustering results of the hierarchical stage is again passed to the split stage in
which clusters that are above the threshold of size are split into new clusters, and finally they are
integrated by using the objective function (MSE). Applying the various clustering approaches
simultaneously will help to improve the performance of clustering process and will provide the
stability of H-K clustering algorithm for high dimensional data.
REFERENCES
[1] A.K. Jain, M.N. Murty, and P.J. Flynn 1999, “Data Clustering: A Review,” ACM Computing
Surveys, vol. 31, no. 3, pp. 264-323.
[2] Ying he, Jian wang, Liang-xi, Lin Mei 2013, Yan-feng Shang, Wen-fei Wang, “A h-k clustering
algorithm based on ensemble learning”, ICSSC.
[3] B.A Tidke, R.G Mehta, D.P Rana 2012, “A novel approach for high dimensional data clustering”,
ISSN: 2250–3676, [IJESAT] international journal of engineering science & advanced technology
Volume-2, Issue-3, 645 – 651.
[4] Emmanuel Muller et al. 2009, “Evaluating Clustering in Subspace Projections of High Dimensional
Data”, VLDB ‘09, August 2428, Lyon, France copyright 2009 VLDB Endowment.
[5] Alijamaat, M. Khalilian, and N. Mustapha 2010, “A Novel Approach for High Dimensional Data
Clustering,” 2010 Third International Conference on Knowledge Discovery and Data Mining, pp.
264-267.
[6] Guanhua Chen, Xiuli Ma et al. 2009, “Mining Representative Subspace Clusters in High-Dimensional
Data”, Sixth International Conference on Fuzzy Systems and Knowledge Discovery.
[7] Hans-Peter Kriegel, Peer Kroger, Matthias Renz, Sebastian Wurst 2005,” A Generic Framework for
Efficient Subspace Clustering of High-Dimensional Data”, In Proc. 5th IEEE International
Conference on Data Mining (ICDM), Houston, TX.
[8] Lance Parsons, Ehtesham Haque and Huan Liu 2004,” Subspace Clustering for High Dimensional
Data: A Review” Supported in part by grants from Prop 301 (No. ECR A601) and CEINT.
[9] Christian Baumgartner, Claudia Plant 2004,”Subspace Selection for Clustering High-Dimensional
Data”, Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04) 0-7695-
2142-8/04 $ 20.00 IEEE.
[10] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan 1998, “Automatic subspace clustering of high
dimensional data for data mining applications”, In Proceedings of the 1998 ACM SIGMOD
international conference on Management of data, pages 94-105. ACM Press.
[11] A. Strehl and J. Ghosh 2002 “Cluster ensembles – A knowledge reuse framework for combining
multiple partitions,” Journal of Machine Learning Research, pp.583-617.
[12] Zhizhou KONG et al. 2008,”A Novel Clustering-Ensemble Approach”, 978-1-4244-1748-3/08/ IEEE
International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014
9
[13] P. Viswanath 2006,“A Fast and Efficient Ensemble Clustering Method”, The 18th International
Conference on Pattern Recognition (ICPR'06) 0-7695-2521-0/06 IEEE
[14] Tung-Shou Chen et al.2005,” A combined k-means and hierarchical clustering method for improving
the clustering efficiency of microarray”, Proceedings of 2005 International Symposium on Intelligent
Signal Processing and Communication Systems.
[15] Kohei Arai and Ali Ridho Barakbah,2007 ,” Hierarchical K-means: an algorithm for centroids
initialization for K-means ”, Reports of the Faculty of Science and Engineering, Vol. 36, No.1, 36-1
(2007),25-31
[16] Zhao Yanchang et al. 2003,”A general framework for clustering high-dimension datasets”, CCECE
2003 - CCGEI 2003, Montrhal, M a y h i 2003 0-7803-7781-8/03 IEEE
[17] Luying Chen et al. 2009,”An Initialization Method for Clustering High-Dimensional Data” ,First
International Workshop on Database Technology and Applications
[18] Emmanuel Muller et al. 2009, “Relevant Subspace Clustering: Mining the Most Interesting Non-
Redundant Concepts in High Dimensional Data” Ninth IEEE International Conference on Data
Mining.
[19] Weiwei Zhuang et al.Ensemble 2012,” Clustering for Internet Security Applications”,IEEE
transactions on systems, man, and cybernetics—part c: applications and reviews, vol. 42, no. 6.
[20] Derek Greene et al. 2004 “Ensemble Clustering in Medical Diagnostics”, Proceedings of the 17th
IEEE Symposium on Computer-Based Medical Systems (CBMS’04) 1063-7125/04
[21] Charu C. Aggarwal, Philip S. Yu,” Finding Generalized Projected Clusters in High Dimensional
Spaces” IBM T. J. Watson Research Center Yorktown Heights, NY 10598 { charu, psyu
}@watson.ibm.com
[22] Reza Ghaemi 2009,”A Survey: Clustering Ensembles Techniques”, proceedings of world academy of
science, engineering and technology volume 38 february 2009 issn: 2070-3740

More Related Content

What's hot

Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel basedIJITCA Journal
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Dataidescitation
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringIJERD Editor
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
 
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...ijcsit
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
GCUBE INDEXING
GCUBE INDEXINGGCUBE INDEXING
GCUBE INDEXINGIJDKP
 
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
 
An approximate possibilistic
An approximate possibilisticAn approximate possibilistic
An approximate possibilisticcsandit
 
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATIONGRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATIONijdms
 
Critical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network ProjectCritical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network Projectiosrjce
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
 
7. 10083 12464-1-pb
7. 10083 12464-1-pb7. 10083 12464-1-pb
7. 10083 12464-1-pbIAESIJEECS
 
Clustering Approach Recommendation System using Agglomerative Algorithm
Clustering Approach Recommendation System using Agglomerative AlgorithmClustering Approach Recommendation System using Agglomerative Algorithm
Clustering Approach Recommendation System using Agglomerative AlgorithmIRJET Journal
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 

What's hot (20)

Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel based
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
Cg33504508
Cg33504508Cg33504508
Cg33504508
 
GCUBE INDEXING
GCUBE INDEXINGGCUBE INDEXING
GCUBE INDEXING
 
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
 
An approximate possibilistic
An approximate possibilisticAn approximate possibilistic
An approximate possibilistic
 
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATIONGRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
 
50120130406035
5012013040603550120130406035
50120130406035
 
J41046368
J41046368J41046368
J41046368
 
Critical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network ProjectCritical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network Project
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
 
7. 10083 12464-1-pb
7. 10083 12464-1-pb7. 10083 12464-1-pb
7. 10083 12464-1-pb
 
Clustering Approach Recommendation System using Agglomerative Algorithm
Clustering Approach Recommendation System using Agglomerative AlgorithmClustering Approach Recommendation System using Agglomerative Algorithm
Clustering Approach Recommendation System using Agglomerative Algorithm
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 

Viewers also liked

PROPOSAL OF AN HYBRID METHODOLOGY FOR ONTOLOGY DEVELOPMENT BY EXTENDING THE P...
PROPOSAL OF AN HYBRID METHODOLOGY FOR ONTOLOGY DEVELOPMENT BY EXTENDING THE P...PROPOSAL OF AN HYBRID METHODOLOGY FOR ONTOLOGY DEVELOPMENT BY EXTENDING THE P...
PROPOSAL OF AN HYBRID METHODOLOGY FOR ONTOLOGY DEVELOPMENT BY EXTENDING THE P...ijitcs
 
MOBILE TELEVISION: UNDERSTANDING THE TECHNOLOGY AND OPPORTUNITIES15ijitcs01
MOBILE TELEVISION: UNDERSTANDING THE TECHNOLOGY AND OPPORTUNITIES15ijitcs01MOBILE TELEVISION: UNDERSTANDING THE TECHNOLOGY AND OPPORTUNITIES15ijitcs01
MOBILE TELEVISION: UNDERSTANDING THE TECHNOLOGY AND OPPORTUNITIES15ijitcs01ijitcs
 
ASSESSING THE ORGANIZATIONAL READINESS FOR IMPLEMENTING KNOWLEDGE MANAGEMENT ...
ASSESSING THE ORGANIZATIONAL READINESS FOR IMPLEMENTING KNOWLEDGE MANAGEMENT ...ASSESSING THE ORGANIZATIONAL READINESS FOR IMPLEMENTING KNOWLEDGE MANAGEMENT ...
ASSESSING THE ORGANIZATIONAL READINESS FOR IMPLEMENTING KNOWLEDGE MANAGEMENT ...ijitcs
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORKijitcs
 
RESEARCH REVIEW FOR POSSIBLE RELATION BETWEEN MOBILE PHONE REDIATION AND BRAI...
RESEARCH REVIEW FOR POSSIBLE RELATION BETWEEN MOBILE PHONE REDIATION AND BRAI...RESEARCH REVIEW FOR POSSIBLE RELATION BETWEEN MOBILE PHONE REDIATION AND BRAI...
RESEARCH REVIEW FOR POSSIBLE RELATION BETWEEN MOBILE PHONE REDIATION AND BRAI...ijitcs
 
EFFECTS OF HUMAN FACTOR ON THE SUCCESS OF INFORMATION TECHNOLOGY OUTSOURCING
EFFECTS OF HUMAN FACTOR ON THE SUCCESS OF INFORMATION TECHNOLOGY OUTSOURCINGEFFECTS OF HUMAN FACTOR ON THE SUCCESS OF INFORMATION TECHNOLOGY OUTSOURCING
EFFECTS OF HUMAN FACTOR ON THE SUCCESS OF INFORMATION TECHNOLOGY OUTSOURCINGijitcs
 
CRITICAL SUCCESS FACTORS FOR M-COMMERCE IN SAUDI ARABIA’S PRIVATE SECTOR: A M...
CRITICAL SUCCESS FACTORS FOR M-COMMERCE IN SAUDI ARABIA’S PRIVATE SECTOR: A M...CRITICAL SUCCESS FACTORS FOR M-COMMERCE IN SAUDI ARABIA’S PRIVATE SECTOR: A M...
CRITICAL SUCCESS FACTORS FOR M-COMMERCE IN SAUDI ARABIA’S PRIVATE SECTOR: A M...ijitcs
 
ADMINISTRATION SECURITY ISSUES IN CLOUD COMPUTING
ADMINISTRATION SECURITY ISSUES IN CLOUD COMPUTINGADMINISTRATION SECURITY ISSUES IN CLOUD COMPUTING
ADMINISTRATION SECURITY ISSUES IN CLOUD COMPUTINGijitcs
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourseijitcs
 
3-D WAVELET CODEC (COMPRESSION/DECOMPRESSION) FOR 3-D MEDICAL IMAGES
3-D WAVELET CODEC (COMPRESSION/DECOMPRESSION) FOR 3-D MEDICAL IMAGES3-D WAVELET CODEC (COMPRESSION/DECOMPRESSION) FOR 3-D MEDICAL IMAGES
3-D WAVELET CODEC (COMPRESSION/DECOMPRESSION) FOR 3-D MEDICAL IMAGESijitcs
 
A LOW COST EEG BASED BCI PROSTHETIC USING MOTOR IMAGERY
A LOW COST EEG BASED BCI PROSTHETIC USING MOTOR IMAGERY A LOW COST EEG BASED BCI PROSTHETIC USING MOTOR IMAGERY
A LOW COST EEG BASED BCI PROSTHETIC USING MOTOR IMAGERY ijitcs
 
Zalp webinar-Raising your employee referral program results to 50% of all hires
Zalp webinar-Raising your employee referral program results to 50% of all hiresZalp webinar-Raising your employee referral program results to 50% of all hires
Zalp webinar-Raising your employee referral program results to 50% of all hiresSavio Vadakkan
 
INFORMATION SECURITY IN CLOUD COMPUTING
INFORMATION SECURITY IN CLOUD COMPUTINGINFORMATION SECURITY IN CLOUD COMPUTING
INFORMATION SECURITY IN CLOUD COMPUTINGijitcs
 
ANALYSIS OF MANUFACTURING OF VOLTAGE RESTORE TO INCREASE DENSITY OF ELEMENTS ...
ANALYSIS OF MANUFACTURING OF VOLTAGE RESTORE TO INCREASE DENSITY OF ELEMENTS ...ANALYSIS OF MANUFACTURING OF VOLTAGE RESTORE TO INCREASE DENSITY OF ELEMENTS ...
ANALYSIS OF MANUFACTURING OF VOLTAGE RESTORE TO INCREASE DENSITY OF ELEMENTS ...ijoejournal
 

Viewers also liked (18)

PROPOSAL OF AN HYBRID METHODOLOGY FOR ONTOLOGY DEVELOPMENT BY EXTENDING THE P...
PROPOSAL OF AN HYBRID METHODOLOGY FOR ONTOLOGY DEVELOPMENT BY EXTENDING THE P...PROPOSAL OF AN HYBRID METHODOLOGY FOR ONTOLOGY DEVELOPMENT BY EXTENDING THE P...
PROPOSAL OF AN HYBRID METHODOLOGY FOR ONTOLOGY DEVELOPMENT BY EXTENDING THE P...
 
MOBILE TELEVISION: UNDERSTANDING THE TECHNOLOGY AND OPPORTUNITIES15ijitcs01
MOBILE TELEVISION: UNDERSTANDING THE TECHNOLOGY AND OPPORTUNITIES15ijitcs01MOBILE TELEVISION: UNDERSTANDING THE TECHNOLOGY AND OPPORTUNITIES15ijitcs01
MOBILE TELEVISION: UNDERSTANDING THE TECHNOLOGY AND OPPORTUNITIES15ijitcs01
 
ASSESSING THE ORGANIZATIONAL READINESS FOR IMPLEMENTING KNOWLEDGE MANAGEMENT ...
ASSESSING THE ORGANIZATIONAL READINESS FOR IMPLEMENTING KNOWLEDGE MANAGEMENT ...ASSESSING THE ORGANIZATIONAL READINESS FOR IMPLEMENTING KNOWLEDGE MANAGEMENT ...
ASSESSING THE ORGANIZATIONAL READINESS FOR IMPLEMENTING KNOWLEDGE MANAGEMENT ...
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 
RESEARCH REVIEW FOR POSSIBLE RELATION BETWEEN MOBILE PHONE REDIATION AND BRAI...
RESEARCH REVIEW FOR POSSIBLE RELATION BETWEEN MOBILE PHONE REDIATION AND BRAI...RESEARCH REVIEW FOR POSSIBLE RELATION BETWEEN MOBILE PHONE REDIATION AND BRAI...
RESEARCH REVIEW FOR POSSIBLE RELATION BETWEEN MOBILE PHONE REDIATION AND BRAI...
 
EFFECTS OF HUMAN FACTOR ON THE SUCCESS OF INFORMATION TECHNOLOGY OUTSOURCING
EFFECTS OF HUMAN FACTOR ON THE SUCCESS OF INFORMATION TECHNOLOGY OUTSOURCINGEFFECTS OF HUMAN FACTOR ON THE SUCCESS OF INFORMATION TECHNOLOGY OUTSOURCING
EFFECTS OF HUMAN FACTOR ON THE SUCCESS OF INFORMATION TECHNOLOGY OUTSOURCING
 
CRITICAL SUCCESS FACTORS FOR M-COMMERCE IN SAUDI ARABIA’S PRIVATE SECTOR: A M...
CRITICAL SUCCESS FACTORS FOR M-COMMERCE IN SAUDI ARABIA’S PRIVATE SECTOR: A M...CRITICAL SUCCESS FACTORS FOR M-COMMERCE IN SAUDI ARABIA’S PRIVATE SECTOR: A M...
CRITICAL SUCCESS FACTORS FOR M-COMMERCE IN SAUDI ARABIA’S PRIVATE SECTOR: A M...
 
ZALP Brochure
ZALP BrochureZALP Brochure
ZALP Brochure
 
ADMINISTRATION SECURITY ISSUES IN CLOUD COMPUTING
ADMINISTRATION SECURITY ISSUES IN CLOUD COMPUTINGADMINISTRATION SECURITY ISSUES IN CLOUD COMPUTING
ADMINISTRATION SECURITY ISSUES IN CLOUD COMPUTING
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
 
3-D WAVELET CODEC (COMPRESSION/DECOMPRESSION) FOR 3-D MEDICAL IMAGES
3-D WAVELET CODEC (COMPRESSION/DECOMPRESSION) FOR 3-D MEDICAL IMAGES3-D WAVELET CODEC (COMPRESSION/DECOMPRESSION) FOR 3-D MEDICAL IMAGES
3-D WAVELET CODEC (COMPRESSION/DECOMPRESSION) FOR 3-D MEDICAL IMAGES
 
A LOW COST EEG BASED BCI PROSTHETIC USING MOTOR IMAGERY
A LOW COST EEG BASED BCI PROSTHETIC USING MOTOR IMAGERY A LOW COST EEG BASED BCI PROSTHETIC USING MOTOR IMAGERY
A LOW COST EEG BASED BCI PROSTHETIC USING MOTOR IMAGERY
 
Marketing Plan
Marketing PlanMarketing Plan
Marketing Plan
 
Zalp webinar-Raising your employee referral program results to 50% of all hires
Zalp webinar-Raising your employee referral program results to 50% of all hiresZalp webinar-Raising your employee referral program results to 50% of all hires
Zalp webinar-Raising your employee referral program results to 50% of all hires
 
Zalpbrochure
ZalpbrochureZalpbrochure
Zalpbrochure
 
INFORMATION SECURITY IN CLOUD COMPUTING
INFORMATION SECURITY IN CLOUD COMPUTINGINFORMATION SECURITY IN CLOUD COMPUTING
INFORMATION SECURITY IN CLOUD COMPUTING
 
Shape, form, and space
Shape, form, and spaceShape, form, and space
Shape, form, and space
 
ANALYSIS OF MANUFACTURING OF VOLTAGE RESTORE TO INCREASE DENSITY OF ELEMENTS ...
ANALYSIS OF MANUFACTURING OF VOLTAGE RESTORE TO INCREASE DENSITY OF ELEMENTS ...ANALYSIS OF MANUFACTURING OF VOLTAGE RESTORE TO INCREASE DENSITY OF ELEMENTS ...
ANALYSIS OF MANUFACTURING OF VOLTAGE RESTORE TO INCREASE DENSITY OF ELEMENTS ...
 

Similar to H-K Clustering Algorithm Using Ensemble Learning

Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
 
Estimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachEstimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachcsandit
 
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
Clustering heterogeneous categorical data using enhanced mini  batch K-means ...Clustering heterogeneous categorical data using enhanced mini  batch K-means ...
Clustering heterogeneous categorical data using enhanced mini batch K-means ...IJECEIAES
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimizationAlexander Decker
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016ijcsbi
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...IOSR Journals
 
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSZac Darcy
 

Similar to H-K Clustering Algorithm Using Ensemble Learning (20)

Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
 
Estimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachEstimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approach
 
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
I017235662
I017235662I017235662
I017235662
 
A046010107
A046010107A046010107
A046010107
 
Ijetr021251
Ijetr021251Ijetr021251
Ijetr021251
 
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
Clustering heterogeneous categorical data using enhanced mini  batch K-means ...Clustering heterogeneous categorical data using enhanced mini  batch K-means ...
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
F04463437
F04463437F04463437
F04463437
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
 
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
 

Recently uploaded

power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 

Recently uploaded (20)

power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 

H-K Clustering Algorithm Using Ensemble Learning

  • 1. International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014 DOI:10.5121/ijitcs.2014.4601 1 A H-K CLUSTERING ALGORITHM FOR HIGH DIMENSIONAL DATA USING ENSEMBLE LEARNING Rashmi Paithankar1 and Bharat Tidke2 1 Department of Computer Engg Flora Institute of Technology, Pune Maharashtra, India 2 Assistant Professor, Department of Computer Engg Flora Institute of Technology, Pune Maharashtra, India ABSTRACT Advances made to the traditional clustering algorithms solves the various problems such as curse of dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we apply it to high dimensional data it causes the dimensional disaster problem due to high computational complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms improve the performance for clustering high dimension dataset from different aspects in different extent. Still these algorithms will improve the performance form a single perspective. The objective of the proposed model is to improve the performance of traditional H-K clustering and overcome the limitations such as high computational complexity and poor accuracy for high dimensional data by combining the three different approaches of clustering algorithm as subspace clustering algorithm and ensemble clustering algorithm with H-K clustering algorithm. KEYWORDS H-K clustering, ensemble, subspace 1 . INTRODUCTION As an important technique in data mining, clustering analysis groups the observations having similar properties which can be called as an unsupervised classification[1] which helps to extract the relevant information from high dimensional data. Hierarchical clustering and partition clustering are the basic types of clustering algorithms. Hierarchical clustering seeks to build a hierarchy of clusters which can be formed by using single link and complete link clustering algorithms. It does not require to pre specify the number of clusters. Examples for these algorithms are BRICH (Balance Iterative Reducing and Clustering using Hierarchies) and CURE (Cluster Using Representatives). Another important type of clustering is Partition clustering, which obtains a single partition of the data instead of clustering structure. It uses criteria function optimization to create clusters locally or globally [1]. Partition cluster have advantage in large applications but we have to pre specify the number of desired output clusters. The K-means algorithm is the most typical partition algorithm, which is quite popular as it is easy to implement and does not require user to specify many parameters. Applying the traditional clustering algorithms on the high dimensional datasets regularly presented a great challenge for traditional data mining techniques both in terms of effectiveness
  • 2. International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014 2 and efficiency. Increasing sparsity of data and increasing difficulty in distinguishing distances between data points which is due to the so called ‘dimensionality disaster’ makes clustering difficult. So, adaptations to existing algorithms are required to maintain data quality and speed. Research in the area of clustering introduced a lot of new concepts as subspace clustering, ensemble clustering, and H-K clustering algorithm. The traditional H-K clustering algorithm can solve the randomness and apriority of the initial centers of K-means clustering algorithm. However, it will lead to a dimensional disaster problem when apply to high dimensional dataset clustering due to its high computational complexity and provides clusters with poor accuracy. Subspace clustering which is an extension of traditional clustering, finds the clusters in various datasets [8] and provides scalability, end user comprehensibility of the results, non-presumption, insensitivity to the order of input records, accuracy and speed also removes redundancy and find overlapping clusters in the subspaces[4,5,6,7,9,10]. Ensemble clustering ‘the knowledge reuse framework’, firstly proposed by Strel and Ghosh [11] is the technique which uses the two mechanisms as generation mechanism which generates the clusters using different criteria and consensus function will choose the most appropriate solution form the set of solutions . It overcome the challenges created by high dimensional data and gives high performance on real world datasets for applications as Internet applications and medical diagnostics [2,3,12,13,19,20]. The proposed model combines the three techniques, subspace clustering, H-K clustering and ensemble clustering and their advantages to improve the performance of clustering result on high dimensional data which will simultaneously overcome the limitations of H-K clustering algorithm for high dimensional data ( as high computational complexity and poor accuracy). 2. MOTIVATION The traditional algorithms for clustering does not give the effective and efficient results when we want to deal with high dimensional data as it has the disadvantages such as the "curse of dimensionality" and the "empty space phenomenon". In high dimensional spaces, the data are inherently sparse, and the distance between each pair of points is almost the same for a wide variety of data distributions and distance functions [4]. Meanwhile, the notion of density is even more troublesome than that of distance. These problems can be referred to as the “curse of dimensionality”. To overcome these problems of irrelevant and noisy features and sparsity of data it is important to provide advanced clustering algorithm that will solve the above problems and cluster the data efficiently. Proposed model has provided with the combination of advanced clustering algorithms that will improve the cluster quality and speed. 3. RELATED WORK A lot of work has been done in the area of clustering, based on the research until date, the general categorization for high dimensional data set clustering includes: 1- Dimension reduction, 2- Subspace clustering, 3 - Ensemble Clustering and 4 - H-K clustering [1] [11] [14]. Following section gives an overview and some of the limitations of the above techniques. 3.1. Dimension reduction Feature selection and feature transformation are the most popular techniques of dimension reduction [5]. Feature transformation techniques create a combination of multiple attributes and make summary of them. [5]. These methods include techniques such as principle component analysis and singular value decomposition. Feature selection methods reveal groups of objects having the similar attributes by picking up the most relevant attributes form the dataset [5]. Yanchang et al. [16] proposed a method in which he used transformation technique and break the high dimensional clustering into several one or two dimensional clustering phases and apply common clustering algorithms on them. Experiments with different datasets showed that, the time
  • 3. International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014 3 complexity of clustering can be linear with the dimensionality of datasets. This framework can easily process hybrid datasets but may face problems for datasets containing overlapping clusters. Chen et al. [17] proposed IMSND (Initialization Method based on Shared Neighborhood Density) which is a local density based method used to find the probability density of a point to search for initial cluster centers on high dimensional data. Author implemented this method on the spherical K-Means algorithm. An experimental evaluation shows the increased performance of K-Means algorithm. But in both methods (Feature Selection and transformation) we will have losing information which naturally affects accuracy [16, 17] and feature selection algorithms have difficulty when clusters are found in different subspaces. This type of data motivated the evolution of subspace clustering algorithms. 3.2. Subspace clustering Subspace clustering is an extension of traditional clustering which finds the clusters that exist in multiple or possibly overlapping clusters [8]. Bottom up approach and Top down approach are two major kinds of subspace clustering based on search strategy. Top down algorithms makes the use of full set of dimensions and reveal the set of subspaces iteratively which starts from an initial set of subspace [8]. Example algorithms are CLIQUE, ENCLUS, and MAFIA etc. Bottom up approaches consider each object as a separate cluster and combine them to form clusters [8]. Example algorithms are PROCLUS, ORCLUS, and PREDECON etc. Agrawal et al. [10] proposed a clustering algorithm ‘CLIQUE’ which identifies dense clusters in subspaces of maximum dimensionality that satisfies the requirements of data for data mining applications as scalability, end user comprehensibility of the results, non-presumption, and insensitivity to the order of input records but does not evaluate the quality of clustering in different subspaces. Chen et al. [6] presented a technique for solving the problem of selecting the k representative clusters by examining the relationship between low dimensional subspace clusters and high dimensional ones by using an approximate method ‘PCoC’. Muller et al. [12] presented a novel model called ‘RESCU’, which extracts the most interesting, non - redundant clusters by using global optimization and provide a proof that proved this problem as NP- hard. Kriegel et al. [7] Proposed finding overlapping clusters in the subspaces, by using the filter refinement architecture, which speed up the subspace finding process and scales at most quadratic w. r. t. to the data dimensionality and subspace dimensionality. Proposed approach overcomes the problems of exponentially scaling of algorithms with the data or subspace dimensionality and the problems caused by the use of global density threshold for clustering. Input data is preprocessed by using the algorithms as DBSCAN, K-Means, and SNN, which finds the base clusters. After words base clusters are merged to find maximal dimensional cluster approximation on which post processing (Pruning, Refinement etc) is applied. Ali et al. [5] proposed a method based on divide and conquers technique, which is a two step clustering. First it select the subspaces based on size/level and again perform clustering on that subspaces based on similarity that uses K-means algorithm. This method improves accuracy and efficiency of original K-means algorithm. 3.3. Ensemble Clustering Ensemble clustering combines the solutions of multiple clustering algorithms using the consensus function and form the more relevant solution. General procedure for it is shown in fig. 1
  • 4. International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014 4 Fig. 1 Ensemble clustering process Tidke et al. [3] proposed a clustering ensemble method based on two staged clustering algorithm to overcome the challenges created by high dimensional data such as, curse of dimensionality and the problem of visualizing high dimensional data in certain cases. PROCLUS is used for initial subspace clustering, K-means partitioning algorithm is applied on generated subspaces which is followed by split and merge techniques in which threshold value, distance function and mean square error conditions are considered respectively. Zhizhou KONG et al. [12], proposed the mechanism of the integrate mechanism of testing and information rolling method to decrease the error probability of matching cluster members, sand using a method of category weight to ensemble clustering. Clusters members are generated using Ward’s Method, k- means Method and Median Method. P. Viswanath et al. [13] presented an ensemble of leaders clustering methods where the entire ensemble requires only a single scan of the data set. ‘Deferring buffer scheme’ which is an improvement over ‘Blocked access scheme’ used for accessing data from dataset and Consensus function called ‘Co association Matrix’ is used to ensemble individual partitions. Weiwei Zhuang [19], Derek Greene [20] applied the ensemble clustering algorithm for the real time applications data such as Internet applications and medical diagnostics respectively. 3.4. H-K clustering H-K clustering algorithm is proposed and implemented for deciding the k clusters for k-means algorithm. It is implemented in divisive H-K and agglomerative H-K clustering. Divisive H-K algorithm implements a top-down approach which splits the whole dataset into the small clusters. It divides the K clusters into K+1 clusters using K-means method. Agglomerative clustering works by merging the small clusters together. It merges the K clusters into K-I clusters. In 2005, Tung-Shou Chen et al. [15] proposed H-K (Hierarchical K-means clustering algorithm) clustering algorithm, combining hierarchical clustering method and partition clustering method organically for data clustering. Compared with single algorithm, H-K clustering algorithm can solve the problem of randomness and apriority of initial centers selection in k-means clustering process, and obtain better clustering result. But it is a pity that it still needs high computing complexity. Ying HE et al. [14] proposed ensemble learning for high dimension data clustering, and proposes a new clustering algorithm named EPCAHK clustering algorithm(Ensemble Principle Analysis Hierarchical K-means clustering algorithm, EPCAHK), which helps to improve the performance of traditional H-K clustering algorithm in high dimensional datasets. Firstly the high dimensional dataset is converted to low dimensional using PCA data reduction technique. Subsequently, the clustering results of the hierarchical stage for obtaining initial information (e.g., the cluster number or the initial clustering centers) are integrated by using the min-transitive closure method. Finally, the final clustering result is achieved by using K-means clustering algorithm based on the ensemble clustering results, and provides some issues which need to be addressed in the future as the relationship between ensemble size and the ensemble clustering algorithm performance, distribution of the dataset and the clustering performance.
  • 5. International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014 5 Table I comparative analysis of techniques for high dimensional data clustering Author Clustering Technique Method Observation Yanchang et al.[16] Dimension Reduction - Convert high dimensional data to Low dimension - Common clustering algorithms - Improves performance - Loses information - Difficult to find clusters in different subspaces Chen et al.[17] Dimension Reduction - IMSND - Spherical K-Means algorithm Agrawal et al.[10] Subspace clustering CLIQUE - identifies dense clusters in subspaces of maximum dimensionality - Provides scalability, end user comprehensibility of the results, non- presumption, insensitivity to the order of input records - Improves accuracy and speed Chen et al.[6] Subspace clustering - A technique for solving the problem of selecting the k representative clusters Muller et al.[12] Subspace clustering - Extracts the most interesting, non - redundant clusters -Removes redundancy Kriegel et al. [7] Subspace clustering - Filter refinement architecture -Find overlapping clusters in the subspaces - Speed up the subspace finding process Tidke et al.[3] Ensemble clustering -Two staged clustering algorithm - PROCLUS -Overcome the challenges created by high dimensional data Zhizhou KONG et al. [12] Ensemble clustering - Category weight method -Decrease the error probability of matching cluster members Weiwei Zhuang [19] Derek Greene [20] Ensemble clustering Apply on real time applications data such as Internet applications and medical diagnostics Tung-Shou Chen et al.[15] H-K clustering -Combine hierarchical clustering method and partition clustering method -Removes randomness and apriority of initial centers selection in k- means clustering process - High computing complexity Ying HE et al. [14] H-K clustering -Ensemble Principle Analysis Hierarchical K-means clustering algorithm- EPCAHK -Improves clustering performance 4.PROPOSED MODEL The flow chart of proposed model is shown in the below,
  • 6. International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014 6 Fig. 2 Block diagram of H-K clustering algorithm based on ensemble learning Based on the above flowchart of proposed model, the following content will unfold these five sub-stages in details: Stage1. Dataset preprocessing Import the dataset for clustering and get the information of it as, number of clusters k and the sample number N and the threshold value. And if the dataset contain any missing values replace it with zero and obtain the pre-processed dataset D. (Mostly prefer the real time datasets for more accurate results) Stage2. The subspace clustering process Adopt the subspace clustering algorithm - ORCLUS, on the pre-processed dataset D which will take the dataset through following steps and the output will give the subsets of dataset. The ORCLUS algorithm is mainly divided into three steps: assign clusters, subspace determination and merge. Assign phase select the k centers and assign the points iteratively to the nearest centers, which is followed by the subspace determination phase that will find the subspace Ei of dimensionality d by calculating the covariance matrix for cluster Ci and selecting the d orthogonal eigen vectors having the least eigen value ( i.e. least spread). Finally the merge stage reduces the number of clusters by combining the clusters which are closer and similar to each other and having least spread. Stage3. The H-K clustering process Adopt the H-K clustering on the k subspaces which is the output of stage 2. Apply the divisive HK means clustering, on the k subspaces it divides the k cluster dataset into k+1 clusters using k means method. This will help pick up the two elements that are furthest from each other in this cluster, so as to divide the distance between the two into 3 equivalent parts to produce one more new cluster. Repeat this process for the range of [2, k+10] and by applying the random selection method select the L cluster as the output and store them as H(1),H(2),H(3). . . . . , H(L). Stage4. Split clustering process This stage follows a split strategy based on a predefined threshold value. If the size of the lth cluster is greater than the threshold then split the cluster into two new clusters depending upon the distance between each point and the two centroids of the two new clusters (Centroid is the
  • 7. International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014 7 mean of each cluster). The distance between point and cluster is calculated using Euclidean distance. Again repeat the above procedure of the newly created cluster, run out of the predefined threshold value. Above steps are given below in the form of pseudo code. Algorithm 1: Input: k clusters and threshold T Output: n cluster 1. Start with k clusters. 2. Check density of each cluster for given threshold T. 3. If density is more than threshold split the cluster into two based on the distance each point is assign to its closest centroid. = ∑ ∑ || ( ) − || 2 (1) Where ∣∣ xi (j) - cj ∣∣2 is a chosen distance measure between a data point xi (j) and cj the cluster centre, is an indicator of the distance of the n data points from their respective cluster centers. 4. Repeat it for each cluster till it reaches threshold value. Now when hierarchy of cluster with similar in size formed by splitting phase merging is required to find out the closest cluster to be merge. Stage5. Ensemble clustering process After splitting the cluster step cluster adopt the merging and merge the closest cluster based on an objective function. Proposed model uses distance function as an objective function to merge the nearly cluster. In proposed method, child cluster from any parent cluster can be merged, if there distance is smaller than other cluster in the hierarchy. Also check mean square error(MSE) of each merged cluster with the parent cluster if found to be larger, that cluster must be unmerged and available to be merge with some other cluster in the hierarchy this process repeats until all MSE of all possible combination of merged cluster is checked with its parent cluster. Finally the number of cluster merged and remain are the output cluster. The algorithm steps are given below: Algorithm 2: Input: hierarchy of cluster Output: partition C1….Cn 1. Start with n node cluster. 2. Find the closest two cluster using Euclidean distance from the hierarchy and merge them 3. Calculate MSE of root cluster and new merge cluster = ∑ ∑ || − ||€ 2xi - j|| (2) Where, j is the mean of cluster Cj and x is the data object belongs to Cj cluster. Formula to compute j is shown in equation (3). j = (1/nj)∑xi € cj xi (3) In sum of squared error formula, the distance from the data object to its cluster centroid is squared and distances are minimized for each data object. Main objective of this formula is to generate compact and separate clusters as possible 4. If MSE of new merge cluster is smaller than the cluster after splitting keep it otherwise unmerges them. 5. Repeat until all possible clusters are merged according to step 4.
  • 8. International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014 8 The above steps can be applied on the high dimensional datasets such as, 1.Cancer (Breast) dataset having 9 attributes as Age,Menopause,Tumorsize,Invnodes,Nodecaps,Degmalig,Breast,Breast-quad,Irradiat and 2.Wdbc dataset having 9 attributes as Clump thickness, Uniformity of cell size, Uniformity of cell shape, Amount of marginal adhesion, Frequency of bare nuclei, Single epithelial cell size, Bland chromatin, Normal nucleoli, Mitoses. 5.CONCLUSION High dimensional dataset processing faces many problems such as ‘curse of dimensionality’ and ‘the sparsity of data in the high dimensional space’. The proposed model provides a solution algorithm for processing the high dimensional dataset which is a combination of the three approaches and makes use of the advantages of ensemble and subspace clustering and simultaneously overcomes the limitations of the traditional H-K clustering such as, high computational complexity and poor accuracy by providing a three stage clustering process in which, firstly the dataset D is converted to subspaces using the subspace clustering algorithm (ORCLUS). Each subspace in the output will reveal the different characteristics of the original dataset. Considering each subspace as the different dataset, adopt the hierarchical clustering. Subsequently, the clustering results of the hierarchical stage is again passed to the split stage in which clusters that are above the threshold of size are split into new clusters, and finally they are integrated by using the objective function (MSE). Applying the various clustering approaches simultaneously will help to improve the performance of clustering process and will provide the stability of H-K clustering algorithm for high dimensional data. REFERENCES [1] A.K. Jain, M.N. Murty, and P.J. Flynn 1999, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323. [2] Ying he, Jian wang, Liang-xi, Lin Mei 2013, Yan-feng Shang, Wen-fei Wang, “A h-k clustering algorithm based on ensemble learning”, ICSSC. [3] B.A Tidke, R.G Mehta, D.P Rana 2012, “A novel approach for high dimensional data clustering”, ISSN: 2250–3676, [IJESAT] international journal of engineering science & advanced technology Volume-2, Issue-3, 645 – 651. [4] Emmanuel Muller et al. 2009, “Evaluating Clustering in Subspace Projections of High Dimensional Data”, VLDB ‘09, August 2428, Lyon, France copyright 2009 VLDB Endowment. [5] Alijamaat, M. Khalilian, and N. Mustapha 2010, “A Novel Approach for High Dimensional Data Clustering,” 2010 Third International Conference on Knowledge Discovery and Data Mining, pp. 264-267. [6] Guanhua Chen, Xiuli Ma et al. 2009, “Mining Representative Subspace Clusters in High-Dimensional Data”, Sixth International Conference on Fuzzy Systems and Knowledge Discovery. [7] Hans-Peter Kriegel, Peer Kroger, Matthias Renz, Sebastian Wurst 2005,” A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data”, In Proc. 5th IEEE International Conference on Data Mining (ICDM), Houston, TX. [8] Lance Parsons, Ehtesham Haque and Huan Liu 2004,” Subspace Clustering for High Dimensional Data: A Review” Supported in part by grants from Prop 301 (No. ECR A601) and CEINT. [9] Christian Baumgartner, Claudia Plant 2004,”Subspace Selection for Clustering High-Dimensional Data”, Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04) 0-7695- 2142-8/04 $ 20.00 IEEE. [10] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan 1998, “Automatic subspace clustering of high dimensional data for data mining applications”, In Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pages 94-105. ACM Press. [11] A. Strehl and J. Ghosh 2002 “Cluster ensembles – A knowledge reuse framework for combining multiple partitions,” Journal of Machine Learning Research, pp.583-617. [12] Zhizhou KONG et al. 2008,”A Novel Clustering-Ensemble Approach”, 978-1-4244-1748-3/08/ IEEE
  • 9. International Journal of Information Technology Convergence and Services (IJITCS) Vol.4, No.5/6, December 2014 9 [13] P. Viswanath 2006,“A Fast and Efficient Ensemble Clustering Method”, The 18th International Conference on Pattern Recognition (ICPR'06) 0-7695-2521-0/06 IEEE [14] Tung-Shou Chen et al.2005,” A combined k-means and hierarchical clustering method for improving the clustering efficiency of microarray”, Proceedings of 2005 International Symposium on Intelligent Signal Processing and Communication Systems. [15] Kohei Arai and Ali Ridho Barakbah,2007 ,” Hierarchical K-means: an algorithm for centroids initialization for K-means ”, Reports of the Faculty of Science and Engineering, Vol. 36, No.1, 36-1 (2007),25-31 [16] Zhao Yanchang et al. 2003,”A general framework for clustering high-dimension datasets”, CCECE 2003 - CCGEI 2003, Montrhal, M a y h i 2003 0-7803-7781-8/03 IEEE [17] Luying Chen et al. 2009,”An Initialization Method for Clustering High-Dimensional Data” ,First International Workshop on Database Technology and Applications [18] Emmanuel Muller et al. 2009, “Relevant Subspace Clustering: Mining the Most Interesting Non- Redundant Concepts in High Dimensional Data” Ninth IEEE International Conference on Data Mining. [19] Weiwei Zhuang et al.Ensemble 2012,” Clustering for Internet Security Applications”,IEEE transactions on systems, man, and cybernetics—part c: applications and reviews, vol. 42, no. 6. [20] Derek Greene et al. 2004 “Ensemble Clustering in Medical Diagnostics”, Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems (CBMS’04) 1063-7125/04 [21] Charu C. Aggarwal, Philip S. Yu,” Finding Generalized Projected Clusters in High Dimensional Spaces” IBM T. J. Watson Research Center Yorktown Heights, NY 10598 { charu, psyu }@watson.ibm.com [22] Reza Ghaemi 2009,”A Survey: Clustering Ensembles Techniques”, proceedings of world academy of science, engineering and technology volume 38 february 2009 issn: 2070-3740