SlideShare a Scribd company logo
1 of 3
Download to read offline
Pankaj Jadwal, Mrs.Ruchi Dave / International Journal of Engineering Research and
               Applications       (IJERA)     ISSN: 2248-9622     www.ijera.com
                      Vol. 2, Issue 5, September-October 2012, pp.322-324
  An Improved and Customized I-K Means For Avoiding Similar
                      Distance Problem
                Pankaj Jadwal[1]                                     Mrs.Ruchi Dave[2]
                  M.Tech Scholar(SGVU Jaipur)                                   SGVU ,Jaipur


ABSTRACT
         In k-means clustering algorithm, we have         am working is the point which I want to clusterize,
given n as number of points, k as number of               having same distance from 2 or more than
clusters and t as number of iterations for having         centriods.so this is the point where the existing K
optimized clusters. But there are some problems           Means fails. So I have proposed an improved and
associated with k means algorithm. A research             customized K Means so that this type of problem can
paper [4] develops an incremental algorithm for           be handled.
solving sum-of-squares clustering problems in
gene expression data sets. Clustering in gene              II.     Problem Formulation
expression data sets is a typical problem. A recent                 There are so many problems associated with
approach [5] to scaling the K means algorithm is          K Means algorithm. A lot of research has been done
based on discovering three kinds of regions .The          on K Means clustering. So many research papers has
problem which I am trying to solve is the similar         been published in order to solve the problems
distance problem. When we start clustering of             associated with K Means Clustering. Problem on
elements, problem can be arise that point x1              which I am working is similar distance problem in K
which I want to clusterize can be at the same             Means Clustering algorithm. As we know that in K
distance from more than one cluster. At that time         means algorithm, parameter on which we decide that
we cannot decide that which cluster I should              a particular point will come in which cluster, is the
choose for that element. So I proposed a approach         distance of that point from the centroid of that
towards solving similar distance problem using            cluster. We can say that a particular point will go into
improved k means clustering approach.                     that cluster whose distance from centriod of any
                                                          cluster is minimum. But if distance of any point p1 to
Keywords: k-means, clustering, optimized cluster.         2 or more than 2 cluster’s centroid is same than we
                                                          are not in the position of finding that point p1 should
  I.     INTRODUCTION                                     go into which cluster. Than I am just improving K
          K Means is one of the simplest clustering       Means clustering algorithm so that Improved K
algorithms that solve the famous clustering problem.      Means will be able to face such type of problem.
The algorithm follows a simple and easy way to            Now I am doing slight modification in K Means
clusterize a given data set through a fixed number of     Clustering algorithm so that we can solve such type
clusters (assume k clusters). The main thought is to      of problem
define k centroids, one for each cluster. These
centroids should be placed in such a way because of       I-KMeans algorithm:
different location causes different result. So, the       NOP: Number of points in a cluster
better choice is to place them as much as possible far    D (ki): Density of cluster ki
away from each other. The next step is to take each       1. Place K points into the space represented by the
point belonging to a given data set and associate it to        objects that are being clustered. These points
the closest centroid.There are so many cons with K             represent initial group centroids.
Means and many researchers have published papers          2. Assign each object to the group that has the
on that problems. A research paper named Selection             closest centroid.
of K in K-means clustering presented [1]. This paper      3. If more than one point has same and smallest
first reviews existing methods for selecting the               distance from the centroid then calculate NOP
number of clusters for the algorithm. Parameters that          (Number of Points) for each cluster and find Min
affect this selection are then discussed and a new             NOP. Object is assigned to the min NOP.
criterion to assist the selection is proposed. A          4. If NOP is same for these clusters than we calculate
research paper has been published named An                density for those clusters. D (ki) is the density of
Modified K Means algorithm for removing the               cluster ki where i is from 1 to number of clusters
problem of empty cluster[2]. In this paper, they have     having same NOP problem.
presented a improved k-means algorithm which              D (ki) = (Min distance from centriod of the ki
removes the problem of generation of empty clusters       cluster+ Max distance from centriod of the ki
(with some exceptions).But the problem on which I         Cluster)/2



                                                                                                  322 | P a g e
Pankaj Jadwal, Mrs.Ruchi Dave / International Journal of Engineering Research and
               Applications           (IJERA)       ISSN: 2248-9622  www.ijera.com
                        Vol. 2, Issue 5, September-October 2012, pp.322-324
   Element will be inserted to that cluster whose D      V.    RESULTS AND DISCUSSION:
(ki) will be minimum.                                               We have compared results in figure 1(a) and
5. When all objects have been assigned, recalculate       1(b).Now to understand, which clustering is better,
the positions of the K centroids.                         there should be equal distribution of points in each
6. Repeat Steps 2, 3 and 4 until the centroids no         cluster as much as possible and quality factor should
longer move. This produces a separation of the            be maximum. Quality factor can be defined in terms
objects into groups from which the metric to be           of maximizing the inter class similarity and
minimized can be calculated.                              minimizing the intra class similarity. Quality factor is
                                                          ratio of inter class similarity by inter class similarity.
III.     EXPERIMENTS AND RESULTS:                         As from figure 1(a) and 1 (b) we came to know that
          To evaluate the proposed algorithm, We          in I K Means equal distribution of data compare to K
have taken an example and apply K means and I             Means and quality factor in I K Means is better than
Kmeans.I have compare the result in terms of Quality      K Means.
factor which is the ratio of inter class similarity by
intra class similarity. I take dataset from uci machine   Algorithm                    Quality Factor
repository [6] for I K-Means Clustering .Data is of
the Banking sector.
                                                          K-Means                      .28
IV.      Data Set Information:
         The data is related with direct marketing        I K Means                    .50
campaigns of a Portuguese banking institution. The
marketing campaigns were based on phone calls.
Often, more than one contact to the same client was
required, in order to access if the product (bank term
deposit) would be (or not) subscribed.
K Means Clustering algorithm




                                                          Figure 2(Quality factor of Algorithms)

                                                          VI.      CONCLUSION
                                                                     We have proposed an improved approach
                                                          through which same distance problem can be solved
                                                          and better result can be obtained. Better result depend
I K Means clustering algorithm                            on two factors, First
                                                                     One is equal distribution of data in each
                                                          cluster and quality factor. In both factors I K Means
                                                          is better than K Means .Quality Factor of I K Means
                                                          is better than K Means.

                                                          References:
                                                            1.     Selection of K in K-means clustering, D T
                                                                   Pham, S S Dimov, and C D Nguyen,
                                                                   Manufacturing Engineering Centre, Cardiff
                                                                   University, Cardiff, UK.site name
                                                            2.     A Modified k-means Algorithm to Avoid
                                                                   Empty Clusters, Malay K. Pakhira ,Kalyani
                                                                   Government Engineering College Kalyani,
                                                                   West Bengal, INDIA.
                                                            3.     Improved K-means Algorithm Based on
FIGURE 1 Comparing results on (a) K Means (b) I                    Fuzzy Feature Selection, Xiuyun Li, Jie
K Means                                                            Yang, Qing Wang, Jinjin Fan, Peng Liu

                                                                                                    323 | P a g e
Pankaj Jadwal, Mrs.Ruchi Dave / International Journal of Engineering Research and
             Applications       (IJERA)     ISSN: 2248-9622     www.ijera.com
                    Vol. 2, Issue 5, September-October 2012, pp.322-324
       School of Information Management and
       Engineering, Shanghai University of
       Finance & Economics.
4.     Modified global k-means algorithm for
       clustering in gene         expression Data
       sets, Adil M. Bagirov and Karim Mardaneh.
5.     Jiawei Han and Micheline Kamber ,Data
       Mining:       Concepts and Techniques, 2nd
       edition, The Morgan Kaufmann Series in
       Data Management Systems, Jim Gray,
       Series Editor Morgan Kaufmann Publishers,
       March 2006. ISBN 1-55860-901-6
6.     UCI Repository of machine learning
       databases,    University    of    California.
       archive.ics.uci.edu/ml/datasets/Bank
       Marketing(accessed on line)




                                                                               324 | P a g e

More Related Content

What's hot

Performance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements dataPerformance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements dataMuhammad GulRaj
 
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming ProblemPenalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problempaperpublications3
 
Dh31504508
Dh31504508Dh31504508
Dh31504508IJMER
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoHridyesh Bisht
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design PatternsAshwin Shiv
 
Pert 05 aplikasi clustering
Pert 05 aplikasi clusteringPert 05 aplikasi clustering
Pert 05 aplikasi clusteringaiiniR
 
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGLOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGijaia
 
Bt0080 fundamentals of algorithms2
Bt0080 fundamentals of algorithms2Bt0080 fundamentals of algorithms2
Bt0080 fundamentals of algorithms2Techglyphs
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster AnalysisSSA KPI
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysisAcad
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Daniele Di Mitri
 
Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Tomonari Masada
 
집합모델 확장불린모델
집합모델  확장불린모델집합모델  확장불린모델
집합모델 확장불린모델guesta34d441
 

What's hot (19)

Performance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements dataPerformance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements data
 
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming ProblemPenalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
 
Use CNN for Sequence Modeling
Use CNN for Sequence ModelingUse CNN for Sequence Modeling
Use CNN for Sequence Modeling
 
Dh31504508
Dh31504508Dh31504508
Dh31504508
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
 
Pert 05 aplikasi clustering
Pert 05 aplikasi clusteringPert 05 aplikasi clustering
Pert 05 aplikasi clustering
 
KMAP PAPER (1)
KMAP PAPER (1)KMAP PAPER (1)
KMAP PAPER (1)
 
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGLOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
 
Bt0080 fundamentals of algorithms2
Bt0080 fundamentals of algorithms2Bt0080 fundamentals of algorithms2
Bt0080 fundamentals of algorithms2
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
PPID3 AICCSA08
PPID3 AICCSA08PPID3 AICCSA08
PPID3 AICCSA08
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
 
Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...
 
313 318
313 318313 318
313 318
 
집합모델 확장불린모델
집합모델  확장불린모델집합모델  확장불린모델
집합모델 확장불린모델
 

Similar to Bb25322324

New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmIRJET Journal
 
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering ProblemEnhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering ProblemAnders Viken
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAdeyemi Fowe
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
 

Similar to Bb25322324 (20)

Bj24390398
Bj24390398Bj24390398
Bj24390398
 
Clustering
ClusteringClustering
Clustering
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
 
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering ProblemEnhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
Data clustering
Data clustering Data clustering
Data clustering
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Noura2
Noura2Noura2
Noura2
 
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
 

Bb25322324

  • 1. Pankaj Jadwal, Mrs.Ruchi Dave / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September-October 2012, pp.322-324 An Improved and Customized I-K Means For Avoiding Similar Distance Problem Pankaj Jadwal[1] Mrs.Ruchi Dave[2] M.Tech Scholar(SGVU Jaipur) SGVU ,Jaipur ABSTRACT In k-means clustering algorithm, we have am working is the point which I want to clusterize, given n as number of points, k as number of having same distance from 2 or more than clusters and t as number of iterations for having centriods.so this is the point where the existing K optimized clusters. But there are some problems Means fails. So I have proposed an improved and associated with k means algorithm. A research customized K Means so that this type of problem can paper [4] develops an incremental algorithm for be handled. solving sum-of-squares clustering problems in gene expression data sets. Clustering in gene II. Problem Formulation expression data sets is a typical problem. A recent There are so many problems associated with approach [5] to scaling the K means algorithm is K Means algorithm. A lot of research has been done based on discovering three kinds of regions .The on K Means clustering. So many research papers has problem which I am trying to solve is the similar been published in order to solve the problems distance problem. When we start clustering of associated with K Means Clustering. Problem on elements, problem can be arise that point x1 which I am working is similar distance problem in K which I want to clusterize can be at the same Means Clustering algorithm. As we know that in K distance from more than one cluster. At that time means algorithm, parameter on which we decide that we cannot decide that which cluster I should a particular point will come in which cluster, is the choose for that element. So I proposed a approach distance of that point from the centroid of that towards solving similar distance problem using cluster. We can say that a particular point will go into improved k means clustering approach. that cluster whose distance from centriod of any cluster is minimum. But if distance of any point p1 to Keywords: k-means, clustering, optimized cluster. 2 or more than 2 cluster’s centroid is same than we are not in the position of finding that point p1 should I. INTRODUCTION go into which cluster. Than I am just improving K K Means is one of the simplest clustering Means clustering algorithm so that Improved K algorithms that solve the famous clustering problem. Means will be able to face such type of problem. The algorithm follows a simple and easy way to Now I am doing slight modification in K Means clusterize a given data set through a fixed number of Clustering algorithm so that we can solve such type clusters (assume k clusters). The main thought is to of problem define k centroids, one for each cluster. These centroids should be placed in such a way because of I-KMeans algorithm: different location causes different result. So, the NOP: Number of points in a cluster better choice is to place them as much as possible far D (ki): Density of cluster ki away from each other. The next step is to take each 1. Place K points into the space represented by the point belonging to a given data set and associate it to objects that are being clustered. These points the closest centroid.There are so many cons with K represent initial group centroids. Means and many researchers have published papers 2. Assign each object to the group that has the on that problems. A research paper named Selection closest centroid. of K in K-means clustering presented [1]. This paper 3. If more than one point has same and smallest first reviews existing methods for selecting the distance from the centroid then calculate NOP number of clusters for the algorithm. Parameters that (Number of Points) for each cluster and find Min affect this selection are then discussed and a new NOP. Object is assigned to the min NOP. criterion to assist the selection is proposed. A 4. If NOP is same for these clusters than we calculate research paper has been published named An density for those clusters. D (ki) is the density of Modified K Means algorithm for removing the cluster ki where i is from 1 to number of clusters problem of empty cluster[2]. In this paper, they have having same NOP problem. presented a improved k-means algorithm which D (ki) = (Min distance from centriod of the ki removes the problem of generation of empty clusters cluster+ Max distance from centriod of the ki (with some exceptions).But the problem on which I Cluster)/2 322 | P a g e
  • 2. Pankaj Jadwal, Mrs.Ruchi Dave / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September-October 2012, pp.322-324 Element will be inserted to that cluster whose D V. RESULTS AND DISCUSSION: (ki) will be minimum. We have compared results in figure 1(a) and 5. When all objects have been assigned, recalculate 1(b).Now to understand, which clustering is better, the positions of the K centroids. there should be equal distribution of points in each 6. Repeat Steps 2, 3 and 4 until the centroids no cluster as much as possible and quality factor should longer move. This produces a separation of the be maximum. Quality factor can be defined in terms objects into groups from which the metric to be of maximizing the inter class similarity and minimized can be calculated. minimizing the intra class similarity. Quality factor is ratio of inter class similarity by inter class similarity. III. EXPERIMENTS AND RESULTS: As from figure 1(a) and 1 (b) we came to know that To evaluate the proposed algorithm, We in I K Means equal distribution of data compare to K have taken an example and apply K means and I Means and quality factor in I K Means is better than Kmeans.I have compare the result in terms of Quality K Means. factor which is the ratio of inter class similarity by intra class similarity. I take dataset from uci machine Algorithm Quality Factor repository [6] for I K-Means Clustering .Data is of the Banking sector. K-Means .28 IV. Data Set Information: The data is related with direct marketing I K Means .50 campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (or not) subscribed. K Means Clustering algorithm Figure 2(Quality factor of Algorithms) VI. CONCLUSION We have proposed an improved approach through which same distance problem can be solved and better result can be obtained. Better result depend I K Means clustering algorithm on two factors, First One is equal distribution of data in each cluster and quality factor. In both factors I K Means is better than K Means .Quality Factor of I K Means is better than K Means. References: 1. Selection of K in K-means clustering, D T Pham, S S Dimov, and C D Nguyen, Manufacturing Engineering Centre, Cardiff University, Cardiff, UK.site name 2. A Modified k-means Algorithm to Avoid Empty Clusters, Malay K. Pakhira ,Kalyani Government Engineering College Kalyani, West Bengal, INDIA. 3. Improved K-means Algorithm Based on FIGURE 1 Comparing results on (a) K Means (b) I Fuzzy Feature Selection, Xiuyun Li, Jie K Means Yang, Qing Wang, Jinjin Fan, Peng Liu 323 | P a g e
  • 3. Pankaj Jadwal, Mrs.Ruchi Dave / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September-October 2012, pp.322-324 School of Information Management and Engineering, Shanghai University of Finance & Economics. 4. Modified global k-means algorithm for clustering in gene expression Data sets, Adil M. Bagirov and Karim Mardaneh. 5. Jiawei Han and Micheline Kamber ,Data Mining: Concepts and Techniques, 2nd edition, The Morgan Kaufmann Series in Data Management Systems, Jim Gray, Series Editor Morgan Kaufmann Publishers, March 2006. ISBN 1-55860-901-6 6. UCI Repository of machine learning databases, University of California. archive.ics.uci.edu/ml/datasets/Bank Marketing(accessed on line) 324 | P a g e