Published on

IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Ms.Suchita S.Mesakar, Prof.M.S.Chaudhari / International Journal of Engineering Research andApplications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.701-704701 | P a g eA Design of Fuzzy Approach for Data ClusteringMs.Suchita S.Mesakar*, Prof.M.S.Chaudhari***(IV Sem M.Tech, Department of Computer Science and Engg., BCCE, Nagpur)** (Asstt.Professor, Department of Computer Science and Engg., BCCE, Nagpur)ABSTRACTData mining is the process ofdiscovering patterns in large datasets. Theobjective of data mining is to extractinformation from a data set and transform itinto an understandable structure for furtheruse. Clustering in data mining is used todiscover distribution patterns in the underlyingdata. Clustering aims to group data into clustersbased on similarity/dissimilarity measures.Clustering algorithms uses the distance metricbased similarity measure in order to partitionthe database such that data points in the samepartition are more similar than points indifferent partitions. The clustering approachesdiffer in various aspects like flat or hierarchicalstructure, crisp or soft (fuzzy) clusterassignments. In this paper, the fuzzy basedclustering approach is used to cluster the data.Keywords:-Clustering, Data Mining, FuzzyClustering, Hard clustering, Soft clustering,1. IntroductionData mining is the process of extraction ofknowledge or meaningful information from largedataset or huge databases. Data clustering isfundamental technique in data mining. Clusteringaims to group objects into subsets in such a fashionthat similar objects are grouped together, whiledifferent objects belong to different groups.Clustering is used in many areas which includemachine learning, pattern recognition togetherwith data mining, document retrieval, imagesegmentation. The learning methods are classifiedas unsupervised and supervised. In unsupervisedlearning for given set of patterns, a collection ofclusters is to be discovered and additional patternsare assigned to correct cluster. In supervisedlearning set of classes (clusters) are given, newpattern (point) are assigned to proper cluster, andare labeled with label of its cluster. Clustering isunsupervised learning method because it deals withfinding a structure in a collection of unlabeled dataand no class values are given which can decide thepriori grouping of data. Clustering uses thedistance measures to find the similarity ordissimilarity between the objects. The commondistance measures which are used for clustering areEuclidean, Manhattan, Minkowski andMahalanobis distances. Many clusteringalgorithms are present, but selection of particularclustering algorithm depends upon the type of data.Clustering can be applied to numerical data andcategorical data. The numerical data consist ofnumeric attributes, such as age, cost such attributevalues can be ordered in specific manner and theproperties can be used to apply the distancemeasures. The categorical data has differentstructure than the numerical data. The categoricaldata consists of non numeric attributes. Theexample of categorical attributes arecolor={orange,blue,white}.The categorical data donot have specific ordering as numeric data, sodistance measures cannot be directly applied tocategorical attributes. Clustering strategies can beclassified as hard clustering and fuzzy clustering.In the hard clustering approach each object of thedataset belongs to only one cluster. In fuzzyclustering each object can belong to more than oneclusters depending upon the degree of membershipassociated with it.The major clustering methods are classified as Hierarchical Algorithms Partitional Algorithms Density Based Algorithms Grid Based AlgorithmsHierarchical clustering algorithmsorganize data into hierarchical structure. It startswith each case in a separate cluster and thencombines the clusters step by step, reducing thenumber of clusters at each step until only onecluster is left. Hierarchical algorithms areclassified into agglomerative methods and divisivemethods. Agglomerative method is step-by-stepclustering of objects and groups to larger groupsand divisive method is step-by-step splitting of thewhole set of objects into the smaller subsets andindividual objects.Partitional Clustering is simply division ofthe set of data objects into disjoint clusters.Density-based algorithms identify clustersas dense regions of objects in the data spaceseparated by regions of low density.Grid based methods first divide space intogrids, and then performs clustering on the grids.The main advantage of Grid based method is its
  2. 2. Ms.Suchita S.Mesakar, Prof.M.S.Chaudhari / International Journal of Engineering Research andApplications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.701-704702 | P a g efast processing time which depends on number ofgrids in each dimension in quantized space. Thedensity-based partitioning methods work best withnumerical attributes, and grid-based methods workwith attributes of different types.2. Review of Different Clustering AlgorithmsIn clustering, the most widely usedclustering algorithm is fuzzy clustering. Zadeh in1965 proposed the fuzzy set theory & it gave anidea of uncertainty of belonging which wasdescribed by a membership function. The use offuzzy set provides imprecise class membershipfunction .The basic idea of fuzzy clustering is non-unique partition of the data into clusters. Theobjects are assigned membership values for each ofthe clusters and fuzzy clustering algorithm allowthe clusters to grow accordingly.Many fuzzy clustering algorithms arepresent in literature. In many situations fuzzyclustering is more natural the hard clustering.Objects are not forced to belong to one cluster as inhard clustering; rather they are assignedmembership degrees between 0 and 1 indicatingtheir partial membership. Fuzzy clusteringalgorithms are broadly classified into two groups:i) Classical and ii) Shape-based [6]. There existmany classical fuzzy clustering algorithms in theliterature, among the most popular and widely usedbeing: i) Fuzzy cmeans (FCM) [2], ii) Suppressedfuzzy c-means (SFCM) [7], iii) Possibilistic c-means (PCM) [4], and (iv) Gustafson-Kessel (GK)[8], while the shape-based fuzzy clusteringalgorithms include: i) Circular shape- based [9], ii)Elliptical shape-based [10], and (iii) Genericshape-based techniques [11].The detailed review of classical fuzzyclustering algorithms is as below.Fuzzy c means clustering method wasdeveloped by Dunn in 1973[1] and improved byBezdek in 1981[2].The FCM employs fuzzypartitioning such that a data point can belong to allgroups with different membership grades between0 and 1. This algorithm works by assigningmembership to each data point corresponding toeach cluster center on the basis of distance betweenthe cluster center and the data point. More the datais near to the cluster center more is its membershiptowards the particular cluster center. Clearly,summation of membership of each data pointshould be equal to one. After each iterationmembership and cluster centers are updatedaccording to the formula [3].Krishnapuram and Keller [4] proposed anew clustering model named Possibilistic c-Means(PCM). The possibilistic C-means algorithm(PCM) was proposed to address the drawbacksassociated with the memberships used inalgorithms such as the fuzzy C-means (FCM). Theapproach differs from the existing clusteringmethods. In this the resulting partition of the datacan be interpreted as a possibilistic partition, andthe membership values can be interpreted asdegrees of possibility of the points belonging to theclasses. An appropriate objective function whoseminimum will characterize a good possibilisticpartition of the data is constructed, and themembership and prototype update equations arederived from necessary conditions forminimization of the criterion function.To overcome difficulties of the PCM, Pal[5] defines a clustering technique that combinesthe features of both Fuzzy a Possibilistic c-meanscalled Fuzzy Possibilistic c-Means (FPCM).Membership and Typicality’s are very significantfor the accurate characteristic of data substructurein clustering difficulty [3].An objective function inthe FPCM depends on both membership andtypicality. FPCM generates memberships andpossibilities at the same time, together with theusual point prototypes or cluster center for eachcluster.The Gustafson-Kessel (GK) algorithm [8]is a powerful clustering technique that has beenused in various image processing, classificationand system identification applications. Gustafsonand Kessel extended the standard fuzzy cmeansalgorithm by employing an adaptive distancenorm, in order to detect clusters of differentgeometrical shapes in one data set.3. Proposed WorkIn this paper, a novel fuzzy clusteringapproach is used for clustering of data.The input to the algorithm will be thedataset containing numerical data. The algorithmbegins by converting each value into fuzzy setdepending upon the membership function. Afterthe fuzzy regions are decided the maximum countvalue for each region is found. The fuzzy itemsetwill be decided from the maximum count values.The candidate set is decided according to whichthe clusters are decided. The term cluster matrixand data cluster matrix is build and finally thevalues are added to the best cluster.The proposed approach can be given infollowing steps
  3. 3. Ms.Suchita S.Mesakar, Prof.M.S.Chaudhari / International Journal of Engineering Research andApplications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.701-704703 | P a g eAlgorithmInput:2 1 1 0 0 01 1 0 0 0 01 0 2 0 0 00 0 0 3 0 20 0 0 11 1 10 1 0 4 0 00 0 0 8 1 23 0 1 0 0 00 1 0 3 0 00 0 0 8 2 1Fig.1. Input valuesStep 1. Transform each term frequency into fuzzyset.Step 2. For all fuzzy regions, calculate the value ofcount.L M H L M H L M H L M H L M H L M H1.67 1.33 1 2 1 1 2 1 1 0 0 0 0 0 0 0 0 02 1 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 02 1 1 0 0 0 1.67 1.33 1 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 1.33 1.67 1 0 0 0 1.67 1.33 10 0 0 0 0 0 0 0 0 1 1 2 2 1 1 2 1 10 0 0 2 1 1 0 0 0 1 2 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 1 1.43 1.57 2 1 1 1.67 1.33 11.33 1.67 1.67 0 0 0 2 1 1 0 0 0 0 0 0 0 0 00 0 0 2 1 1 0 0 0 1.33 1.67 1 0 0 0 0 0 00 0 0 0..00 0 0 0 0 0 1 1.43 1.57 1.67 1.33 1 2 1 1Count 7 5 4 8 4 4 5.67 3.33 3 6.66 9.2 8.14 5.67 3.33 3 7.34 4.66 4Fig.2. Calculation of Fuzzy RegionsStep3. Find the region of each term withmaximum count.Step 4. Find fuzzy frequent itemset L1Count Support Values (Count/TotalNo. )7 7.00/10 = 70 %8 8.00/10 = 80 %5.67 5.67/10 = 57 %9.2 9.20/10 = 92 %5.67 5.67/10 = 57 %7.34 7.34/10 = 73 %Fig.3. Fuzzy Frequent ItemsetStep 5. Generate the candidate set.Step 6. Candidate cluster is generated based on thefuzzy frequent itemsets.Step 7. Build p x k term cluster matrix G = [gmax-R]Fig.4.Term Cluster MatrixStep 8. Build n x k data cluster matrix V.Fig.5.Data Cluster MatrixStep 9. Assign a data to a best target cluster.Fig.6.Targeted Clusters4. ConclusionFuzzy clustering is a powerfulunsupervised method for the analysis of data andconstruction of models. The overview of the mostfrequently used fuzzy clustering algorithms isdiscussed in this paper and a novel fuzzy basedclustering approach is proposed to cluster the data.The approach starts by deciding the fuzzy regionsfor given data and fuzzy frequent itemset from thefuzzy region.Finally, the term cluster matrix anddata cluster matrix is build and the values areadded to the best cluster.
  4. 4. Ms.Suchita S.Mesakar, Prof.M.S.Chaudhari / International Journal of Engineering Research andApplications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.701-704704 | P a g eReferences[1] J. C. Dunn (1973): "A Fuzzy Relative ofthe ISODATA Process and Its Use inDetecting Compact Well-SeparatedClusters", Journal of Cybernetics 3: 32-57[2] J. C. Bezdek (1981): "Pattern Recognitionwith Fuzzy Objective FunctionAlgoritms", Plenum Press, New York.[3] R.Suganya, R.Shanthi,” Fuzzy C-MeansAlgorithm-A Review”, InternationalJournal of Scientific and ResearchPublications, Volume 2, Issue 11,November 2012.[4] R. Krishnapuram and J. M. Keller, “Apossibilistic approach to clustering,” IEEETransactions on Fuzzy Systems, vol. 1, no.2, pp. 98–110, 1993.[5] Pal N.R, Pal K, Keller J.M. and BezdekJ.C, “A Possibilistic Fuzzy c-MeansClustering Algorithm”, IEEETransactions on Fuzzy Systems, Vol. 13,No. 4, Pp. 517–530, 2005.[6] Hoppner, F., et al., Fuzzy ClusterAnalysis: methods for classification, dataanalysis, and image recognition. 1999,New York: John Wiley & Sons, Ltd.[7] Fan, J.L., Zhen, W.Z., and Xie,W.X.,Suppressed fuzzy c-meansclusteringalgorithm. Pattern RecognitionLetters, 2003.24: pp. 1607- 1612.[8] Gustafson, D.E. and Kessel, W.C.,Fuzzyclustering with a fuzzy covariance matrix.In Proceedings of IEEE Conference onDecision Control. 1979. pp. 761-766.[9] Man, Y. and Gath, I., Detection andseparationof ring shaped clusters usingfuzzy clustering,IEEE Transaction onPattern Analysis and MachineIntelligence, 1994.16(8): pp. 855-861.[10] Gath, I. and Hoory, D., Fuzzy clusteringof elliptic ring-shaped clusters.PatternRecognition Letters, 1995.16(7):pp. 727-741.[11] Ameer Ali, M., Dooley, L.S., andKarmakar,G.C.,Object basedsegmentation using fuzzy clustering, inIEEE International Conference onAcoustics, Speech, and SignalProcessing.2006.