Published on

Published in: Education
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. CLIQUE<br />09mx<br />Crew Members ~<br /> K. Kanagaraj 14<br /> S. Karthikeyan 17<br /> S. Kathiresan 19<br /> N. PadmaShree 28<br /> M. RamKumar 33<br /> S. Sowmya 45<br />
  2. 2. GRID-BASED CLUSTERING METHOD<br />Using multi-resolution grid data structure<br />Clustering complexity depends on the number of populated grid cells and not on the number of objects in the dataset<br />Space into a finite number of cells that form a grid structure on which all of the operations for clustering is performed.<br />(eg) assume that we have a set of records and we want to cluster with respect to two attributes, then, we divide the related space (plane), into a grid structure and then we find the clusters. <br />
  3. 3. Salary (10,000)<br />“Space” is this <br />plane<br />8<br />7<br />6<br /> 5 <br />4<br />3<br />2<br /> 1<br />0<br /> 20 30 40 50 60 <br />Age<br />
  4. 4. 4<br />Advantages of Grid-based Clustering<br />fast<br />No distance computations<br />Complexity is usually on #-of populated-grid-cells and not on #-of objects<br />Easy to determine which clusters are neighboring<br />Shapes are limited to union of grid-cells<br />
  5. 5. Techniques for Grid-Based Clustering<br />The following are some techniques that are used to perform Grid-Based Clustering:<br />CLIQUE (CLustering In QUEst.)<br />STING (STatistical Information Grid.)<br />WaveCluster<br />
  6. 6. CLIQUE<br />CLustering In QUEst – By Agarwal, Gehrke, Gunopulos, Raghavan published in (SIGMOD ‘98) - [Special Interest Group on Management of Data]<br />Clustering - grouping of a number of similar things acc,. to Characteristic or Behavior.<br />Quest - make a search (for)<br />Automatic sub-space clustering of high dimension data<br />
  7. 7. Looking at CLIQUE as an Example<br /> CLIQUE is used for the clustering of high-dimensional data present in large tables. <br /> By high-dimensional data we mean records that have many attributes.<br /> CLIQUE identifies the dense units in the subspaces of high dimensional data space, and uses these subspaces to provide more efficient clustering. <br />
  8. 8. Definitions That Need to Be Known<br />Unit : After forming a grid structure on the space, each rectangular cell is called a Unit.<br />Dense: A unit is dense, if the fraction of total data points contained in the unit exceeds the input model parameter.<br />Cluster: A cluster is defined as a maximal set of connected dense units. <br />
  9. 9. How Does CLIQUE Work?<br /> Let us say that we have a set of records that we would like to cluster in terms of n-attributes.<br /> So, we are dealing with an n-dimensional space. <br />MAJOR STEPS :<br />CLIQUE partitions each subspace that has dimension 1 into the same number of equal length intervals.<br />Using this as basis, it partitions the n-dimensional data space into non-overlapping rectangular units.<br />
  10. 10. CLIQUE: Major Steps (Cont.)<br />Now CLIQUE’S goal is to identify the dense n-dimensional units.<br />It does this in the following way:<br />CLIQUE finds dense units of higher dimensionality by finding the dense units in the subspaces.<br />So, for example if we are dealing with a 3-dimensional space, CLIQUE finds the dense units in the 3 related PLANES (2-dimensional subspaces.)<br />It then intersects the extension of the subspaces representing the dense units to form a candidate search space in which dense units of higher dimensionality would exist.<br />
  11. 11. CLIQUE: Major Steps. (Cont.)<br />Eachmaximal set of connected dense units is considered a cluster.<br />Using this definition, the dense units in the subspaces are examined in order to find clusters in the subspaces. <br />The information of the subspaces is then used to find clusters in the n-dimensional space. <br />It must be noted that all cluster boundaries are either horizontal or vertical. This is due to the nature of the rectangular grid cells.<br />
  12. 12. Example for CLIQUE<br /> Let us say that we want to cluster a set of records that have three attributes namely salary, vacation and age.<br /> The data space for the this data would be 3-dimensional. <br />vacation<br />age<br />salary<br />
  13. 13. Example (Cont.)<br /> After plotting the data objects, each dimension, (i.e., salary, vacation and age) is split into intervals of equal length.<br /> Then we form a 3-dimensional grid on the space, each unit of which would be a 3-D rectangle.<br /> Now, our goal is to find the dense 3-D rectangular units.<br />
  14. 14. Example (Cont.)<br />To do this, we find the dense units of the subspaces of this 3-d space.<br />So, we find the dense units with respect to age for salary. This means that we look at the salary-age plane and find all the 2-D rectangular units that are dense.<br /> We also find the dense 2-D rectangular units for the vacation-age plane.<br />
  15. 15. Example <br />
  16. 16. Example (Cont.)<br /> Now let us try to visualize the dense units of the two planes on the following 3-d figure :<br />
  17. 17. Example (Cont.)<br />We can extend the dense areas in the vacation-age plane inwards.<br /> We can extend the dense areas in the salary-age plane upwards. <br />The intersection of these two spaces would give us a candidate search space in which 3-dimensional dense units exist.<br />We then find the dense units in the salary-vacation plane and we form an extension of the subspace that represents these dense units.<br />
  18. 18. Example (Cont.)<br /> Now, we perform an intersection of the candidate search space with the extension of the dense units of the salary-vacation plane, in order to get all the 3-d dense units. <br />So, What was the main idea?<br />We used the dense units in subspaces in order to find the dense units in the 3-dimensional space. <br />After finding the dense units, it is very easy to find clusters.<br />
  19. 19. Reflecting upon CLIQUE<br /> Why does CLIQUE confine its search for dense units in high dimensions to the intersection of dense units in subspaces?<br /> Because the Apriori property employs prior knowledge of the items in the search space so that portions of the space can be pruned. <br /> The property for CLIQUE says that if a k-dimensional unit is dense then so are its projections in the (k-1) dimensional space.<br />
  20. 20. Strength and Weakness of CLIQUE<br />Strength<br />It automatically finds subspaces of thehighest dimensionality such that high density clusters exist in those subspaces.<br />It is quite efficient.<br />It is insensitive to the order of records in input and does not presume some canonical data distribution.<br />It scales linearly with the size of input and has good scalability as the number of dimensions in the data increases.<br />Weakness<br />The accuracy of the clustering result may be degraded at the expense of simplicity of the simplicity of this method.<br />
  21. 21. Although the study of complete subgraphs goes back at least to the graph-theoretic reformulation of Ramsey theory by Erdős & Szekeres (1935),[1] the term "clique" comes from Luce & Perry (1949), who used complete subgraphs in social networks to model cliques of people; that is, groups of people all of whom know each other. Cliques have many other applications in the sciences and particularly in bioinformatics.<br />
  22. 22. A maximal clique is a clique that cannot be extended by including one more adjacent vertex, that is, a clique which does not exist exclusively within the vertex set of a larger clique.<br />A maximum clique is a clique of the largest possible size in a given graph. The clique number ω(G) of a graph G is the number of vertices in a maximum clique in G.<br />