Clustering and Analysis in Data Mining<br />
What is Clustering?<br />The process of grouping a set of physical or abstract objects into classes of similar objects is ...
Why Clustering?<br />Scalability<br />Ability to deal with different types of attributes<br />Discovery of clusters with a...
 Data types in Cluster Analysis<br />Data matrix (or object-by-variable structure)<br />Interval-Scaled Variables<br />Bin...
Methods used in clustering:<br />Partitioning method.<br />Hierarchical method.<br />Data Density based method.<br />Grid ...
Hierarchical methods in clustering<br />   There are two types of hierarchical clustering methods:<br />Agglomerative hier...
Agglomerative hierarchical clustering<br />This bottom-up strategy starts by placing each object in its own cluster and th...
Divisive hierarchical clustering<br />This top-down strategy does the reverse of agglomerative hierarchical clustering by ...
Density-Based methods in clustering<br />DBSCAN: A Density-Based Clustering Method Based on Connected Regions withSufficie...
Grid-Based methods in clustering<br />STING: Statistical information gridSTING is a grid-based multi resolution clustering...
Model-Based Clustering Methods<br />Expectation-Maximization<br />Conceptual Clustering<br />Neural Network Approach<br />
Methods of Clustering High-Dimensional Data<br />CLIQUE: A Dimension-Growth Subspace Clustering MethodCLIQUE (CLustering I...
Constraint-Based Cluster Analysis<br />    Constraint-based clustering finds clusters that satisfy user-specified preferen...
Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutori...
Upcoming SlideShare
Loading in...5
×

Data Mining: clustering and analysis

6,522

Published on

Data Mining: clustering and analysis

Published in: Technology
2 Comments
6 Likes
Statistics
Notes
No Downloads
Views
Total Views
6,522
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
2
Likes
6
Embeds 0
No embeds

No notes for slide

Data Mining: clustering and analysis

  1. 1. Clustering and Analysis in Data Mining<br />
  2. 2. What is Clustering?<br />The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering.<br />
  3. 3. Why Clustering?<br />Scalability<br />Ability to deal with different types of attributes<br />Discovery of clusters with arbitrary shape<br />Minimal requirements for domain knowledge to determine input parameters<br />Ability to deal with noisy data<br />Incremental clustering and insensitivity to the order of input records:<br />High dimensionality<br />Constraint-based clustering<br />Interpretability and usability<br />
  4. 4.  Data types in Cluster Analysis<br />Data matrix (or object-by-variable structure)<br />Interval-Scaled Variables<br />Binary Variables<br />A categorical variable<br />A discrete ordinal variable<br />A ratio-scaled variable<br />
  5. 5. Methods used in clustering:<br />Partitioning method.<br />Hierarchical method.<br />Data Density based method.<br />Grid based method.<br />Model Based method.<br />
  6. 6. Hierarchical methods in clustering<br /> There are two types of hierarchical clustering methods:<br />Agglomerative hierarchical clustering<br />Divisive hierarchical clustering<br />
  7. 7. Agglomerative hierarchical clustering<br />This bottom-up strategy starts by placing each object in its own cluster and then merges these atomic clusters into larger and larger clusters, until all of the objects are in a single cluster or until certain termination conditions are satisfied.<br />
  8. 8. Divisive hierarchical clustering<br />This top-down strategy does the reverse of agglomerative hierarchical clustering by starting with all objects in one cluster. It subdivides the cluster into smaller and smaller pieces, until each object forms a cluster on its own or until it satisfies certain termination conditions, such as a desired number of clusters is obtained or the diameter of each cluster is within a certain threshold.<br />
  9. 9. Density-Based methods in clustering<br />DBSCAN: A Density-Based Clustering Method Based on Connected Regions withSufficiently High Density<br />OPTICS: Ordering Points to Identify the Clustering Structure<br />DENCLUE: Clustering Based on Density Distribution Functions<br />
  10. 10. Grid-Based methods in clustering<br />STING: Statistical information gridSTING is a grid-based multi resolution clustering technique in which the spatial area is divided into rectangular cells.<br />Wave Cluster: Clustering Using Wavelet TransformationWave Cluster is a multi resolution clustering algorithm that first summarizes the data by imposing a multidimensional grid structure onto the data space. It then uses a wavelet transformation to transform the original feature space, finding dense regions in the transformed space<br />
  11. 11. Model-Based Clustering Methods<br />Expectation-Maximization<br />Conceptual Clustering<br />Neural Network Approach<br />
  12. 12. Methods of Clustering High-Dimensional Data<br />CLIQUE: A Dimension-Growth Subspace Clustering MethodCLIQUE (CLustering In QUEst) was the first algorithm proposed for dimension-growth subspace clustering in high-dimensional space.<br />PROCLUS: A Dimension-Reduction Subspace Clustering MethodPROCLUS (PROjected CLUStering) is a typical dimension-reduction subspace clustering method. That is, instead of starting from single-dimensional spaces, it starts by finding an initial approximation of the clusters in the high-dimensional attribute space. Each dimension is then assigned a weight for each cluster, and the updated weights are used in the next iteration to regenerate the clusters.<br />
  13. 13. Constraint-Based Cluster Analysis<br /> Constraint-based clustering finds clusters that satisfy user-specified preferences or constraints, few categories of constraints are :<br />Constraints on individual objects<br />Constraints on the selection of clustering parameters<br />Constraints on distance or similarity functions<br />User-specified constraints on the properties of individual clusters<br />Semi-supervised clustering based on “partial” supervision<br />
  14. 14. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at www.dataminingtools.net<br />

×