Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Data Mining: clustering and analysis

Data Mining: clustering and analysis

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Data Mining: clustering and analysis

  1. 1. Clustering and Analysis in Data Mining<br />
  2. 2. What is Clustering?<br />The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering.<br />
  3. 3. Why Clustering?<br />Scalability<br />Ability to deal with different types of attributes<br />Discovery of clusters with arbitrary shape<br />Minimal requirements for domain knowledge to determine input parameters<br />Ability to deal with noisy data<br />Incremental clustering and insensitivity to the order of input records:<br />High dimensionality<br />Constraint-based clustering<br />Interpretability and usability<br />
  4. 4.  Data types in Cluster Analysis<br />Data matrix (or object-by-variable structure)<br />Interval-Scaled Variables<br />Binary Variables<br />A categorical variable<br />A discrete ordinal variable<br />A ratio-scaled variable<br />
  5. 5. Methods used in clustering:<br />Partitioning method.<br />Hierarchical method.<br />Data Density based method.<br />Grid based method.<br />Model Based method.<br />
  6. 6. Hierarchical methods in clustering<br /> There are two types of hierarchical clustering methods:<br />Agglomerative hierarchical clustering<br />Divisive hierarchical clustering<br />
  7. 7. Agglomerative hierarchical clustering<br />This bottom-up strategy starts by placing each object in its own cluster and then merges these atomic clusters into larger and larger clusters, until all of the objects are in a single cluster or until certain termination conditions are satisfied.<br />
  8. 8. Divisive hierarchical clustering<br />This top-down strategy does the reverse of agglomerative hierarchical clustering by starting with all objects in one cluster. It subdivides the cluster into smaller and smaller pieces, until each object forms a cluster on its own or until it satisfies certain termination conditions, such as a desired number of clusters is obtained or the diameter of each cluster is within a certain threshold.<br />
  9. 9. Density-Based methods in clustering<br />DBSCAN: A Density-Based Clustering Method Based on Connected Regions withSufficiently High Density<br />OPTICS: Ordering Points to Identify the Clustering Structure<br />DENCLUE: Clustering Based on Density Distribution Functions<br />
  10. 10. Grid-Based methods in clustering<br />STING: Statistical information gridSTING is a grid-based multi resolution clustering technique in which the spatial area is divided into rectangular cells.<br />Wave Cluster: Clustering Using Wavelet TransformationWave Cluster is a multi resolution clustering algorithm that first summarizes the data by imposing a multidimensional grid structure onto the data space. It then uses a wavelet transformation to transform the original feature space, finding dense regions in the transformed space<br />
  11. 11. Model-Based Clustering Methods<br />Expectation-Maximization<br />Conceptual Clustering<br />Neural Network Approach<br />
  12. 12. Methods of Clustering High-Dimensional Data<br />CLIQUE: A Dimension-Growth Subspace Clustering MethodCLIQUE (CLustering In QUEst) was the first algorithm proposed for dimension-growth subspace clustering in high-dimensional space.<br />PROCLUS: A Dimension-Reduction Subspace Clustering MethodPROCLUS (PROjected CLUStering) is a typical dimension-reduction subspace clustering method. That is, instead of starting from single-dimensional spaces, it starts by finding an initial approximation of the clusters in the high-dimensional attribute space. Each dimension is then assigned a weight for each cluster, and the updated weights are used in the next iteration to regenerate the clusters.<br />
  13. 13. Constraint-Based Cluster Analysis<br /> Constraint-based clustering finds clusters that satisfy user-specified preferences or constraints, few categories of constraints are :<br />Constraints on individual objects<br />Constraints on the selection of clustering parameters<br />Constraints on distance or similarity functions<br />User-specified constraints on the properties of individual clusters<br />Semi-supervised clustering based on “partial” supervision<br />
  14. 14. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at<br />
  • PoojaPatil341

    Apr. 29, 2020
  • ajaysrivastava002

    Apr. 14, 2020
  • imtiazashraf29

    Jan. 24, 2020
  • ShaikAlthaf16

    Dec. 16, 2019
  • Sirwans12345

    Nov. 18, 2019
  • AlaAtta

    Nov. 9, 2019
  • umahesh33

    Apr. 8, 2019
  • ShaikFardhinvali

    Feb. 26, 2019
  • TimothyMujabi

    Jun. 5, 2018
  • nommanshaik

    Apr. 22, 2018
  • abdomuaadh

    Apr. 3, 2018
  • GayatriAkula1

    Dec. 24, 2017
  • usharay1

    Dec. 20, 2017
  • abhiramreddy8

    Dec. 8, 2017
  • pavithrachandrasekar3

    Nov. 24, 2017
  • SamiullahMosafar

    Nov. 23, 2017
  • KurellaManikanta

    Sep. 20, 2017
  • zendu1

    Aug. 23, 2017
  • shachibattar

    Jun. 8, 2017
  • EashanDeshmukh

    Apr. 30, 2017

Data Mining: clustering and analysis


Total views


On Slideshare


From embeds


Number of embeds