Cluster analysisThe class of technique used to classify objects or cases into relatively homogenous groups called clusters. Also known as classification analysis or numerical taxonomy.Example: Clustering of variables on the variables like quality consciousness(var1) and Price sensitivity(var2)It requires no prior information about sample
Uses of Cluster Analysis• Segmenting the market(benefits soughts)• Understanding Buyer behavior• Assess new product opportunities(brands or markets)• Selecting test markets(grouping cities)• Effort to reduce clusters
Steps• Formulation of problem: Selecting relevant variables on interval scale.• Select a distance measure: how close or different objects are?Euclidean Distance• Select clustering procedure• Interpret or profiling clusters• Assess reliability of clustering
Steps in SPSS1. ANALYZE from SPSS2. Click CLASSIFY and then HIERARCHICAL CLUSTER3. Move the VARIABLES into VARIABLE box4. In Cluster check CASES. In DISPLAY Box check STATISTICS and PLOTS5. Click on statistics. In pop up window check agglomeration schedule. In cluster membership
Agglomeration Schedule• “Stage” with 19 clusters• Respondents 14 & 16 are combined “ Clusters combined”• Euclidean distance betwn two respondents “Cofficients”• “Stage cluster first appears” indicates the stage at which first cluster is formed. Entry of 1 in stage 6, respondent 14 was first grouped in stage 1• “Next Stage” the stage at which another cluster is combined with this one. Number is 6 so at the stage 6, 10 and 14 combined to form a single cluster
Icicle plot• Columns corresponds to objects being clustered, 1 through 20.• Row corresponds to number of clusters• Figure is read from bottom to top• First all cases are considered, last row 20 initial clusters• First step, two closest objects are combined resulting in 19 clusters, 14 and 16 are combined, X’s• Row 18 corresponds, 18 clusters, 6 and 7 are combined. Here 16 are individual, two contains two respondents.• Each step leads to a new cluster
Dendogram• Read fro left to right• Vertical lines represent clusters that r joined together.• Position of line represents the distance at which clusters were joined• Initially its less different as distances increase it becomes clear.
Deciding the Clusters• Practical , theoretical or conceptual considerations while deciding number of clusters• In hierarchical clustering, the distances at which clusters are formed are a criteria. In “coefficients” column suddenly more than doubles between stages 17 (three clusters) and 18 (clusters). That can be seen in last two stages of dendogram.
Interpret and profiling the clusters• Cluster 1 : High values variables V1(shopping is fun) and V 3(I combine shopping with eating out). It has a low value for V5( I don’t care about shopping). Cluster 1 can be labeled as “fun loving and concerned shoppers”. This consists of respondents or cases 1,3, 6,7,8,12,15 and 17.• Cluster 2 is just opposite with low values on V1 and V3 and high values V5 so it can be labeled as “Apathetic shoppers”. It consists of cases 2,5, 9, 11, 13 and 20.• Cluster 3 has high values of V2(shopping upsets budget, V4(I try to get best buys) and V6( comparing saves money) so they can be labeled as economical shoppers. It consists of cases 4, 10,14, 16, 18 and 19.
• The Initial Cluster center are the values of three randomly selected cases. Each case is assigned to nearest classification cluster center• The results also displays the cluster membership and the distance between each case and its classification center• Cluster 1 of hierarchical clustering is same sa cluster 3 of non hieararchical clustering• Cluster 3 of hierarchical clustering is same as cluster 1 of non hierarchical clustering
• The distance between the final cluster centers indicated that the pair of clusters are well seperated• Univarite F test for each clustering variable is presented. It is only desriptive
• AIC is at minimum (97.594) for a three cluster solution. A comparison of cluster centroids show that cluster 1(two step cluster) corresponds to cluster 2 (hierarchical). Cluster 2(two step cluster) corresponds to cluster 3(hierarchical) .• The results are same ensures validity of clustering