2. Data Exploration
• The percentage sales is the highest for the category 1 ,Fresh Foods
section(38%)
• The percentage sales is the lowest for the category 3 , Health and
Beauty section at 13%
• Average Sale is around 210 Rs/sqft.
3. Clustering Methodology used
• K – Means technique used to cluster the different stores.
• Worked to understand the techniques with different number of
clusters : 4 , 5 and 6 at the initial clustering level.
• The 4 initial cluster works the best with the different clusters having a
z score with a good difference from that of the population. The
cluster strengths are greater too .So this looks to be the optimal
number of cluster.
• One of the clusters is then again processed to get 3 more clusters .
4. Cluster Profiling
• Cluster 1
• No of stores in the cluster : 91
• Cluster Strength : 4.71
• The significant difference between the cluster and the population is the avg
sale amount.It is around 291 Rs/sqft compared to 210Rs/sqft for the
population.
• Frozen food higher than in the population.(28% > 25%)
• Tobacco sales are less compared to the population. (20% < 23%)
5. Cluster Profiling
• Cluster 2
• No of stores in the cluster : 151 , Cluster Strength : 3.10
• The biggest differentiator here is the Tobacco and Alcohol sections.(27.5% >
23% in the population)
• Average sale is significantly lower than that of the population(163 Rs/sqft <<
210 Rs/sqft)
• The fresh food sections sales are also very low compared to the overall
population.(32% << 38%)
• Health and Beauty sales is slightly better than the population(16.2% > 13.7%)
6. Cluster Profiling
• Cluster 3
• No of stores in the cluster : 3 , Cluster Strength : 8.33
• Very high difference between the average sales in these stores(471Rs/sqft >
210Rs/sqft)
• Tobacco and Alcohol forms considerably low percentage if compared to the
population.(18% < 23%)
• Fresh Foods among these stores has a higher share of the sales compared to
that in the overall population.
• Frozen food share also is a little greater than in the population.(26.4% >
24.9%)
• Looks to be an outlier group with very less no of stores and significantly
different characteristics from the population.
7. Cluster Profiling
• Cluster 4
• No of stores in the cluster : 270 .A much higher than expected no of
stores(100-140) in the cluster .
• Cluster Strength : 3.12
• Not very significant changes from the characteristics of the whole population.
• The highest differentiator being the Fresh Food Section ( 41.4% > 38.4%)
• The Tobacco and Alcohol section sales accounts a little less than the average
population(21.4% < 22.8%)
• Another iteration on the stores of this clusters is done to get more details.
8. Cluster Profiling
• Cluster 4_1
• No of stores in the cluster : 101 , Cluster Strength : 3.26
• Significant difference in the Fresh Food sales in this cluster compared to the
population.(45% > 38%)
• Average sales is around 184 Rs/sqft compared to the 210 Rs/sqft for the
population.
• Tobacco & Alcohol , Health & Beauty sections sales are a tad lower to the
overall population.
9. Cluster Profiling
• Cluster 4_2
• No of stores in the cluster : 85 , Cluster Strength : 3.42
• It has a very similar characteristics for all the store sections and average sales
when compared to the overall population.
10. Cluster Profiling
• Cluster 4_3
• No of stores in the cluster : 84 , Cluster Strength : 3.55
• Frozen Food section sales are a little lower than the overall population(20% <
22%)
• Sales/sqft a little higher than the population for the stores in this cluster (233
Rs/sqft > 210 Rs/sqft)
11. Recommendations
• We find that the cluster 3 with the 3 stores is an outlier .On further
check(as mentioned in the class videos) , its found to be the open
area outside the store to sell the fresh foods that is not accounted for.
• Hence , the stores can maybe try to sell the fresh food items in the
extra open area outside the stores.
• We would need more data to analyze why certain stores and sections
have greater sales in different clusters.