Product Cluster
Analysis
Presented by : AKSHITHA RAI
Leveraging Data for Competitive Edge
PROJECT DOMAIN
Retail and Wholesale Distribution
 Overview of the Industry: The retail and wholesale
distribution sector plays a pivotal role in the economy,
connecting manufacturers with consumers, encompassing
businesses that sell and distribute goods directly to end-users.
 Challenges and Opportunities: Businesses face diverse
challenges including inventory management, including supply
chain complexities, changing consumer preferences, and
market competition. However, this project is specifically geared
towards addressing inventory management challenges, such as
stockouts and overstocking.
What is Inventory Management?
There are three types of inventory:
 Raw Materials are stock used to make an end product.
 Work in Process consists of the raw materials that are being made into
finished goods.
 Finished Goods are the final products that get produced for sale to
consumers.
Inventory management tracks how much physical inventory you have
in your organization. It monitors stock at other locations, such as
distributors or subcontractors. When you have clear visibility into
your inventory, you know when to order, where to store it, and when
you need to stop selling.
PROJECT OVERVIEW
Primary Objective
Benefits
Group similar products based on sales patterns
to optimize inventory management and tailor
offerings to better align with customer
preferences.
 Streamlined Inventory
Management: Clustering similar products for
optimal stock levels. By understanding the
specific needs of each product cluster.
 Tailored Product Offerings: Develop
targeted promotional strategies and dynamic
pricing models for each product category.
 Reduced Costs: Minimizing stock-outs or
overstocking can lead to significant cost
savings in warehousing, transportation, and
handling.
DATA DEFINITION
Warehouse and Retail Sales dataset:
Data Columns/Features given:
 YEAR: Calendar Year
 MONTH: Month
 SUPPLIER: Supplier Name
 ITEM CODE: Item code
 ITEM DESCRIPTION: Item Description
 ITEM TYPE: Item Type
 RETAIL SALES: Cases of product sold from DLC
dispensaries
 RETAIL TRANSFERS: Cases of product transferred to
DLC dispensaries
 WAREHOUSE SALES: Cases of product sold to MC
licensees
Total Entries: 307,645
Columns: 9
“Cases" represent a standard packaging unit, such as a case of 12
bottles or a case of 24 units, depending on the specific products and
Alcohol Beverage Services,
previously known as
the Department of Liquor
Control (DLC) is a government
agency within the County
of Montgomery, Maryland and
is the wholesaler of beer, wine
and alcoholic beverages.
01 - Data Cleaning
02 - Data Exploration
Methodology Overview
04 - Feature Scaling
06 - Evaluate the cluster
07 -
07- Clustering Analysis
03 - Data Pre-processing
05 - Apply ML Algorithms
Data Cleaning Summary
 Missing Values:
 There are missing values in the "SUPPLIER" column (167 missing
values), which account for only 0.0543% of the values in the
"Supplier" column, hence can be dropped.
 Rows with missing values in the "ITEM TYPE" column (1 value) will
also be dropped.
 Duplicates:
 Total duplicates found: 0.
 After Data Cleaning: After handling missing values and removing
duplicates, the "df_clean" DataFrame contains 307,477 entries with 9
columns.
Data Exploration
9
ITEM TYPES
396
SUPPLIERS
34039
TOTAL
UNIQUE
PRODUCTS
Unique Item Types and
Frequencies:
WINE: 187,640
LIQUOR: 64,910
BEER: 42,413
KEGS: 10,146
NON-ALCOHOL: 1,899
STR_SUPPLIES: 318
REF: 79
DUNNAGE: 72
 The dataset contains a variety of item types, with wine being the most prevalent,
followed by liquor and beer. Each unique item code corresponds to a distinct
product, showcasing the wide variety of products available in the dataset.
 The dataset comprises a diverse range
of suppliers, reflecting a broad network
of sources for the products included.
 Among all the item types we have,
the 'REF' category remains somewhat
ambiguous in its definition. Let's delve
deeper into its contents to shed light on
the nature of items it encompasses.
Data Exploration
 The "Store Special Wine," "Store
Special Beer Quart," and "Store Special
Liquor" items within the REF category
likely represent specific promotional or
discounted offerings for these beverage
types.
 Moving these items from the REF category
to their respective broader categories
(WINE, LIQUOR, and BEER) could provide
a clearer representation of sales within
each beverage type.
 "Corkscrew" and "Wine Aerator-in
Bottle" are accessories or tools related to
wine consumption rather than actual
alcoholic beverages. So let's rename
'REF' to "Wine Tools".
Data Exploration  Wine emerges as the
most prevalent item type
in the dataset, with
187,679 occurrences,
indicating its significant
presence in sales data.
 Liquor follows closely
behind wine, with 64,911
occurrences, suggesting
its substantial contribution
to overall sales volume
and consumer preference.
 Beer ranks third in
frequency, with 42,422
occurrences, highlighting
a notable consumer
preference for beer
products.
Data Exploration  The top suppliers
predominantly belong to
the beer industry, including
renowned brands such
as Miller Brewing Company,
Anheuser Busch Inc, and
Heineken USA. This
suggests that beer products
play a significant role in
driving sales for the business.
 There is also representation
from other beverage
categories, such as wine and
spirits. E & J Gallo Winery
and Diageo North America
Inc are notable suppliers of
wine and spirits, contributing
to the diversity of product
offerings.
Data Exploration
 The correlation coefficient
between retail sales and
warehouse sales is
approximately 0.501. This
indicates a moderate positive
correlation between the two
variables but the relationship is
not extremely strong.
Data Exploration-RETAIL
SALES
 Liquor products (whiskey, brandy,
vodka, rum, gin and tequila)
demonstrate the highest retail sales,
suggesting a considerable demand,
potentially driven by factors like taste
preferences or social trends.
 Wine and beer closely trail liquor in
retail sales, indicating their enduring
popularity and consumer appeal within
the market.
 Non-alcoholic beverages make a
modest contribution to retail sales,
reflecting a segment of the market that
caters to consumers seeking
alternatives to alcoholic beverages,
perhaps driven by health-conscious
choices or personal preferences.
"RETAIL SALES" typically refer to the cases of products sold directly to customers
from DLC (Department of Liquor Control) dispensaries or retail outlets.
Data Exploration- RETAIL
TRANSFER
 The analysis highlights the varying
degrees of demand across different
beverage categories, with alcoholic
beverages such as liquor, wine, and
beer leading in terms of both retail
transfers and sales.
 Non-alcoholic beverages, although
contributing less to total retail transfers
and sales compared to alcoholic
beverages, still show notable
demand.
Let's examine the items under
STR_SUPPLIES category to
understand why there is a surge
in retail transfers compared to
retail sales for STR supplies .
Data Exploration
These observations suggest a diverse range of
products being transferred, with a notable emphasis
on packaging materials like paper bags and thermal
register paper.
 Paper bags in various sizes (12LB, 20LB, quarts,
pints, 1/6 barrel) are the top-selling items in terms of
retail transfers, indicating a significant demand for
packaging materials. Using paper bags with branded
logos or designs can contribute to a store's branding
and image.
 Single bottle wine gift totes are also popular,
suggesting a preference for gifting wine.
 Thermal register paper is also among the top-
selling items, it plays a crucial role in retail
operations by facilitating efficient and reliable receipt
printing at the point of sale.
 Other items like shot glasses, plastic bags, wine
Data Exploration-Warehouse
SALES
 Beer emerges as the top-performing
item type in terms of warehouse
sales. This dominance may be
attributed to several factors,
including its popularity among
consumers, diverse product
offerings, and its relatively longer
shelf life compared to some other
alcoholic beverages.
 Despite being second to beer, wine
still makes a substantial contribution
to warehouse sales.
 KEGS demonstrate considerable
demand and play a crucial role in the
distribution of draft beer to bars,
restaurants, and other
establishments.
"WAREHOUSE SALES" refer to cases of products sold to MC (Montgomery County)
licensees, which are establishments authorized to sell alcohol for consumption on their
premises. These sales are typically made in bulk to businesses such as bars, restaurants,
clubs, and hotels for resale to their customers.
Data Exploration
 These item descriptions for 'DUNNAGE' suggest that they
represent various sizes of empty kegs used for storing and
dispensing beverages. In inventory management, these items
would be categorized as dunnage, which refers to materials
used for packaging, storing, or transporting goods to prevent
damage.
 The negative values in the "WAREHOUSE SALES" column for
dunnage items indicate a decrease in inventory, likely due to
returns, exchanges, or damage to the empty kegs.
 Since dunnage items have zero retail sales and retail transfers,
they primarily represent inventory management for kegs, it may
be appropriate to remove them from the ITEM TYPES
before applying clustering algorithms, as they do not
contribute to the primary objectives of analyzing sales patterns
and optimizing inventory management.
Data Preprocessing
 Let's drop the ITEM DESCRIPTION,
SUPPLIER, and ITEM CODE columns as
they are categorical and may not directly
contribute to the clustering process.
 Let's also drop the 'YEAR' and 'MONTH'
columns as the given data is not evenly
distributed across 2017, 2018, 2019, and
2020. Uneven distribution of data across
years and months can introduce noise and
bias into the clustering process, potentially
leading to misleading insights.
 To prepare the data for clustering analysis,
we performed dummy encoding on the
categorical 'ITEM TYPE' column using the
pd.get_dummies() function. The dummy
encoded columns allow the inclusion of
categorical information (item types) in the
numerical data, ensuring that this
Feature Scaling
 Since clustering algorithms are sensitive to the scale of the
features, it's essential to scale the numerical features before
applying clustering algorithms.
 We applied MinMax scaling to normalize the numerical features,
including 'RETAIL SALES', 'RETAIL TRANSFERS', and
'WAREHOUSE SALES', ensuring that these variables are on a
consistent scale between 0 and 1. This scaling technique
preserves the relative relationships between the features and
prepares the data for clustering analysis, enhancing the accuracy
of our results and facilitating meaningful comparisons across
different sales metrics.
CLUSTERING: K-Means Algorithm
K-means is a centroid-based algorithm or a
distance-based algorithm, where we
calculate the distances to assign a point to
a cluster. In K-Means, each cluster is
associated with a centroid. The main
objective of the K-Means algorithm is to
minimize the sum of distances between
the points and their respective cluster
centroid.
To perform K-means clustering, we must
first specify the desired number of clusters
K using: Elbow Method and Silhouette
Method
Using the elbow plot, we obtained K value
= 4. Let's verify using the Silhouette score.
CLUSTERING: K-Means Algorithm
The Silhouette score is calculated for each
data point and then averaged across all
data points.
It measures how similar an object is to its
own cluster (cohesion) compared to other
clusters (separation).
It ranges from -1 to 1. A silhouette score
close to 1 indicates dense, well-separated
clusters, while a score close to -1 indicates
that the data point may have been assigned
to the wrong cluster.
Peak at 6 clusters: The highest silhouette score is at 6
clusters (0.9926370991740481), indicating that this
configuration is the best in terms of clustering quality
according to the silhouette score.
The silhouette score slightly decreases when moving to 7
clusters (0.9882040308746644), suggesting that adding
the seventh cluster doesn't improve the clustering.
Let’s choose K value=6 for clustering.
CLUSTERING: K-Means Algorithm
Clustering Evaluation Metrics:
After building the K means model and assigning the
cluster labels, along with the silhouette score we
also use Davies-Bouldin Index and Calinski-
Harabasz Index to assess the clustering results.
Lower DBI values indicate better clustering quality
as they suggest that clusters are compact and well-
separated.
Higher values of the Calinski-Harabasz index
indicate better-defined, more compact clusters.
Both metrics strongly suggest that the K-
means clustering on the dataset is highly
effective, resulting in distinct and
cohesive clusters. This indicates that
the choice of the number of clusters and
the algorithm’s performance on this
dataset are very good.
CLUSTERING: Mini Batch K-Means
Mini Batch K-Means clustering is a variation of
the traditional K-Means algorithm that processes
small batches of data at a time, making it
computationally efficient for large datasets.
The results suggest that for the given dataset, the
standard K-Means algorithm was able to identify
more well-defined, distinct, and compact clusters
compared to Mini Batch K-Means, as evidenced by
the higher Silhouette Score, lower Davies-Bouldin
Index, and higher Calinski-Harabasz Index.
CLUSTERING: Mean Shift Algorithm
Mean-shift clustering is a density-based clustering method that
focuses on finding the regions of high density and iteratively
shifting data points towards the highest density of points. The
algorithm does not require any prior information about the
number of clusters present in the data.
Mean Shift splits the LIQUOR and BEER
clusters into smaller sub-clusters, which might
be less desirable considering our objective to
group similar items.
Silhouette Score and Davies-Bouldin Index favor
K-Means, suggesting it creates more well-
defined and better-separated clusters. Calinski-
Harabasz Index slightly favors Mean Shift.
Overall, K-Means appears to be the better
clustering method based on these metrics and
Cluster Analysis
Based on the evaluation metrics, K-means demonstrated the best performance
among the clustering algorithms tested. Therefore, we will conduct our cluster
analysis using the results from the K-means algorithm.
Cluster 0 (WINE): Cluster 0 primarily consists of wine
products. This cluster likely represents customers with a
preference for wine products.
Cluster 1 (LIQUOR): Cluster 1 is characterized by liquor
products. This cluster may indicate a distinct group of
customers who prefer liquor items.
Cluster 2 (BEER): Cluster 2 primarily includes beer products.
This cluster likely represents customers who have a
preference for beer.
Cluster 3 (KEGS): Cluster 3 is associated with kegs. Kegs
are designed for larger quantities of beverages and are
typically purchased by commercial establishments like bars,
restaurants, and event organizers, rather than individual
customers directly.
Cluster Analysis
Cluster 4 (NON-ALCOHOL): Cluster 4 comprises non-alcoholic products.
This cluster may indicate customers who prefer non-alcoholic beverages
to avoid alcohol for health, religious, or personal reasons. This niche
market may have a smaller customer base compared to the overall
population.
Cluster 5 (STR_SUPPLIES, WINE TOOLS) The STR_SUPPLIES
category includes a variety of storage supplies, such as paper bags,
thermal register paper, and plastic bags, used for packaging and
transporting various products. STR_SUPPLIES category is more likely to
be used by retailers or establishments rather than individual customers,
given the nature of the products and their usage in packaging and
transporting various goods. The presence of wine tools alongside storage
supplies in Cluster 5 also indicate that these purchases are also made by
commercial entities. Businesses often require wine tools for serving and
preparing wine for their customers.
Cluster Analysis
 Beer has the highest total sales across retail, transfers, and warehouse at
7101470.19 units. This suggests beer is the top selling product category.
 Wine has second highest total sales at 1903693.02 units, with a significant
portion (1,156,984.91 units) coming from warehouse sales. This implies wine is a
popular product category for wholesale distribution.
 Liquor has the third highest total sales at 897599.52 units, but the majority of
these sales (802,693.25 units) are from retail stores rather than warehouses. This
indicates liquor is more popular in retail locations.
 Kegs have 118,430 units in warehouse sales but 0 retail sales, indicating they are
only sold wholesale and not directly to consumers
 Non-alcoholic beverages have relatively low total sales at 53299.9 units, suggesting
they are a minor product category compared to beer, liquor and wine.
Cluster Analysis
Clusters where the total sales are among the
top 25%, include cluster 0 [WINE] and cluster
2 [BEER] indicating high demand.
This insight is valuable for inventory
management, marketing strategies, and product
planning, as businesses can focus on stocking
and promoting beer and wine products to
capitalize on their popularity among consumers.
Let’s analyze seasonal decomposition plots for
RETAIL SALES, RETAIL TRANSFERS, and
WAREHOUSE SALES to understand seasonal
effects and overall trends in each cluster. These
insights will help us grasp the nuances driving our
sales patterns, aiding better decision-making and
strategy.
Cluster Analysis
Cluster 0 (WINE):
o Overall Trend: Stable retail sales and transfers; slight upward trend in warehouse sales.
o Seasonality: Peak sales in November and December, lower sales in January and February. Holiday
shopping drives increased demand.
Cluster 1 (LIQUOR):
o Overall Trend: Upward trends in retail, transfers, and warehouse sales.
o Seasonality: Fluctuations throughout the year; significant peaks in November and December during
holiday seasons, indicating higher demand.
Cluster 2 (BEER):
o Overall Trend: Mixed trends in retail, transfers, and warehouse sales.
o Seasonality: Peaks during specific months, possibly tied to seasonal events or consumer behaviors.
CLUSTER 3 (KEGS):
o Overall Trend: The overall trend suggests a focus on warehouse distribution for KEGS within this cluster,
highlighting a specific market approach for these products.
o Seasonality Warehouse sales for KEGS show fluctuations but generally remain at moderate levels.
Power BI Dashboard
Recommendations For Strategy
Optimization
Focus on Beer and Wine Products:
Beer and wine products consistently show high demand across retail, transfers, and warehouse sales.
Allocate more resources towards stocking and promoting these items to capitalize on their popularity
among consumers.
Seasonal Planning:
Adjust inventory levels based on seasonal trends observed in each cluster. For example, increase stock
of wine and liquor products leading up to the holiday season when demand peaks, while optimizing
inventory for non-alcoholic beverages during warmer months or alongside health-conscious trends.
Retail Strategy Alignment:
Tailor retail strategies to align with consumer behavior and preferences observed in each cluster. For
instance, offer promotions or themed events around wine and liquor during peak demand periods, while
emphasizing convenience and variety for non-alcoholic beverages.
Wholesale Distribution Optimization:
Optimize warehouse distribution channels based on product categories. Since kegs predominantly sell
through wholesale channels, streamline distribution processes to ensure efficient supply to commercial
establishments while monitoring trends to anticipate demand fluctuations.
Diversification:
Consider diversifying product offerings within clusters or specializing in certain categories based on
market demand and profitability. For instance, within the wine cluster, explore niche or premium wine
Recommendations For Strategy
Optimization
Data-Driven Decision Making:
Continuously analyze sales data and consumer trends to make informed inventory management
decisions. Utilize advanced analytics tools to forecast demand, identify emerging trends, and optimize
inventory levels to minimize stockouts and overstock situations.
Enhanced Marketing and Promotions:
Develop targeted marketing campaigns and promotions to drive sales for specific product categories
within each cluster. Leverage consumer insights to tailor messaging and offers that resonate with target
audiences, increasing engagement and conversion rates.
Supplier collaboration: Work closely with suppliers to establish flexible and responsive supply chains.
This can include sharing demand forecasts and collaborating on inventory planning to ensure timely
delivery without overstocking.
Regular inventory audits: Conduct regular audits to identify slow-moving or obsolete inventory and take
proactive measures to liquidate or reduce it. This prevents tying up capital in products that are unlikely to
be sold.
Efficient storage and warehouse management: Optimize warehouse layout and storage practices to
maximize space utilization and minimize holding costs.
CONCLUSION
Benefits
Recap of project objective:
Throughout this project, our primary objective was to leverage data analysis to
optimize inventory management processes and tailor our product offerings to
better align with customer preferences.
Achievements:
 Identified high-demand product categories (beer and wine).
 Determined seasonal trends and their impact on sales.
 Developed actionable insights for inventory management and retail strategies.

Product Cluster Analysis: Unveiling Hidden Customer Preferences

  • 1.
    Product Cluster Analysis Presented by: AKSHITHA RAI Leveraging Data for Competitive Edge
  • 2.
    PROJECT DOMAIN Retail andWholesale Distribution  Overview of the Industry: The retail and wholesale distribution sector plays a pivotal role in the economy, connecting manufacturers with consumers, encompassing businesses that sell and distribute goods directly to end-users.  Challenges and Opportunities: Businesses face diverse challenges including inventory management, including supply chain complexities, changing consumer preferences, and market competition. However, this project is specifically geared towards addressing inventory management challenges, such as stockouts and overstocking.
  • 3.
    What is InventoryManagement? There are three types of inventory:  Raw Materials are stock used to make an end product.  Work in Process consists of the raw materials that are being made into finished goods.  Finished Goods are the final products that get produced for sale to consumers. Inventory management tracks how much physical inventory you have in your organization. It monitors stock at other locations, such as distributors or subcontractors. When you have clear visibility into your inventory, you know when to order, where to store it, and when you need to stop selling.
  • 4.
    PROJECT OVERVIEW Primary Objective Benefits Groupsimilar products based on sales patterns to optimize inventory management and tailor offerings to better align with customer preferences.  Streamlined Inventory Management: Clustering similar products for optimal stock levels. By understanding the specific needs of each product cluster.  Tailored Product Offerings: Develop targeted promotional strategies and dynamic pricing models for each product category.  Reduced Costs: Minimizing stock-outs or overstocking can lead to significant cost savings in warehousing, transportation, and handling.
  • 5.
    DATA DEFINITION Warehouse andRetail Sales dataset: Data Columns/Features given:  YEAR: Calendar Year  MONTH: Month  SUPPLIER: Supplier Name  ITEM CODE: Item code  ITEM DESCRIPTION: Item Description  ITEM TYPE: Item Type  RETAIL SALES: Cases of product sold from DLC dispensaries  RETAIL TRANSFERS: Cases of product transferred to DLC dispensaries  WAREHOUSE SALES: Cases of product sold to MC licensees Total Entries: 307,645 Columns: 9 “Cases" represent a standard packaging unit, such as a case of 12 bottles or a case of 24 units, depending on the specific products and Alcohol Beverage Services, previously known as the Department of Liquor Control (DLC) is a government agency within the County of Montgomery, Maryland and is the wholesaler of beer, wine and alcoholic beverages.
  • 6.
    01 - DataCleaning 02 - Data Exploration Methodology Overview 04 - Feature Scaling 06 - Evaluate the cluster 07 - 07- Clustering Analysis 03 - Data Pre-processing 05 - Apply ML Algorithms
  • 7.
    Data Cleaning Summary Missing Values:  There are missing values in the "SUPPLIER" column (167 missing values), which account for only 0.0543% of the values in the "Supplier" column, hence can be dropped.  Rows with missing values in the "ITEM TYPE" column (1 value) will also be dropped.  Duplicates:  Total duplicates found: 0.  After Data Cleaning: After handling missing values and removing duplicates, the "df_clean" DataFrame contains 307,477 entries with 9 columns.
  • 8.
    Data Exploration 9 ITEM TYPES 396 SUPPLIERS 34039 TOTAL UNIQUE PRODUCTS UniqueItem Types and Frequencies: WINE: 187,640 LIQUOR: 64,910 BEER: 42,413 KEGS: 10,146 NON-ALCOHOL: 1,899 STR_SUPPLIES: 318 REF: 79 DUNNAGE: 72  The dataset contains a variety of item types, with wine being the most prevalent, followed by liquor and beer. Each unique item code corresponds to a distinct product, showcasing the wide variety of products available in the dataset.  The dataset comprises a diverse range of suppliers, reflecting a broad network of sources for the products included.  Among all the item types we have, the 'REF' category remains somewhat ambiguous in its definition. Let's delve deeper into its contents to shed light on the nature of items it encompasses.
  • 9.
    Data Exploration  The"Store Special Wine," "Store Special Beer Quart," and "Store Special Liquor" items within the REF category likely represent specific promotional or discounted offerings for these beverage types.  Moving these items from the REF category to their respective broader categories (WINE, LIQUOR, and BEER) could provide a clearer representation of sales within each beverage type.  "Corkscrew" and "Wine Aerator-in Bottle" are accessories or tools related to wine consumption rather than actual alcoholic beverages. So let's rename 'REF' to "Wine Tools".
  • 10.
    Data Exploration Wine emerges as the most prevalent item type in the dataset, with 187,679 occurrences, indicating its significant presence in sales data.  Liquor follows closely behind wine, with 64,911 occurrences, suggesting its substantial contribution to overall sales volume and consumer preference.  Beer ranks third in frequency, with 42,422 occurrences, highlighting a notable consumer preference for beer products.
  • 11.
    Data Exploration The top suppliers predominantly belong to the beer industry, including renowned brands such as Miller Brewing Company, Anheuser Busch Inc, and Heineken USA. This suggests that beer products play a significant role in driving sales for the business.  There is also representation from other beverage categories, such as wine and spirits. E & J Gallo Winery and Diageo North America Inc are notable suppliers of wine and spirits, contributing to the diversity of product offerings.
  • 12.
    Data Exploration  Thecorrelation coefficient between retail sales and warehouse sales is approximately 0.501. This indicates a moderate positive correlation between the two variables but the relationship is not extremely strong.
  • 13.
    Data Exploration-RETAIL SALES  Liquorproducts (whiskey, brandy, vodka, rum, gin and tequila) demonstrate the highest retail sales, suggesting a considerable demand, potentially driven by factors like taste preferences or social trends.  Wine and beer closely trail liquor in retail sales, indicating their enduring popularity and consumer appeal within the market.  Non-alcoholic beverages make a modest contribution to retail sales, reflecting a segment of the market that caters to consumers seeking alternatives to alcoholic beverages, perhaps driven by health-conscious choices or personal preferences. "RETAIL SALES" typically refer to the cases of products sold directly to customers from DLC (Department of Liquor Control) dispensaries or retail outlets.
  • 14.
    Data Exploration- RETAIL TRANSFER The analysis highlights the varying degrees of demand across different beverage categories, with alcoholic beverages such as liquor, wine, and beer leading in terms of both retail transfers and sales.  Non-alcoholic beverages, although contributing less to total retail transfers and sales compared to alcoholic beverages, still show notable demand. Let's examine the items under STR_SUPPLIES category to understand why there is a surge in retail transfers compared to retail sales for STR supplies .
  • 15.
    Data Exploration These observationssuggest a diverse range of products being transferred, with a notable emphasis on packaging materials like paper bags and thermal register paper.  Paper bags in various sizes (12LB, 20LB, quarts, pints, 1/6 barrel) are the top-selling items in terms of retail transfers, indicating a significant demand for packaging materials. Using paper bags with branded logos or designs can contribute to a store's branding and image.  Single bottle wine gift totes are also popular, suggesting a preference for gifting wine.  Thermal register paper is also among the top- selling items, it plays a crucial role in retail operations by facilitating efficient and reliable receipt printing at the point of sale.  Other items like shot glasses, plastic bags, wine
  • 16.
    Data Exploration-Warehouse SALES  Beeremerges as the top-performing item type in terms of warehouse sales. This dominance may be attributed to several factors, including its popularity among consumers, diverse product offerings, and its relatively longer shelf life compared to some other alcoholic beverages.  Despite being second to beer, wine still makes a substantial contribution to warehouse sales.  KEGS demonstrate considerable demand and play a crucial role in the distribution of draft beer to bars, restaurants, and other establishments. "WAREHOUSE SALES" refer to cases of products sold to MC (Montgomery County) licensees, which are establishments authorized to sell alcohol for consumption on their premises. These sales are typically made in bulk to businesses such as bars, restaurants, clubs, and hotels for resale to their customers.
  • 17.
    Data Exploration  Theseitem descriptions for 'DUNNAGE' suggest that they represent various sizes of empty kegs used for storing and dispensing beverages. In inventory management, these items would be categorized as dunnage, which refers to materials used for packaging, storing, or transporting goods to prevent damage.  The negative values in the "WAREHOUSE SALES" column for dunnage items indicate a decrease in inventory, likely due to returns, exchanges, or damage to the empty kegs.  Since dunnage items have zero retail sales and retail transfers, they primarily represent inventory management for kegs, it may be appropriate to remove them from the ITEM TYPES before applying clustering algorithms, as they do not contribute to the primary objectives of analyzing sales patterns and optimizing inventory management.
  • 18.
    Data Preprocessing  Let'sdrop the ITEM DESCRIPTION, SUPPLIER, and ITEM CODE columns as they are categorical and may not directly contribute to the clustering process.  Let's also drop the 'YEAR' and 'MONTH' columns as the given data is not evenly distributed across 2017, 2018, 2019, and 2020. Uneven distribution of data across years and months can introduce noise and bias into the clustering process, potentially leading to misleading insights.  To prepare the data for clustering analysis, we performed dummy encoding on the categorical 'ITEM TYPE' column using the pd.get_dummies() function. The dummy encoded columns allow the inclusion of categorical information (item types) in the numerical data, ensuring that this
  • 19.
    Feature Scaling  Sinceclustering algorithms are sensitive to the scale of the features, it's essential to scale the numerical features before applying clustering algorithms.  We applied MinMax scaling to normalize the numerical features, including 'RETAIL SALES', 'RETAIL TRANSFERS', and 'WAREHOUSE SALES', ensuring that these variables are on a consistent scale between 0 and 1. This scaling technique preserves the relative relationships between the features and prepares the data for clustering analysis, enhancing the accuracy of our results and facilitating meaningful comparisons across different sales metrics.
  • 20.
    CLUSTERING: K-Means Algorithm K-meansis a centroid-based algorithm or a distance-based algorithm, where we calculate the distances to assign a point to a cluster. In K-Means, each cluster is associated with a centroid. The main objective of the K-Means algorithm is to minimize the sum of distances between the points and their respective cluster centroid. To perform K-means clustering, we must first specify the desired number of clusters K using: Elbow Method and Silhouette Method Using the elbow plot, we obtained K value = 4. Let's verify using the Silhouette score.
  • 21.
    CLUSTERING: K-Means Algorithm TheSilhouette score is calculated for each data point and then averaged across all data points. It measures how similar an object is to its own cluster (cohesion) compared to other clusters (separation). It ranges from -1 to 1. A silhouette score close to 1 indicates dense, well-separated clusters, while a score close to -1 indicates that the data point may have been assigned to the wrong cluster. Peak at 6 clusters: The highest silhouette score is at 6 clusters (0.9926370991740481), indicating that this configuration is the best in terms of clustering quality according to the silhouette score. The silhouette score slightly decreases when moving to 7 clusters (0.9882040308746644), suggesting that adding the seventh cluster doesn't improve the clustering. Let’s choose K value=6 for clustering.
  • 22.
    CLUSTERING: K-Means Algorithm ClusteringEvaluation Metrics: After building the K means model and assigning the cluster labels, along with the silhouette score we also use Davies-Bouldin Index and Calinski- Harabasz Index to assess the clustering results. Lower DBI values indicate better clustering quality as they suggest that clusters are compact and well- separated. Higher values of the Calinski-Harabasz index indicate better-defined, more compact clusters. Both metrics strongly suggest that the K- means clustering on the dataset is highly effective, resulting in distinct and cohesive clusters. This indicates that the choice of the number of clusters and the algorithm’s performance on this dataset are very good.
  • 23.
    CLUSTERING: Mini BatchK-Means Mini Batch K-Means clustering is a variation of the traditional K-Means algorithm that processes small batches of data at a time, making it computationally efficient for large datasets. The results suggest that for the given dataset, the standard K-Means algorithm was able to identify more well-defined, distinct, and compact clusters compared to Mini Batch K-Means, as evidenced by the higher Silhouette Score, lower Davies-Bouldin Index, and higher Calinski-Harabasz Index.
  • 24.
    CLUSTERING: Mean ShiftAlgorithm Mean-shift clustering is a density-based clustering method that focuses on finding the regions of high density and iteratively shifting data points towards the highest density of points. The algorithm does not require any prior information about the number of clusters present in the data. Mean Shift splits the LIQUOR and BEER clusters into smaller sub-clusters, which might be less desirable considering our objective to group similar items. Silhouette Score and Davies-Bouldin Index favor K-Means, suggesting it creates more well- defined and better-separated clusters. Calinski- Harabasz Index slightly favors Mean Shift. Overall, K-Means appears to be the better clustering method based on these metrics and
  • 25.
    Cluster Analysis Based onthe evaluation metrics, K-means demonstrated the best performance among the clustering algorithms tested. Therefore, we will conduct our cluster analysis using the results from the K-means algorithm. Cluster 0 (WINE): Cluster 0 primarily consists of wine products. This cluster likely represents customers with a preference for wine products. Cluster 1 (LIQUOR): Cluster 1 is characterized by liquor products. This cluster may indicate a distinct group of customers who prefer liquor items. Cluster 2 (BEER): Cluster 2 primarily includes beer products. This cluster likely represents customers who have a preference for beer. Cluster 3 (KEGS): Cluster 3 is associated with kegs. Kegs are designed for larger quantities of beverages and are typically purchased by commercial establishments like bars, restaurants, and event organizers, rather than individual customers directly.
  • 26.
    Cluster Analysis Cluster 4(NON-ALCOHOL): Cluster 4 comprises non-alcoholic products. This cluster may indicate customers who prefer non-alcoholic beverages to avoid alcohol for health, religious, or personal reasons. This niche market may have a smaller customer base compared to the overall population. Cluster 5 (STR_SUPPLIES, WINE TOOLS) The STR_SUPPLIES category includes a variety of storage supplies, such as paper bags, thermal register paper, and plastic bags, used for packaging and transporting various products. STR_SUPPLIES category is more likely to be used by retailers or establishments rather than individual customers, given the nature of the products and their usage in packaging and transporting various goods. The presence of wine tools alongside storage supplies in Cluster 5 also indicate that these purchases are also made by commercial entities. Businesses often require wine tools for serving and preparing wine for their customers.
  • 27.
    Cluster Analysis  Beerhas the highest total sales across retail, transfers, and warehouse at 7101470.19 units. This suggests beer is the top selling product category.  Wine has second highest total sales at 1903693.02 units, with a significant portion (1,156,984.91 units) coming from warehouse sales. This implies wine is a popular product category for wholesale distribution.  Liquor has the third highest total sales at 897599.52 units, but the majority of these sales (802,693.25 units) are from retail stores rather than warehouses. This indicates liquor is more popular in retail locations.  Kegs have 118,430 units in warehouse sales but 0 retail sales, indicating they are only sold wholesale and not directly to consumers  Non-alcoholic beverages have relatively low total sales at 53299.9 units, suggesting they are a minor product category compared to beer, liquor and wine.
  • 28.
    Cluster Analysis Clusters wherethe total sales are among the top 25%, include cluster 0 [WINE] and cluster 2 [BEER] indicating high demand. This insight is valuable for inventory management, marketing strategies, and product planning, as businesses can focus on stocking and promoting beer and wine products to capitalize on their popularity among consumers. Let’s analyze seasonal decomposition plots for RETAIL SALES, RETAIL TRANSFERS, and WAREHOUSE SALES to understand seasonal effects and overall trends in each cluster. These insights will help us grasp the nuances driving our sales patterns, aiding better decision-making and strategy.
  • 29.
    Cluster Analysis Cluster 0(WINE): o Overall Trend: Stable retail sales and transfers; slight upward trend in warehouse sales. o Seasonality: Peak sales in November and December, lower sales in January and February. Holiday shopping drives increased demand. Cluster 1 (LIQUOR): o Overall Trend: Upward trends in retail, transfers, and warehouse sales. o Seasonality: Fluctuations throughout the year; significant peaks in November and December during holiday seasons, indicating higher demand. Cluster 2 (BEER): o Overall Trend: Mixed trends in retail, transfers, and warehouse sales. o Seasonality: Peaks during specific months, possibly tied to seasonal events or consumer behaviors. CLUSTER 3 (KEGS): o Overall Trend: The overall trend suggests a focus on warehouse distribution for KEGS within this cluster, highlighting a specific market approach for these products. o Seasonality Warehouse sales for KEGS show fluctuations but generally remain at moderate levels.
  • 30.
  • 31.
    Recommendations For Strategy Optimization Focuson Beer and Wine Products: Beer and wine products consistently show high demand across retail, transfers, and warehouse sales. Allocate more resources towards stocking and promoting these items to capitalize on their popularity among consumers. Seasonal Planning: Adjust inventory levels based on seasonal trends observed in each cluster. For example, increase stock of wine and liquor products leading up to the holiday season when demand peaks, while optimizing inventory for non-alcoholic beverages during warmer months or alongside health-conscious trends. Retail Strategy Alignment: Tailor retail strategies to align with consumer behavior and preferences observed in each cluster. For instance, offer promotions or themed events around wine and liquor during peak demand periods, while emphasizing convenience and variety for non-alcoholic beverages. Wholesale Distribution Optimization: Optimize warehouse distribution channels based on product categories. Since kegs predominantly sell through wholesale channels, streamline distribution processes to ensure efficient supply to commercial establishments while monitoring trends to anticipate demand fluctuations. Diversification: Consider diversifying product offerings within clusters or specializing in certain categories based on market demand and profitability. For instance, within the wine cluster, explore niche or premium wine
  • 32.
    Recommendations For Strategy Optimization Data-DrivenDecision Making: Continuously analyze sales data and consumer trends to make informed inventory management decisions. Utilize advanced analytics tools to forecast demand, identify emerging trends, and optimize inventory levels to minimize stockouts and overstock situations. Enhanced Marketing and Promotions: Develop targeted marketing campaigns and promotions to drive sales for specific product categories within each cluster. Leverage consumer insights to tailor messaging and offers that resonate with target audiences, increasing engagement and conversion rates. Supplier collaboration: Work closely with suppliers to establish flexible and responsive supply chains. This can include sharing demand forecasts and collaborating on inventory planning to ensure timely delivery without overstocking. Regular inventory audits: Conduct regular audits to identify slow-moving or obsolete inventory and take proactive measures to liquidate or reduce it. This prevents tying up capital in products that are unlikely to be sold. Efficient storage and warehouse management: Optimize warehouse layout and storage practices to maximize space utilization and minimize holding costs.
  • 33.
    CONCLUSION Benefits Recap of projectobjective: Throughout this project, our primary objective was to leverage data analysis to optimize inventory management processes and tailor our product offerings to better align with customer preferences. Achievements:  Identified high-demand product categories (beer and wine).  Determined seasonal trends and their impact on sales.  Developed actionable insights for inventory management and retail strategies.