Market Basket Analysis Project
Milestone 1
R.Vinothkumar
2
A Grocery Store shared the transactional data with you. Your job is to identify the most
popular combos that can be suggested to the Grocery Store chain after a thorough analysis of
the most commonly occurring sets of items in the customer orders. The Store doesn’t have
any combo offers. Can you suggest the best combos & offers?
We aim to analyze the association rules to suggest the best combo and offers for the
Grocery Store chain using Market Basket Analysis.
Problem Statement
Data Information
• Data Contains 20,642 rows and 3
variables.
• Variables are Date, Order id and Product.
• 37 Products are available for the analysis
• Order id and Product are denormalized,
which helps to find the association of the
products among the ordered products
Statistical Talk
• More number of Orders placed on 8-
Feb-2019.
• Poultry Product is most sold product
which is orders 480 times & hand soaps
are least with 394 times.
• Order id 1071 – placed a greater number
of products
3
Data Statistics
4
Product Grouping
Pro du c t Grou pin g is esse nti al wh en plac in g a p rod u c t in a sto re.
Sale by product category We grouped the Products as Meat, Household, Groceries,
Packaging, Milk products, Vegetables & Fruits, Drinks & Breads
• Grocery products are sold more than
others.
• Vegetables with least.
5
Product Category
FR
6
Periodical Analysis of Sales
Year Total Order Top Months in Sales
2018 533 May
2019 507 May
2020-Upto Feb 99 Jan
• The Periodical trend
shows that’s sales are
not consistent across
the period and there
is no Trend.
FR
7
Sales across Calendar
• Most Orders are placed on
Sundays
Most Sold Products by Days
• Sunday - Waffles
• Monday - Poultry
• Tuesday - Bagels
• Wednesday - soda
• Thursday - Mixes
• Friday - Poultry
• Saturday - Poultry
FR
8
Market basket analysis
• Market Basket Analysis is a technique which identifies the strength of association
between pairs of products purchased together and identify patterns of co-
occurrence.
• The technique determines relationships of what products were purchased with
which other products.
• The association rule has three measures that express the degree of confidence
in the rule,
• Support - Popularity of an item, nothing but the most ordered product. Like Poultry items in our dataset
• Confidence - Chances of purchasing other product along with Poultry.
• Lift – Increase in sale of poultry if they sold with other product
FR
9
Implementation of Market Basket Analysis
• KNIME is a multi-purpose tool which act as ETL and Model building components.
Following components are used to extract the association rules among the product
sales.
• CSV reader – To read the data from CSV
• Group by – As the data is in denormalized form, we group the products based on ordered.
• Cell splitter - As all the products are merged into a single column, we use cell splitter to split the
products with delimiters
• Association rule learner – We configure the support & minimum confidence
• Split column collection – Helps to split the columns with collection of cells
• Column aggregator - Aggregate the cells in the column
• Column Filter – Filter only the required column from the input
• CSV Writer – Write the Output to a CSV file
FR
10
Workflow Configuration on Association Rules
• Association rule learner helps to identify
the relationship between the product
purchase pattern. Based on support and
confidence.
• We set the minimum support for a
product is 5%.
• Minimum confidence threshold is 55%.
• Based on the above threshold
configuration, rule learner generated 83
sets of Rules
FR
11
Association Rule Outcome
• Total of 82 Rules are generated for the threshold we set
FR
12
Association Rule Outcome
• These ruleset are the actionable items for the store manager to attract the
customers.
• The Rules generated will have the minimum support/ frequently purchased must be
5% of all the purchases.
• With the confidence level of 55%.
FR
13
Top Associated Products
FR
14
Top Associated Products
• If antecedent bagels, cereals & Sandwich bags the consequent is Cheese,
which greater Lift percentage among other relationship.
• Like we can pack a items with higher lift across all the consequent which
improves the sales.
• Association Rule gives many options to pack based on the lift values.
• The store manger can try the different combinations to fix the best among all.
• The Most sold product, Poultry has a greater number of combinations.
FR
15
FR
16
Recommendation
• Upon the Association Rules listed, We can pick the higher value combination to
recommend the product.
• In some cases of the rule, the consequent may be different product category like
washing liquids, dinner table along with diary products. If the Grocery shop had
online option, it is ok to list in the recommended products.
• If the grocery shop has only the walk-in shops, then we need to be careful in placing
the consequent along with antecedents. In that case following placing options can
be considered.
1. Placing the consequent near the Billing counter.
2. Placing the recommended products every time next to each section of the products.
3. Store Designer can be recruited to have innovative placing of the products.
17
Higher Consequents
• Poultry has More combinations.
• Dinner Rolls are the frequent
partners with other products, which
includes the Diary & Household
items
Thank You.

572608301-Capstone-2-Market-Basket-Analysis-Vinothkumar-R-1.pptx

  • 1.
    Market Basket AnalysisProject Milestone 1 R.Vinothkumar
  • 2.
    2 A Grocery Storeshared the transactional data with you. Your job is to identify the most popular combos that can be suggested to the Grocery Store chain after a thorough analysis of the most commonly occurring sets of items in the customer orders. The Store doesn’t have any combo offers. Can you suggest the best combos & offers? We aim to analyze the association rules to suggest the best combo and offers for the Grocery Store chain using Market Basket Analysis. Problem Statement
  • 3.
    Data Information • DataContains 20,642 rows and 3 variables. • Variables are Date, Order id and Product. • 37 Products are available for the analysis • Order id and Product are denormalized, which helps to find the association of the products among the ordered products Statistical Talk • More number of Orders placed on 8- Feb-2019. • Poultry Product is most sold product which is orders 480 times & hand soaps are least with 394 times. • Order id 1071 – placed a greater number of products 3 Data Statistics
  • 4.
    4 Product Grouping Pro duc t Grou pin g is esse nti al wh en plac in g a p rod u c t in a sto re.
  • 5.
    Sale by productcategory We grouped the Products as Meat, Household, Groceries, Packaging, Milk products, Vegetables & Fruits, Drinks & Breads • Grocery products are sold more than others. • Vegetables with least. 5 Product Category
  • 6.
    FR 6 Periodical Analysis ofSales Year Total Order Top Months in Sales 2018 533 May 2019 507 May 2020-Upto Feb 99 Jan • The Periodical trend shows that’s sales are not consistent across the period and there is no Trend.
  • 7.
    FR 7 Sales across Calendar •Most Orders are placed on Sundays Most Sold Products by Days • Sunday - Waffles • Monday - Poultry • Tuesday - Bagels • Wednesday - soda • Thursday - Mixes • Friday - Poultry • Saturday - Poultry
  • 8.
    FR 8 Market basket analysis •Market Basket Analysis is a technique which identifies the strength of association between pairs of products purchased together and identify patterns of co- occurrence. • The technique determines relationships of what products were purchased with which other products. • The association rule has three measures that express the degree of confidence in the rule, • Support - Popularity of an item, nothing but the most ordered product. Like Poultry items in our dataset • Confidence - Chances of purchasing other product along with Poultry. • Lift – Increase in sale of poultry if they sold with other product
  • 9.
    FR 9 Implementation of MarketBasket Analysis • KNIME is a multi-purpose tool which act as ETL and Model building components. Following components are used to extract the association rules among the product sales. • CSV reader – To read the data from CSV • Group by – As the data is in denormalized form, we group the products based on ordered. • Cell splitter - As all the products are merged into a single column, we use cell splitter to split the products with delimiters • Association rule learner – We configure the support & minimum confidence • Split column collection – Helps to split the columns with collection of cells • Column aggregator - Aggregate the cells in the column • Column Filter – Filter only the required column from the input • CSV Writer – Write the Output to a CSV file
  • 10.
    FR 10 Workflow Configuration onAssociation Rules • Association rule learner helps to identify the relationship between the product purchase pattern. Based on support and confidence. • We set the minimum support for a product is 5%. • Minimum confidence threshold is 55%. • Based on the above threshold configuration, rule learner generated 83 sets of Rules
  • 11.
    FR 11 Association Rule Outcome •Total of 82 Rules are generated for the threshold we set
  • 12.
    FR 12 Association Rule Outcome •These ruleset are the actionable items for the store manager to attract the customers. • The Rules generated will have the minimum support/ frequently purchased must be 5% of all the purchases. • With the confidence level of 55%.
  • 13.
  • 14.
    FR 14 Top Associated Products •If antecedent bagels, cereals & Sandwich bags the consequent is Cheese, which greater Lift percentage among other relationship. • Like we can pack a items with higher lift across all the consequent which improves the sales. • Association Rule gives many options to pack based on the lift values. • The store manger can try the different combinations to fix the best among all. • The Most sold product, Poultry has a greater number of combinations.
  • 15.
  • 16.
    FR 16 Recommendation • Upon theAssociation Rules listed, We can pick the higher value combination to recommend the product. • In some cases of the rule, the consequent may be different product category like washing liquids, dinner table along with diary products. If the Grocery shop had online option, it is ok to list in the recommended products. • If the grocery shop has only the walk-in shops, then we need to be careful in placing the consequent along with antecedents. In that case following placing options can be considered. 1. Placing the consequent near the Billing counter. 2. Placing the recommended products every time next to each section of the products. 3. Store Designer can be recruited to have innovative placing of the products.
  • 17.
    17 Higher Consequents • Poultryhas More combinations. • Dinner Rolls are the frequent partners with other products, which includes the Diary & Household items
  • 18.