Enterprise Systems Engineering Project
Data Mining in Market Basket Analysis
Tanmayee Mandala, Lubna Tarawneh, Nikhil Mohkhedkar &
Prathamesh Kulkarni
State University of New York at Binghamton
December 4th, 2019
2Binghamton University | December, 2019
Agenda
 Introduction
 Problem Description
 Objectives
 Data Collection
 Methods
 Results
 Decision Making
 Implementing DSS
 Conclusion
3Binghamton University | December, 2019
Market Basket Analysis
 Fundamental technique used by retailers to uncover association between items
 Used to analyze the customer purchasing behavior and helps in increasing the
sales and maintaining inventory by focusing on point of sale transactional data
 Provides insight into which products tend to be purchased together and which
are most amenable to promotion
4Binghamton University | December, 2019
Data Mining in Market Basket Analysis
 Process of extracting patterns (knowledge) from data
 Helps in predicting future trends
 Association rules: if a customer buys product 1 then it is expected that he
buys product 2
 Trivial rules: people who buy chalk-piece also buy duster
 Inexplicable rules: people who buy mobile also buy bag
 Actionable rules: must satisfy a minimum support and
confidence threshold to be considered
5Binghamton University | December, 2019
Problem Description
 Large number of people visit retail stores each day
 Huge amount of data is collected and stored in large transactional database
 Transactions can be used to help managers make important decisions to
increase their sales figures
 Cross selling of items
 Proper placement of frequently purchased and EDLP items
 Change store layout
6Binghamton University | December, 2019
Objectives
 Understand customer behavior
 Gain insights of large transactional data by applying data mining techniques
 Find frequently purchased item sets and patterns from transactions
 Proper placement of frequent and EDLP items
 Dynamic visualization of results
 Support the manager by making an effective DSS to help with decision-
making process
7Binghamton University | December, 2019
Data Collection
 Online data for small retail store
 One week of data
 3 data sets collected (7,000+ customers orders)
 60 different products bought
Basket Product 1 Product 2 Product 3 Product 4
1 Frozen Vegetables Spaghetti Green Tea -
2 Burgers French Fries Low Fat Yogurt Green Tea
3 Asparagus Salad - -
4 Ground Beef Pepper Mineral Water Chocolate
5 Butter Light Mayo Fresh Bread -
6 Pancakes Eggs - -
7 Frozen Vegetables Chicken Pancakes -
8Binghamton University | December, 2019
Assumptions
 Only grocery section of a small retail store is considered when implementing
DSS
 Different product sections not considered when placing the products in aisles
 Sales data not available and is randomly generated
 Results based on small set of data, when using bigger set of data the results
would be more accurate and can give better insights
9Binghamton University | December, 2019
Methodology
Apriori Algorithm: most established algorithm for finding frequent item sets
Scan the transaction data base to get the support
of S each 1-itemset, compare S with min_sup,
and get a support of 1-itemsets, L1
Use L(k-1) join L(k-1) to generate a set of
candidate k-itemsets. And use Apriori property to
prune the unfrequented k-itemsets from this set.
STEP 1
STEP 2
For every nonempty subset s of 1, output the rule
“s=> (1-s)” if confidence C of the rule “s =>
(1-s)” (=support s of 1/support S of s)’min_conf
STEP 6
Scan the transaction database to get the support
S of each candidate k-itemset in the find set,
compare S with min_sup, and get a set of
frequent k – itemsets L(k)
STEP 3
The candidate
set = Null
STEP 4
For each frequent itemset 1, generate all
nonempty subsets of 1
STEP 5
NO
YES
10Binghamton University | December, 2019
Results of Apriori Algorithm
 Association rules obtained
 Bigger set of data can give more accurate rules
 Knowledge is extracted
 Rules obtained:
 ['pasta']  ['Almonds']
 ['Sugar’]  ['Coffee']
 ['mushroom cream sauce’]  ['escalope']
 ['herb & pepper'] ['noodles']
 ['tomato sauce']  ['noodles']
 ['whole wheat pasta'] ['olive oil']
Sugar
Coffee
11Binghamton University | December, 2019
Decision Making
 Put the products that are purchased together close to one another
 Help customers to notice them
 Increase the revenues of these products
 Improve the customers shopping experience
 Put EDLP items near frequently purchased items
 EDLP items account for high percentage of sales
 Making them more visible to customers can increase their sales
 Identify the most selling item to decide the location of advertisement
12Binghamton University | December, 2019
Implementing DSS: Dynamic Aisle Demo in Python
13Binghamton University | December, 2019
Dashboard Using Power BI (1/3)
14Binghamton University | December, 2019
Dashboard Using Power BI (2/3)
15Binghamton University | December, 2019
Dashboard Using Power BI (3/3)
16Binghamton University | December, 2019
Conclusion and Future Work
 In addition to its popularity as a retailers technique Market Basket Analysis
can be applicable in many other areas:
 Manufacturing industry for predictive analysis of equipment failure
 Pharmaceutical for the discovery of co-occurrence relationships among diagnosis
and pharmaceutical active ingredients prescribed to different patient groups
 Banking for fraud detection based on credit card usage data
 Analyzing customer behavior by associating purchases with demographic and
socio-economic data
 More and more organizations are discovering ways of using market basket
analysis to gain useful insights into associations and hidden relationships
17Binghamton University | December, 2019

Data mining in market basket analysis

  • 1.
    Enterprise Systems EngineeringProject Data Mining in Market Basket Analysis Tanmayee Mandala, Lubna Tarawneh, Nikhil Mohkhedkar & Prathamesh Kulkarni State University of New York at Binghamton December 4th, 2019
  • 2.
    2Binghamton University |December, 2019 Agenda  Introduction  Problem Description  Objectives  Data Collection  Methods  Results  Decision Making  Implementing DSS  Conclusion
  • 3.
    3Binghamton University |December, 2019 Market Basket Analysis  Fundamental technique used by retailers to uncover association between items  Used to analyze the customer purchasing behavior and helps in increasing the sales and maintaining inventory by focusing on point of sale transactional data  Provides insight into which products tend to be purchased together and which are most amenable to promotion
  • 4.
    4Binghamton University |December, 2019 Data Mining in Market Basket Analysis  Process of extracting patterns (knowledge) from data  Helps in predicting future trends  Association rules: if a customer buys product 1 then it is expected that he buys product 2  Trivial rules: people who buy chalk-piece also buy duster  Inexplicable rules: people who buy mobile also buy bag  Actionable rules: must satisfy a minimum support and confidence threshold to be considered
  • 5.
    5Binghamton University |December, 2019 Problem Description  Large number of people visit retail stores each day  Huge amount of data is collected and stored in large transactional database  Transactions can be used to help managers make important decisions to increase their sales figures  Cross selling of items  Proper placement of frequently purchased and EDLP items  Change store layout
  • 6.
    6Binghamton University |December, 2019 Objectives  Understand customer behavior  Gain insights of large transactional data by applying data mining techniques  Find frequently purchased item sets and patterns from transactions  Proper placement of frequent and EDLP items  Dynamic visualization of results  Support the manager by making an effective DSS to help with decision- making process
  • 7.
    7Binghamton University |December, 2019 Data Collection  Online data for small retail store  One week of data  3 data sets collected (7,000+ customers orders)  60 different products bought Basket Product 1 Product 2 Product 3 Product 4 1 Frozen Vegetables Spaghetti Green Tea - 2 Burgers French Fries Low Fat Yogurt Green Tea 3 Asparagus Salad - - 4 Ground Beef Pepper Mineral Water Chocolate 5 Butter Light Mayo Fresh Bread - 6 Pancakes Eggs - - 7 Frozen Vegetables Chicken Pancakes -
  • 8.
    8Binghamton University |December, 2019 Assumptions  Only grocery section of a small retail store is considered when implementing DSS  Different product sections not considered when placing the products in aisles  Sales data not available and is randomly generated  Results based on small set of data, when using bigger set of data the results would be more accurate and can give better insights
  • 9.
    9Binghamton University |December, 2019 Methodology Apriori Algorithm: most established algorithm for finding frequent item sets Scan the transaction data base to get the support of S each 1-itemset, compare S with min_sup, and get a support of 1-itemsets, L1 Use L(k-1) join L(k-1) to generate a set of candidate k-itemsets. And use Apriori property to prune the unfrequented k-itemsets from this set. STEP 1 STEP 2 For every nonempty subset s of 1, output the rule “s=> (1-s)” if confidence C of the rule “s => (1-s)” (=support s of 1/support S of s)’min_conf STEP 6 Scan the transaction database to get the support S of each candidate k-itemset in the find set, compare S with min_sup, and get a set of frequent k – itemsets L(k) STEP 3 The candidate set = Null STEP 4 For each frequent itemset 1, generate all nonempty subsets of 1 STEP 5 NO YES
  • 10.
    10Binghamton University |December, 2019 Results of Apriori Algorithm  Association rules obtained  Bigger set of data can give more accurate rules  Knowledge is extracted  Rules obtained:  ['pasta']  ['Almonds']  ['Sugar’]  ['Coffee']  ['mushroom cream sauce’]  ['escalope']  ['herb & pepper'] ['noodles']  ['tomato sauce']  ['noodles']  ['whole wheat pasta'] ['olive oil'] Sugar Coffee
  • 11.
    11Binghamton University |December, 2019 Decision Making  Put the products that are purchased together close to one another  Help customers to notice them  Increase the revenues of these products  Improve the customers shopping experience  Put EDLP items near frequently purchased items  EDLP items account for high percentage of sales  Making them more visible to customers can increase their sales  Identify the most selling item to decide the location of advertisement
  • 12.
    12Binghamton University |December, 2019 Implementing DSS: Dynamic Aisle Demo in Python
  • 13.
    13Binghamton University |December, 2019 Dashboard Using Power BI (1/3)
  • 14.
    14Binghamton University |December, 2019 Dashboard Using Power BI (2/3)
  • 15.
    15Binghamton University |December, 2019 Dashboard Using Power BI (3/3)
  • 16.
    16Binghamton University |December, 2019 Conclusion and Future Work  In addition to its popularity as a retailers technique Market Basket Analysis can be applicable in many other areas:  Manufacturing industry for predictive analysis of equipment failure  Pharmaceutical for the discovery of co-occurrence relationships among diagnosis and pharmaceutical active ingredients prescribed to different patient groups  Banking for fraud detection based on credit card usage data  Analyzing customer behavior by associating purchases with demographic and socio-economic data  More and more organizations are discovering ways of using market basket analysis to gain useful insights into associations and hidden relationships
  • 17.