2. Contents
• Data Science Objective
• About data
• EDA
• Data Preparation
• ML Approach
• Results
• Conclusion
3. Data Science Objective
• To understand the customer behaviour based on the previous purchase.
• To identify customers interest on specific product from various range of product
categories
• To construct a set of rules using Market Basket Analysis which will be helpful to increase
the sales.
• To identify a set of rules for decision making process to identify the frequent buyers of
products and the combination of the products. This will helpful to propose the customer
to buy the product based his previous purchase.
4. About Data
• The data has 44 columns and 29929 observations which represents order details,
customer details, shipping details, pricing details, item details, business unit, territory and
financial details.
• Columns are measured and available in both numeric and character formats.
• All columns are duly converted to needed formats namely., dates, binary responses,
nominal to numeric, etc.
• Found few columns with missing value percentage more than 50% have been excluded.
• Columns representing identification ones, Unique id are ignored for analysis.
• For market basket analysis customer number, order creation date, item description and
item number are relevant.
5. Exploratory Data Analysis
Product Purchased Together Count Product Purchased Together Count
One 9835 Seventeen 95
Two 7676 Eighteen 66
Three 6033 Nineteen 52
Four 4734 Twenty 38
Five 3729 Twenty One 29
Six 2874 Twenty Two 18
Seven 2229 Twenty Three 14
Eight 1684 Twenty Four 8
Nine 1246 Twenty Five 7
Ten 896 Twenty Six 7
Eleven 650 Twenty Seven 6
Twelve 468 Twenty Eight 5
Thirteen 351 Twenty Nine 4
Fourteen 273 Thirty 1
Fifteen 196 Thiry One 1
Sixteen 141 Thirty Two 1
• After
performing
necessary
cleaning, did a
basic EDA.
• Maximum
number of
products
purchased per
transaction in
the given data
equals 32.
• And a minimum
of one product
for every
transaction.
6. Data Preparation
• Consider last 3 years data as there might be the possibility that some products mayn’t be
continued.
• Then entries are selected from a specific date.
• The data is grouped based on the customer and date of creation of the order.
• The data should be arranged to get the distinct product purchased by each customer.
7. ML Approach
Apriori Association Rule Technique
have been implemented with
minimum bound support for each
rule at 0.001, and at least 0.5 (>
50%) confidence that customer
buys recommended product. Also
requested for output items sets as
rules, and minimum of ten rules as
output.
8. Model Output & Evaluation
1. X2_sausage=t 99 ==> X1_frankfurter=t 99 <conf:(1)>
2. X1_citrus fruit=t X3_pip fruit=t 31 ==> X2_tropical fruit=t 31 <conf:(1)>
3. X2_sausage=t X3_tropical fruit=t 10 ==> X1_frankfurter=t 10 <conf:(1)>
4. X1_other vegetables=t X3_curd=t 14 ==> X2_whole milk=t 13 <conf:(0.93)>
5. X1_root vegetables=t X3_whole milk=t 41 ==> X2_other vegetables=t 32 <conf:(0.78)>
6. X1_other vegetables=t X3_yogurt=t 33 ==> X2_whole milk=t 24 <conf:(0.73)>
7. X2_liquor=t 31 ==> X1_bottled beer=t 22 <conf:(0.71)>
8. X2_ham=t 64 ==> X1_sausage=t 43 <conf:(0.67)>
9. X1_sausage=t X3_soda=t 24 ==> X2_rolls/buns=t 16 <conf:(0.67)>
10. X3_liquor=t 27 ==> X2_bottled beer=t 17 <conf:(0.63)>
• Association rules are evaluated based on the measures available in the rule set.
• Each rule will be in the form of antecedent => consequent format.
• In addition, one can see each rule’s confidence number <conf: (x)>, which tells us about
support for that rule and we have minimum support of 0.63 which is nothing but a cut
off of 63% was used in selecting above 10 rules.
9. Conclusion
• Association Rules provide an efficient set of rules that recommend next best buy for the
visiting store customers.
• As we have seen in our research we found several such ones, similarities between the
items purchased like customers who ever buy other vegetables, and curd are more likely
to buy whole milk.
• Similarly, customers who buy citrus fruit, and pip fruit, are more likely to buy tropical fruit
for regular use.
• Thus, all these generated rule sets help retailer to recommend best possible buys for their
customers, where in without any external marketing effort.