3. • An observant Wal-Mart store manager discovered a strong association between a
brand of babies nappies (diapers) and a brand of beer. Analysis of purchases revealed
that they were made by men, on Friday evenings mainly between 6pm and 7pm. The
supermarket figured out the following rationale:
• Because packs of diapers are very large, the wife, who in most cases made the household
purchases, left the diaper purchase to her husband.
• Being the end of the working week, the husband and father also wanted to get some beer in for
the weekend.
Beer & Nappies
4. • They put the premium beer display next to the diapers
• The result was that the fathers buying diapers and who also usually bought beer now
bought the premium beer (the up-sell) as it was so conveniently placed next to the
diapers
• Significantly, the men that did not buy beer before began to purchase it because it was so
visible and handy - just next to the nappies (the cross-sell).
• Beer sales skyrocketed
What did the supermarket do with this knowledge?
5. What is consumer behavior ?
•A consumer activity in deciding to
purchase, use, as well as consume the
purchased goods and services.
6. What is Market Basket Analysis ?
• Identifies customers purchasing habits.
• It provides insight into the combination of products within a customers
'basket’.
• The term 'basket' normally applies to a single order.
• However, the analysis can be applied to other variations.
• We often compare all orders associated with a single customer.
• Ultimately, the purchasing insights provide the potential to create cross sell
propositions:
• Which product combinations are bought
• When they are purchased; and in
• What sequence
7. What is Association Rule Mining ?
• Recent research has positioned association mining as one of the most
popular tools in retail analytics.
• A data mining technique which generates rules in the form of X ⇒ Y ,
where X and Y are two non-overlapping discrete sets.
• A rule is considered as significant if it is satisfied by at least a certain
percentage of cases (minimum support) and its confidence is above a
certain threshold (minimum confidence).
• Conventional association mining considers “positive” relations in the form
of X ⇒ Y .
• However, negative associations in the form of X ⇒ ¬Y , where ¬Y represents
the negation (absence) of Y , can also be discovered through association
mining.
12. How does it help ?
• Developing this understanding enables businesses to promote their
most profitable products.
• It can also encourage customers to buy items that might have
otherwise been overlooked or missed.
• Market basket analysis delivers the "Amazon effect" to your business.
13. How does it help ?
• Changing the store layout according to trends
• Customer behavior analysis
• Catalogue design
• Cross marketing on online stores
• What are the trending items customers buy ?
• Customized emails with add-on sales
15. The Association rule
ANTECEDENT CONSEQUENT
A B=> [Support, Confidence]
{BEER}{DIAPER} -> [ Support = 60% , Confidence = 75 % ]
In 60 % of cases Diaper and Beer are sold together. In 75 % of cases when someone purchases a
Diaper , a Beer is also purchased.
18. • A very useful practical application
• Rule – If a basket contains apples and cheese, then it is likely to contain
bread
• It is important to talk of a measure at this point – Support
• Support – Given an itemset L, the support of L is the percentage of
transactions that contain L.
• Example – Support of {eggs} is 25%, support of {apples, bread} is 15%, support of
{eggs, grapes} is 10%
• Point to ponder – if the support of an itemset with one item, let us say X is p%, then
the support for an itemset which has X as one of the items will always be less than or
equal to p%.
Discovering Rules in a Market Basket
19. • Rule – If a basket contains X, it is likely to contain Y.
• X – apples, Y – bread. So, the rule that is being mentioned is that if a
customer buys apples he is likely to buy bread.
• The rule is depicted as {apples} {bread}
• X could be {apples, bread}, Y – {dates}. Therefore, the rule is, if a
customer buys apples and breads he is likely to buy dates.
• It is depicted as {apples, bread} {dates}
Discovering Rules
20. Measures of Rules - Confidence
•Confidence of a rule, {X} -> {Y}
• When the if part is true, how often is the then part true? Accuracy if you will.
• If {bread, eggs, milk} has a support of 0.2 and {bread, eggs} also has a support of 0.2, the
confidence of the rule {bread, eggs} -> {milk} is 1, i.e., 100% of time a customer buys bread and
eggs, milk is bought as well. The rule is therefore correct for 100% of the transactions.
21. Measures of Rules - Lift
•Lift of a rule, {X} -> {Y}
• How many times more often X and Y occur together than expected (if they are statistically independent of
each other)? Really related or coincidental?
• Lift is 1 if {X} and {Y} are statistically independent of one another. A lift of {X} -> {Y} greater than 1 indicates
that there is some usefulness to the rule.
• A larger value of lift suggests a greater strength of the association between X and Y
• 1000 transactions, {milk, eggs} - 300, {milk}- 500, {eggs} - 400, Lift (milk->eggs) = 0.3/0.5*0.4 = 1.5
22. Measures of Rules - Leverage
•Leverage of a rule, {X} -> {Y}
• How many times more often X and Y occur together than expected (if they are statistically independent of
each other)? Really related or coincidental?
Leverage ( X -> Y) = Support(X*Y) — Support (X) x Support (Y)
• Leverage is 0 if {X} and {Y} are statistically independent of one another.
• A leverage of {X} -> {Y} greater than 0 indicates that there is some usefulness to the rule.
• A larger value of lift indicates a stronger relationship between X and Y
• 1000 transactions, {milk, eggs} - 300, {milk}- 500, {eggs} - 400, Leverage (milk->eggs) = 0.3 - 0.5*0.4 = 0.1
23. • Confidence, Lift, Leverage – why so many?
• Confidence can identify trustworthy rules
• It cannot tell you if the rule is coincidental
• Lift and Leverage helps identify interesting rules and filters the
ones which are coincidental.
Measures of Rules
24. • Apriori is generally invoked with a minimum support threshold – let us say, 0.1
Apriori Algorithm (Itemset Lattice)
ϕ
A B C D
AB AC AD BC BD CD
ABC ABD ACD BCD
ABCD
25. • Apriori is generally invoked with a minimum support threshold – let
us say, 0.1
• Rules based on the frequent itemsets:
• {B}{C}
• {C}{B}
• {B}{D}
• {D}{B}
• {C}{D}
• {D}{C}
• {B}{CD}
• {C}{BD}
• {D}{BC}
Apriori Algorithm - Rules
26. Association rule mining example using Excel : Step 1
Total
2928
Items Support
vegetables 0.606557377
baby 0.271174863
fruit 0.296789617
milk 0.303620219
dvds 0.211065574
meat 0.24931694
COUNTIF(INDIRECT(vegetables)
,1)/total
27. Association rule mining example using Excel : Step 2
X Y
vegetables milk
if veg then milk
Vegetables-->Milk
Support (Vegetables ^ Milk) Support (Vegetables) Confidence Suport (Milk) Lift
0.184084699 0.606557377 0.303490991 0.303620219 0.999574378
COUNTIFS(INDIRECT(vegetables),1,INDIRECT(milk),1)/total
Support (Vegetables ^ Milk)
Support (Vegetables)
Confidence
Support (Milk)
29. Association rule mining example using R
Output
transdata<-read.csv("Transactions.csv")
MyTransaction<-as(transdata,"transactions")
itemsets<-apriori(MyTransaction,parameter = list(minlen=1,maxlen=1, support=0.05,target="frequent
itemsets"))
inspect(head(sort(itemsets,by="support")))
38. Apriori Example
The interpretation is straight forward:
•100% customers who bought “WOBBLY CHICKEN” also bought “DECORATION”.
•100% customers who bought “DECOUPAGE” also bought “GREETING CARD”.
41. Apriori Example
Parallel coordinates plot for 20 rules
3 2 1 rhs
HOOK
WOBBLYCHICKEN
METAL
PARTYPIZZA DISH GREEN POLKADOT
CHILDS GARDEN SPADE BLUE
CHOCOLATE SPOTS
ART LIGHTS
WRAP
PARTYPIZZA DISH PINK POLKADOT
Position
The RHS is the
Consequent or the
item we propose the
customer will buy;
the positions are in
the LHS where 2 is
the most recent
addition to our
basket and 1 is the
item we previously
had.