2. What is Association Rule Mining?
• Affinity analysis and association rule
learning encompasses a broad set of analytics
techniques aimed at uncovering the
associations and connections between specific
objects these might be
• visitors to your website (customers or audience),
• products in your store,
• content items on your media site
• Product recommendation on e-commerce website.
3. Market Basket Analysis?
• “market basket analysis” is one of the most
famous application of Association rule mining.
• In a market basket analysis, you look to see if
there are combinations of products that
frequently co-occur in transactions.
– For example, maybe people who buy flour and casting sugar,
also tend to buy eggs (because a high proportion of them are
planning on baking a cake).
4. Importance of performing MBA
• A retailer can use this information:
– Store layout (put products that co-occur together close to one
another, to improve the customer shopping experience)
– Marketing (e.g. target customers who buy flour with offers on
eggs, to encourage them to spend more on their shopping
basket)
• Online retailers and publishers can use this type
of analysis to:
– Inform the placement of content items on their media sites, or
products in their catalogue
– Drive recommendation engines (like Amazon’s customers who
bought this product also bought these products…)
– Deliver targeted marketing (e.g. emailing customers who bought
products specific products with other products and offers on
those products that are likely to be interesting to them.)
5. Some Mathematical terminology
• Support: The fraction of which our item set
occurs in our dataset.
X->Y
pr(xUy)
• Confidence: probability that a rule is correct
for a new transaction with items on the left.
pr(y|x)
6. Impelmentation using R
• It is assumed that reader has prior basic
knowledge of Association rule mining,apriori
algorithm etc.
• Packages we will use:
• Arules
• Arulesviz
7. Know your dataset
• Dataset used:
– Groceries(data is inbuilt in arule package)
– Also it can be downloaded
from:(Source:http://www.salemmarafi.com/wp-
content/uploads/2014/03/groceries.csv)
# Transaction in Input data 9835
# Columns in input data 32
# Items in input data 169
8. Lets start by loading packages &
datasets
Top 25 Frequent product
9. Mine some rules!!
• We need to provide min.support and min. confidence
Output:
If someone buys yogurt
and cereals, they are 81%
likely to buy whole milk
too.
10. Cont..
• We can get the summary report by using
• Summary(rules)
11. Sorting Stuff out
The first issue we see here is that the rules are not sorted. Often we will want the
most relevant rules first. Lets say we wanted to have the most likely rules. We can
easily sort by confidence by executing the following code.
Before Sorting
After Sorting
12. Redundancies!!
• Sometimes, rules will repeat. Redundancy indicates that one item might be a
given. As an analyst you can elect to drop the item from the dataset. Alternatively,
you can remove redundant rules generated.
• We can remove the redundancy by following rule:
subset.matrix <- is.subset(rules, rules)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
rules.pruned <- rules[!redundant]
rules<-rules.pruned
13. Targeting Items
• There are two types of Target we might be intrested:
• What are customers likely to buy before buying whole milk
• What are customers likely to buy if they purchase whole milk?