2024: Domino Containers - The Next Step. News from the Domino Container commu...
Masket Basket Analysis
1. MARKET BASKET ANALYSIS
LEARNING OBJECTIVES:
• EXPLAINWHAT ASSOCIATION RULESAND ITEM SETSARE.
• DESCRIBETHE BASIC PROCESS FOR MARKET BASKETANALYSIS.
• KNOWWHENTOAPPLY MARKET BASKETANALYSIS.
• UNDERSTANDTHE STRENGTHSANDWEAKNESSESOF MARKET BASKETANALYSIS.
By Obakeng Brian Pheelwane & Marc Berman – Group 14
2. ASSOCIATION RULES
• Association discovery is to find items that imply the presence of other items in the same transaction.
• Association rules are in the form:
If <Left hand Side (LHS)> then <Right hand Side(RHS)>
• To indicate the validity and importance of a rule, each rule has two parameters:
• Support Factor
• Confidence Factor.
• RHS usually has one item; LHS has one or more items.
3. EXAMPLES OF ASSOCIATION RULES
In a database of transactions of two items (X,Y) in a departmental store, an example association rule is:
Thus if a customer buys X (which occurs in 70% of the cases studied) then he/she will also buyY.This
occurs 20% of all purchases made at the departmental store.
Therefore, the rule has a 70% confidence factor and 20% support factor.
4. CONFIDENCE AND SUPPORT FACTORS
• Assume we have an association rule indicated as LHS -> RHS.
• T is the total number of cases in the database.
• X is the number of cases covered by the LHS.
• Y is the number of cases covered by the RHS.
• XY is the number of cases covered by both the LHS and the RHS,
indicated by the overlapping area in Figure 1.Figure 1: Confidence and support factors visualised
Figure 2: Confidence and support factor formula
• Confidence factor is calculated based upon the number of cases present
in both the left and right hand sides of the scenario, divided by the total
number of cases in the left hand side.
• Support factor is calculated based upon the number of cases present in
both the left and right hand sides of the scenario, divided by the total
number of cases in the database.
5. THE BASIC PROCESS OF MARKET BASKET ANALYSIS
1. Choosing the right item set.
• The objective is to define a set of items.When association rules are formed among these items, some
of the rules provide a meaningful interpretation that may lead to useful rules.
• Several methods to generating the right item sets:
• Use taxonomy to get the right level, range from general items to special items (see Figure 3)
• Use virtual items (see Figures 4 & 5)
• A combination of both
• The taxonomy and virtual items (to be prepared by the users or domain expert) become the means to
assist users to choose the right item set during the exploration to find useful rules.
7. Figure 4:This is an example of poor choice of virtual items since the rules are likely to be
redundant.
The problem with this visualisation is the rules are just repeats of the definition.
Figure 5:This is an example of a good choice of virtual items, though one must be careful
to not totally encompass the items used for analysis as this would create redundancy
again.
8. BASIC PROCESS CONTINUED
2. Generating rules:
• The rule generation process involves generating the co-occurrence matrix, counting the frequencies of
co-occurrence between n items in the item set.
• To generate a rule of n item of the form:
If X1, X2,…,X(n-1)Then Xn
A co-occurrence matrix of n items is required.
Number of items on LHS Total number of combinations
1 100
2 4,950
3 161,700
4 3,921,225
5 75,287,520
6 1,192,052,400
7 16,007,560,800
8 186,087,894,300
Figure 5:This is a computationally expensive process, especially when a
large data set is present.
9. BASIC PROCESS CONTINUED
3. Identifying useful rules that are unknown, valid and actionable.
• First, specify the threshold values for confidence factor and support factor to filter out rules which are
not supported by the data automatically by the rule generation algorithm.
• Second, human judgement is required to identify the interestingness, validity and actionability of the
rules which have sifted through the automatic filter.
10. WHENTO APPLY MARKET BASKET ANALYSIS
• Problems that consist of well-defined items that group together in potentially interesting ways.
• Time-series problems that can be adapted for market basket analysis by relatively simple data
transformations.
11. STRENGTHS ANDWEAKNESSES
Strengths:
• Clear and understandable results
• Support undirected data mining
• Work on variable-length data
• Simple computational process
Weaknesses:
• Computation increases exponentially as
• the problem size grows.
• Limited support for attributes on the data.
• Difficult to determine the right number of items.
• It discounts rare items.
12. DISSOCIATION RULES
• Similar to association rules except that a negation “NOT” is used to an item. An example of
dissociation rule is:
• If X and notY then Z.
Problems with dissociation rules:
1. Doubling the items significantly slows down the runtime
2. The size of transactions grows because it includes inverted items
3. Tend to produce rules in which all items are inverted because the frequency of the inverted items are
usually much larger.
13. WHATWE HAVE LEARNED:
• We have learned about association and dissociation rules.
• How to generate more specialised items using taxonomy and virtual items.
• When to apply Market Basket Analysis
• Finally, the strengths and weaknesses of Market Basket Analysis
14. REVIEW QUESTIONS
1. Discuss the similarities and differences between a decision rule and an association rule in terms of rule structure and
how it is used.
Decision rule (Separate-and-conquer)
Decision rules are closely related to decision trees.The terminal nodes of a tree can be grouped into rules. Attempts to find a
partial solution for a part of a problem. Looking for the optimal solution to the problem
How it is used:
- One partial solution in each step
Association rule
An association rule does not have a target. It finds all rules that exist in data. Attempts to find a full set of solutions of a problem.
Looking for the optimal solution to the problem.
How it is used:
- Multiple combinations in each step
15. 2. Discuss the due caution one should have when applying association rules. Relate your explanation to
the definition of data mining: Data mining is a process of extracting previously unknown, valid, and
actionable information from large databases and then using the information to make crucial decisions.
REVIEW QUESTIONS
16. 3. Compare the model selection process in predictive modelling with the similar process in market basket
analysis. Answer the following questions in your comparison:
i. What is a model?
A model in predictive modelling tasks is one built to make prediction for unseen data.
E.g. the trained model is used to make a positive or negative diagnosis about a disease for a new patient.
A model in market basket analysis is in the form of a set of rules that describes the association between
attributes and they are not meant for prediction.
ii. How do the model selection processes differ?
The model selection process in predictive modelling is guided by maximising a measure determined
during the problem definition step, this process can be carried out objectively.
The model selection process in the market basket analysis is more subjective, although a few measures
can be used to reduce the set of candidate rules.
REVIEW QUESTIONS