An Association rulewill be strong if
Support(X->Y) >= Minimum Support and Confidence (X->Y) >= Minimum Confidence.
Correlation Analysis
It is used to study the closeness of the relationship between two or more variables i.e., the degree
to which the variables are associated with each other.
Methods of Correlation Analysis
1. Lift
2. X2
(Chi Square)
3. All Confidence
4. Cosine
Correlation Analysis using Lift
The Lift between the occurrence of A and B can be measured by computing
Lift (A, B) =
𝑃(𝐴𝑈𝐵)
𝑃(𝐴)∗𝑃(𝐵)
• If Lift < 1, then the occurrence of A is negatively correlated with the occurrence of B,
meaning that the occurrence of one likely leads to the absence of the other one.
• If Lift > 1, then the occurrence of A is positively correlated with the occurrence of B,
meaning that the occurrence of one implies to the occurrence of the other one.
• If Lift = = 1, then A and B are independent and there is no correlation between them.
Correlation Analysis using Χ2
Χ2
= ∑
[𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑−𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑]2
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑
Expected (i,j) =
𝑆𝑢𝑝(𝑖)∗𝑆𝑢𝑝(𝑗)
𝑁
• If Χ2
< 1, then the occurrence of A is negatively correlated with the occurrence of B,
meaning that the occurrence of one likely leads to the absence of the other one.
• If Χ2
> 1, then the occurrence of A is positively correlated with the occurrence of B,
meaning that the occurrence of one implies to the occurrence of the other one.
• If Χ2
= = 0, then A and B are independent and there is no correlation between them.
Correlation Analysis using All-Confidence
All-Confidence=
𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝐴𝑈𝐵)
𝑀𝑎𝑥−𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝐴)
• If All-Confidence < 0.5, then the occurrence of A is negatively correlated with the
occurrence of B, meaning that the occurrence of one likely leads to the absence of the other
one.
2.
• If All-Confidence> 0.5, then the occurrence of A is positively correlated with the
occurrence of B, meaning that the occurrence of one implies to the occurrence of the other
one.
• If All-Confidence = = 0.5, then A and B are independent and there is no correlation between
them.
Correlation Analysis using Cosine
Cosine=
𝑃(𝐴𝑈𝐵)
√𝑃(𝐴)∗𝑃(𝐵)
• If Cosine < 0.5, then the occurrence of A is negatively correlated with the occurrence of B,
meaning that the occurrence of one likely leads to the absence of the other one.
• If Cosine > 0.5, then the occurrence of A is positively correlated with the occurrence of B,
meaning that the occurrence of one implies to the occurrence of the other one.
• If Cosine = = 0.5, then A and B are independent and there is no correlation between them.
Problem:
Mr. Jamal Hossain, manager of All Electronics interested to find out the correlation between his
two most sold item namely computer game and video. Mr. Jamal analyses his database and find
out the following statistics about the two items.
Computer Game 𝐶𝑜𝑚𝑝𝑢𝑡𝑒𝑟 𝐺𝑎𝑚𝑒 ∑ 𝑅𝑜𝑤
Video 4000 3500 7500
𝑉𝑖𝑑𝑒𝑜 2000 500 2500
∑ 𝐶𝑜𝑙 6000 4000 10000 = N
i. Suppose that the association rule buys (X, “Computer Game”) -> buys (Y, “Video”) is
mined. Given a minimum support threshold and minimum confidence threshold of 30%and
60% respectively, is this rule strong?
ii. Analyze correlation between these items using Lift, Chi Square, All-Confidence and
Cosine measure.
Solution:
i. We know that,
Support (X->Y) =
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋𝑢𝑌)
𝑁
*100
Confidence (X->Y) =
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋𝑢𝑌)
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋)
*100
Therefore,
Support (G->V) =
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝐺𝑢𝑉)
𝑁
*100
= (4000/10000)*100 = 40% > Minimum Support 30%