Association Rules and
Frequent Pattern
Analysis
Dr. Iqbal H. Sarker
Dept of CSE, CUET
Research LAB Web:
Sarker DataLAB
(http://sarkerdatalab.com/)
Machine Learning Slide 1
Iqbal H. Sarker
Today’s Agenda
 Introduction to Association Rules
 Motivation with Examples
 Algorithms
 How it works?
 Real life Application Areas
 Summary
Slide 2
Iqbal H. Sarker Machine Learning
Introduction to AR
 Ideas come from the market basket analysis (MBA)
◼ Let’s go shopping!
Milk, eggs, sugar,
bread
Eggs, sugar
Milk, eggs, cereal,
bread
Customer1
Customer2 Customer3
◼ What do my customer buy? Which product are bought together?
◼ Aim: Find associations and correlations between the different
items that customers place in their shopping basket
Slide 3
Iqbal H. Sarker Machine Learning
Association rule learning is
a rule-based machine
learning method for discovering
interesting relations between
variables in large databases.
Iqbal H. Sarker Machine Learning Slide 4
Real-Life Applications
Used in many recommender systems
5
Machine Learning
Iqbal H. Sarker
Introduction to AR
 Formalizing the problem a little bit
◼ Transaction Database T: a set of transactions T = {t1, t2, …, tn}
◼ Each transaction contains a set of items I (item set)
◼ An itemset is a collection of items I = {i1, i2, …, im}
 General aim:
◼ Find frequent/interesting patterns, associations, correlations, or
causal structures among sets of items or elements in
databases or other information repositories.
◼ Put this relationships in terms of association rules
➢ X  Y
Slide 6
Iqbal H. Sarker Machine Learning
What’s an Interesting Rule?
 An association rule is an TID Items
implication of two itemsets
◼ X  Y
T1
T2
T3
T4
T5
bread, jelly, peanut-butter
bread, peanut-butter
bread, milk, peanut-butter
beer, bread
beer, milk
 Many measures of interest.
The two most used are:
◼ Support (s)
➢ The occurring frequency of the rule,
i.e., number of transactions that
contain both X and Y
s =
(X Y)
No.of trans.
◼ Confidence (c)
➢ The strength of the association,
i.e., measures of how often items in Y
Slide 7
Iqbal H. Sarker Machine Learning
appear in transactions that contain X
c =
(X  Y )
(X)
8
Mining Association Rules—an Example
For rule A  C:
support = support({A}{C}) = 50%
confidence = support({A}{C})/support({A}) = 66.6%
Min. support 50%
Min. confidence 50%
Transaction-id Items bought
10 A, B, C
20 A, C
30 A, D
40 B, E, F
Frequent pattern Support
{A} 75%
{B} 50%
{C} 50%
{A, C} 50%
Machine Learning
Iqbal H. Sarker
The Apriori Algorithm: Basics
 The name, Apriori, is based on the fact that the algorithm
uses prior knowledge of frequent itemset properties
 It consists of two steps
1. Generate all frequent itemsets whose support ≥
minsup
2. Use frequent itemsets to generate association rules
 So, let’s pay attention to the first step
Slide 9
Iqbal H. Sarker Machine Learning
Apriori
null
A B C D E
AB AD
AC AE BD
BC BE CE
CD DE
ABC ABE
ABD ACD ADE
ACE BCD BDE
BCE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Given n items, we have 2^n possible itemsets.
◼ Do we have to generate them all?
Slide 10
Iqbal H. Sarker Machine Learning
Apriori
 Let’s avoid expanding all the graph
 Key idea:
◼ Use Apriori Property: Any subsets of a frequent itemset are
also frequent itemsets
 Therefore, the algorithm iteratively does:
◼ Create itemsets
◼ Only continue exploration of those whose support ≥ minsup
Slide 11
Iqbal H. Sarker Machine Learning
Apriori: Pseudo-code
Iqbal H. Sarker Machine Learning Slide 12
Join Step: Ck is generated by joining Lk-1with itself
Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a
frequent k-itemset
Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Illustration of the Apriori
principle
Found to be
Infrequent
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Pruned
Infrequent supersets
Another Example
null
Infrequent
itemset
A B C D E
AB AD
AC AE BD
BC BE CE
CD DE
ABC ABE
ABD ACD ADE
ACE BCD BDE
BCE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Slide 14
Iqbal H. Sarker Machine Learning
Example of Apriori Run
Database TDB
L1
C1
1st scan
C2 C2
L2 2nd scan
L3
C3 3rd scan
Slide 20
Machine Learning
Itemset sup
{B, C, E} 2
Itemset
{B, C, E}
Itemset sup
{A, C} 2
{B, C} 2
{B, E} 3
{C, E} 2
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
Itemset sup
{A, B} 1
{A, C} 2
{A, E} 1
{B, C} 2
{B, E} 3
{C, E} 2
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset sup
{A} 2
{B} 3
{C} 3
{E} 3
Itemset sup
{A} 2
{B} 3
{C} 3
{D} 1
{E} 3
Machine Learning Slide 15
Apriori
 Remember that Apriori consists of two steps
1. Generate all frequent itemsets whose support ≥ minsup
2. Use frequent itemsets to generate association rules
 We accomplished step 1. So we have all frequent
itemsets
 So, let’s pay attention to the second step
Slide 16
Iqbal H. Sarker Machine Learning
Rule Generation in Apriori
 Given a frequent itemset L
◼ Find all non-empty subsets F in L, such that the association
rule F  {L-F} satisfies the minimum confidence
◼ Create the rule F  {L-F}
 If L={A,B,C}
◼ The candidate itemsets are: ABC, ACB, BCA, ABC,
BAC, CAB
◼ In general, there are 2K-2 candidate solutions, where k is the
length of the itemset L
Slide 17
Iqbal H. Sarker Machine Learning
Example of Efficient Rule Generation
ABCD
Low
confidence
ABCD ABDC ACDB BCDA
ABCD ACBD BCAD BDAD
ADBC CDAB
ABCD BACD CABD DABC
Slide 18
Iqbal H. Sarker Machine Learning
Pruned
Rules
Relevant Algorithms
1. Apriori
2. FP-Growth
3. ECLAT
4. ABC-RuleMiner (Sarker et al., Elsevier)
[ABC-RuleMiner: User behavioral rule-based machine learning method for
context-aware intelligent services, Journal of Network and Computer
Applications, Elsevier, 2020]
5. Others…
Iqbal H. Sarker Machine Learning Slide 19
Possible Application Areas
➢Market Basket Analysis
➢Context-Aware Intelligent Systems
➢Medical Diagnosis
➢Mobile Applications
➢Smart Cities
➢Cyber security
➢Protein Sequence
➢Web Usage
➢Census Data
➢So on..
Iqbal H. Sarker Machine Learning Slide 20
Questions ?
Thank You !!!
Sarker DataLAB
(http://sarkerdatalab.com/)
Email: iqbal.sarker.cse@gmail.com
21

Association-Analysis.pdf

  • 1.
    Association Rules and FrequentPattern Analysis Dr. Iqbal H. Sarker Dept of CSE, CUET Research LAB Web: Sarker DataLAB (http://sarkerdatalab.com/) Machine Learning Slide 1 Iqbal H. Sarker
  • 2.
    Today’s Agenda  Introductionto Association Rules  Motivation with Examples  Algorithms  How it works?  Real life Application Areas  Summary Slide 2 Iqbal H. Sarker Machine Learning
  • 3.
    Introduction to AR Ideas come from the market basket analysis (MBA) ◼ Let’s go shopping! Milk, eggs, sugar, bread Eggs, sugar Milk, eggs, cereal, bread Customer1 Customer2 Customer3 ◼ What do my customer buy? Which product are bought together? ◼ Aim: Find associations and correlations between the different items that customers place in their shopping basket Slide 3 Iqbal H. Sarker Machine Learning
  • 4.
    Association rule learningis a rule-based machine learning method for discovering interesting relations between variables in large databases. Iqbal H. Sarker Machine Learning Slide 4
  • 5.
    Real-Life Applications Used inmany recommender systems 5 Machine Learning Iqbal H. Sarker
  • 6.
    Introduction to AR Formalizing the problem a little bit ◼ Transaction Database T: a set of transactions T = {t1, t2, …, tn} ◼ Each transaction contains a set of items I (item set) ◼ An itemset is a collection of items I = {i1, i2, …, im}  General aim: ◼ Find frequent/interesting patterns, associations, correlations, or causal structures among sets of items or elements in databases or other information repositories. ◼ Put this relationships in terms of association rules ➢ X  Y Slide 6 Iqbal H. Sarker Machine Learning
  • 7.
    What’s an InterestingRule?  An association rule is an TID Items implication of two itemsets ◼ X  Y T1 T2 T3 T4 T5 bread, jelly, peanut-butter bread, peanut-butter bread, milk, peanut-butter beer, bread beer, milk  Many measures of interest. The two most used are: ◼ Support (s) ➢ The occurring frequency of the rule, i.e., number of transactions that contain both X and Y s = (X Y) No.of trans. ◼ Confidence (c) ➢ The strength of the association, i.e., measures of how often items in Y Slide 7 Iqbal H. Sarker Machine Learning appear in transactions that contain X c = (X  Y ) (X)
  • 8.
    8 Mining Association Rules—anExample For rule A  C: support = support({A}{C}) = 50% confidence = support({A}{C})/support({A}) = 66.6% Min. support 50% Min. confidence 50% Transaction-id Items bought 10 A, B, C 20 A, C 30 A, D 40 B, E, F Frequent pattern Support {A} 75% {B} 50% {C} 50% {A, C} 50% Machine Learning Iqbal H. Sarker
  • 9.
    The Apriori Algorithm:Basics  The name, Apriori, is based on the fact that the algorithm uses prior knowledge of frequent itemset properties  It consists of two steps 1. Generate all frequent itemsets whose support ≥ minsup 2. Use frequent itemsets to generate association rules  So, let’s pay attention to the first step Slide 9 Iqbal H. Sarker Machine Learning
  • 10.
    Apriori null A B CD E AB AD AC AE BD BC BE CE CD DE ABC ABE ABD ACD ADE ACE BCD BDE BCE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given n items, we have 2^n possible itemsets. ◼ Do we have to generate them all? Slide 10 Iqbal H. Sarker Machine Learning
  • 11.
    Apriori  Let’s avoidexpanding all the graph  Key idea: ◼ Use Apriori Property: Any subsets of a frequent itemset are also frequent itemsets  Therefore, the algorithm iteratively does: ◼ Create itemsets ◼ Only continue exploration of those whose support ≥ minsup Slide 11 Iqbal H. Sarker Machine Learning
  • 12.
    Apriori: Pseudo-code Iqbal H.Sarker Machine Learning Slide 12 Join Step: Ck is generated by joining Lk-1with itself Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset Pseudo-code: Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk;
  • 13.
    Illustration of theApriori principle Found to be Infrequent null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Pruned Infrequent supersets
  • 14.
    Another Example null Infrequent itemset A BC D E AB AD AC AE BD BC BE CE CD DE ABC ABE ABD ACD ADE ACE BCD BDE BCE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Slide 14 Iqbal H. Sarker Machine Learning
  • 15.
    Example of AprioriRun Database TDB L1 C1 1st scan C2 C2 L2 2nd scan L3 C3 3rd scan Slide 20 Machine Learning Itemset sup {B, C, E} 2 Itemset {B, C, E} Itemset sup {A, C} 2 {B, C} 2 {B, E} 3 {C, E} 2 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemset sup {A, B} 1 {A, C} 2 {A, E} 1 {B, C} 2 {B, E} 3 {C, E} 2 Tid Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E Itemset sup {A} 2 {B} 3 {C} 3 {E} 3 Itemset sup {A} 2 {B} 3 {C} 3 {D} 1 {E} 3 Machine Learning Slide 15
  • 16.
    Apriori  Remember thatApriori consists of two steps 1. Generate all frequent itemsets whose support ≥ minsup 2. Use frequent itemsets to generate association rules  We accomplished step 1. So we have all frequent itemsets  So, let’s pay attention to the second step Slide 16 Iqbal H. Sarker Machine Learning
  • 17.
    Rule Generation inApriori  Given a frequent itemset L ◼ Find all non-empty subsets F in L, such that the association rule F  {L-F} satisfies the minimum confidence ◼ Create the rule F  {L-F}  If L={A,B,C} ◼ The candidate itemsets are: ABC, ACB, BCA, ABC, BAC, CAB ◼ In general, there are 2K-2 candidate solutions, where k is the length of the itemset L Slide 17 Iqbal H. Sarker Machine Learning
  • 18.
    Example of EfficientRule Generation ABCD Low confidence ABCD ABDC ACDB BCDA ABCD ACBD BCAD BDAD ADBC CDAB ABCD BACD CABD DABC Slide 18 Iqbal H. Sarker Machine Learning Pruned Rules
  • 19.
    Relevant Algorithms 1. Apriori 2.FP-Growth 3. ECLAT 4. ABC-RuleMiner (Sarker et al., Elsevier) [ABC-RuleMiner: User behavioral rule-based machine learning method for context-aware intelligent services, Journal of Network and Computer Applications, Elsevier, 2020] 5. Others… Iqbal H. Sarker Machine Learning Slide 19
  • 20.
    Possible Application Areas ➢MarketBasket Analysis ➢Context-Aware Intelligent Systems ➢Medical Diagnosis ➢Mobile Applications ➢Smart Cities ➢Cyber security ➢Protein Sequence ➢Web Usage ➢Census Data ➢So on.. Iqbal H. Sarker Machine Learning Slide 20
  • 21.
    Questions ? Thank You!!! Sarker DataLAB (http://sarkerdatalab.com/) Email: iqbal.sarker.cse@gmail.com 21