Association-Analysis.pdf

Association Rules and
Frequent Pattern
Analysis
Dr. Iqbal H. Sarker
Dept of CSE, CUET
Research LAB Web:
Sarker DataLAB
(http://sarkerdatalab.com/)
Machine Learning Slide 1
Iqbal H. Sarker

Today’s Agenda
 Introduction to Association Rules
 Motivation with Examples
 Algorithms
 How it works?
 Real life Application Areas
 Summary
Slide 2
Iqbal H. Sarker Machine Learning

Introduction to AR
 Ideas come from the market basket analysis (MBA)
◼ Let’s go shopping!
Milk, eggs, sugar,
bread
Eggs, sugar
Milk, eggs, cereal,
bread
Customer1
Customer2 Customer3
◼ What do my customer buy? Which product are bought together?
◼ Aim: Find associations and correlations between the different
items that customers place in their shopping basket
Slide 3

Association rule learning is
a rule-based machine
learning method for discovering
interesting relations between
variables in large databases.
Iqbal H. Sarker Machine Learning Slide 4

Real-Life Applications
Used in many recommender systems
5
Machine Learning
Iqbal H. Sarker

Introduction to AR
 Formalizing the problem a little bit
◼ Transaction Database T: a set of transactions T = {t1, t2, …, tn}
◼ Each transaction contains a set of items I (item set)
◼ An itemset is a collection of items I = {i1, i2, …, im}
 General aim:
◼ Find frequent/interesting patterns, associations, correlations, or
causal structures among sets of items or elements in
databases or other information repositories.
◼ Put this relationships in terms of association rules
➢ X  Y
Slide 6

What’s an Interesting Rule?
 An association rule is an TID Items
implication of two itemsets
◼ X  Y
T1
T2
T3
T4
T5
bread, jelly, peanut-butter
bread, peanut-butter
bread, milk, peanut-butter
beer, bread
beer, milk
 Many measures of interest.
The two most used are:
◼ Support (s)
➢ The occurring frequency of the rule,
i.e., number of transactions that
contain both X and Y
s =
(X Y)
No.of trans.
◼ Confidence (c)
➢ The strength of the association,
i.e., measures of how often items in Y
Slide 7
appear in transactions that contain X
c =
(X  Y )
(X)

8
Mining Association Rules—an Example
For rule A  C:
support = support({A}{C}) = 50%
confidence = support({A}{C})/support({A}) = 66.6%
Min. support 50%
Min. confidence 50%
Transaction-id Items bought
10 A, B, C
20 A, C
30 A, D
40 B, E, F
Frequent pattern Support
{A} 75%
{B} 50%
{C} 50%
{A, C} 50%
Machine Learning
Iqbal H. Sarker

The Apriori Algorithm: Basics
 The name, Apriori, is based on the fact that the algorithm
uses prior knowledge of frequent itemset properties
 It consists of two steps
1. Generate all frequent itemsets whose support ≥
minsup
2. Use frequent itemsets to generate association rules
 So, let’s pay attention to the first step
Slide 9

Apriori
null
A B C D E
AB AD
AC AE BD
BC BE CE
CD DE
ABC ABE
ABD ACD ADE
ACE BCD BDE
BCE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Given n items, we have 2^n possible itemsets.
◼ Do we have to generate them all?
Slide 10

Apriori
 Let’s avoid expanding all the graph
 Key idea:
◼ Use Apriori Property: Any subsets of a frequent itemset are
also frequent itemsets
 Therefore, the algorithm iteratively does:
◼ Create itemsets
◼ Only continue exploration of those whose support ≥ minsup
Slide 11

Apriori: Pseudo-code
Join Step: Ck is generated by joining Lk-1with itself
Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a
frequent k-itemset
Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;

Illustration of the Apriori
principle
Found to be
Infrequent
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCDE
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCDE
Pruned
Infrequent supersets

Another Example
null
Infrequent
itemset
A B C D E
AB AD
AC AE BD
BC BE CE
CD DE
ABC ABE
ABD ACD ADE
ACE BCD BDE
BCE CDE
ABCDE
Slide 14

Example of Apriori Run
Database TDB
L1
C1
1st scan
C2 C2
L2 2nd scan
L3
C3 3rd scan
Slide 20
Machine Learning
Itemset sup
{B, C, E} 2
Itemset
{B, C, E}
Itemset sup
{A, C} 2
{B, C} 2
{B, E} 3
{C, E} 2
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
Itemset sup
{A, B} 1
{A, C} 2
{A, E} 1
{B, C} 2
{B, E} 3
{C, E} 2
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset sup
{A} 2
{B} 3
{C} 3
{E} 3
Itemset sup
{A} 2
{B} 3
{C} 3
{D} 1
{E} 3
Machine Learning Slide 15

Apriori
 Remember that Apriori consists of two steps
1. Generate all frequent itemsets whose support ≥ minsup
2. Use frequent itemsets to generate association rules
 We accomplished step 1. So we have all frequent
itemsets
 So, let’s pay attention to the second step
Slide 16

Rule Generation in Apriori
 Given a frequent itemset L
◼ Find all non-empty subsets F in L, such that the association
rule F  {L-F} satisfies the minimum confidence
◼ Create the rule F  {L-F}
 If L={A,B,C}
◼ The candidate itemsets are: ABC, ACB, BCA, ABC,
BAC, CAB
◼ In general, there are 2K-2 candidate solutions, where k is the
length of the itemset L
Slide 17

Example of Efficient Rule Generation
ABCD
Low
confidence
ABCD ABDC ACDB BCDA
ABCD ACBD BCAD BDAD
ADBC CDAB
ABCD BACD CABD DABC
Slide 18
Pruned
Rules

Relevant Algorithms
1. Apriori
2. FP-Growth
3. ECLAT
4. ABC-RuleMiner (Sarker et al., Elsevier)
[ABC-RuleMiner: User behavioral rule-based machine learning method for
context-aware intelligent services, Journal of Network and Computer
Applications, Elsevier, 2020]
5. Others…

Possible Application Areas
➢Market Basket Analysis
➢Context-Aware Intelligent Systems
➢Medical Diagnosis
➢Mobile Applications
➢Smart Cities
➢Cyber security
➢Protein Sequence
➢Web Usage
➢Census Data
➢So on..

Questions ?
Thank You !!!
Sarker DataLAB
(http://sarkerdatalab.com/)
Email: iqbal.sarker.cse@gmail.com
21

Association-Analysis.pdf

More Related Content

Similar to Association-Analysis.pdf

Recently uploaded

Association-Analysis.pdf