SlideShare a Scribd company logo
1
Mining Association Rules
Mohamed G. Elfeky
2
Introduction
Data mining is the discovery of knowledge
and useful information from the large
amounts of data stored in databases.
Association Rules: describing association
relationships among the attributes in the
set of relevant data.
3
Rules
Body ==> Consequent [ Support , Confidence ]
Body: represents the examined data.
Consequent: represents a discovered property
for the examined data.
Support: represents the percentage of the
records satisfying the body or the consequent.
Confidence: represents the percentage of the
records satisfying both the body and the
consequent to those satisfying only the body.
4
Association Rules Examples
Basket Data
Tea ^ Milk ==> Sugar [0.3 , 0.9]
Relational Data
x.diagnosis = Heart ^ x.sex = Male ==> x.age > 50 [0.4 , 0.7]
Object-Oriented Data
s.hobbies = { sport , art } ==> s.age() = Young [0.5 , 0.8]
5
Topics of Discussion
Formal Statement of the Problem
Different Algorithms
 AIS
 SETM
 Apriori
 AprioriTid
 AprioriHybrid
Performance Analysis
6
Formal Statement of the Problem
I = { i1 , i2 , … , im } is a set of items
D is a set of transactions T
Each transaction T is a set of items (subset of I)
TID is a unique identifier that is associated with
each transaction
The problem is to generate all association rules
that have support and confidence greater than
the user-specified minimum support and
minimum confidence
7
Problem Decomposition
The problem can be decomposed into two subproblems:
1. Find all sets of items (itemsets) that have
support (number of transactions) greater than
the minimum support (large itemsets).
2. Use the large itemsets to generate the desired
rules.
For each large itemset l, find all non-empty subsets,
and for each subset a generate a rule a ==> (l-a) if
its confidence is greater than the minimum
confidence.
8
General Algorithm
1. In the first pass, the support of each individual
item is counted, and the large ones are
determined
2. In each subsequent pass, the large itemsets
determined in the previous pass is used to
generate new itemsets called candidate
itemsets.
3. The support of each candidate itemset is
counted, and the large ones are determined.
4. This process continues until no new large
itemsets are found.
9
AIS Algorithm
Candidate itemsets are generated and counted on-the-
fly as the database is scanned.
1. For each transaction, it is determined which of the
large itemsets of the previous pass are contained in
this transaction.
2. New candidate itemsets are generated by extending
these large itemsets with other items in this
transaction.
The disadvantage is that this results in unnecessarily
generating and counting too many candidate itemsets
that turn out to be small.
10
Example
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database
Itemset Support
{1} 2
{2} 3
{3} 3
{5} 3
L1
Itemset Support
{1 3}* 2
{1 4} 1
{3 4} 1
{2 3}* 2
{2 5}* 3
{3 5}* 2
{1 2} 1
{1 5} 1
C2
Itemset Support
{1 3 4} 1
{2 3 5}* 2
{1 3 5} 1
C3
11
SETM Algorithm
Candidate itemsets are generated on-the-fly as the
database is scanned, but counted at the end of the pass.
1. New candidate itemsets are generated the same way as
in AIS algorithm, but the TID of the generating
transaction is saved with the candidate itemset in a
sequential structure.
2. At the end of the pass, the support count of candidate
itemsets is determined by aggregating this sequential
structure
It has the same disadvantage of the AIS algorithm.
Another disadvantage is that for each candidate itemset,
there are as many entries as its support value.
12
Example
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database
Itemset Support
{1} 2
{2} 3
{3} 3
{5} 3
L1
Itemset TID
{1 3} 100
{1 4} 100
{3 4} 100
{2 3} 200
{2 5} 200
{3 5} 200
{1 2} 300
{1 3} 300
{1 5} 300
{2 3} 300
{2 5} 300
{3 5} 300
{2 5} 400
C2
Itemset TID
{1 3 4} 100
{2 3 5} 200
{1 3 5} 300
{2 3 5} 300
C3
13
Apriori Algorithm
Candidate itemsets are generated using only the
large itemsets of the previous pass without
considering the transactions in the database.
1.The large itemset of the previous pass is joined
with itself to generate all itemsets whose size is
higher by 1.
2.Each generated itemset, that has a subset which
is not large, is deleted. The remaining itemsets
are the candidate ones.
14
Example
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database
Itemset Support
{1} 2
{2} 3
{3} 3
{5} 3
L1
Itemset Support
{1 2} 1
{1 3}* 2
{1 5} 1
{2 3}* 2
{2 5}* 3
{3 5}* 2
C2
Itemset Support
{2 3 5}* 2
C3
{1 2 3}
{1 3 5}
{2 3 5}
15
AprioriTid Algorithm
The database is not used at all for counting the support
of candidate itemsets after the first pass.
1. The candidate itemsets are generated the same way as
in Apriori algorithm.
2. Another set C’ is generated of which each member has
the TID of each transaction and the large itemsets
present in this transaction. This set is used to count the
support of each candidate itemset.
The advantage is that the number of entries in C’ may
be smaller than the number of transactions in the
database, especially in the later passes.
16
Example
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database
Itemset Support
{1} 2
{2} 3
{3} 3
{5} 3
L1
Itemset Support
{1 2} 1
{1 3}* 2
{1 5} 1
{2 3}* 2
{2 5}* 3
{3 5}* 2
C2
Itemset Support
{2 3 5}* 2
C3
100 {1 3}
200 {2 3}, {2 5}, {3 5}
300 {1 2}, {1 3}, {1 5},
{2 3}, {2 5}, {3 5}
400 {2 5}
C’2
200 {2 3 5}
300 {2 3 5}
C’3
17
Performance Analysis
18
AprioriHybrid Algorithm
Performance Analysis shows that:
1. Apriori does better than AprioriTid in the
earlier passes.
2. AprioriTid does better than Apriori in the
later passes.
Hence, a hybrid algorithm can be designed
that uses Apriori in the initial passes and
switches to AprioriTid when it expects that
the set C’ will fit in memory.

More Related Content

Similar to MiningAssociationbestRulespresentation.ppt

Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining ProcedureDiscovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining Procedure
IOSR Journals
 
Efficient Temporal Association Rule Mining
Efficient Temporal Association Rule MiningEfficient Temporal Association Rule Mining
A04010105
A04010105A04010105
Efficient Temporal Association Rule Mining
Efficient Temporal Association Rule MiningEfficient Temporal Association Rule Mining
Efficient Temporal Association Rule Mining
IJMER
 
Hiding Sensitive Association Rules
Hiding Sensitive Association Rules Hiding Sensitive Association Rules
Hiding Sensitive Association Rules
Vinayreddy Polati
 
Ej36829834
Ej36829834Ej36829834
Ej36829834
IJERA Editor
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
SowmyaJyothi3
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
SowmyaJyothi3
 
J0945761
J0945761J0945761
J0945761
IOSR Journals
 
Intelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIntelligent Supermarket using Apriori
Intelligent Supermarket using Apriori
IRJET Journal
 
B0950814
B0950814B0950814
B0950814
IOSR Journals
 
ifip2008albashiri.pdf
ifip2008albashiri.pdfifip2008albashiri.pdf
ifip2008albashiri.pdf
KamalAlbashiri
 
Hiding slides
Hiding slidesHiding slides
Hiding slides
sameeksha15
 
Datamining.pptx
Datamining.pptxDatamining.pptx
Datamining.pptx
MedinaBedru
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
Rashi Agarwal
 
CS583-association-rules.ppt
CS583-association-rules.pptCS583-association-rules.ppt
CS583-association-rules.ppt
YbralemBugusa
 
CS583-association-rules presentation.ppt
CS583-association-rules presentation.pptCS583-association-rules presentation.ppt
CS583-association-rules presentation.ppt
l228296
 
CS583-association-rules.ppt
CS583-association-rules.pptCS583-association-rules.ppt
CS583-association-rules.ppt
ZAFmedia
 
unit II Mining Association Rule.pdf
unit II Mining   Association    Rule.pdfunit II Mining   Association    Rule.pdf
unit II Mining Association Rule.pdf
logeswarisaravanan
 
Association rule mining used in data mining
Association rule mining used in data miningAssociation rule mining used in data mining
Association rule mining used in data mining
vayumani25
 

Similar to MiningAssociationbestRulespresentation.ppt (20)

Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining ProcedureDiscovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining Procedure
 
Efficient Temporal Association Rule Mining
Efficient Temporal Association Rule MiningEfficient Temporal Association Rule Mining
Efficient Temporal Association Rule Mining
 
A04010105
A04010105A04010105
A04010105
 
Efficient Temporal Association Rule Mining
Efficient Temporal Association Rule MiningEfficient Temporal Association Rule Mining
Efficient Temporal Association Rule Mining
 
Hiding Sensitive Association Rules
Hiding Sensitive Association Rules Hiding Sensitive Association Rules
Hiding Sensitive Association Rules
 
Ej36829834
Ej36829834Ej36829834
Ej36829834
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
 
J0945761
J0945761J0945761
J0945761
 
Intelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIntelligent Supermarket using Apriori
Intelligent Supermarket using Apriori
 
B0950814
B0950814B0950814
B0950814
 
ifip2008albashiri.pdf
ifip2008albashiri.pdfifip2008albashiri.pdf
ifip2008albashiri.pdf
 
Hiding slides
Hiding slidesHiding slides
Hiding slides
 
Datamining.pptx
Datamining.pptxDatamining.pptx
Datamining.pptx
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
 
CS583-association-rules.ppt
CS583-association-rules.pptCS583-association-rules.ppt
CS583-association-rules.ppt
 
CS583-association-rules presentation.ppt
CS583-association-rules presentation.pptCS583-association-rules presentation.ppt
CS583-association-rules presentation.ppt
 
CS583-association-rules.ppt
CS583-association-rules.pptCS583-association-rules.ppt
CS583-association-rules.ppt
 
unit II Mining Association Rule.pdf
unit II Mining   Association    Rule.pdfunit II Mining   Association    Rule.pdf
unit II Mining Association Rule.pdf
 
Association rule mining used in data mining
Association rule mining used in data miningAssociation rule mining used in data mining
Association rule mining used in data mining
 

Recently uploaded

一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
Vineet
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
yuvarajkumar334
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
mukulupadhayay1
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 

Recently uploaded (20)

一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 

MiningAssociationbestRulespresentation.ppt

  • 2. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. Association Rules: describing association relationships among the attributes in the set of relevant data.
  • 3. 3 Rules Body ==> Consequent [ Support , Confidence ] Body: represents the examined data. Consequent: represents a discovered property for the examined data. Support: represents the percentage of the records satisfying the body or the consequent. Confidence: represents the percentage of the records satisfying both the body and the consequent to those satisfying only the body.
  • 4. 4 Association Rules Examples Basket Data Tea ^ Milk ==> Sugar [0.3 , 0.9] Relational Data x.diagnosis = Heart ^ x.sex = Male ==> x.age > 50 [0.4 , 0.7] Object-Oriented Data s.hobbies = { sport , art } ==> s.age() = Young [0.5 , 0.8]
  • 5. 5 Topics of Discussion Formal Statement of the Problem Different Algorithms  AIS  SETM  Apriori  AprioriTid  AprioriHybrid Performance Analysis
  • 6. 6 Formal Statement of the Problem I = { i1 , i2 , … , im } is a set of items D is a set of transactions T Each transaction T is a set of items (subset of I) TID is a unique identifier that is associated with each transaction The problem is to generate all association rules that have support and confidence greater than the user-specified minimum support and minimum confidence
  • 7. 7 Problem Decomposition The problem can be decomposed into two subproblems: 1. Find all sets of items (itemsets) that have support (number of transactions) greater than the minimum support (large itemsets). 2. Use the large itemsets to generate the desired rules. For each large itemset l, find all non-empty subsets, and for each subset a generate a rule a ==> (l-a) if its confidence is greater than the minimum confidence.
  • 8. 8 General Algorithm 1. In the first pass, the support of each individual item is counted, and the large ones are determined 2. In each subsequent pass, the large itemsets determined in the previous pass is used to generate new itemsets called candidate itemsets. 3. The support of each candidate itemset is counted, and the large ones are determined. 4. This process continues until no new large itemsets are found.
  • 9. 9 AIS Algorithm Candidate itemsets are generated and counted on-the- fly as the database is scanned. 1. For each transaction, it is determined which of the large itemsets of the previous pass are contained in this transaction. 2. New candidate itemsets are generated by extending these large itemsets with other items in this transaction. The disadvantage is that this results in unnecessarily generating and counting too many candidate itemsets that turn out to be small.
  • 10. 10 Example TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database Itemset Support {1} 2 {2} 3 {3} 3 {5} 3 L1 Itemset Support {1 3}* 2 {1 4} 1 {3 4} 1 {2 3}* 2 {2 5}* 3 {3 5}* 2 {1 2} 1 {1 5} 1 C2 Itemset Support {1 3 4} 1 {2 3 5}* 2 {1 3 5} 1 C3
  • 11. 11 SETM Algorithm Candidate itemsets are generated on-the-fly as the database is scanned, but counted at the end of the pass. 1. New candidate itemsets are generated the same way as in AIS algorithm, but the TID of the generating transaction is saved with the candidate itemset in a sequential structure. 2. At the end of the pass, the support count of candidate itemsets is determined by aggregating this sequential structure It has the same disadvantage of the AIS algorithm. Another disadvantage is that for each candidate itemset, there are as many entries as its support value.
  • 12. 12 Example TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database Itemset Support {1} 2 {2} 3 {3} 3 {5} 3 L1 Itemset TID {1 3} 100 {1 4} 100 {3 4} 100 {2 3} 200 {2 5} 200 {3 5} 200 {1 2} 300 {1 3} 300 {1 5} 300 {2 3} 300 {2 5} 300 {3 5} 300 {2 5} 400 C2 Itemset TID {1 3 4} 100 {2 3 5} 200 {1 3 5} 300 {2 3 5} 300 C3
  • 13. 13 Apriori Algorithm Candidate itemsets are generated using only the large itemsets of the previous pass without considering the transactions in the database. 1.The large itemset of the previous pass is joined with itself to generate all itemsets whose size is higher by 1. 2.Each generated itemset, that has a subset which is not large, is deleted. The remaining itemsets are the candidate ones.
  • 14. 14 Example TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database Itemset Support {1} 2 {2} 3 {3} 3 {5} 3 L1 Itemset Support {1 2} 1 {1 3}* 2 {1 5} 1 {2 3}* 2 {2 5}* 3 {3 5}* 2 C2 Itemset Support {2 3 5}* 2 C3 {1 2 3} {1 3 5} {2 3 5}
  • 15. 15 AprioriTid Algorithm The database is not used at all for counting the support of candidate itemsets after the first pass. 1. The candidate itemsets are generated the same way as in Apriori algorithm. 2. Another set C’ is generated of which each member has the TID of each transaction and the large itemsets present in this transaction. This set is used to count the support of each candidate itemset. The advantage is that the number of entries in C’ may be smaller than the number of transactions in the database, especially in the later passes.
  • 16. 16 Example TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database Itemset Support {1} 2 {2} 3 {3} 3 {5} 3 L1 Itemset Support {1 2} 1 {1 3}* 2 {1 5} 1 {2 3}* 2 {2 5}* 3 {3 5}* 2 C2 Itemset Support {2 3 5}* 2 C3 100 {1 3} 200 {2 3}, {2 5}, {3 5} 300 {1 2}, {1 3}, {1 5}, {2 3}, {2 5}, {3 5} 400 {2 5} C’2 200 {2 3 5} 300 {2 3 5} C’3
  • 18. 18 AprioriHybrid Algorithm Performance Analysis shows that: 1. Apriori does better than AprioriTid in the earlier passes. 2. AprioriTid does better than Apriori in the later passes. Hence, a hybrid algorithm can be designed that uses Apriori in the initial passes and switches to AprioriTid when it expects that the set C’ will fit in memory.