SlideShare a Scribd company logo
APRIORI Algorithm
Professor Anita Wasilewska
Lecture Notes
The Apriori Algorithm: Basics
The Apriori Algorithm is an influential algorithm for
mining frequent itemsets for boolean association rules.
Key Concepts :
• Frequent Itemsets: The sets of item which has minimum
support (denoted by Li for ith-Itemset).
• Apriori Property: Any subset of frequent itemset must be
frequent.
• Join Operation: To find Lk , a set of candidate k-itemsets
is generated by joining Lk-1 with itself.
The Apriori Algorithm in a
Nutshell
• Find the frequent itemsets: the sets of items that have
minimum support
– A subset of a frequent itemset must also be a
frequent itemset
• i.e., if {AB} is a frequent itemset, both {A} and {B}
should be a frequent itemset
– Iteratively find frequent itemsets with cardinality
from 1 to k (k-itemset)
• Use the frequent itemsets to generate association rules.
The Apriori Algorithm : Pseudo
code
• Join Step: Ck is generated by joining Lk-1with itself
• Prune Step: Any (k-1)-itemset that is not frequent cannot be a
subset of a frequent k-itemset
• Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=∅; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return ∪k Lk;
The Apriori Algorithm: Example
• Consider a database, D ,
consisting of 9 transactions.
• Suppose min. support count
required is 2 (i.e. min_sup = 2/9 =
22 % )
• Let minimum confidence required
is 70%.
• We have to first find out the
frequent itemset using Apriori
algorithm.
• Then, Association rules will be
generated using min. support &
min. confidence.
TID List of Items
T100 I1, I2, I5
T100 I2, I4
T100 I2, I3
T100 I1, I2, I4
T100 I1, I3
T100 I2, I3
T100 I1, I3
T100 I1, I2 ,I3, I5
T100 I1, I2, I3
Step 1: Generating 1-itemset Frequent Pattern
Itemset Sup.Count
{I1} 6
{I2} 7
{I3} 6
{I4} 2
{I5} 2
Itemset Sup.Count
{I1} 6
{I2} 7
{I3} 6
{I4} 2
{I5} 2
• The set of frequent 1-itemsets, L1 , consists of the candidate 1-
itemsets satisfying minimum support.
• In the first iteration of the algorithm, each item is a member of the set
of candidate.
Scan D for
count of each
candidate
Compare candidate
support count with
minimum support
count
C1 L1
Step 2: Generating 2-itemset Frequent Pattern
Itemset
{I1, I2}
{I1, I3}
{I1, I4}
{I1, I5}
{I2, I3}
{I2, I4}
{I2, I5}
{I3, I4}
{I3, I5}
{I4, I5}
Itemset Sup.
Count
{I1, I2} 4
{I1, I3} 4
{I1, I4} 1
{I1, I5} 2
{I2, I3} 4
{I2, I4} 2
{I2, I5} 2
{I3, I4} 0
{I3, I5} 1
{I4, I5} 0
Itemset Sup
Count
{I1, I2} 4
{I1, I3} 4
{I1, I5} 2
{I2, I3} 4
{I2, I4} 2
{I2, I5} 2
Generate
C2
candidates
from L1
C2
C2
L2
Scan D for
count of
each
candidate
Compare
candidate
support
count with
minimum
support
count
Step 2: Generating 2-itemset Frequent Pattern
• To discover the set of frequent 2-itemsets, L2 , the
algorithm uses L1 Join L1 to generate a candidate set of
2-itemsets, C2.
• Next, the transactions in D are scanned and the support
count for each candidate itemset in C2 is accumulated
(as shown in the middle table).
• The set of frequent 2-itemsets, L2 , is then determined,
consisting of those candidate 2-itemsets in C2 having
minimum support.
• Note: We haven’t used Apriori Property yet.
Step 3: Generating 3-itemset Frequent Pattern
Itemset
{I1, I2, I3}
{I1, I2, I5}
Itemset Sup.
Count
{I1, I2, I3} 2
{I1, I2, I5} 2
Itemset Sup
Count
{I1, I2, I3} 2
{I1, I2, I5} 2
C3 C3
L3
Scan D for
count of
each
candidate
Compare
candidate
support
count with
min support
count
Scan D for
count of
each
candidate
• The generation of the set of candidate 3-itemsets, C3 , involves use of
the Apriori Property.
• In order to find C3, we compute L2 Join L2.
• C3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3,
I5}, {I2, I4, I5}}.
• Now, Join step is complete and Prune step will be used to reduce the
size of C3. Prune step helps to avoid heavy computation due to large Ck.
Step 3: Generating 3-itemset Frequent Pattern
• Based on the Apriori property that all subsets of a frequent itemset must
also be frequent, we can determine that four latter candidates cannot
possibly be frequent. How ?
• For example , lets take {I1, I2, I3}. The 2-item subsets of it are {I1, I2}, {I1,
I3} & {I2, I3}. Since all 2-item subsets of {I1, I2, I3} are members of L2, We
will keep {I1, I2, I3} in C3.
• Lets take another example of {I2, I3, I5} which shows how the pruning is
performed. The 2-item subsets are {I2, I3}, {I2, I5} & {I3,I5}.
• BUT, {I3, I5} is not a member of L2 and hence it is not frequent violating
Apriori Property. Thus We will have to remove {I2, I3, I5} from C3.
• Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of
result of Join operation for Pruning.
• Now, the transactions in D are scanned in order to determine L3, consisting
of those candidates 3-itemsets in C3 having minimum support.
Step 4: Generating 4-itemset Frequent Pattern
• The algorithm uses L3 Join L3 to generate a candidate
set of 4-itemsets, C4. Although the join results in {{I1, I2,
I3, I5}}, this itemset is pruned since its subset {{I2, I3, I5}}
is not frequent.
• Thus, C4 = φ , and algorithm terminates, having found
all of the frequent items. This completes our Apriori
Algorithm.
• What’s Next ?
These frequent itemsets will be used to generate strong
association rules ( where strong association rules satisfy
both minimum support & minimum confidence).
Step 5: Generating Association Rules from Frequent
Itemsets
• Procedure:
• For each frequent itemset “l”, generate all nonempty subsets
of l.
• For every nonempty subset s of l, output the rule “s (l-s)” if
support_count(l) / support_count(s) >= min_conf where
min_conf is minimum confidence threshold.
• Back To Example:
We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3},
{I2,I4}, {I2,I5}, {I1,I2,I3}, {I1,I2,I5}}.
– Lets take l = {I1,I2,I5}.
– Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.
Step 5: Generating Association Rules from
Frequent Itemsets
• Let minimum confidence threshold is , say 70%.
• The resulting association rules are shown below,
each listed with its confidence.
– R1: I1 ^ I2 I5
• Confidence = sc{I1,I2,I5}/sc{I1,I2} = 2/4 = 50%
• R1 is Rejected.
– R2: I1 ^ I5 I2
• Confidence = sc{I1,I2,I5}/sc{I1,I5} = 2/2 = 100%
• R2 is Selected.
– R3: I2 ^ I5 I1
• Confidence = sc{I1,I2,I5}/sc{I2,I5} = 2/2 = 100%
• R3 is Selected.
Step 5: Generating Association Rules from
Frequent Itemsets
– R4: I1 I2 ^ I5
• Confidence = sc{I1,I2,I5}/sc{I1} = 2/6 = 33%
• R4 is Rejected.
– R5: I2 I1 ^ I5
• Confidence = sc{I1,I2,I5}/{I2} = 2/7 = 29%
• R5 is Rejected.
– R6: I5 I1 ^ I2
• Confidence = sc{I1,I2,I5}/ {I5} = 2/2 = 100%
• R6 is Selected.
In this way, We have found three strong
association rules.
Methods to Improve Apriori’s Efficiency
• Hash-based itemset counting: A k-itemset whose corresponding
hashing bucket count is below the threshold cannot be frequent.
• Transaction reduction: A transaction that does not contain any
frequent k-itemset is useless in subsequent scans.
• Partitioning: Any itemset that is potentially frequent in DB must be
frequent in at least one of the partitions of DB.
• Sampling: mining on a subset of given data, lower support threshold
+ a method to determine the completeness.
• Dynamic itemset counting: add new candidate itemsets only when
all of their subsets are estimated to be frequent.
Mining Frequent Patterns Without Candidate
Generation
• Compress a large database into a compact, Frequent-
Pattern tree (FP-tree) structure
– highly condensed, but complete for frequent pattern
mining
– avoid costly database scans
• Develop an efficient, FP-tree-based frequent pattern
mining method
– A divide-and-conquer methodology: decompose
mining tasks into smaller ones
– Avoid candidate generation: sub-database test
only!
FP-Growth Method : An Example
• Consider the same previous
example of a database, D ,
consisting of 9 transactions.
• Suppose min. support count
required is 2 (i.e. min_sup =
2/9 = 22 % )
• The first scan of database is
same as Apriori, which derives
the set of 1-itemsets & their
support counts.
• The set of frequent items is
sorted in the order of
descending support count.
• The resulting set is denoted as
L = {I2:7, I1:6, I3:6, I4:2, I5:2}
TID List of Items
T100 I1, I2, I5
T100 I2, I4
T100 I2, I3
T100 I1, I2, I4
T100 I1, I3
T100 I2, I3
T100 I1, I3
T100 I1, I2 ,I3, I5
T100 I1, I2, I3
FP-Growth Method: Construction of FP-Tree
• First, create the root of the tree, labeled with “null”.
• Scan the database D a second time. (First time we scanned it to
create 1-itemset and then L).
• The items in each transaction are processed in L order (i.e. sorted
order).
• A branch is created for each transaction with items having their
support count separated by colon.
• Whenever the same node is encountered in another transaction, we
just increment the support count of the common node or Prefix.
• To facilitate tree traversal, an item header table is built so that each
item points to its occurrences in the tree via a chain of node-links.
• Now, The problem of mining frequent patterns in database is
transformed to that of mining the FP-Tree.
FP-Growth Method: Construction of FP-Tree
An FP-Tree that registers compressed, frequent pattern information
Item
Id
Sup
Count
Node-
link
I2 7
I1 6
I3 6
I4 2
I5 2
I2:7 I1:2
I1:4
I3:2 I4:1
I3:2 I4:1
null{}
I3:2
I5:1
I5:1
Mining the FP-Tree by Creating Conditional (sub)
pattern bases
Steps:
1. Start from each frequent length-1 pattern (as an initial suffix
pattern).
2. Construct its conditional pattern base which consists of the
set of prefix paths in the FP-Tree co-occurring with suffix
pattern.
3. Then, Construct its conditional FP-Tree & perform mining
on such a tree.
4. The pattern growth is achieved by concatenation of the
suffix pattern with the frequent patterns generated from a
conditional FP-Tree.
5. The union of all frequent patterns (generated by step 4)
gives the required frequent itemset.
FP-Tree Example Continued
Now, Following the above mentioned steps:
• Lets start from I5. The I5 is involved in 2 branches namely {I2 I1 I5: 1} and {I2
I1 I3 I5: 1}.
• Therefore considering I5 as suffix, its 2 corresponding prefix paths would be
{I2 I1: 1} and {I2 I1 I3: 1}, which forms its conditional pattern base.
Item Conditional pattern
base
Conditional
FP-Tree
Frequent pattern
generated
I5 {(I2 I1: 1),(I2 I1 I3: 1)} <I2:2 , I1:2> I2 I5:2, I1 I5:2, I2 I1 I5: 2
I4 {(I2 I1: 1),(I2: 1)} <I2: 2> I2 I4: 2
I3 {(I2 I1: 1),(I2: 2), (I1: 2)} <I2: 4, I1:
2>,<I1:2>
I2 I3:4, I1, I3: 2 , I2 I1 I3:
2
I2 {(I2: 4)} <I2: 4> I2 I1: 4
Mining the FP-Tree by creating conditional (sub) pattern bases
FP-Tree Example Continued
• Out of these, Only I1 & I2 is selected in the conditional FP-Tree
because I3 is not satisfying the minimum support count.
For I1 , support count in conditional pattern base = 1 + 1 = 2
For I2 , support count in conditional pattern base = 1 + 1 = 2
For I3, support count in conditional pattern base = 1
Thus support count for I3 is less than required min_sup which is 2
here.
• Now , We have conditional FP-Tree with us.
• All frequent pattern corresponding to suffix I5 are generated by
considering all possible combinations of I5 and conditional FP-Tree.
• The same procedure is applied to suffixes I4, I3 and I1.
• Note: I2 is not taken into consideration for suffix because it doesn’t
have any prefix at all.
Why Frequent Pattern Growth Fast ?
• Performance study shows
– FP-growth is an order of magnitude faster than Apriori,
and is also faster than tree-projection
• Reasoning
– No candidate generation, no candidate test
– Use compact data structure
– Eliminate repeated database scan
– Basic operation is counting and FP-tree building

More Related Content

What's hot

Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
hina firdaus
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Acad
 
Fp growth
Fp growthFp growth
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
Data Structures (CS8391)
Data Structures (CS8391)Data Structures (CS8391)
Data Structures (CS8391)
Elavarasi K
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Presentation on Elementary data structures
Presentation on Elementary data structuresPresentation on Elementary data structures
Presentation on Elementary data structures
Kuber Chandra
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Junghoon Kim
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
Sulman Ahmed
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
KU Leuven
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
Shani729
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Gangadhar S
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
deepti92pawar
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
Albert Orriols-Puig
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
Salah Amean
 
Algorithms Lecture 8: Pattern Algorithms
Algorithms Lecture 8: Pattern AlgorithmsAlgorithms Lecture 8: Pattern Algorithms
Algorithms Lecture 8: Pattern Algorithms
Mohamed Loey
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 

What's hot (20)

Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Fp growth
Fp growthFp growth
Fp growth
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Data Structures (CS8391)
Data Structures (CS8391)Data Structures (CS8391)
Data Structures (CS8391)
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Presentation on Elementary data structures
Presentation on Elementary data structuresPresentation on Elementary data structures
Presentation on Elementary data structures
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Algorithms Lecture 8: Pattern Algorithms
Algorithms Lecture 8: Pattern AlgorithmsAlgorithms Lecture 8: Pattern Algorithms
Algorithms Lecture 8: Pattern Algorithms
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 

Viewers also liked

Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Ashis Kumar Chanda
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
Albert Orriols-Puig
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
Prof.Nilesh Magar
 
Associative Learning
Associative LearningAssociative Learning
Associative Learning
Indrajit Sreemany
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
Wan Aezwani Wab
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
Yanchang Zhao
 
Data mining
Data miningData mining
Data mining
Akannsha Totewar
 
02 Data Mining
02 Data Mining02 Data Mining
Association Rule Mining in Data Mining
Association Rule Mining in Data Mining Association Rule Mining in Data Mining
Association Rule Mining in Data Mining
Ayesha Ali
 
Ch 9-2.Machine Learning: Symbol-based[new]
Ch 9-2.Machine Learning: Symbol-based[new]Ch 9-2.Machine Learning: Symbol-based[new]
Ch 9-2.Machine Learning: Symbol-based[new]
butest
 
Data Mining (Predict The Future)
Data Mining (Predict The Future)Data Mining (Predict The Future)
Data Mining (Predict The Future)
Daffodil International University
 
Mining Data from Reservoir Simulation Result
Mining Data from Reservoir Simulation ResultMining Data from Reservoir Simulation Result
Mining Data from Reservoir Simulation Result
akmaltk96
 
Association
AssociationAssociation
Association
ZachariaJ
 
05. k means clustering ( k-means 클러스터링)
05. k means clustering ( k-means 클러스터링)05. k means clustering ( k-means 클러스터링)
05. k means clustering ( k-means 클러스터링)
Jeonghun Yoon
 
Data preprocessing ppt1
Data preprocessing ppt1Data preprocessing ppt1
Data preprocessing ppt1
meenas06
 
Seminar Association Rules
Seminar Association RulesSeminar Association Rules
Seminar Association Rules
alrazgi
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
Deepa Jeya
 
Data mining- Association Analysis -market basket
Data mining- Association Analysis -market basketData mining- Association Analysis -market basket
Data mining- Association Analysis -market basket
Swapnil Soni
 
Frequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigDataFrequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigData
Raju Gupta
 
Les algorithmes de génération des règles d association
Les algorithmes de génération des règles d associationLes algorithmes de génération des règles d association
Les algorithmes de génération des règles d association
Hajer Trabelsi
 

Viewers also liked (20)

Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Associative Learning
Associative LearningAssociative Learning
Associative Learning
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
 
Data mining
Data miningData mining
Data mining
 
02 Data Mining
02 Data Mining02 Data Mining
02 Data Mining
 
Association Rule Mining in Data Mining
Association Rule Mining in Data Mining Association Rule Mining in Data Mining
Association Rule Mining in Data Mining
 
Ch 9-2.Machine Learning: Symbol-based[new]
Ch 9-2.Machine Learning: Symbol-based[new]Ch 9-2.Machine Learning: Symbol-based[new]
Ch 9-2.Machine Learning: Symbol-based[new]
 
Data Mining (Predict The Future)
Data Mining (Predict The Future)Data Mining (Predict The Future)
Data Mining (Predict The Future)
 
Mining Data from Reservoir Simulation Result
Mining Data from Reservoir Simulation ResultMining Data from Reservoir Simulation Result
Mining Data from Reservoir Simulation Result
 
Association
AssociationAssociation
Association
 
05. k means clustering ( k-means 클러스터링)
05. k means clustering ( k-means 클러스터링)05. k means clustering ( k-means 클러스터링)
05. k means clustering ( k-means 클러스터링)
 
Data preprocessing ppt1
Data preprocessing ppt1Data preprocessing ppt1
Data preprocessing ppt1
 
Seminar Association Rules
Seminar Association RulesSeminar Association Rules
Seminar Association Rules
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
Data mining- Association Analysis -market basket
Data mining- Association Analysis -market basketData mining- Association Analysis -market basket
Data mining- Association Analysis -market basket
 
Frequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigDataFrequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigData
 
Les algorithmes de génération des règles d association
Les algorithmes de génération des règles d associationLes algorithmes de génération des règles d association
Les algorithmes de génération des règles d association
 

Similar to Apriori

apriori algo.pptx for frequent itemset..
apriori algo.pptx for frequent itemset..apriori algo.pptx for frequent itemset..
apriori algo.pptx for frequent itemset..
NidhiGupta899987
 
Apriori Algorith with example
Apriori Algorith with exampleApriori Algorith with example
Apriori Algorith with example
PVKoteswaraRaoAsstPr
 
Data mining ..... Association rule mining
Data mining ..... Association rule miningData mining ..... Association rule mining
Data mining ..... Association rule mining
ShaimaaMohamedGalal
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Utkarsh Sharma
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
Rashi Agarwal
 
Association in Frequent Pattern Mining
Association in Frequent Pattern MiningAssociation in Frequent Pattern Mining
Association in Frequent Pattern Mining
ShreeaBose
 
Association rules by arpit_sharma
Association rules by arpit_sharmaAssociation rules by arpit_sharma
Association rules by arpit_sharma
Er. Arpit Sharma
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
ssuser957b41
 
Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining ProcedureDiscovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining Procedure
IOSR Journals
 
APRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptxAPRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptx
SABITHARASSISTANTPRO
 
Ijcatr04051008
Ijcatr04051008Ijcatr04051008
Ijcatr04051008
Editor IJCATR
 
Apriority and fpgrowth algorithms
Apriority and fpgrowth algorithmsApriority and fpgrowth algorithms
Apriority and fpgrowth algorithms
balaji_selvaraj
 
Apriori.pptx
Apriori.pptxApriori.pptx
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
ramya marichamy
 
6 module 4
6 module 46 module 4
6 module 4
tafosepsdfasg
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
NIMMYRAJU
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
malathieswaran29
 
1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2
Krish_ver2
 
Interval intersection
Interval intersectionInterval intersection
Interval intersection
Aabida Noman
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
raju980973
 

Similar to Apriori (20)

apriori algo.pptx for frequent itemset..
apriori algo.pptx for frequent itemset..apriori algo.pptx for frequent itemset..
apriori algo.pptx for frequent itemset..
 
Apriori Algorith with example
Apriori Algorith with exampleApriori Algorith with example
Apriori Algorith with example
 
Data mining ..... Association rule mining
Data mining ..... Association rule miningData mining ..... Association rule mining
Data mining ..... Association rule mining
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
 
Association in Frequent Pattern Mining
Association in Frequent Pattern MiningAssociation in Frequent Pattern Mining
Association in Frequent Pattern Mining
 
Association rules by arpit_sharma
Association rules by arpit_sharmaAssociation rules by arpit_sharma
Association rules by arpit_sharma
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining ProcedureDiscovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining Procedure
 
APRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptxAPRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptx
 
Ijcatr04051008
Ijcatr04051008Ijcatr04051008
Ijcatr04051008
 
Apriority and fpgrowth algorithms
Apriority and fpgrowth algorithmsApriority and fpgrowth algorithms
Apriority and fpgrowth algorithms
 
Apriori.pptx
Apriori.pptxApriori.pptx
Apriori.pptx
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
6 module 4
6 module 46 module 4
6 module 4
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
 
1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2
 
Interval intersection
Interval intersectionInterval intersection
Interval intersection
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
 

Recently uploaded

New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
awadeshbabu
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
nooriasukmaningtyas
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 

Recently uploaded (20)

New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 

Apriori

  • 1. APRIORI Algorithm Professor Anita Wasilewska Lecture Notes
  • 2. The Apriori Algorithm: Basics The Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules. Key Concepts : • Frequent Itemsets: The sets of item which has minimum support (denoted by Li for ith-Itemset). • Apriori Property: Any subset of frequent itemset must be frequent. • Join Operation: To find Lk , a set of candidate k-itemsets is generated by joining Lk-1 with itself.
  • 3. The Apriori Algorithm in a Nutshell • Find the frequent itemsets: the sets of items that have minimum support – A subset of a frequent itemset must also be a frequent itemset • i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset – Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) • Use the frequent itemsets to generate association rules.
  • 4. The Apriori Algorithm : Pseudo code • Join Step: Ck is generated by joining Lk-1with itself • Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset • Pseudo-code: Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=∅; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return ∪k Lk;
  • 5. The Apriori Algorithm: Example • Consider a database, D , consisting of 9 transactions. • Suppose min. support count required is 2 (i.e. min_sup = 2/9 = 22 % ) • Let minimum confidence required is 70%. • We have to first find out the frequent itemset using Apriori algorithm. • Then, Association rules will be generated using min. support & min. confidence. TID List of Items T100 I1, I2, I5 T100 I2, I4 T100 I2, I3 T100 I1, I2, I4 T100 I1, I3 T100 I2, I3 T100 I1, I3 T100 I1, I2 ,I3, I5 T100 I1, I2, I3
  • 6. Step 1: Generating 1-itemset Frequent Pattern Itemset Sup.Count {I1} 6 {I2} 7 {I3} 6 {I4} 2 {I5} 2 Itemset Sup.Count {I1} 6 {I2} 7 {I3} 6 {I4} 2 {I5} 2 • The set of frequent 1-itemsets, L1 , consists of the candidate 1- itemsets satisfying minimum support. • In the first iteration of the algorithm, each item is a member of the set of candidate. Scan D for count of each candidate Compare candidate support count with minimum support count C1 L1
  • 7. Step 2: Generating 2-itemset Frequent Pattern Itemset {I1, I2} {I1, I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5} Itemset Sup. Count {I1, I2} 4 {I1, I3} 4 {I1, I4} 1 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 {I2, I5} 2 {I3, I4} 0 {I3, I5} 1 {I4, I5} 0 Itemset Sup Count {I1, I2} 4 {I1, I3} 4 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 {I2, I5} 2 Generate C2 candidates from L1 C2 C2 L2 Scan D for count of each candidate Compare candidate support count with minimum support count
  • 8. Step 2: Generating 2-itemset Frequent Pattern • To discover the set of frequent 2-itemsets, L2 , the algorithm uses L1 Join L1 to generate a candidate set of 2-itemsets, C2. • Next, the transactions in D are scanned and the support count for each candidate itemset in C2 is accumulated (as shown in the middle table). • The set of frequent 2-itemsets, L2 , is then determined, consisting of those candidate 2-itemsets in C2 having minimum support. • Note: We haven’t used Apriori Property yet.
  • 9. Step 3: Generating 3-itemset Frequent Pattern Itemset {I1, I2, I3} {I1, I2, I5} Itemset Sup. Count {I1, I2, I3} 2 {I1, I2, I5} 2 Itemset Sup Count {I1, I2, I3} 2 {I1, I2, I5} 2 C3 C3 L3 Scan D for count of each candidate Compare candidate support count with min support count Scan D for count of each candidate • The generation of the set of candidate 3-itemsets, C3 , involves use of the Apriori Property. • In order to find C3, we compute L2 Join L2. • C3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}. • Now, Join step is complete and Prune step will be used to reduce the size of C3. Prune step helps to avoid heavy computation due to large Ck.
  • 10. Step 3: Generating 3-itemset Frequent Pattern • Based on the Apriori property that all subsets of a frequent itemset must also be frequent, we can determine that four latter candidates cannot possibly be frequent. How ? • For example , lets take {I1, I2, I3}. The 2-item subsets of it are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-item subsets of {I1, I2, I3} are members of L2, We will keep {I1, I2, I3} in C3. • Lets take another example of {I2, I3, I5} which shows how the pruning is performed. The 2-item subsets are {I2, I3}, {I2, I5} & {I3,I5}. • BUT, {I3, I5} is not a member of L2 and hence it is not frequent violating Apriori Property. Thus We will have to remove {I2, I3, I5} from C3. • Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of result of Join operation for Pruning. • Now, the transactions in D are scanned in order to determine L3, consisting of those candidates 3-itemsets in C3 having minimum support.
  • 11. Step 4: Generating 4-itemset Frequent Pattern • The algorithm uses L3 Join L3 to generate a candidate set of 4-itemsets, C4. Although the join results in {{I1, I2, I3, I5}}, this itemset is pruned since its subset {{I2, I3, I5}} is not frequent. • Thus, C4 = φ , and algorithm terminates, having found all of the frequent items. This completes our Apriori Algorithm. • What’s Next ? These frequent itemsets will be used to generate strong association rules ( where strong association rules satisfy both minimum support & minimum confidence).
  • 12. Step 5: Generating Association Rules from Frequent Itemsets • Procedure: • For each frequent itemset “l”, generate all nonempty subsets of l. • For every nonempty subset s of l, output the rule “s (l-s)” if support_count(l) / support_count(s) >= min_conf where min_conf is minimum confidence threshold. • Back To Example: We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3}, {I2,I4}, {I2,I5}, {I1,I2,I3}, {I1,I2,I5}}. – Lets take l = {I1,I2,I5}. – Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.
  • 13. Step 5: Generating Association Rules from Frequent Itemsets • Let minimum confidence threshold is , say 70%. • The resulting association rules are shown below, each listed with its confidence. – R1: I1 ^ I2 I5 • Confidence = sc{I1,I2,I5}/sc{I1,I2} = 2/4 = 50% • R1 is Rejected. – R2: I1 ^ I5 I2 • Confidence = sc{I1,I2,I5}/sc{I1,I5} = 2/2 = 100% • R2 is Selected. – R3: I2 ^ I5 I1 • Confidence = sc{I1,I2,I5}/sc{I2,I5} = 2/2 = 100% • R3 is Selected.
  • 14. Step 5: Generating Association Rules from Frequent Itemsets – R4: I1 I2 ^ I5 • Confidence = sc{I1,I2,I5}/sc{I1} = 2/6 = 33% • R4 is Rejected. – R5: I2 I1 ^ I5 • Confidence = sc{I1,I2,I5}/{I2} = 2/7 = 29% • R5 is Rejected. – R6: I5 I1 ^ I2 • Confidence = sc{I1,I2,I5}/ {I5} = 2/2 = 100% • R6 is Selected. In this way, We have found three strong association rules.
  • 15. Methods to Improve Apriori’s Efficiency • Hash-based itemset counting: A k-itemset whose corresponding hashing bucket count is below the threshold cannot be frequent. • Transaction reduction: A transaction that does not contain any frequent k-itemset is useless in subsequent scans. • Partitioning: Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DB. • Sampling: mining on a subset of given data, lower support threshold + a method to determine the completeness. • Dynamic itemset counting: add new candidate itemsets only when all of their subsets are estimated to be frequent.
  • 16. Mining Frequent Patterns Without Candidate Generation • Compress a large database into a compact, Frequent- Pattern tree (FP-tree) structure – highly condensed, but complete for frequent pattern mining – avoid costly database scans • Develop an efficient, FP-tree-based frequent pattern mining method – A divide-and-conquer methodology: decompose mining tasks into smaller ones – Avoid candidate generation: sub-database test only!
  • 17. FP-Growth Method : An Example • Consider the same previous example of a database, D , consisting of 9 transactions. • Suppose min. support count required is 2 (i.e. min_sup = 2/9 = 22 % ) • The first scan of database is same as Apriori, which derives the set of 1-itemsets & their support counts. • The set of frequent items is sorted in the order of descending support count. • The resulting set is denoted as L = {I2:7, I1:6, I3:6, I4:2, I5:2} TID List of Items T100 I1, I2, I5 T100 I2, I4 T100 I2, I3 T100 I1, I2, I4 T100 I1, I3 T100 I2, I3 T100 I1, I3 T100 I1, I2 ,I3, I5 T100 I1, I2, I3
  • 18. FP-Growth Method: Construction of FP-Tree • First, create the root of the tree, labeled with “null”. • Scan the database D a second time. (First time we scanned it to create 1-itemset and then L). • The items in each transaction are processed in L order (i.e. sorted order). • A branch is created for each transaction with items having their support count separated by colon. • Whenever the same node is encountered in another transaction, we just increment the support count of the common node or Prefix. • To facilitate tree traversal, an item header table is built so that each item points to its occurrences in the tree via a chain of node-links. • Now, The problem of mining frequent patterns in database is transformed to that of mining the FP-Tree.
  • 19. FP-Growth Method: Construction of FP-Tree An FP-Tree that registers compressed, frequent pattern information Item Id Sup Count Node- link I2 7 I1 6 I3 6 I4 2 I5 2 I2:7 I1:2 I1:4 I3:2 I4:1 I3:2 I4:1 null{} I3:2 I5:1 I5:1
  • 20. Mining the FP-Tree by Creating Conditional (sub) pattern bases Steps: 1. Start from each frequent length-1 pattern (as an initial suffix pattern). 2. Construct its conditional pattern base which consists of the set of prefix paths in the FP-Tree co-occurring with suffix pattern. 3. Then, Construct its conditional FP-Tree & perform mining on such a tree. 4. The pattern growth is achieved by concatenation of the suffix pattern with the frequent patterns generated from a conditional FP-Tree. 5. The union of all frequent patterns (generated by step 4) gives the required frequent itemset.
  • 21. FP-Tree Example Continued Now, Following the above mentioned steps: • Lets start from I5. The I5 is involved in 2 branches namely {I2 I1 I5: 1} and {I2 I1 I3 I5: 1}. • Therefore considering I5 as suffix, its 2 corresponding prefix paths would be {I2 I1: 1} and {I2 I1 I3: 1}, which forms its conditional pattern base. Item Conditional pattern base Conditional FP-Tree Frequent pattern generated I5 {(I2 I1: 1),(I2 I1 I3: 1)} <I2:2 , I1:2> I2 I5:2, I1 I5:2, I2 I1 I5: 2 I4 {(I2 I1: 1),(I2: 1)} <I2: 2> I2 I4: 2 I3 {(I2 I1: 1),(I2: 2), (I1: 2)} <I2: 4, I1: 2>,<I1:2> I2 I3:4, I1, I3: 2 , I2 I1 I3: 2 I2 {(I2: 4)} <I2: 4> I2 I1: 4 Mining the FP-Tree by creating conditional (sub) pattern bases
  • 22. FP-Tree Example Continued • Out of these, Only I1 & I2 is selected in the conditional FP-Tree because I3 is not satisfying the minimum support count. For I1 , support count in conditional pattern base = 1 + 1 = 2 For I2 , support count in conditional pattern base = 1 + 1 = 2 For I3, support count in conditional pattern base = 1 Thus support count for I3 is less than required min_sup which is 2 here. • Now , We have conditional FP-Tree with us. • All frequent pattern corresponding to suffix I5 are generated by considering all possible combinations of I5 and conditional FP-Tree. • The same procedure is applied to suffixes I4, I3 and I1. • Note: I2 is not taken into consideration for suffix because it doesn’t have any prefix at all.
  • 23. Why Frequent Pattern Growth Fast ? • Performance study shows – FP-growth is an order of magnitude faster than Apriori, and is also faster than tree-projection • Reasoning – No candidate generation, no candidate test – Use compact data structure – Eliminate repeated database scan – Basic operation is counting and FP-tree building