Apriori algorithm

1 I NAME OF PRESENTER
Apriori Algorithm
Ashis Kumar Chanda
Department of Computer Science and Engineering
University of Dhaka

2 I NAME OF PRESENTERCSE, DU2
Key concepts
oIntroduction
oFrequent Itemsets
oApriori Property
oJoin operation
oPrune operation
oDrawback
oImproving mechanism

Introduction
• Extracting hidden knowledge or pattern from
huge data is know as Data mining
• Find frequent itemsets, closed itemsets,
periodic patterns, assertion rule
• The First and main algorithm of Data mining
is Apriori to find frequent itemsets

4 I NAME OF PRESENTER
Apriori property: All nonempty subsets of a frequent
itemset must also be frequent
There is two steps:
1. The join step: To find 𝐿 𝑘, a set of candidate k-
itemsets is generated by joining 𝐿 𝑘 with itself
2. The prune step: 𝐶 𝑘 is a superset of 𝐿 𝑘, that is, its
members may or may not be frequent, but all of the
frequent k-itemsets are included in 𝐶 𝑘. A scan of the
database to determine the count of each candidate in 𝐶 𝑘
would result in the determination of 𝐿 𝑘
CSE, DU4
Algorithm

Original dataset

Customized dataset
Assuming
Mango=M Onion=O Nintendo=N Key-chain=K
Eggs=E Yo-yo=Y Doll=D Apple=A
Umbrella=U Corn=C Ice-cream=I
Considering each event with an unique character, we get
the database in a short view that given below

Finding support count
Fig: Result after scanning database first
time

Finding l1
Fig: Result after considering minimum
support

Finding c2
Fig: Result after L1*L1 join step

Finding L2
Fig: Result after pruning step of C2
dataset

Finding C3
Fig: Result after L2*L2 join step

Finding L3
Fig: Result after pruning step of C3
dataset

Uses
GSP(Generalized Sequential Patterns)
Spade(Sequential Pattern Discovery using
Equivalent classes)

Drawback
 Huge candidate set generation
Every event joins with all other events. If there is
‘e’ events in ith step, then total generated
candidate sets are: e*e
 Repeatedly scan the database
In every steps, this process need to scan whole
database to find frequency of a event

Improving mechanism
 Hash based technique
 Transaction reduction
 Partitioning
 Sampling
 Dynamic itemset counting

References
- Data Mining Concepts & Techniques
by J. Han & M. Kamber
- Database system Concept
by Abraham Sillberschatz, Korth, Sudarshan
- Lecture of Dr. S. Srinath
Institute of Technology at Madras, India

Apriori algorithm

More Related Content

What's hot

Similar to Apriori algorithm

More from Ashis Kumar Chanda

Recently uploaded

Apriori algorithm