1 I NAME OF PRESENTER
Apriori Algorithm
Ashis Kumar Chanda
Department of Computer Science and Engineering
University of Dhaka
2 I NAME OF PRESENTERCSE, DU2
Key concepts
oIntroduction
oFrequent Itemsets
oApriori Property
oJoin operation
oPrune operation
oDrawback
oImproving mechanism
3 I NAME OF PRESENTERCSE, DU3
Introduction
• Extracting hidden knowledge or pattern from
huge data is know as Data mining
• Find frequent itemsets, closed itemsets,
periodic patterns, assertion rule
• The First and main algorithm of Data mining
is Apriori to find frequent itemsets
4 I NAME OF PRESENTER
Apriori property: All nonempty subsets of a frequent
itemset must also be frequent
There is two steps:
1. The join step: To find 𝐿 𝑘, a set of candidate k-
itemsets is generated by joining 𝐿 𝑘 with itself
2. The prune step: 𝐶 𝑘 is a superset of 𝐿 𝑘, that is, its
members may or may not be frequent, but all of the
frequent k-itemsets are included in 𝐶 𝑘. A scan of the
database to determine the count of each candidate in 𝐶 𝑘
would result in the determination of 𝐿 𝑘
CSE, DU4
Algorithm
5 I NAME OF PRESENTERCSE, DU5
Original dataset
6 I NAME OF PRESENTERCSE, DU6
Customized dataset
Assuming
Mango=M Onion=O Nintendo=N Key-chain=K
Eggs=E Yo-yo=Y Doll=D Apple=A
Umbrella=U Corn=C Ice-cream=I
Considering each event with an unique character, we get
the database in a short view that given below
7 I NAME OF PRESENTERCSE, DU7
Finding support count
Fig: Result after scanning database first
time
8 I NAME OF PRESENTERCSE, DU8
Finding l1
Fig: Result after considering minimum
support
9 I NAME OF PRESENTERCSE, DU9
Finding c2
Fig: Result after L1*L1 join step
10 I NAME OF PRESENTERCSE, DU10
Finding L2
Fig: Result after pruning step of C2
dataset
11 I NAME OF PRESENTERCSE, DU11
Finding C3
Fig: Result after L2*L2 join step
12 I NAME OF PRESENTERCSE, DU12
Finding L3
Fig: Result after pruning step of C3
dataset
13 I NAME OF PRESENTERCSE, DU13
Uses
GSP(Generalized Sequential Patterns)
Spade(Sequential Pattern Discovery using
Equivalent classes)
14 I NAME OF PRESENTERCSE, DU14
Drawback
 Huge candidate set generation
Every event joins with all other events. If there is
‘e’ events in ith step, then total generated
candidate sets are: e*e
 Repeatedly scan the database
In every steps, this process need to scan whole
database to find frequency of a event
15 I NAME OF PRESENTERCSE, DU15
Improving mechanism
 Hash based technique
 Transaction reduction
 Partitioning
 Sampling
 Dynamic itemset counting
16 I NAME OF PRESENTERCSE, DU16
References
- Data Mining Concepts & Techniques
by J. Han & M. Kamber
- Database system Concept
by Abraham Sillberschatz, Korth, Sudarshan
- Lecture of Dr. S. Srinath
Institute of Technology at Madras, India

Apriori algorithm

  • 1.
    1 I NAMEOF PRESENTER Apriori Algorithm Ashis Kumar Chanda Department of Computer Science and Engineering University of Dhaka
  • 2.
    2 I NAMEOF PRESENTERCSE, DU2 Key concepts oIntroduction oFrequent Itemsets oApriori Property oJoin operation oPrune operation oDrawback oImproving mechanism
  • 3.
    3 I NAMEOF PRESENTERCSE, DU3 Introduction • Extracting hidden knowledge or pattern from huge data is know as Data mining • Find frequent itemsets, closed itemsets, periodic patterns, assertion rule • The First and main algorithm of Data mining is Apriori to find frequent itemsets
  • 4.
    4 I NAMEOF PRESENTER Apriori property: All nonempty subsets of a frequent itemset must also be frequent There is two steps: 1. The join step: To find 𝐿 𝑘, a set of candidate k- itemsets is generated by joining 𝐿 𝑘 with itself 2. The prune step: 𝐶 𝑘 is a superset of 𝐿 𝑘, that is, its members may or may not be frequent, but all of the frequent k-itemsets are included in 𝐶 𝑘. A scan of the database to determine the count of each candidate in 𝐶 𝑘 would result in the determination of 𝐿 𝑘 CSE, DU4 Algorithm
  • 5.
    5 I NAMEOF PRESENTERCSE, DU5 Original dataset
  • 6.
    6 I NAMEOF PRESENTERCSE, DU6 Customized dataset Assuming Mango=M Onion=O Nintendo=N Key-chain=K Eggs=E Yo-yo=Y Doll=D Apple=A Umbrella=U Corn=C Ice-cream=I Considering each event with an unique character, we get the database in a short view that given below
  • 7.
    7 I NAMEOF PRESENTERCSE, DU7 Finding support count Fig: Result after scanning database first time
  • 8.
    8 I NAMEOF PRESENTERCSE, DU8 Finding l1 Fig: Result after considering minimum support
  • 9.
    9 I NAMEOF PRESENTERCSE, DU9 Finding c2 Fig: Result after L1*L1 join step
  • 10.
    10 I NAMEOF PRESENTERCSE, DU10 Finding L2 Fig: Result after pruning step of C2 dataset
  • 11.
    11 I NAMEOF PRESENTERCSE, DU11 Finding C3 Fig: Result after L2*L2 join step
  • 12.
    12 I NAMEOF PRESENTERCSE, DU12 Finding L3 Fig: Result after pruning step of C3 dataset
  • 13.
    13 I NAMEOF PRESENTERCSE, DU13 Uses GSP(Generalized Sequential Patterns) Spade(Sequential Pattern Discovery using Equivalent classes)
  • 14.
    14 I NAMEOF PRESENTERCSE, DU14 Drawback  Huge candidate set generation Every event joins with all other events. If there is ‘e’ events in ith step, then total generated candidate sets are: e*e  Repeatedly scan the database In every steps, this process need to scan whole database to find frequency of a event
  • 15.
    15 I NAMEOF PRESENTERCSE, DU15 Improving mechanism  Hash based technique  Transaction reduction  Partitioning  Sampling  Dynamic itemset counting
  • 16.
    16 I NAMEOF PRESENTERCSE, DU16 References - Data Mining Concepts & Techniques by J. Han & M. Kamber - Database system Concept by Abraham Sillberschatz, Korth, Sudarshan - Lecture of Dr. S. Srinath Institute of Technology at Madras, India