Direct Hashing and Pruning Algorithm in Data MIning.pdf

Direct Hashing and Pruning (DHP)
Consider the following database containing five transactions with min_sup=50%. Find frequent
itemset with the help of Direct Hashing and Pruning.
TID Item
T1 Bread, Cheese, Eggs, Juice
T2 Bread, Cheese, Juice
T3 Bread, Milk, Yogurt
T4 Bread, Juice, Milk
T5 Cheese, Juice, Milk
Solution
Let,
Beard=B, Cheese=C, Eggs=E, Juice=J, Milk=M, Yogurt=Y. Then the database becomes
TID Item Itemset
T1 B, C, E, J (B, C), (B, E), (B, J), (C, E), (C, J), (E, J)
T2 B, C, J (B, C), (B, J), (C, J)
T3 B, M, Y (B, M), (B, Y), (M, Y)
T4 B, J, M (B, J), (B, M), (J, M)
T5 C, J, M (C, J), (C, M), (J, M)
Given that, min_sup=50%
Therefore, Sup_count=(50x5)/100=2.5=3
C1:
Item Sup_count
B 4
C 3
E 1
J 4
M 3
Y 1
Now, we assign a serial number for each item.
B=1, C=2, E=3, J=4, M=5, Y=6
For BC For BE For BJ For CE
H(x)=12%8=4 H(x)=13%8=5 H(x)=14%8=6 H(x)=23%8=7
For CJ For EJ For BM For BY
H(x)=24%8=0 H(x)=34%8=2 H(x)=15%8=7 H(x)=16%8=0

For MY For JM For CM
H(x)=56%8=0 H(x)=45%8=5 H(x)=25%8=1
Bit Vector Bucket No Count Pair C2
1 0 3+1+1=5 (C, J), (B, Y), (M, Y) (C, J)
0 1 1 (C, M)
0 2 1 (E, J)
0 3 0
0 4 2 (B, C)
1 5 1+2=3 (B, E), (J, M) (J, M)
1 6 3 (B, J) (B, J)
1 7 1+2=3 (C, E), (B, M) (B, M)
Now, the database will be
TID Item Itemset
T1 B, C, E, J (B, J), (C, J)
T2 B, C, J (B, J), (C, J)
T3 B, M, Y (B, M)
T4 B, J, M (B, J), (B, M), (J, M)
T5 C, J, M (C, J), (J, M)
C2 Sup_count L2
(B, J) 3 (B, J)
(C, J) 3 (C, J)
(B, M) 2
(J, M) 2

Direct Hashing and Pruning Algorithm in Data MIning.pdf

More Related Content

What's hot

Similar to Direct Hashing and Pruning Algorithm in Data MIning.pdf

More from A. S. M. Shafi

Recently uploaded

Direct Hashing and Pruning Algorithm in Data MIning.pdf