Direct Hashing and Pruning Algorithm in Data MIning.pdf
1.
Direct Hashing andPruning (DHP)
Consider the following database containing five transactions with min_sup=50%. Find frequent
itemset with the help of Direct Hashing and Pruning.
TID Item
T1 Bread, Cheese, Eggs, Juice
T2 Bread, Cheese, Juice
T3 Bread, Milk, Yogurt
T4 Bread, Juice, Milk
T5 Cheese, Juice, Milk
Solution
Let,
Beard=B, Cheese=C, Eggs=E, Juice=J, Milk=M, Yogurt=Y. Then the database becomes
TID Item Itemset
T1 B, C, E, J (B, C), (B, E), (B, J), (C, E), (C, J), (E, J)
T2 B, C, J (B, C), (B, J), (C, J)
T3 B, M, Y (B, M), (B, Y), (M, Y)
T4 B, J, M (B, J), (B, M), (J, M)
T5 C, J, M (C, J), (C, M), (J, M)
Given that, min_sup=50%
Therefore, Sup_count=(50x5)/100=2.5=3
C1:
Item Sup_count
B 4
C 3
E 1
J 4
M 3
Y 1
Now, we assign a serial number for each item.
B=1, C=2, E=3, J=4, M=5, Y=6
For BC For BE For BJ For CE
H(x)=12%8=4 H(x)=13%8=5 H(x)=14%8=6 H(x)=23%8=7
For CJ For EJ For BM For BY
H(x)=24%8=0 H(x)=34%8=2 H(x)=15%8=7 H(x)=16%8=0
2.
For MY ForJM For CM
H(x)=56%8=0 H(x)=45%8=5 H(x)=25%8=1
Bit Vector Bucket No Count Pair C2
1 0 3+1+1=5 (C, J), (B, Y), (M, Y) (C, J)
0 1 1 (C, M)
0 2 1 (E, J)
0 3 0
0 4 2 (B, C)
1 5 1+2=3 (B, E), (J, M) (J, M)
1 6 3 (B, J) (B, J)
1 7 1+2=3 (C, E), (B, M) (B, M)
Now, the database will be
TID Item Itemset
T1 B, C, E, J (B, J), (C, J)
T2 B, C, J (B, J), (C, J)
T3 B, M, Y (B, M)
T4 B, J, M (B, J), (B, M), (J, M)
T5 C, J, M (C, J), (J, M)
C2 Sup_count L2
(B, J) 3 (B, J)
(C, J) 3 (C, J)
(B, M) 2
(J, M) 2