Direct Hashing and Pruning (DHP)
Consider the following database containing five transactions with min_sup=50%. Find frequent
itemset with the help of Direct Hashing and Pruning.
TID Item
T1 Bread, Cheese, Eggs, Juice
T2 Bread, Cheese, Juice
T3 Bread, Milk, Yogurt
T4 Bread, Juice, Milk
T5 Cheese, Juice, Milk
Solution
Let,
Beard=B, Cheese=C, Eggs=E, Juice=J, Milk=M, Yogurt=Y. Then the database becomes
TID Item Itemset
T1 B, C, E, J (B, C), (B, E), (B, J), (C, E), (C, J), (E, J)
T2 B, C, J (B, C), (B, J), (C, J)
T3 B, M, Y (B, M), (B, Y), (M, Y)
T4 B, J, M (B, J), (B, M), (J, M)
T5 C, J, M (C, J), (C, M), (J, M)
Given that, min_sup=50%
Therefore, Sup_count=(50x5)/100=2.5=3
C1:
Item Sup_count
B 4
C 3
E 1
J 4
M 3
Y 1
Now, we assign a serial number for each item.
B=1, C=2, E=3, J=4, M=5, Y=6
For BC For BE For BJ For CE
H(x)=12%8=4 H(x)=13%8=5 H(x)=14%8=6 H(x)=23%8=7
For CJ For EJ For BM For BY
H(x)=24%8=0 H(x)=34%8=2 H(x)=15%8=7 H(x)=16%8=0
For MY For JM For CM
H(x)=56%8=0 H(x)=45%8=5 H(x)=25%8=1
Bit Vector Bucket No Count Pair C2
1 0 3+1+1=5 (C, J), (B, Y), (M, Y) (C, J)
0 1 1 (C, M)
0 2 1 (E, J)
0 3 0
0 4 2 (B, C)
1 5 1+2=3 (B, E), (J, M) (J, M)
1 6 3 (B, J) (B, J)
1 7 1+2=3 (C, E), (B, M) (B, M)
Now, the database will be
TID Item Itemset
T1 B, C, E, J (B, J), (C, J)
T2 B, C, J (B, J), (C, J)
T3 B, M, Y (B, M)
T4 B, J, M (B, J), (B, M), (J, M)
T5 C, J, M (C, J), (J, M)
C2 Sup_count L2
(B, J) 3 (B, J)
(C, J) 3 (C, J)
(B, M) 2
(J, M) 2

Direct Hashing and Pruning Algorithm in Data MIning.pdf

  • 1.
    Direct Hashing andPruning (DHP) Consider the following database containing five transactions with min_sup=50%. Find frequent itemset with the help of Direct Hashing and Pruning. TID Item T1 Bread, Cheese, Eggs, Juice T2 Bread, Cheese, Juice T3 Bread, Milk, Yogurt T4 Bread, Juice, Milk T5 Cheese, Juice, Milk Solution Let, Beard=B, Cheese=C, Eggs=E, Juice=J, Milk=M, Yogurt=Y. Then the database becomes TID Item Itemset T1 B, C, E, J (B, C), (B, E), (B, J), (C, E), (C, J), (E, J) T2 B, C, J (B, C), (B, J), (C, J) T3 B, M, Y (B, M), (B, Y), (M, Y) T4 B, J, M (B, J), (B, M), (J, M) T5 C, J, M (C, J), (C, M), (J, M) Given that, min_sup=50% Therefore, Sup_count=(50x5)/100=2.5=3 C1: Item Sup_count B 4 C 3 E 1 J 4 M 3 Y 1 Now, we assign a serial number for each item. B=1, C=2, E=3, J=4, M=5, Y=6 For BC For BE For BJ For CE H(x)=12%8=4 H(x)=13%8=5 H(x)=14%8=6 H(x)=23%8=7 For CJ For EJ For BM For BY H(x)=24%8=0 H(x)=34%8=2 H(x)=15%8=7 H(x)=16%8=0
  • 2.
    For MY ForJM For CM H(x)=56%8=0 H(x)=45%8=5 H(x)=25%8=1 Bit Vector Bucket No Count Pair C2 1 0 3+1+1=5 (C, J), (B, Y), (M, Y) (C, J) 0 1 1 (C, M) 0 2 1 (E, J) 0 3 0 0 4 2 (B, C) 1 5 1+2=3 (B, E), (J, M) (J, M) 1 6 3 (B, J) (B, J) 1 7 1+2=3 (C, E), (B, M) (B, M) Now, the database will be TID Item Itemset T1 B, C, E, J (B, J), (C, J) T2 B, C, J (B, J), (C, J) T3 B, M, Y (B, M) T4 B, J, M (B, J), (B, M), (J, M) T5 C, J, M (C, J), (J, M) C2 Sup_count L2 (B, J) 3 (B, J) (C, J) 3 (C, J) (B, M) 2 (J, M) 2