Dynamic itemset counting

Dynamic Itemset Counting
Presented by : Atefeh Rahimi
Bahareh Hajihashemi
Adviser : Dr. Vahidipour
December 2017
1

• The “market-basket” Problem
• Given a set of items and a large collection of transactions
which are subsets (baskets) of these items.
• What is the relationships between the presence of various
items within those baskets?
2
The Problem
TID Items
1 Milk, Bread
2 Milk, Bread, Eggs
3 Milk, Beer
4 Milk, Eggs, Beer

•Frequent itemset generation
• Apriori Dynamic Itemset Counting(DIC)
•Implication rules generation by a “threshold”
• Confidence Conviction
3
Mining association rules

4
DIC Algorithm
• Why do we have to wait till the end of the pass?
• DIC allows us to start counting an itemset as soon as
we suspect it may be necessary to count it.

5
The Apriori Algorithm — Example
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database D itemset sup.
{1} 2
{2} 3
{3} 3
{4} 1
{5} 3
itemset sup.
{1} 2
{2} 3
{3} 3
{5} 3
Scan D
C1
L1
itemset
{1 2}
{1 3}
{1 5}
{2 3}
{2 5}
{3 5}
itemset sup
{1 2} 1
{1 3} 2
{1 5} 1
{2 3} 2
{2 5} 3
{3 5} 2
itemset sup
{1 3} 2
{2 3} 2
{2 5} 3
{3 5} 2
L2
C2 C2
Scan D
C3 L3
itemset
{2 3 5} Scan D
itemset sup
{2 3 5} 2

7
DIC Algorithm
Itemsets are marked in different ways
• Solid box : confirmed large itemsets
• Solid circle: confirmed small itemsets
• Dashed box: suspected large itemsets
• Dashed circle: suspected small itemsets

8
• Mark the empty itemset with a solid square.
• Mark all the 1-itemsets with dashed circles
• Leave all other itemsets unmarked.
DIC Algorithm

9
while any dashed items set remain:
1.read M transactions for each transaction increment the respective counters
for the itemsets that appear in the transaction and are marked with dashes.
DIC Algorithm

10
DIC Algorithm
2-if a dashed circles count exceeds minsupp, turn it into a dashed Square if
any immediate superset of it has all of its subsets as solid or dashed squares
add a new counter for it and make it a dashed circle.

a =3+2=5 , b=3+3=6 , c=3+2=5 ,d=5+4=9 , e=4+2=6, ab=1 , ac=1, ad=1, ae=1, bc=1, bd=2,
be=1, cd=1, ce=0 ,de=2
11
3-If a dashed itemset has been counted through all the transactions make it solid and
stop counting it.
DIC Algorithm

ab=3 , ac=2, ad=4, ae=4, bc=3, bd=5, be=4, cd=4, ce=2 ,de=6,
adc=0,adb=0, abe=0,…,cde=0 12
DIC Algorithm
4-if we are at the end of the transaction file, rewind to the beginning.
5-if any that item sets remain go to step one.

13
abc=1, abd=0, ade=1, acd=0, ace=0, ade=0, bcd=0, bce=0,
bde=1, cde=0
DIC Algorithm

14
abc=1, abd=0, ade=0, acd=0, ace=0, ade=4, bcd=0, bce=0,
bde=3, cde=0, adbe=0
DIC Algorithm

17
• Solution : Randomness.
• Randomize order of how to read transactions.
• every pass must be the same order.
• it may be expensive to do
Homogeneous data

18
• Parallelism
• incremental updates
Extension to DIC

• Divide the database among the nodes and to have each node
count all the itemsets for its own data segment
• DIC can dynamically in incorporate new itemsets to be
added, it is not necessary to wait.
• Nodes can proceed to count the itemsets they suspect are
candidates and make adjustments as they get more results
from other nodes.
19
Parallelism

• Handling incremental updates involves two things: detecting
when a large itemset becomes small and detecting when a
small itemsets becomes large.
• if a small itemset becomes large. we must count over the
entire day data, not just the update. Therefore, when we
determine that a new itemset that must be counted. we must
go back and count it over the prefix of the data that we
missed.
20
Incremental update

Dynamic itemset counting

Recommended

Recommended

More Related Content

Similar to Dynamic itemset counting

Similar to Dynamic itemset counting (20)

Recently uploaded

Recently uploaded (20)

Dynamic itemset counting