SlideShare a Scribd company logo
Mining Non-Redundant Recurrent Rules from a Sequence Database
Yoon SeungYong
Ministry of Science and ICT, Republic of Korea
forcom@forcom.kr
- Efficient Mining of Recurrent Rules from a Sequence Database(Lo et al., DASFAA 2008)
- Parallel Mining of Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, ISIS 2017)
· A Parallel Algorithm for Mining Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, JACIII 2019)
- Towards Efficient Mining of Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, IWCIA 2017)
· Mining Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, IJCISTUDIES 2018)
- Efficient Mining of Recurrent Rules from a Sequence Database Using Multi-Core Processors(Yoon and Seki, SCIS&ISIS 2018)
- Bidirectional Mining of Non-Redundant Recurrent Rules from a Sequence Database(Lo et al., IEEE ICDE 2011)
- A New Algorithm for Mining Recurrent Rules from a Sequence Database(Seki and Yoon, IEEE SMC 2019)
Table of Contents
1. Motivation
2. Mining Non-Redundant Recurrent Rules (NR3) – Lo et al.
3. Parallel Mining of Non-Redundant Recurrent Rules (pNR3)
4. Loop-Fused Mining of NR3 (LF-NR3)
5. Parallel Loop-Fused Mining of NR3 (pLF-NR3)
6. Bidirectional Mining of NR3 (BOB) – Lo et al.
7. Interleaved Bidirectional Mining of NR3 (iBiRM)
8. Conclusion
2019.11.18. 2
Motivation
2019.11.18. 3
Sequence Database & Sequential Rule
 Transaction Histories
 Program Traces
2019.11.18. 4
Customer Movie Rental History
Alice Star Wars 4, Star Wars 5, Star Wars 6, Star Wars 1
Bob Shrek, Spirited Away, Your Name
Clara Spirited Away, Howl’s Moving Castle, Princess Mononoke
David Star Wars 1, Star Wars 2, Star Wars 3, Star Wars 4, Star Wars 5
Eve Your Name
Trace ID Command
1 check, lock, use, use, unlock, exit
2 check, lock, use, check, lock, use, unlock, exit
3 check, use, unlock, exit
4 check, lock, use
5 check, lock, use, unlock, check, lock, use, unlock, exit
〈Star Wars 4〉→ 〈Star Wars 5〉
〈lock〉→ 〈unlock〉
What is a recurrent rule?
 Recurrent Rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡
 “Whenever a series of precedent events occurs,
eventually another series of consequent events occurs”
 e.g., 𝑅 = ⟨check, lock⟩ → ⟨use, unlock⟩
“Whenever ⟨check, lock⟩ occurs, eventually ⟨use, unlock⟩ occurs”
 Captures temporal constraints that repeat a meaningful number of times
both within a sequence and across multiple sequences
 A sequential rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡 means “whenever a sequence is a super-sequence of
𝑅 𝑝𝑟𝑒, it will be a super-sequence of 𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡”
 Linear Temporal Logic (LTL)
 One of the most widely-used formalism for program verification
 Clarke, Edmund M., Orna Grumberg, and Doron Peled. Model checking. MIT press, 1999.
 Recurrent rule can be expressed in the form of LTL
2019.11.18. 5
- proposed by David LO
Mining Non-Redundant Recurrent Rules (NR3)
based on David LO, Siau-Cheng KHOO, NUS and Chao LIU, DASFAA, 2008
2019.11.18. 6
Preliminaries & Examples (1)
 a sequence database 𝑆𝑒𝑞𝐷𝐵 – a set of sequences : 𝑆1, 𝑆2, 𝑆3, 𝑆4, 𝑆5
 a set of events 𝐼 in 𝑆𝑒𝑞𝐷𝐵 : {check, exit, lock, unlock, use}
 a size of 𝑆𝑒𝑞𝐷𝐵 = 𝑆𝑒𝑞𝐷𝐵 : 𝑆𝑒𝑞𝐷𝐵 = 5
 a sequence 𝑆 = 𝑒1, 𝑒2, … , 𝑒 𝑛 ∶ 𝑆1 = ⟨check, lock, use, use, unlock, exit⟩
 a temporal point 𝑗 of 𝑒𝑗 in 𝑆 : an event of a temporal point 5 in 𝑆1 is unlock
 a length of 𝑆 = 𝑆 = 𝑛 : 𝑆1 = 6
 the last event of 𝑆 = 𝑙𝑎𝑠𝑡 𝑆 = 𝑆[𝑛] : 𝑙𝑎𝑠𝑡 𝑆1 = exit
 the j-prefix of 𝑆 = 𝑆 𝑗
= ⟨𝑒1, 𝑒2, … , 𝑒𝑗⟩ : 𝑆1
2
= ⟨check, lock⟩
2019.11.18. 7
SID Sequence
𝑆1 ⟨check, lock, use, use, unlock, exit⟩
𝑆2 ⟨check, lock, use, check, lock, use, unlock, exit⟩
𝑆3 ⟨check, use, unlock, exit⟩
𝑆4 ⟨check, lock, use⟩
𝑆5 ⟨check, lock, use, unlock, check, lock, use, unlock, exit⟩
an example sequence database 𝑆𝑒𝑞𝐷𝐵
Preliminaries & Examples (2)
 Given a sequence 𝑆 = ⟨𝑒1, … , 𝑒 𝑛⟩ and 𝑆′ = ⟨𝑒1
′
, … , 𝑒 𝑚
′ ⟩
 the concatenation of 𝑆 and 𝑆′
≔ 𝑆 ++𝑆′
= ⟨𝑒1, … , 𝑒 𝑛, 𝑒1
′
, … , 𝑒 𝑚
′
⟩
 𝑆 is a super-sequence of 𝑆′
≔ 𝑆 ⊒ 𝑆′
if 𝑒𝑖1
= 𝑒1
′
, … , 𝑒𝑖 𝑚
= 𝑒 𝑚
′
(1 ≤ 𝑖1 ≤ ⋯ ≤ 𝑖 𝑚 ≤ 𝑛)
 e.g., 𝑆1 ⊒ ⟨check, lock, unlock⟩ :
 𝑆 𝑗
is an instance of 𝑆′
in 𝑆, if 𝑆 𝑗
⊒ 𝑆′
and 𝑙𝑎𝑠𝑡 𝑆′
= 𝑆 𝑗
 𝑆 𝑗 is the minimum instance of 𝑆′ in 𝑆,
if 𝑆 𝑗 is an instance of 𝑆′ and ∄𝑘 < 𝑗, 𝑠. 𝑡. , 𝑆 𝑘 is an instance of 𝑆′
 e.g., 𝑆1
3
, 𝑆1
4
are instances of ⟨check, lock, use⟩ in 𝑆1, and 𝑆1
3
is the minimum
 𝑆5
9
is an instance of 𝑆1 in 𝑆5, and it is the minimum
2019.11.18. 8
SID Sequence
𝑆1 ⟨check, lock, use, use, unlock, exit⟩
𝑆2 ⟨check, lock, use, check, lock, use, unlock, exit⟩
𝑆3 ⟨check, use, unlock, exit⟩
𝑆4 ⟨check, lock, use⟩
𝑆5 ⟨check, lock, use, unlock, check, lock, use, unlock, exit⟩
𝑆1 = ⟨check, lock, use, use, unlock, exit⟩
an example sequence database 𝑆𝑒𝑞𝐷𝐵
Definitions & Examples (1)
 Given a sequence 𝑃 = ⟨lock, use⟩ and a sequence database 𝑆𝑒𝑞𝐷𝐵
 Consider a sequence database 𝑆𝑒𝑞𝐷𝐵 and a sequence 𝑃
 𝑆𝑒𝑞𝐷𝐵 projected on 𝑃
 𝑆𝑒𝑞𝐷𝐵 𝑃 = 𝑖, 𝑠𝑥 𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑝𝑥 is the minimum instance of 𝑃 }
 the sequence support 𝑠𝑢𝑝 𝑃, 𝑆𝑒𝑞𝐷𝐵 = 𝑆𝑒𝑞𝐷𝐵 𝑃
 𝑆𝑒𝑞𝐷𝐵 all-projected on 𝑃
 𝑆𝑒𝑞𝐷𝐵 𝑃
𝑎𝑙𝑙
= 𝑖, 𝑠𝑥 𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑝𝑥 is 𝐚𝐧 𝐢𝐧𝐬𝐭𝐚𝐧𝐜𝐞 of 𝑃 }
 the instance support 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑃, 𝑆𝑒𝑞𝐷𝐵 = |𝑆𝑒𝑞𝐷𝐵 𝑃
𝑎𝑙𝑙
|
2019.11.18. 9
SID Sequence
𝑆1 ⟨check, lock, use, use, unlock, exit⟩
𝑆2 ⟨check, lock, use, check, lock, use, unlock, exit⟩
𝑆3 ⟨check, use, unlock, exit⟩
𝑆4 ⟨check, lock, use⟩
𝑆5 ⟨check, lock, use, unlock, check, lock, use, unlock, exit⟩
SIDSequence
𝑆1 ⟨use, unlock, exit⟩
𝑆2 ⟨check, lock, use, unlock, exit⟩
𝑆4 ⟨⟩
𝑆5 ⟨unlock, check, lock, use, unlock, exit⟩
𝑆𝑒𝑞𝐷𝐵 𝑃
𝑠𝑢𝑝 𝑃, 𝑆𝑒𝑞𝐷𝐵 = 4
SIDSequence
𝑆1 ⟨use, unlock, exit⟩
𝑆1 ⟨unlock, exit⟩
𝑆2 ⟨check, lock, use, unlock, exit⟩
𝑆2 ⟨unlock, exit⟩
𝑆4 ⟨⟩
𝑆5 ⟨unlock, check, lock, use, unlock, exit⟩
𝑆5 ⟨unlock, exit⟩
𝑆𝑒𝑞𝐷𝐵 𝑃
𝑎𝑙𝑙
𝑠𝑢𝑝 𝑎𝑙𝑙
𝑃, 𝑆𝑒𝑞𝐷𝐵 = 7
Definitions & Examples (2)
 Consider a recurrent rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡 in a sequence database 𝑆𝑒𝑞𝐷𝐵
 the pre-condition 𝑅 𝑝𝑟𝑒, the post-condition 𝑅 𝑝𝑜𝑠𝑡
 the sequence support 𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝(𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵)
 the instance support 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙
(𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵)
 the confidence 𝑐𝑜𝑛𝑓 𝑅, 𝑆𝑒𝑞𝐷𝐵 =
sup 𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
𝑎𝑙𝑙
𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅 𝑝𝑟𝑒, 𝑆𝑒𝑞𝐷𝐵
=
𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
𝑎𝑙𝑙
𝑅 𝑝𝑜𝑠𝑡
𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
𝑎𝑙𝑙
 𝑅 is significant if 𝑠𝑢𝑝 𝑅,𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖 𝑛_𝑠𝑢𝑝, 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑅,𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖 𝑛_𝑠𝑢𝑝 𝑎𝑙𝑙
, 𝑐𝑜𝑛𝑓 𝑅,𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖 𝑛_𝑐𝑜𝑛𝑓
 Given a rule 𝑅 = ⟨lock, use⟩ → unlock and a sequence database 𝑆𝑒𝑞𝐷𝐵
 the sequence support 𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 ⟨lock, use, unlock⟩, 𝑆𝑒𝑞𝐷𝐵 = 3
 the instance support 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙 ⟨lock, use, unlock⟩, 𝑆𝑒𝑞𝐷𝐵 = 4
 the confidence 𝑐𝑜𝑛𝑓 𝑅, 𝑆𝑒𝑞𝐷𝐵 =
sup ⟨unlock⟩, 𝑆𝑒𝑞𝐷𝐵⟨lock,use⟩
𝑎𝑙𝑙
𝑠𝑢𝑝 𝑎𝑙𝑙 ⟨lock,use⟩, 𝑆𝑒𝑞𝐷𝐵
=
6
7
2019.11.18. 10
SID Sequence
𝑆1 ⟨check, lock, use, use, unlock, exit⟩
𝑆2 ⟨check, lock, use, check, lock, use, unlock, exit⟩
𝑆3 ⟨check, use, unlock, exit⟩
𝑆4 ⟨check, lock, use⟩
𝑆5 ⟨check, lock, use, unlock, check, lock, use, unlock, exit⟩
𝑆𝑒𝑞𝐷𝐵
→
Rule Redundancy
 Consider 𝑅 = ⟨check⟩ → ⟨lock, use, unlock⟩ and 𝑅′ = ⟨check⟩ → ⟨unlock⟩
with the same sequence/instance support and confidence
 Do we really need both these rules?
 Rule Redundancy
 A rule 𝑅′ = 𝑅 𝑝𝑟𝑒
′ → 𝑅 𝑝𝑜𝑠𝑡
′
is redundant if there is another rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡
1. the same sequence/instance support and confidence
2. 𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡 ⊒ 𝑅 𝑝𝑟𝑒
′
++𝑅 𝑝𝑜𝑠𝑡
′
(R is longer than R’)
 Mining Non-Redundant Recurrent Rules
 Mine pruned pre/post-conditions using modified BIDE (LS-Set miner)
 BIDE : frequent closed sequence mining algorithm based on pattern-growth strategy
 Wang, Jianyong, and Jiawei Han. "BIDE: Efficient mining of frequent closed sequences." Data Engineering, 2004.
Proceedings. 20th International Conference on. IEEE, 2004.
2019.11.18. 11
𝑆 = ⟨check, lock, use, unlock⟩
FS-Set, CS-Set, LS-Set
 The set of frequent sequential pattern (FS-Set)
 𝐹𝑆 = {𝑠| support 𝑠 ≥ min_sup}
 The set of closed frequent sequential pattern (CS-Set)
 𝐶𝑆 = {𝑠|𝑠 ∈ 𝐹𝑆 𝑎𝑛𝑑 ∄𝑠′
∈ 𝐹𝑆, 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑠 ⊑ 𝑠′
𝑎𝑛𝑑 support 𝑠 = support 𝑠′
}
 Project Database Closed Set (LS-Set)
 𝐿𝑆 = {𝑠| support 𝑠 ≥ min_sup 𝑎𝑛𝑑 ∄𝑠′
, 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑠 ⊑ 𝑠′
𝑎𝑛𝑑 𝑆𝑒𝑞𝐷𝐵𝑠 = 𝑆𝑒𝑞𝐷𝐵 𝑠′}
 cf. 𝑆𝑒𝑞𝐷𝐵𝑠 = 𝑆𝑒𝑞𝐷𝐵 𝑠′ ⇔ 𝑆𝑒𝑞𝐷𝐵𝑠 = 𝑆𝑒𝑞𝐷𝐵 𝑠′
 Xifeng Yan, Jiawei Han, Ramin Afshar, “CloSpan: Mining Closed Sequential Patterns in Large Datasets“, SIAM 2003
2019.11.18. 12
Pruning Redundant Pre-Conds
 In a sequence database 𝑆𝑒𝑞𝐷𝐵, consider a pre-condition candidate 𝑅 𝑝𝑟𝑒.
 If there is a pre-condition candidate 𝑅 𝑝𝑟𝑒
′
⊐ 𝑅 𝑝𝑟𝑒 such that
 (i) 𝑅 𝑝𝑟𝑒
′
= 𝑃1 ++𝑒 ++𝑃2 while 𝑅 𝑝𝑟𝑒 = 𝑃1 ++𝑃2, for some event 𝑒 and nonempty 𝑃1, 𝑃2
 (ii) 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
= 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
′
 then, for any post-condition candidate 𝑝𝑜𝑠𝑡 and any forward extension 𝑅 𝑝𝑟𝑒 ++𝑃,
 the rule 𝑅 𝑝𝑟𝑒 ++𝑃 → 𝑝𝑜𝑠𝑡 is redundant
2019.11.18. 13
LS-Set BIDE
2019.11.18. 14
Backward-extension event checking is omitted from the original BIDE algorithm
• David Lo, Siau-Cheng KHOO, Chao LIU, “Mining Recurrent Rules from Sequence Database”, TR12/07 NUS
Non-Redundant Recurrent Rules Miner (NR3)
 Input: a sequence database 𝑆𝑒𝑞𝐷𝐵; thresholds min_sup, min_supall, min_conf
 Output: Significant and non-redundant recurrent rules 𝑅𝑢𝑙𝑒𝑠
 Procedure
1. 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 ≔ A pruned set of pre-conditions from 𝑆𝑒𝑞𝐷𝐵 satisfying 𝑚𝑖𝑛 _𝑠𝑢𝑝
2. foreach 𝑝𝑟𝑒 ∈ 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 do
1. 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙 ≔ 𝑆𝑒𝑞𝐷𝐵 all−projected on 𝑝𝑟𝑒
2. 𝑏𝑡ℎ𝑑 ≔ 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 × 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙
3. 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 ≔ A pruned set of post-conditions from 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙 satisfying 𝑏𝑡ℎ𝑑
4. foreach 𝑝𝑜𝑠𝑡 ∈ 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 do
1. if 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑝𝑟𝑒 ++𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 then
1. 𝑅𝑢𝑙𝑒𝑠 = 𝑅𝑢𝑙𝑒𝑠 ∪ 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡
3. Remove remaining redundancy in 𝑅𝑢𝑙𝑒𝑠
 Alias for Tasks
 Procedure line 1 : GenPre task
 Procedure line 2.1 – 2.4 : GenRule task
 Procedure line 3 : RemRedun task
2019.11.18. 15
a c
b ac b
a a b c
𝜀
<a>→<c,a,d>
<a>→<c,b,b>
<a>→<b>
Rules
<a,b>→<c,d>
hash table <a>→<c,a,d>
<a>→<c,b,b>
<a,b>→<c,d>
<a,b>→<c,a>
<a>→<b>
Rules
<c,a,d>
Parallel Mining of Non-Redundant Recurrent Rules (pNR3)
2019.11.18. 16
Revisiting Non-Redundant Recurrent Rules Miner (NR3)
 Input: a sequence database 𝑆𝑒𝑞𝐷𝐵; thresholds min_sup, min_supall, min_conf
 Output: Significant and non-redundant recurrent rules 𝑅𝑢𝑙𝑒𝑠
 Procedure
1. 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 ≔ A pruned set of pre-conditions from 𝑆𝑒𝑞𝐷𝐵 satisfying 𝑚𝑖𝑛 _𝑠𝑢𝑝
2. foreach 𝑝𝑟𝑒 ∈ 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 do
1. 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙 ≔ 𝑆𝑒𝑞𝐷𝐵 all−projected on 𝑝𝑟𝑒
2. 𝑏𝑡ℎ𝑑 ≔ 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 × 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙
3. 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 ≔ A pruned set of post-conditions from 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙 satisfying 𝑏𝑡ℎ𝑑
4. foreach 𝑝𝑜𝑠𝑡 ∈ 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 do
1. if 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑝𝑟𝑒 ++𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 then
1. 𝑅𝑢𝑙𝑒𝑠 = 𝑅𝑢𝑙𝑒𝑠 ∪ 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡
3. Remove remaining redundancy in 𝑅𝑢𝑙𝑒𝑠
 Parallelization Strategy
 1. the single-producer-multiple-consumer framework
 2. the loop-level parallelization
2019.11.18. 17
a c
b ac b
a a b c
𝜀
<a>→<c,a,d>
<a>→<c,b,b>
<a>→<b>
Rules
<a,b>→<c,d>
hash table <a>→<c,a,d>
<a>→<c,b,b>
<a,b>→<c,d>
<a,b>→<c,a>
<a>→<b>
Rules
<c,a,d>
1
2
Parallel Non-Redundant Recurrent Rules Miner (pNR3)
2019.11.18. 18
a c
b ac b
a a b c
GenPre task
<a>➝<c,a,d>
<a>➝<c,b,b>
<a,b>➝<c,d>
<a,b>➝<c,a>
<a>➝<b>
RulesThread pool
GenRule[c,b]
GenRule[c,b,c]
GenRule[a,b]
GenRule[a]
task queue worker threads
GenPre
[1]
GenRule[a]
[2]
GenRule[a,b]
[N]
<a>➝<c,a,d>
<a>➝<c,b,b>
<a>➝<b>
Rules
<a,b>➝<c,d>
RemRedun task
hash table
Image
UML
Parallel Non-Redundant Recurrent Rules Miner (pNR3)
2019.11.18. 19
- pNR3 framework
- GenPre task
- GenRule task
Source codes are available at https://bitbucket.org/sekilab/nr3
Parallelization Effects of pNR3
 Let 𝑡 𝑇 be the runtime of a task 𝑇, 𝑁 be the number of available threads
 NR3 : 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒 + 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
 pNR3 : max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
 GenPre Concurrency : max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
 GenRule Parallelization : 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒 + 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
2019.11.18. 20
a c
b ac b
a a b c
𝜀
<a>→<c,a,d>
<a>→<c,b,b>
<a,b>→<c,d>
<a,b>→<c,a>
<a>→<b>
Rules
<a>
<a, b>
<c,a,d>
<a>→<c,a,d>
<a>→<c,b,b>
<a>→<b>
Rules
<a,b>→<c,d>
hash table
GenRule par. (1/N)
GenPre Concurrency (max func) RemRedun
Experiment Environment
 Dataset
 D10C10N10R0.5 (IBM synthetic data generator)
 9,678 sequences, average length 31.22
 BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)
 59,601 sequences, average length 2.42
 Experiment Machine
 Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores)
 8GB RAM
 Microsoft Windows 7 Professional x64
 Implementation
 Java SE 8
 Default JVM settings
2019.11.18. 21
D10C10N10R0.5
 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5 − 0.9%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙
= 1
 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 22
0
5000
10000
15000
20000
25000
0.5 0.6 0.7 0.8 0.9
size
min_sup (%)
PreCond
RuleCand
Rules
0
50
100
150
200
250
300
0.5 0.6 0.7 0.8 0.9
runtime(s)
min_sup (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
0.5 0.6 0.7 0.8 0.9
runtime(%)
min_sup (%)
GenPre GenRule RemRedun
(sec) 0.5 0.6 0.7 0.8 0.9
NR3 241 152 99 69 54
2-pNR3 118 78 49 37 26
4-pNR3 74 47 31 22 17
8-pNR3 54 35 23 18 14
(sec) 0.5 0.6 0.7 0.8 0.9
GenPre 34 22 15 11 8
GenRule 206 130 83 57 46
RemRedun 0 0 0 0 0
Elapsed 241 152 99 69 54
(size) 0.5 0.6 0.7 0.8 0.9
PreCond 21563 15013 11105 8917 7262
RuleCand 3965 2418 1622 1258 956
Rules 3912 2414 1621 1258 956
100
1000
10000
100000
50 60 70 80 90
size-(logscale)
min_conf (%)
PreCond
RuleCand
Rules
0
50
100
150
200
250
300
50 60 70 80 90
runtime(s)
min_conf (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
50 60 70 80 90
runtime(%)
min_conf (%)
GenPre GenRule RemRedun
(sec) 50 60 70 80 90
NR3 241 184 176 170 167
2-pNR3 119 92 88 85 83
4-pNR3 74 56 50 52 52
8-pNR3 54 47 46 45 45
(sec) 50 60 70 80 90
GenPre 34 34 34 34 34
GenRule 206 149 140 135 132
RemRedun 0 0 0 0 0
Elapsed 241 184 176 170 167
(size) 50 60 70 80 90
PreCond 21563 21563 21563 21563 21563
RuleCand 3965 1392 527 374 297
Rules 3912 1372 519 368 294
max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
BMSWebView1
 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.080 − 0.100%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.090%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 23
0
2000
4000
6000
8000
10000
0.080 0.085 0.090 0.095 0.100
size
min_sup (%)
PreCond
RuleCand
Rules
100
1000
10000
100000
0.080 0.085 0.090 0.095 0.100
runtime(s)-(logscale)
min_sup (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
0.080 0.085 0.090 0.095 0.100
runtime(%)
min_sup (%)
GenPre GenRule RemRedun
(sec) 0.080 0.085 0.090 0.095 0.100
NR3 43357 23729 12049 5063 2212
2-pNR3 21440 11737 6100 2567 1034
4-pNR3 12937 6839 3566 1550 618
8-pNR3 9567 5261 2721 1118 450
(sec) 0.080 0.085 0.090 0.095 0.100
GenPre 16 11 9 8 7
GenRule 43340 23718 12039 5055 2204
RemRedun 0 0 0 0 0
Elapsed 43357 23729 12049 5063 2212
(size) 0.080 0.085 0.090 0.095 0.100
PreCond 9476 7222 5734 4725 3981
RuleCand 6413 3638 2333 1605 1147
Rules 5976 3498 2260 1570 1139
0
1000
2000
3000
4000
5000
6000
50 60 70 80 90
size
min_conf (%)
PreCond
RuleCand
Rules
10
100
1000
10000
100000
50 60 70 80 90
runtime(s)-(logscale)
min_conf (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
50 60 70 80 90
runtime(%)
min_conf (%)
GenPre GenRule RemRedun
(sec) 50 60 70 80 90
NR3 12049 1778 304 145 104
2-pNR3 6100 932 157 72 50
4-pNR3 3566 580 90 42 32
8-pNR3 2721 400 69 32 22
(sec) 50 60 70 80 90
GenPre 9 9 9 9 10
GenRule 12039 1768 294 135 93
RemRedun 0 0 0 0 0
Elapsed 12049 1778 304 145 104
(size) 50 60 70 80 90
PreCond 5734 5734 5734 5734 5734
RuleCand 2333 1703 1173 685 288
Rules 2260 1648 1123 645 268
max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
Loop Fused Mining of NR3 (LF-NR3)
2019.11.18. 24
Simplifying the all-projection operation
 Given the projected database 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒,
 The all-projected database 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙 can be simplified:
 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙
= 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 ∪ 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑙𝑎𝑠𝑡 𝑝𝑟𝑒
𝑎𝑙𝑙
2019.11.18. 25
Non-Redundant Recurrent Rules Miner (NR3)
2019.11.18. 26
Loop-Fused NR3 (LF-NR3)
2019.11.18. ‹#›
Data Structure Level Optimization for Projections
 For each sequence Si in SeqDB and a set I of events,
 A hash map 𝑀𝑎𝑝𝑖 ∶ 𝐼 → 2 1,…, 𝑆 𝑖
 such that each key 𝑒 ∈ 𝐼 is mapped to the set of values each of which is a temporal point
of event e occurring in Si
2019.11.18. 28
Experiment Environment
 Dataset
 D10C10N10R0.5 (IBM synthetic data generator)
 9,678 sequences, average length 31.22
 BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)
 59,601 sequences, average length 2.42
 Experiment Machine
 Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores)
 8GB RAM
 Microsoft Windows 7 Professional x64
 Implementation
 Java SE 8
 Default JVM settings
2019.11.18. 29
D10C10N10R0.5
 (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5 − 0.9%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙
= 1
 (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 30
BMSWebView1
 (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.100 − 0.120%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
 (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.090%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 31
Discussion
 Computational Complexity of the Algorithms
 𝐼 𝑘 × 𝐼 𝑘 (I : the set of events, k : the length of the longest frequent pattern)
 The effects of fusing loops in NR3
 The foreach loop in the GenRule step eliminated
 The use of intermediate data 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 simplifies the computation of
 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙
= 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 ∪ 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑙𝑎𝑠𝑡 𝑝𝑟𝑒
𝑎𝑙𝑙
 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
 The effect of the hash-based data structure
 The efficient computation of (all-)projected databases
 Using the hash-based data structure is not always efficient if the sequences are short
2019.11.18. 32
Parallel Loop Fused Mining of NR3 (pLF-NR3)
2019.11.18. 33
Loop-Fused NR3 (LF-NR3)
2019.11.18. ‹#›
Possible to use the task-parallelism
underlying in the LF-NR3 algorithm,
• which can be handled within the
single-producer-multiple-consumer
framework
Parallel Loop Fused NR3 (pLF-NR3)
2019.11.18. 35
Experiment Environment
 Dataset
 D10C10N10R0.5 (IBM synthetic data generator)
 9,678 sequences, average length 31.22
 BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)
 59,601 sequences, average length 2.42
 Experiment Machine
 Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores)
 8GB RAM
 Microsoft Windows 7 Professional x64
 Implementation
 Java SE 8
 Default JVM settings
2019.11.18. 36
D10C10N10R0.5
 (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5 − 0.9%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙
= 1
 (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 37
BMSWebView1
 (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.092 − 0.108%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
 (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.092%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 38
Bidirectional Mining Non-Redundant Recurrent Rules (BOB)
based on David LO, Bolin DING, Lucia, Jiawei HAN, ICDE, 2011
2019.11.18. 39
Additional Definitions
 a sequence database 𝑆𝑒𝑞𝐷𝐵 – a set of sequences
 a sequence 𝑆 = 𝑒1, 𝑒2, … , 𝑒 𝑛
 the j-suffix of 𝑆 = 𝑒 𝑛−𝑗+1, 𝑒 𝑛−𝑗+2, … , 𝑒 𝑛
 𝑆′ is the 𝑗 𝑡ℎ minimum suffix of 𝑆,
if 𝑆′
is an suffix of 𝑆 iff no suffix starting with first(P) shorter than sx,
and longer than the (j-1)th minimum suffix
 The 𝒋 𝒕𝒉 suf-projection of 𝑆𝑒𝑞𝐷𝐵 with regarding to a pattern 𝑃
 𝑆𝑒𝑞𝐷𝐵𝑃
𝑠𝑢𝑓− 𝑗
= 𝑖, 𝑠𝑥 |𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑠𝑥 is the 𝑗 𝑡ℎ
minimum suffix of 𝑆𝑖 of 𝑃
 𝑆𝑒𝑞𝐷𝐵 pre-projected on 𝑃
 𝑆𝑒𝑞𝐷𝐵𝑃
𝑝𝑟𝑒
= 𝑖, 𝑝𝑥 𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑠𝑥 is 𝐭𝐡𝐞 𝐦𝐢𝐧𝐢𝐦𝐮𝐦 𝐬𝐮𝐟𝐟𝐢𝐱 of 𝑃 }
2019.11.18. 40
Anti-Monotonicity Property of Confidence
 Proposition 1
 Consider a rule 𝑅, in the form of 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡, and a sequence database 𝑆𝑒𝑞𝐷𝐵
 𝑐𝑜𝑛𝑓 𝑅, 𝑆𝑒𝑞𝐷𝐵 =
sup 𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
𝑎𝑙𝑙
𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅 𝑝𝑟𝑒, 𝑆𝑒𝑞𝐷𝐵
=
𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅 𝑝𝑟𝑒, 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
𝑝𝑟𝑒
𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅 𝑝𝑟𝑒, 𝑆𝑒𝑞𝐷𝐵
 Proposition 2
 Consider two rules 𝑅 and 𝑅′ in a sequence database 𝑆𝑒𝑞𝐷𝐵 with 𝑅 𝑝𝑟𝑒
′ = 𝑅 𝑝𝑟𝑒 and
𝑅 𝑝𝑜𝑠𝑡
′
= 𝑒 ++𝑅 𝑝𝑜𝑠𝑡 for some event 𝑒 ∈ 𝐼
 𝑐𝑜𝑛𝑓 𝑅 ≥ 𝑐𝑜𝑛𝑓 𝑅′
 Theorem. Anti-Monotonicity Property of Confidence
 Consider two rules 𝑅 and 𝑅′
in a sequence database 𝑆𝑒𝑞𝐷𝐵 with 𝑅 𝑝𝑟𝑒
′
= 𝑅 𝑝𝑟𝑒 and
𝑅 𝑝𝑜𝑠𝑡
′
= 𝑒𝑣𝑠 ++𝑅 𝑝𝑜𝑠𝑡 where 𝑒𝑣𝑠 is an arbitrary series of events.
 𝑐𝑜𝑛𝑓 𝑅 ≥ 𝑐𝑜𝑛𝑓 𝑅′
 If 𝑅 is not confident enough(𝑐𝑜𝑛𝑓 𝑅 < 𝑚𝑖𝑛_𝑐𝑜𝑛𝑓), 𝑅′
is not either
2019.11.18. 41
Pruning Redundant Post-Conds
 In a sequence database 𝑆𝑒𝑞𝐷𝐵, consider a post condition candidate 𝑅 𝑝𝑜𝑠𝑡.
 Lemma 1
 If there is a post-condition candidate 𝑅 𝑝𝑜𝑠𝑡
′
⊏ 𝑅 𝑝𝑜𝑠𝑡 such that
 (i) 𝑅 𝑝𝑜𝑠𝑡
′
= 𝑃1 ++𝑒 ++𝑃2 while 𝑅 𝑝𝑜𝑠𝑡 = 𝑃1 ++𝑃2, for some event 𝑒, subsequences 𝑃1, (nonempty) 𝑃2
 (ii) 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
𝑝𝑟𝑒
= 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
′
𝑝𝑟𝑒
 then for any pre-condition candidate 𝑝𝑟𝑒 and any backward extension 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 of 𝑅 𝑝𝑜𝑠𝑡, the rule 𝑅 =
𝑝𝑟𝑒 → 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 is not confidence-closed
 i.e., there exists another rule 𝑅′
⊐ 𝑅 such that 𝑐𝑜𝑛𝑓 𝑅 = 𝑐𝑜𝑛𝑓 𝑅′
 Lemma 2
 If there is a post-condition candidate 𝑅 𝑝𝑜𝑠𝑡
′
⊐ 𝑅 𝑝𝑜𝑠𝑡 such that
 (i) 𝑅 𝑝𝑜𝑠𝑡
′
= 𝑃1 ++𝑒 ++𝑃2 while 𝑅 𝑝𝑜𝑠𝑡 = 𝑃1 ++𝑃2, for some event 𝑒, subsequences (nonempty) 𝑃1, 𝑃2
 (iii) ∀𝑗 ∶ 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
𝑠𝑢𝑓−𝑗
= 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
′
𝑠𝑢𝑓−𝑗
, and
 (iv) ∀𝑗 ∶ 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
𝑠𝑢𝑓−𝑗
𝑅 𝑝𝑜𝑠𝑡
𝑎𝑙𝑙
= 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
′
𝑠𝑢𝑓−𝑗
𝑅 𝑝𝑜𝑠𝑡
′
𝑎𝑙𝑙
 then for any pre-condition candidate 𝑝𝑟𝑒 and any backward extension 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 of 𝑅 𝑝𝑜𝑠𝑡, the rule 𝑅 =
𝑝𝑟𝑒 → 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 is not support-closed
 i.e., there exists another rule 𝑅′
⊐ 𝑅 such that 𝑠𝑢𝑝 𝑅 = 𝑠𝑢𝑝 𝑅′
and 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑅 = 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑅′
 Theorem. Pruning Redundant Post-Conds
 If the properties (i)-(iv) in Lemma 1 and 2 are satisfied,
 then for any pre-condition candidate 𝑝𝑟𝑒 and any backward extension 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 of 𝑅 𝑝𝑜𝑠𝑡, the rule 𝑅 =
𝑝𝑟𝑒 → 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 is redundant.
2019.11.18. 42
Bidirectional Pruning-based Recurrent Rule Mining(BOB)
2019.11.18. 43
Interleaved Bidirectional Mining of NR3 (iBiRM)
2019.11.18. 44
Optimizing Operations
 Given the sequence database 𝑆𝑒𝑞𝐷𝐵, and the rule 𝑅 = 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡
 𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
 Pruning the search space of PRE early
 for 𝑅 = 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡 and 𝑅′ = 𝑝𝑟𝑒 ++𝑒 → 𝑝𝑜𝑠𝑡,
 if 𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝, then 𝑠𝑢𝑝 𝑅′, 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝
 if 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑅, 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝 𝑎𝑙𝑙
, then 𝑠𝑢𝑝 𝑎𝑙
𝑅′
, 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝 𝑎𝑙𝑙
 Decreasing the number of scanning a database using a prefix tree
 for each pre-condition 𝑝𝑟𝑒 ∈ 𝑃𝑅𝐸, suppose that a node 𝑁0 ∈ 𝑇𝑃𝑂𝑆𝑇 has its children
nodes 𝑁1, … , 𝑁𝑘
 we can compute the instance supports of its children nodes 𝑁1, … , 𝑁𝑘 by scanning 𝑆𝑒𝑞𝐷𝐵
once
 When 𝑁0 corresponds to a post-condition 𝑝𝑜𝑠𝑡 ∈ 𝑃𝑂𝑆𝑇, each child node 𝑁𝑖 corresponds to
a post-condition 𝑝𝑜𝑠𝑡𝑖 = 𝑒𝑖 ++𝑝𝑜𝑠𝑡 for some event 𝑒𝑖, and the post condition of each child
node thus has its suffix 𝑝𝑜𝑠𝑡 in common.
 When scanning a sequence 𝑠 ∈ 𝑆𝑒𝑞𝐷𝐵, we record the positions of each 𝑒𝑖’s and
those of the events appearing in 𝑝𝑜𝑠𝑡, from which we can compute the number of
instances of 𝑝𝑟𝑒 ++𝑝𝑜𝑠𝑡𝑖 in 𝑠
2019.11.18. ‹#›
Bidirectional Pruning-based Recurrent Rule Mining(BOB)
2019.11.18. 46
Interleaved Bidirectional Recurrent Rule Miner (iBiRM)
2019.11.18. ‹#›
Experiment Environment
 Dataset
 D5C20N10R0.5 (IBM synthetic data generator)
 4,999 sequences, average length 64.39
 BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)
 59,601 sequences, average length 2.42
 Experiment Machine
 Intel Core i5 2.50GHz
 8GB RAM
 Microsoft Windows 7 Professional x64
 Implementation
 Java SE 8
 Default JVM settings
2019.11.18. 48
D5C20N10R0.5
 (a)-(d) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 2.0 − 2.8%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙
= 1
 (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 2.4%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 49
BMSWebView1
 (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.092 − 0.108%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
 (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.092%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 50
Conclusion
2019.11.18. 51
Conclusion & Future Works
 Conclusion
 We have proposed Parallel Non-Redundant Recurrent Rules Miner (pNR3)
 We have proposed Loop-Fused Non-Redundant Recurrent Rules Miner(LF-NR3)
 We have proposed Parallel Loop-Fused Non-Redundant Recurrent Rules Miner
(pLF-NR3)
 We have proposed Interleaved Bidirectional Non-Redundant Recurrent Rules Miner
(iBiRM)
 Future works
 Improvement of the sequential recurrent rule mining algorithm
 Improvement of the parallel algorithms
 Source codes are available at https://bitbucket.org/sekilab/nr3
2019.11.18. 52

More Related Content

What's hot

Deadlocks in operating system
Deadlocks in operating systemDeadlocks in operating system
Deadlocks in operating system
lalithambiga kamaraj
 
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
Shayek Parvez
 
7 Deadlocks
7 Deadlocks7 Deadlocks
7 Deadlocks
Dr. Loganathan R
 
Chapter 7 - Deadlocks
Chapter 7 - DeadlocksChapter 7 - Deadlocks
Chapter 7 - Deadlocks
Wayne Jones Jnr
 
Deadlock Detection in Distributed Systems
Deadlock Detection in Distributed SystemsDeadlock Detection in Distributed Systems
Deadlock Detection in Distributed Systems
DHIVYADEVAKI
 
Deadlock
DeadlockDeadlock
Deadlock
Rajandeep Gill
 
Operating System
Operating SystemOperating System
Operating System
Subhasis Dash
 
Bankers
BankersBankers
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321maclean liu
 
The implementation of Banker's algorithm, data structure and its parser
The implementation of Banker's algorithm, data structure and its parserThe implementation of Banker's algorithm, data structure and its parser
The implementation of Banker's algorithm, data structure and its parser
Matthew Chang
 
Mca ii os u-3 dead lock & io systems
Mca  ii  os u-3 dead lock & io systemsMca  ii  os u-3 dead lock & io systems
Mca ii os u-3 dead lock & io systems
Rai University
 
Ch8 OS
Ch8 OSCh8 OS
Ch8 OSC.U
 
Deadlocks in operating system
Deadlocks in operating systemDeadlocks in operating system
Deadlocks in operating system
Midhun Sankar
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
ushabarad142
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
Sakshi Tiwari
 
Deadlock
DeadlockDeadlock
Deadlock
Farhat Shaikh
 
OOW13 JB KP ASH Deep Dive
OOW13 JB KP ASH Deep DiveOOW13 JB KP ASH Deep Dive
OOW13 JB KP ASH Deep Dive
Kellyn Pot'Vin-Gorman
 

What's hot (20)

Deadlocks in operating system
Deadlocks in operating systemDeadlocks in operating system
Deadlocks in operating system
 
Deadlock
DeadlockDeadlock
Deadlock
 
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
 
7 Deadlocks
7 Deadlocks7 Deadlocks
7 Deadlocks
 
Chapter 7 - Deadlocks
Chapter 7 - DeadlocksChapter 7 - Deadlocks
Chapter 7 - Deadlocks
 
Deadlock Detection in Distributed Systems
Deadlock Detection in Distributed SystemsDeadlock Detection in Distributed Systems
Deadlock Detection in Distributed Systems
 
Deadlock
DeadlockDeadlock
Deadlock
 
Operating System
Operating SystemOperating System
Operating System
 
OS_Ch8
OS_Ch8OS_Ch8
OS_Ch8
 
Bankers
BankersBankers
Bankers
 
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
 
The implementation of Banker's algorithm, data structure and its parser
The implementation of Banker's algorithm, data structure and its parserThe implementation of Banker's algorithm, data structure and its parser
The implementation of Banker's algorithm, data structure and its parser
 
Mca ii os u-3 dead lock & io systems
Mca  ii  os u-3 dead lock & io systemsMca  ii  os u-3 dead lock & io systems
Mca ii os u-3 dead lock & io systems
 
OSCh8
OSCh8OSCh8
OSCh8
 
Ch8 OS
Ch8 OSCh8 OS
Ch8 OS
 
Deadlocks in operating system
Deadlocks in operating systemDeadlocks in operating system
Deadlocks in operating system
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
 
Deadlock
DeadlockDeadlock
Deadlock
 
OOW13 JB KP ASH Deep Dive
OOW13 JB KP ASH Deep DiveOOW13 JB KP ASH Deep Dive
OOW13 JB KP ASH Deep Dive
 

Similar to Mining non-redundant recurrent rules from a sequence database

WWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generatorsWWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generators
Chuancong Gao
 
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법
NAVER Engineering
 
Lash
LashLash
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
AshishDPatel1
 
Fast Sequential Rule Mining
Fast Sequential Rule MiningFast Sequential Rule Mining
Fast Sequential Rule Mining
ijsrd.com
 
Mining Top-k Closed Sequential Patterns in Sequential Databases
Mining Top-k Closed Sequential Patterns in Sequential Databases Mining Top-k Closed Sequential Patterns in Sequential Databases
Mining Top-k Closed Sequential Patterns in Sequential Databases
IOSR Journals
 
lecture2.pdf
lecture2.pdflecture2.pdf
lecture2.pdf
Tigabu Yaya
 
Foundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual SystemsFoundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual Systems
ijtsrd
 
Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...
David Rosenblum
 
Learning from 6,000 projects mining specifications in the large
Learning from 6,000 projects   mining specifications in the largeLearning from 6,000 projects   mining specifications in the large
Learning from 6,000 projects mining specifications in the large
CISPA Helmholtz Center for Information Security
 
3 recursion
3 recursion3 recursion
3 recursion
Nguync91368
 
3-Recursion.ppt
3-Recursion.ppt3-Recursion.ppt
3-Recursion.ppt
TrnHuy921814
 
Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1
paul0001
 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature Hashing
Wush Wu
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
岳華 杜
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
IFPRI-EPTD
 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Michael Barker
 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael Barker
JAX London
 

Similar to Mining non-redundant recurrent rules from a sequence database (20)

WWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generatorsWWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generators
 
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법
 
Lash
LashLash
Lash
 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
 
Fast Sequential Rule Mining
Fast Sequential Rule MiningFast Sequential Rule Mining
Fast Sequential Rule Mining
 
Mining Top-k Closed Sequential Patterns in Sequential Databases
Mining Top-k Closed Sequential Patterns in Sequential Databases Mining Top-k Closed Sequential Patterns in Sequential Databases
Mining Top-k Closed Sequential Patterns in Sequential Databases
 
lecture2.pdf
lecture2.pdflecture2.pdf
lecture2.pdf
 
Foundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual SystemsFoundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual Systems
 
Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...
 
Learning from 6,000 projects mining specifications in the large
Learning from 6,000 projects   mining specifications in the largeLearning from 6,000 projects   mining specifications in the large
Learning from 6,000 projects mining specifications in the large
 
3 recursion
3 recursion3 recursion
3 recursion
 
3-Recursion.ppt
3-Recursion.ppt3-Recursion.ppt
3-Recursion.ppt
 
Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1
 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature Hashing
 
Modifed my_poster
Modifed my_posterModifed my_poster
Modifed my_poster
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
 
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!
 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael Barker
 

More from SeungYong Yoon

정보보호 최고책임자(CISO)의 법적 지위 제안
정보보호 최고책임자(CISO)의 법적 지위 제안정보보호 최고책임자(CISO)의 법적 지위 제안
정보보호 최고책임자(CISO)의 법적 지위 제안
SeungYong Yoon
 
계산 종이접기 입문(2)
계산 종이접기 입문(2)계산 종이접기 입문(2)
계산 종이접기 입문(2)
SeungYong Yoon
 
계산 종이접기 입문(1)
계산 종이접기 입문(1)계산 종이접기 입문(1)
계산 종이접기 입문(1)
SeungYong Yoon
 
양자 정보학 강의 (Quantum Information Lecture)
양자 정보학 강의 (Quantum Information Lecture)양자 정보학 강의 (Quantum Information Lecture)
양자 정보학 강의 (Quantum Information Lecture)
SeungYong Yoon
 
디지털포렌식, 이것만 알자!
디지털포렌식, 이것만 알자!디지털포렌식, 이것만 알자!
디지털포렌식, 이것만 알자!
SeungYong Yoon
 
サーバを作ってみた (4)
サーバを作ってみた (4)サーバを作ってみた (4)
サーバを作ってみた (4)
SeungYong Yoon
 
サーバを作ってみた (2)
サーバを作ってみた (2)サーバを作ってみた (2)
サーバを作ってみた (2)
SeungYong Yoon
 
サーバを作ってみた (1)
サーバを作ってみた (1)サーバを作ってみた (1)
サーバを作ってみた (1)
SeungYong Yoon
 
サーバを作ってみた (3)
サーバを作ってみた (3)サーバを作ってみた (3)
サーバを作ってみた (3)
SeungYong Yoon
 

More from SeungYong Yoon (9)

정보보호 최고책임자(CISO)의 법적 지위 제안
정보보호 최고책임자(CISO)의 법적 지위 제안정보보호 최고책임자(CISO)의 법적 지위 제안
정보보호 최고책임자(CISO)의 법적 지위 제안
 
계산 종이접기 입문(2)
계산 종이접기 입문(2)계산 종이접기 입문(2)
계산 종이접기 입문(2)
 
계산 종이접기 입문(1)
계산 종이접기 입문(1)계산 종이접기 입문(1)
계산 종이접기 입문(1)
 
양자 정보학 강의 (Quantum Information Lecture)
양자 정보학 강의 (Quantum Information Lecture)양자 정보학 강의 (Quantum Information Lecture)
양자 정보학 강의 (Quantum Information Lecture)
 
디지털포렌식, 이것만 알자!
디지털포렌식, 이것만 알자!디지털포렌식, 이것만 알자!
디지털포렌식, 이것만 알자!
 
サーバを作ってみた (4)
サーバを作ってみた (4)サーバを作ってみた (4)
サーバを作ってみた (4)
 
サーバを作ってみた (2)
サーバを作ってみた (2)サーバを作ってみた (2)
サーバを作ってみた (2)
 
サーバを作ってみた (1)
サーバを作ってみた (1)サーバを作ってみた (1)
サーバを作ってみた (1)
 
サーバを作ってみた (3)
サーバを作ってみた (3)サーバを作ってみた (3)
サーバを作ってみた (3)
 

Recently uploaded

DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
awadeshbabu
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
Self-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptxSelf-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptx
iemerc2024
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
manasideore6
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 

Recently uploaded (20)

DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
Self-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptxSelf-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptx
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 

Mining non-redundant recurrent rules from a sequence database

  • 1. Mining Non-Redundant Recurrent Rules from a Sequence Database Yoon SeungYong Ministry of Science and ICT, Republic of Korea forcom@forcom.kr - Efficient Mining of Recurrent Rules from a Sequence Database(Lo et al., DASFAA 2008) - Parallel Mining of Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, ISIS 2017) · A Parallel Algorithm for Mining Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, JACIII 2019) - Towards Efficient Mining of Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, IWCIA 2017) · Mining Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, IJCISTUDIES 2018) - Efficient Mining of Recurrent Rules from a Sequence Database Using Multi-Core Processors(Yoon and Seki, SCIS&ISIS 2018) - Bidirectional Mining of Non-Redundant Recurrent Rules from a Sequence Database(Lo et al., IEEE ICDE 2011) - A New Algorithm for Mining Recurrent Rules from a Sequence Database(Seki and Yoon, IEEE SMC 2019)
  • 2. Table of Contents 1. Motivation 2. Mining Non-Redundant Recurrent Rules (NR3) – Lo et al. 3. Parallel Mining of Non-Redundant Recurrent Rules (pNR3) 4. Loop-Fused Mining of NR3 (LF-NR3) 5. Parallel Loop-Fused Mining of NR3 (pLF-NR3) 6. Bidirectional Mining of NR3 (BOB) – Lo et al. 7. Interleaved Bidirectional Mining of NR3 (iBiRM) 8. Conclusion 2019.11.18. 2
  • 4. Sequence Database & Sequential Rule  Transaction Histories  Program Traces 2019.11.18. 4 Customer Movie Rental History Alice Star Wars 4, Star Wars 5, Star Wars 6, Star Wars 1 Bob Shrek, Spirited Away, Your Name Clara Spirited Away, Howl’s Moving Castle, Princess Mononoke David Star Wars 1, Star Wars 2, Star Wars 3, Star Wars 4, Star Wars 5 Eve Your Name Trace ID Command 1 check, lock, use, use, unlock, exit 2 check, lock, use, check, lock, use, unlock, exit 3 check, use, unlock, exit 4 check, lock, use 5 check, lock, use, unlock, check, lock, use, unlock, exit 〈Star Wars 4〉→ 〈Star Wars 5〉 〈lock〉→ 〈unlock〉
  • 5. What is a recurrent rule?  Recurrent Rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡  “Whenever a series of precedent events occurs, eventually another series of consequent events occurs”  e.g., 𝑅 = ⟨check, lock⟩ → ⟨use, unlock⟩ “Whenever ⟨check, lock⟩ occurs, eventually ⟨use, unlock⟩ occurs”  Captures temporal constraints that repeat a meaningful number of times both within a sequence and across multiple sequences  A sequential rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡 means “whenever a sequence is a super-sequence of 𝑅 𝑝𝑟𝑒, it will be a super-sequence of 𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡”  Linear Temporal Logic (LTL)  One of the most widely-used formalism for program verification  Clarke, Edmund M., Orna Grumberg, and Doron Peled. Model checking. MIT press, 1999.  Recurrent rule can be expressed in the form of LTL 2019.11.18. 5 - proposed by David LO
  • 6. Mining Non-Redundant Recurrent Rules (NR3) based on David LO, Siau-Cheng KHOO, NUS and Chao LIU, DASFAA, 2008 2019.11.18. 6
  • 7. Preliminaries & Examples (1)  a sequence database 𝑆𝑒𝑞𝐷𝐵 – a set of sequences : 𝑆1, 𝑆2, 𝑆3, 𝑆4, 𝑆5  a set of events 𝐼 in 𝑆𝑒𝑞𝐷𝐵 : {check, exit, lock, unlock, use}  a size of 𝑆𝑒𝑞𝐷𝐵 = 𝑆𝑒𝑞𝐷𝐵 : 𝑆𝑒𝑞𝐷𝐵 = 5  a sequence 𝑆 = 𝑒1, 𝑒2, … , 𝑒 𝑛 ∶ 𝑆1 = ⟨check, lock, use, use, unlock, exit⟩  a temporal point 𝑗 of 𝑒𝑗 in 𝑆 : an event of a temporal point 5 in 𝑆1 is unlock  a length of 𝑆 = 𝑆 = 𝑛 : 𝑆1 = 6  the last event of 𝑆 = 𝑙𝑎𝑠𝑡 𝑆 = 𝑆[𝑛] : 𝑙𝑎𝑠𝑡 𝑆1 = exit  the j-prefix of 𝑆 = 𝑆 𝑗 = ⟨𝑒1, 𝑒2, … , 𝑒𝑗⟩ : 𝑆1 2 = ⟨check, lock⟩ 2019.11.18. 7 SID Sequence 𝑆1 ⟨check, lock, use, use, unlock, exit⟩ 𝑆2 ⟨check, lock, use, check, lock, use, unlock, exit⟩ 𝑆3 ⟨check, use, unlock, exit⟩ 𝑆4 ⟨check, lock, use⟩ 𝑆5 ⟨check, lock, use, unlock, check, lock, use, unlock, exit⟩ an example sequence database 𝑆𝑒𝑞𝐷𝐵
  • 8. Preliminaries & Examples (2)  Given a sequence 𝑆 = ⟨𝑒1, … , 𝑒 𝑛⟩ and 𝑆′ = ⟨𝑒1 ′ , … , 𝑒 𝑚 ′ ⟩  the concatenation of 𝑆 and 𝑆′ ≔ 𝑆 ++𝑆′ = ⟨𝑒1, … , 𝑒 𝑛, 𝑒1 ′ , … , 𝑒 𝑚 ′ ⟩  𝑆 is a super-sequence of 𝑆′ ≔ 𝑆 ⊒ 𝑆′ if 𝑒𝑖1 = 𝑒1 ′ , … , 𝑒𝑖 𝑚 = 𝑒 𝑚 ′ (1 ≤ 𝑖1 ≤ ⋯ ≤ 𝑖 𝑚 ≤ 𝑛)  e.g., 𝑆1 ⊒ ⟨check, lock, unlock⟩ :  𝑆 𝑗 is an instance of 𝑆′ in 𝑆, if 𝑆 𝑗 ⊒ 𝑆′ and 𝑙𝑎𝑠𝑡 𝑆′ = 𝑆 𝑗  𝑆 𝑗 is the minimum instance of 𝑆′ in 𝑆, if 𝑆 𝑗 is an instance of 𝑆′ and ∄𝑘 < 𝑗, 𝑠. 𝑡. , 𝑆 𝑘 is an instance of 𝑆′  e.g., 𝑆1 3 , 𝑆1 4 are instances of ⟨check, lock, use⟩ in 𝑆1, and 𝑆1 3 is the minimum  𝑆5 9 is an instance of 𝑆1 in 𝑆5, and it is the minimum 2019.11.18. 8 SID Sequence 𝑆1 ⟨check, lock, use, use, unlock, exit⟩ 𝑆2 ⟨check, lock, use, check, lock, use, unlock, exit⟩ 𝑆3 ⟨check, use, unlock, exit⟩ 𝑆4 ⟨check, lock, use⟩ 𝑆5 ⟨check, lock, use, unlock, check, lock, use, unlock, exit⟩ 𝑆1 = ⟨check, lock, use, use, unlock, exit⟩ an example sequence database 𝑆𝑒𝑞𝐷𝐵
  • 9. Definitions & Examples (1)  Given a sequence 𝑃 = ⟨lock, use⟩ and a sequence database 𝑆𝑒𝑞𝐷𝐵  Consider a sequence database 𝑆𝑒𝑞𝐷𝐵 and a sequence 𝑃  𝑆𝑒𝑞𝐷𝐵 projected on 𝑃  𝑆𝑒𝑞𝐷𝐵 𝑃 = 𝑖, 𝑠𝑥 𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑝𝑥 is the minimum instance of 𝑃 }  the sequence support 𝑠𝑢𝑝 𝑃, 𝑆𝑒𝑞𝐷𝐵 = 𝑆𝑒𝑞𝐷𝐵 𝑃  𝑆𝑒𝑞𝐷𝐵 all-projected on 𝑃  𝑆𝑒𝑞𝐷𝐵 𝑃 𝑎𝑙𝑙 = 𝑖, 𝑠𝑥 𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑝𝑥 is 𝐚𝐧 𝐢𝐧𝐬𝐭𝐚𝐧𝐜𝐞 of 𝑃 }  the instance support 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑃, 𝑆𝑒𝑞𝐷𝐵 = |𝑆𝑒𝑞𝐷𝐵 𝑃 𝑎𝑙𝑙 | 2019.11.18. 9 SID Sequence 𝑆1 ⟨check, lock, use, use, unlock, exit⟩ 𝑆2 ⟨check, lock, use, check, lock, use, unlock, exit⟩ 𝑆3 ⟨check, use, unlock, exit⟩ 𝑆4 ⟨check, lock, use⟩ 𝑆5 ⟨check, lock, use, unlock, check, lock, use, unlock, exit⟩ SIDSequence 𝑆1 ⟨use, unlock, exit⟩ 𝑆2 ⟨check, lock, use, unlock, exit⟩ 𝑆4 ⟨⟩ 𝑆5 ⟨unlock, check, lock, use, unlock, exit⟩ 𝑆𝑒𝑞𝐷𝐵 𝑃 𝑠𝑢𝑝 𝑃, 𝑆𝑒𝑞𝐷𝐵 = 4 SIDSequence 𝑆1 ⟨use, unlock, exit⟩ 𝑆1 ⟨unlock, exit⟩ 𝑆2 ⟨check, lock, use, unlock, exit⟩ 𝑆2 ⟨unlock, exit⟩ 𝑆4 ⟨⟩ 𝑆5 ⟨unlock, check, lock, use, unlock, exit⟩ 𝑆5 ⟨unlock, exit⟩ 𝑆𝑒𝑞𝐷𝐵 𝑃 𝑎𝑙𝑙 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑃, 𝑆𝑒𝑞𝐷𝐵 = 7
  • 10. Definitions & Examples (2)  Consider a recurrent rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡 in a sequence database 𝑆𝑒𝑞𝐷𝐵  the pre-condition 𝑅 𝑝𝑟𝑒, the post-condition 𝑅 𝑝𝑜𝑠𝑡  the sequence support 𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝(𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵)  the instance support 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙 (𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵)  the confidence 𝑐𝑜𝑛𝑓 𝑅, 𝑆𝑒𝑞𝐷𝐵 = sup 𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒 𝑎𝑙𝑙 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅 𝑝𝑟𝑒, 𝑆𝑒𝑞𝐷𝐵 = 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒 𝑎𝑙𝑙 𝑅 𝑝𝑜𝑠𝑡 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒 𝑎𝑙𝑙  𝑅 is significant if 𝑠𝑢𝑝 𝑅,𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖 𝑛_𝑠𝑢𝑝, 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅,𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖 𝑛_𝑠𝑢𝑝 𝑎𝑙𝑙 , 𝑐𝑜𝑛𝑓 𝑅,𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖 𝑛_𝑐𝑜𝑛𝑓  Given a rule 𝑅 = ⟨lock, use⟩ → unlock and a sequence database 𝑆𝑒𝑞𝐷𝐵  the sequence support 𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 ⟨lock, use, unlock⟩, 𝑆𝑒𝑞𝐷𝐵 = 3  the instance support 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙 ⟨lock, use, unlock⟩, 𝑆𝑒𝑞𝐷𝐵 = 4  the confidence 𝑐𝑜𝑛𝑓 𝑅, 𝑆𝑒𝑞𝐷𝐵 = sup ⟨unlock⟩, 𝑆𝑒𝑞𝐷𝐵⟨lock,use⟩ 𝑎𝑙𝑙 𝑠𝑢𝑝 𝑎𝑙𝑙 ⟨lock,use⟩, 𝑆𝑒𝑞𝐷𝐵 = 6 7 2019.11.18. 10 SID Sequence 𝑆1 ⟨check, lock, use, use, unlock, exit⟩ 𝑆2 ⟨check, lock, use, check, lock, use, unlock, exit⟩ 𝑆3 ⟨check, use, unlock, exit⟩ 𝑆4 ⟨check, lock, use⟩ 𝑆5 ⟨check, lock, use, unlock, check, lock, use, unlock, exit⟩ 𝑆𝑒𝑞𝐷𝐵 →
  • 11. Rule Redundancy  Consider 𝑅 = ⟨check⟩ → ⟨lock, use, unlock⟩ and 𝑅′ = ⟨check⟩ → ⟨unlock⟩ with the same sequence/instance support and confidence  Do we really need both these rules?  Rule Redundancy  A rule 𝑅′ = 𝑅 𝑝𝑟𝑒 ′ → 𝑅 𝑝𝑜𝑠𝑡 ′ is redundant if there is another rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡 1. the same sequence/instance support and confidence 2. 𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡 ⊒ 𝑅 𝑝𝑟𝑒 ′ ++𝑅 𝑝𝑜𝑠𝑡 ′ (R is longer than R’)  Mining Non-Redundant Recurrent Rules  Mine pruned pre/post-conditions using modified BIDE (LS-Set miner)  BIDE : frequent closed sequence mining algorithm based on pattern-growth strategy  Wang, Jianyong, and Jiawei Han. "BIDE: Efficient mining of frequent closed sequences." Data Engineering, 2004. Proceedings. 20th International Conference on. IEEE, 2004. 2019.11.18. 11 𝑆 = ⟨check, lock, use, unlock⟩
  • 12. FS-Set, CS-Set, LS-Set  The set of frequent sequential pattern (FS-Set)  𝐹𝑆 = {𝑠| support 𝑠 ≥ min_sup}  The set of closed frequent sequential pattern (CS-Set)  𝐶𝑆 = {𝑠|𝑠 ∈ 𝐹𝑆 𝑎𝑛𝑑 ∄𝑠′ ∈ 𝐹𝑆, 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑠 ⊑ 𝑠′ 𝑎𝑛𝑑 support 𝑠 = support 𝑠′ }  Project Database Closed Set (LS-Set)  𝐿𝑆 = {𝑠| support 𝑠 ≥ min_sup 𝑎𝑛𝑑 ∄𝑠′ , 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑠 ⊑ 𝑠′ 𝑎𝑛𝑑 𝑆𝑒𝑞𝐷𝐵𝑠 = 𝑆𝑒𝑞𝐷𝐵 𝑠′}  cf. 𝑆𝑒𝑞𝐷𝐵𝑠 = 𝑆𝑒𝑞𝐷𝐵 𝑠′ ⇔ 𝑆𝑒𝑞𝐷𝐵𝑠 = 𝑆𝑒𝑞𝐷𝐵 𝑠′  Xifeng Yan, Jiawei Han, Ramin Afshar, “CloSpan: Mining Closed Sequential Patterns in Large Datasets“, SIAM 2003 2019.11.18. 12
  • 13. Pruning Redundant Pre-Conds  In a sequence database 𝑆𝑒𝑞𝐷𝐵, consider a pre-condition candidate 𝑅 𝑝𝑟𝑒.  If there is a pre-condition candidate 𝑅 𝑝𝑟𝑒 ′ ⊐ 𝑅 𝑝𝑟𝑒 such that  (i) 𝑅 𝑝𝑟𝑒 ′ = 𝑃1 ++𝑒 ++𝑃2 while 𝑅 𝑝𝑟𝑒 = 𝑃1 ++𝑃2, for some event 𝑒 and nonempty 𝑃1, 𝑃2  (ii) 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒 = 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒 ′  then, for any post-condition candidate 𝑝𝑜𝑠𝑡 and any forward extension 𝑅 𝑝𝑟𝑒 ++𝑃,  the rule 𝑅 𝑝𝑟𝑒 ++𝑃 → 𝑝𝑜𝑠𝑡 is redundant 2019.11.18. 13
  • 14. LS-Set BIDE 2019.11.18. 14 Backward-extension event checking is omitted from the original BIDE algorithm • David Lo, Siau-Cheng KHOO, Chao LIU, “Mining Recurrent Rules from Sequence Database”, TR12/07 NUS
  • 15. Non-Redundant Recurrent Rules Miner (NR3)  Input: a sequence database 𝑆𝑒𝑞𝐷𝐵; thresholds min_sup, min_supall, min_conf  Output: Significant and non-redundant recurrent rules 𝑅𝑢𝑙𝑒𝑠  Procedure 1. 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 ≔ A pruned set of pre-conditions from 𝑆𝑒𝑞𝐷𝐵 satisfying 𝑚𝑖𝑛 _𝑠𝑢𝑝 2. foreach 𝑝𝑟𝑒 ∈ 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 do 1. 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑎𝑙𝑙 ≔ 𝑆𝑒𝑞𝐷𝐵 all−projected on 𝑝𝑟𝑒 2. 𝑏𝑡ℎ𝑑 ≔ 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 × 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑎𝑙𝑙 3. 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 ≔ A pruned set of post-conditions from 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑎𝑙𝑙 satisfying 𝑏𝑡ℎ𝑑 4. foreach 𝑝𝑜𝑠𝑡 ∈ 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 do 1. if 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑝𝑟𝑒 ++𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 then 1. 𝑅𝑢𝑙𝑒𝑠 = 𝑅𝑢𝑙𝑒𝑠 ∪ 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡 3. Remove remaining redundancy in 𝑅𝑢𝑙𝑒𝑠  Alias for Tasks  Procedure line 1 : GenPre task  Procedure line 2.1 – 2.4 : GenRule task  Procedure line 3 : RemRedun task 2019.11.18. 15 a c b ac b a a b c 𝜀 <a>→<c,a,d> <a>→<c,b,b> <a>→<b> Rules <a,b>→<c,d> hash table <a>→<c,a,d> <a>→<c,b,b> <a,b>→<c,d> <a,b>→<c,a> <a>→<b> Rules <c,a,d>
  • 16. Parallel Mining of Non-Redundant Recurrent Rules (pNR3) 2019.11.18. 16
  • 17. Revisiting Non-Redundant Recurrent Rules Miner (NR3)  Input: a sequence database 𝑆𝑒𝑞𝐷𝐵; thresholds min_sup, min_supall, min_conf  Output: Significant and non-redundant recurrent rules 𝑅𝑢𝑙𝑒𝑠  Procedure 1. 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 ≔ A pruned set of pre-conditions from 𝑆𝑒𝑞𝐷𝐵 satisfying 𝑚𝑖𝑛 _𝑠𝑢𝑝 2. foreach 𝑝𝑟𝑒 ∈ 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 do 1. 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑎𝑙𝑙 ≔ 𝑆𝑒𝑞𝐷𝐵 all−projected on 𝑝𝑟𝑒 2. 𝑏𝑡ℎ𝑑 ≔ 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 × 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑎𝑙𝑙 3. 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 ≔ A pruned set of post-conditions from 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑎𝑙𝑙 satisfying 𝑏𝑡ℎ𝑑 4. foreach 𝑝𝑜𝑠𝑡 ∈ 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 do 1. if 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑝𝑟𝑒 ++𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 then 1. 𝑅𝑢𝑙𝑒𝑠 = 𝑅𝑢𝑙𝑒𝑠 ∪ 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡 3. Remove remaining redundancy in 𝑅𝑢𝑙𝑒𝑠  Parallelization Strategy  1. the single-producer-multiple-consumer framework  2. the loop-level parallelization 2019.11.18. 17 a c b ac b a a b c 𝜀 <a>→<c,a,d> <a>→<c,b,b> <a>→<b> Rules <a,b>→<c,d> hash table <a>→<c,a,d> <a>→<c,b,b> <a,b>→<c,d> <a,b>→<c,a> <a>→<b> Rules <c,a,d> 1 2
  • 18. Parallel Non-Redundant Recurrent Rules Miner (pNR3) 2019.11.18. 18 a c b ac b a a b c GenPre task <a>➝<c,a,d> <a>➝<c,b,b> <a,b>➝<c,d> <a,b>➝<c,a> <a>➝<b> RulesThread pool GenRule[c,b] GenRule[c,b,c] GenRule[a,b] GenRule[a] task queue worker threads GenPre [1] GenRule[a] [2] GenRule[a,b] [N] <a>➝<c,a,d> <a>➝<c,b,b> <a>➝<b> Rules <a,b>➝<c,d> RemRedun task hash table Image UML
  • 19. Parallel Non-Redundant Recurrent Rules Miner (pNR3) 2019.11.18. 19 - pNR3 framework - GenPre task - GenRule task Source codes are available at https://bitbucket.org/sekilab/nr3
  • 20. Parallelization Effects of pNR3  Let 𝑡 𝑇 be the runtime of a task 𝑇, 𝑁 be the number of available threads  NR3 : 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒 + 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛  pNR3 : max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛  GenPre Concurrency : max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛  GenRule Parallelization : 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒 + 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛 2019.11.18. 20 a c b ac b a a b c 𝜀 <a>→<c,a,d> <a>→<c,b,b> <a,b>→<c,d> <a,b>→<c,a> <a>→<b> Rules <a> <a, b> <c,a,d> <a>→<c,a,d> <a>→<c,b,b> <a>→<b> Rules <a,b>→<c,d> hash table GenRule par. (1/N) GenPre Concurrency (max func) RemRedun
  • 21. Experiment Environment  Dataset  D10C10N10R0.5 (IBM synthetic data generator)  9,678 sequences, average length 31.22  BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)  59,601 sequences, average length 2.42  Experiment Machine  Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores)  8GB RAM  Microsoft Windows 7 Professional x64  Implementation  Java SE 8  Default JVM settings 2019.11.18. 21
  • 22. D10C10N10R0.5  𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5 − 0.9%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1  𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1 2019.11.18. 22 0 5000 10000 15000 20000 25000 0.5 0.6 0.7 0.8 0.9 size min_sup (%) PreCond RuleCand Rules 0 50 100 150 200 250 300 0.5 0.6 0.7 0.8 0.9 runtime(s) min_sup (%) NR3 2-pNR3 4-pNR3 8-pNR3 0% 20% 40% 60% 80% 100% 0.5 0.6 0.7 0.8 0.9 runtime(%) min_sup (%) GenPre GenRule RemRedun (sec) 0.5 0.6 0.7 0.8 0.9 NR3 241 152 99 69 54 2-pNR3 118 78 49 37 26 4-pNR3 74 47 31 22 17 8-pNR3 54 35 23 18 14 (sec) 0.5 0.6 0.7 0.8 0.9 GenPre 34 22 15 11 8 GenRule 206 130 83 57 46 RemRedun 0 0 0 0 0 Elapsed 241 152 99 69 54 (size) 0.5 0.6 0.7 0.8 0.9 PreCond 21563 15013 11105 8917 7262 RuleCand 3965 2418 1622 1258 956 Rules 3912 2414 1621 1258 956 100 1000 10000 100000 50 60 70 80 90 size-(logscale) min_conf (%) PreCond RuleCand Rules 0 50 100 150 200 250 300 50 60 70 80 90 runtime(s) min_conf (%) NR3 2-pNR3 4-pNR3 8-pNR3 0% 20% 40% 60% 80% 100% 50 60 70 80 90 runtime(%) min_conf (%) GenPre GenRule RemRedun (sec) 50 60 70 80 90 NR3 241 184 176 170 167 2-pNR3 119 92 88 85 83 4-pNR3 74 56 50 52 52 8-pNR3 54 47 46 45 45 (sec) 50 60 70 80 90 GenPre 34 34 34 34 34 GenRule 206 149 140 135 132 RemRedun 0 0 0 0 0 Elapsed 241 184 176 170 167 (size) 50 60 70 80 90 PreCond 21563 21563 21563 21563 21563 RuleCand 3965 1392 527 374 297 Rules 3912 1372 519 368 294 max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
  • 23. BMSWebView1  𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.080 − 0.100%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1  𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.090%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1 2019.11.18. 23 0 2000 4000 6000 8000 10000 0.080 0.085 0.090 0.095 0.100 size min_sup (%) PreCond RuleCand Rules 100 1000 10000 100000 0.080 0.085 0.090 0.095 0.100 runtime(s)-(logscale) min_sup (%) NR3 2-pNR3 4-pNR3 8-pNR3 0% 20% 40% 60% 80% 100% 0.080 0.085 0.090 0.095 0.100 runtime(%) min_sup (%) GenPre GenRule RemRedun (sec) 0.080 0.085 0.090 0.095 0.100 NR3 43357 23729 12049 5063 2212 2-pNR3 21440 11737 6100 2567 1034 4-pNR3 12937 6839 3566 1550 618 8-pNR3 9567 5261 2721 1118 450 (sec) 0.080 0.085 0.090 0.095 0.100 GenPre 16 11 9 8 7 GenRule 43340 23718 12039 5055 2204 RemRedun 0 0 0 0 0 Elapsed 43357 23729 12049 5063 2212 (size) 0.080 0.085 0.090 0.095 0.100 PreCond 9476 7222 5734 4725 3981 RuleCand 6413 3638 2333 1605 1147 Rules 5976 3498 2260 1570 1139 0 1000 2000 3000 4000 5000 6000 50 60 70 80 90 size min_conf (%) PreCond RuleCand Rules 10 100 1000 10000 100000 50 60 70 80 90 runtime(s)-(logscale) min_conf (%) NR3 2-pNR3 4-pNR3 8-pNR3 0% 20% 40% 60% 80% 100% 50 60 70 80 90 runtime(%) min_conf (%) GenPre GenRule RemRedun (sec) 50 60 70 80 90 NR3 12049 1778 304 145 104 2-pNR3 6100 932 157 72 50 4-pNR3 3566 580 90 42 32 8-pNR3 2721 400 69 32 22 (sec) 50 60 70 80 90 GenPre 9 9 9 9 10 GenRule 12039 1768 294 135 93 RemRedun 0 0 0 0 0 Elapsed 12049 1778 304 145 104 (size) 50 60 70 80 90 PreCond 5734 5734 5734 5734 5734 RuleCand 2333 1703 1173 685 288 Rules 2260 1648 1123 645 268 max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
  • 24. Loop Fused Mining of NR3 (LF-NR3) 2019.11.18. 24
  • 25. Simplifying the all-projection operation  Given the projected database 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒,  The all-projected database 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑎𝑙𝑙 can be simplified:  𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑎𝑙𝑙 = 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 ∪ 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑙𝑎𝑠𝑡 𝑝𝑟𝑒 𝑎𝑙𝑙 2019.11.18. 25
  • 26. Non-Redundant Recurrent Rules Miner (NR3) 2019.11.18. 26
  • 28. Data Structure Level Optimization for Projections  For each sequence Si in SeqDB and a set I of events,  A hash map 𝑀𝑎𝑝𝑖 ∶ 𝐼 → 2 1,…, 𝑆 𝑖  such that each key 𝑒 ∈ 𝐼 is mapped to the set of values each of which is a temporal point of event e occurring in Si 2019.11.18. 28
  • 29. Experiment Environment  Dataset  D10C10N10R0.5 (IBM synthetic data generator)  9,678 sequences, average length 31.22  BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)  59,601 sequences, average length 2.42  Experiment Machine  Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores)  8GB RAM  Microsoft Windows 7 Professional x64  Implementation  Java SE 8  Default JVM settings 2019.11.18. 29
  • 30. D10C10N10R0.5  (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5 − 0.9%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1  (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1 2019.11.18. 30
  • 31. BMSWebView1  (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.100 − 0.120%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1  (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.090%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1 2019.11.18. 31
  • 32. Discussion  Computational Complexity of the Algorithms  𝐼 𝑘 × 𝐼 𝑘 (I : the set of events, k : the length of the longest frequent pattern)  The effects of fusing loops in NR3  The foreach loop in the GenRule step eliminated  The use of intermediate data 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 simplifies the computation of  𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑎𝑙𝑙 = 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 ∪ 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑙𝑎𝑠𝑡 𝑝𝑟𝑒 𝑎𝑙𝑙  𝑠𝑢𝑝 𝑎𝑙𝑙 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒  The effect of the hash-based data structure  The efficient computation of (all-)projected databases  Using the hash-based data structure is not always efficient if the sequences are short 2019.11.18. 32
  • 33. Parallel Loop Fused Mining of NR3 (pLF-NR3) 2019.11.18. 33
  • 34. Loop-Fused NR3 (LF-NR3) 2019.11.18. ‹#› Possible to use the task-parallelism underlying in the LF-NR3 algorithm, • which can be handled within the single-producer-multiple-consumer framework
  • 35. Parallel Loop Fused NR3 (pLF-NR3) 2019.11.18. 35
  • 36. Experiment Environment  Dataset  D10C10N10R0.5 (IBM synthetic data generator)  9,678 sequences, average length 31.22  BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)  59,601 sequences, average length 2.42  Experiment Machine  Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores)  8GB RAM  Microsoft Windows 7 Professional x64  Implementation  Java SE 8  Default JVM settings 2019.11.18. 36
  • 37. D10C10N10R0.5  (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5 − 0.9%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1  (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1 2019.11.18. 37
  • 38. BMSWebView1  (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.092 − 0.108%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1  (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.092%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1 2019.11.18. 38
  • 39. Bidirectional Mining Non-Redundant Recurrent Rules (BOB) based on David LO, Bolin DING, Lucia, Jiawei HAN, ICDE, 2011 2019.11.18. 39
  • 40. Additional Definitions  a sequence database 𝑆𝑒𝑞𝐷𝐵 – a set of sequences  a sequence 𝑆 = 𝑒1, 𝑒2, … , 𝑒 𝑛  the j-suffix of 𝑆 = 𝑒 𝑛−𝑗+1, 𝑒 𝑛−𝑗+2, … , 𝑒 𝑛  𝑆′ is the 𝑗 𝑡ℎ minimum suffix of 𝑆, if 𝑆′ is an suffix of 𝑆 iff no suffix starting with first(P) shorter than sx, and longer than the (j-1)th minimum suffix  The 𝒋 𝒕𝒉 suf-projection of 𝑆𝑒𝑞𝐷𝐵 with regarding to a pattern 𝑃  𝑆𝑒𝑞𝐷𝐵𝑃 𝑠𝑢𝑓− 𝑗 = 𝑖, 𝑠𝑥 |𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑠𝑥 is the 𝑗 𝑡ℎ minimum suffix of 𝑆𝑖 of 𝑃  𝑆𝑒𝑞𝐷𝐵 pre-projected on 𝑃  𝑆𝑒𝑞𝐷𝐵𝑃 𝑝𝑟𝑒 = 𝑖, 𝑝𝑥 𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑠𝑥 is 𝐭𝐡𝐞 𝐦𝐢𝐧𝐢𝐦𝐮𝐦 𝐬𝐮𝐟𝐟𝐢𝐱 of 𝑃 } 2019.11.18. 40
  • 41. Anti-Monotonicity Property of Confidence  Proposition 1  Consider a rule 𝑅, in the form of 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡, and a sequence database 𝑆𝑒𝑞𝐷𝐵  𝑐𝑜𝑛𝑓 𝑅, 𝑆𝑒𝑞𝐷𝐵 = sup 𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒 𝑎𝑙𝑙 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅 𝑝𝑟𝑒, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅 𝑝𝑟𝑒, 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡 𝑝𝑟𝑒 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅 𝑝𝑟𝑒, 𝑆𝑒𝑞𝐷𝐵  Proposition 2  Consider two rules 𝑅 and 𝑅′ in a sequence database 𝑆𝑒𝑞𝐷𝐵 with 𝑅 𝑝𝑟𝑒 ′ = 𝑅 𝑝𝑟𝑒 and 𝑅 𝑝𝑜𝑠𝑡 ′ = 𝑒 ++𝑅 𝑝𝑜𝑠𝑡 for some event 𝑒 ∈ 𝐼  𝑐𝑜𝑛𝑓 𝑅 ≥ 𝑐𝑜𝑛𝑓 𝑅′  Theorem. Anti-Monotonicity Property of Confidence  Consider two rules 𝑅 and 𝑅′ in a sequence database 𝑆𝑒𝑞𝐷𝐵 with 𝑅 𝑝𝑟𝑒 ′ = 𝑅 𝑝𝑟𝑒 and 𝑅 𝑝𝑜𝑠𝑡 ′ = 𝑒𝑣𝑠 ++𝑅 𝑝𝑜𝑠𝑡 where 𝑒𝑣𝑠 is an arbitrary series of events.  𝑐𝑜𝑛𝑓 𝑅 ≥ 𝑐𝑜𝑛𝑓 𝑅′  If 𝑅 is not confident enough(𝑐𝑜𝑛𝑓 𝑅 < 𝑚𝑖𝑛_𝑐𝑜𝑛𝑓), 𝑅′ is not either 2019.11.18. 41
  • 42. Pruning Redundant Post-Conds  In a sequence database 𝑆𝑒𝑞𝐷𝐵, consider a post condition candidate 𝑅 𝑝𝑜𝑠𝑡.  Lemma 1  If there is a post-condition candidate 𝑅 𝑝𝑜𝑠𝑡 ′ ⊏ 𝑅 𝑝𝑜𝑠𝑡 such that  (i) 𝑅 𝑝𝑜𝑠𝑡 ′ = 𝑃1 ++𝑒 ++𝑃2 while 𝑅 𝑝𝑜𝑠𝑡 = 𝑃1 ++𝑃2, for some event 𝑒, subsequences 𝑃1, (nonempty) 𝑃2  (ii) 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡 𝑝𝑟𝑒 = 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡 ′ 𝑝𝑟𝑒  then for any pre-condition candidate 𝑝𝑟𝑒 and any backward extension 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 of 𝑅 𝑝𝑜𝑠𝑡, the rule 𝑅 = 𝑝𝑟𝑒 → 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 is not confidence-closed  i.e., there exists another rule 𝑅′ ⊐ 𝑅 such that 𝑐𝑜𝑛𝑓 𝑅 = 𝑐𝑜𝑛𝑓 𝑅′  Lemma 2  If there is a post-condition candidate 𝑅 𝑝𝑜𝑠𝑡 ′ ⊐ 𝑅 𝑝𝑜𝑠𝑡 such that  (i) 𝑅 𝑝𝑜𝑠𝑡 ′ = 𝑃1 ++𝑒 ++𝑃2 while 𝑅 𝑝𝑜𝑠𝑡 = 𝑃1 ++𝑃2, for some event 𝑒, subsequences (nonempty) 𝑃1, 𝑃2  (iii) ∀𝑗 ∶ 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡 𝑠𝑢𝑓−𝑗 = 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡 ′ 𝑠𝑢𝑓−𝑗 , and  (iv) ∀𝑗 ∶ 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡 𝑠𝑢𝑓−𝑗 𝑅 𝑝𝑜𝑠𝑡 𝑎𝑙𝑙 = 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡 ′ 𝑠𝑢𝑓−𝑗 𝑅 𝑝𝑜𝑠𝑡 ′ 𝑎𝑙𝑙  then for any pre-condition candidate 𝑝𝑟𝑒 and any backward extension 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 of 𝑅 𝑝𝑜𝑠𝑡, the rule 𝑅 = 𝑝𝑟𝑒 → 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 is not support-closed  i.e., there exists another rule 𝑅′ ⊐ 𝑅 such that 𝑠𝑢𝑝 𝑅 = 𝑠𝑢𝑝 𝑅′ and 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅 = 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅′  Theorem. Pruning Redundant Post-Conds  If the properties (i)-(iv) in Lemma 1 and 2 are satisfied,  then for any pre-condition candidate 𝑝𝑟𝑒 and any backward extension 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 of 𝑅 𝑝𝑜𝑠𝑡, the rule 𝑅 = 𝑝𝑟𝑒 → 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 is redundant. 2019.11.18. 42
  • 43. Bidirectional Pruning-based Recurrent Rule Mining(BOB) 2019.11.18. 43
  • 44. Interleaved Bidirectional Mining of NR3 (iBiRM) 2019.11.18. 44
  • 45. Optimizing Operations  Given the sequence database 𝑆𝑒𝑞𝐷𝐵, and the rule 𝑅 = 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡  𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒  𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒  Pruning the search space of PRE early  for 𝑅 = 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡 and 𝑅′ = 𝑝𝑟𝑒 ++𝑒 → 𝑝𝑜𝑠𝑡,  if 𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝, then 𝑠𝑢𝑝 𝑅′, 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝  if 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅, 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝 𝑎𝑙𝑙 , then 𝑠𝑢𝑝 𝑎𝑙 𝑅′ , 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝 𝑎𝑙𝑙  Decreasing the number of scanning a database using a prefix tree  for each pre-condition 𝑝𝑟𝑒 ∈ 𝑃𝑅𝐸, suppose that a node 𝑁0 ∈ 𝑇𝑃𝑂𝑆𝑇 has its children nodes 𝑁1, … , 𝑁𝑘  we can compute the instance supports of its children nodes 𝑁1, … , 𝑁𝑘 by scanning 𝑆𝑒𝑞𝐷𝐵 once  When 𝑁0 corresponds to a post-condition 𝑝𝑜𝑠𝑡 ∈ 𝑃𝑂𝑆𝑇, each child node 𝑁𝑖 corresponds to a post-condition 𝑝𝑜𝑠𝑡𝑖 = 𝑒𝑖 ++𝑝𝑜𝑠𝑡 for some event 𝑒𝑖, and the post condition of each child node thus has its suffix 𝑝𝑜𝑠𝑡 in common.  When scanning a sequence 𝑠 ∈ 𝑆𝑒𝑞𝐷𝐵, we record the positions of each 𝑒𝑖’s and those of the events appearing in 𝑝𝑜𝑠𝑡, from which we can compute the number of instances of 𝑝𝑟𝑒 ++𝑝𝑜𝑠𝑡𝑖 in 𝑠 2019.11.18. ‹#›
  • 46. Bidirectional Pruning-based Recurrent Rule Mining(BOB) 2019.11.18. 46
  • 47. Interleaved Bidirectional Recurrent Rule Miner (iBiRM) 2019.11.18. ‹#›
  • 48. Experiment Environment  Dataset  D5C20N10R0.5 (IBM synthetic data generator)  4,999 sequences, average length 64.39  BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)  59,601 sequences, average length 2.42  Experiment Machine  Intel Core i5 2.50GHz  8GB RAM  Microsoft Windows 7 Professional x64  Implementation  Java SE 8  Default JVM settings 2019.11.18. 48
  • 49. D5C20N10R0.5  (a)-(d) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 2.0 − 2.8%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1  (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 2.4%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1 2019.11.18. 49
  • 50. BMSWebView1  (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.092 − 0.108%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1  (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.092%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1 2019.11.18. 50
  • 52. Conclusion & Future Works  Conclusion  We have proposed Parallel Non-Redundant Recurrent Rules Miner (pNR3)  We have proposed Loop-Fused Non-Redundant Recurrent Rules Miner(LF-NR3)  We have proposed Parallel Loop-Fused Non-Redundant Recurrent Rules Miner (pLF-NR3)  We have proposed Interleaved Bidirectional Non-Redundant Recurrent Rules Miner (iBiRM)  Future works  Improvement of the sequential recurrent rule mining algorithm  Improvement of the parallel algorithms  Source codes are available at https://bitbucket.org/sekilab/nr3 2019.11.18. 52

Editor's Notes

  1. Good morning everyone. I am Yoon SeungYong, a student in Nagoya Institute of Technology. Seki Hirohisa is my advisor, and participated in this research. From now, I’d like to introduce my research, ‘Parallel Mining of Non-Redundant Recurrent Rules from a Sequence Database’.
  2. I will, first, speak of the motivation of this research, and introduce the recurrent rules and the algorithm NR3, base of this research. I, then, present our algorithm, parallel mining of recurrent rules, pNR3, and show the effectiveness of our algorithm based on experiment results.
  3. Our motivation on the research
  4. I first talk about the sequence database and sequential rules. An example of a sequence database is transaction histories. For instance, Alice rented Star Wars 4, 5, and 6, and then Star Wars 1, as the release date. Another example is program traces. From these databases, we can infer a rule <Star Wars 4> then <Star Wars 5>, and <lock> then <unlock>.
  5. But why recurrent rules? Because a recurrent rule captures temporal constraints within a sequence and across multiple sequences. Recall the previous examples. In the transaction histories, we rarely cares how many times a customer lend same videos. But in the program traces, we have to consider how many times a series of commands has been executed. This is the reason that a recurrent rule has been proposed And mined recurrent rules can be directly converted into Linear Temporal Logic, the most widely used formalism for program verification. For more details, refer a favorite text book, Model checking.
  6. From now, I will introduce mining recurrent rules, and the algorithm NR3.
  7. We first define some terminologies. A sequence database is a set of sequences. A sequence is a series of events. In a sequence, we say the position of each event a temporal point. And, we refer the first j event as the j-prefix of sequence.
  8. We will define some operations on the sequence. This is a concatenation of S and S’. We say S is a super-sequence of S’, if S contains S’. And the matched prefix is called as instance, and the shortest one is the minimum instance.
  9. We will define the operation on a database. We say a database is projected on a sequence P, if a sequence contains P, the longest remaining part will be a projected database, and as it is known operation. We say a database is all-projected on a sequence P, if a sequence contains P, all of the remaining part will be a all-projected database. We say the number of the sequences support, especially, the sequence support is for projection, and the instance support is for all-projection.
  10. We will define a recurrent rule R equals pre then post. The supports are almost same as we previously defined. The confidence has special form, we can intuitively see it how many sequences contains post in the all-projected database on pre. We say a rule is significant if the number of rules is above the thresholds.
  11. We will define the notion of Rule Redundancy. Consider these two rules. R contains R’, and have the same support and confidence. It means if a sequence contains R then it also contains R’. We do not need to mine these rules, so we will prune some of them. We define a rule is redundant if there is another longer rule that has the same support confidence. And this will be processed using the algorithm BIDE, well-known frequent closed sequence miner.
  12. Now I will introduce the algorithm of Non-Redundant Recurrent Rules Miner, NR3, the work of David Lo, and others. The NR3 receives a sequence database and three thresholds, and emits significant and non-redundant recurrent rules. It first generates the candidates of pre-conditions using BIDE, consisting of recursions. So we call this step GenPre. Next, by looping the candidate pre, it generates the candidates of post-conditions and generates rules. We call this step GenRule, and in this step, we get significant rules. Finally, we remove remaining redundant rules using hash tables using the supports and confidence as a key. We call this step RemRedun.
  13. From now I will show our algorithm, parallel mining of recurrent rules, pNR3.
  14. Let’s review the previous work. First, if GenPre task find one pre-condition candidate, then we can handle GenRule task immediately. We call this strategy, the single-producer-multiple-consumer-framework. Because the GenRule tasks can be consumed as the GenPre task produces a pre. Second, we can concurrently handle the GenRule tasks. We call this strategy, namely, the loop-level parallelization.
  15. This is our algorithm Parallel Non-Redundant Recurrent Rules Miner, pNR3. The pNR3 instance starts to mine pre-conditions. Then the GenPre emits GenRule tasks using found pre, and push them into the thread pool. The thread pool handles these GenRule tasks, and the tasks collect significant rules. Finally the RemRedun instance removes redundant rules.
  16. This is our Java implementation. It works as I explained. The source codes are available at our Bitbucket repository.
  17. I will discuss the effect of parallelization. We utilized two strategy, GenPre Concurrency, the single-producer-multiple-consumer framework and GenRule Parallelization, the loop-level parallelization. GenPre Concurrency works as maximum function of GenPre or GenRule, because the longer task effects the total runtime. GenRule Parallelization works as a divider function, because available threads can handle each GenRule task. As a result, the runtime of our pNR3 is max GenPre or GenRule divided by N plus RemRedun. We will see these discussion in experiment results.
  18. I’ll explain experiment environment. We used two famous dataset, one is a synthetic dataset and another is real dataset. We implemented nr3 and pNR3 in Java 8, and executed in the common Core i7 machine which has 4 physical cores.
  19. This is an experiment result on synthetic dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenPre takes about 20% of runtime, and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm becomes 20% of this dataset, then we can say our algorithm is effective. As the results show, the runtime of 8-pNR3 is about 20% of NR3, so we can say our algorithm is very effective.
  20. This is an experiment result on real world dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenRule takes almost 100% of runtime, and GenPre and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm decreases as we increase the number of threads, then we can say our algorithm is effective. As the results show, the runtime of 4-pNR3 is about 30% of NR3, and 8-pNR3 is about 20% of NR3, so we can say our algorithm is effective, even if we take account into some overheads due to parallelization.
  21. From now I will show our algorithm, parallel mining of recurrent rules, pNR3.
  22. Now I will introduce the algorithm of Non-Redundant Recurrent Rules Miner, NR3, the work of David Lo, and others. The NR3 receives a sequence database and three thresholds, and emits significant and non-redundant recurrent rules. It first generates the candidates of pre-conditions using BIDE, consisting of recursions. So we call this step GenPre. Next, by looping the candidate pre, it generates the candidates of post-conditions and generates rules. We call this step GenRule, and in this step, we get significant rules. Finally, we remove remaining redundant rules using hash tables using the supports and confidence as a key. We call this step RemRedun.
  23. I’ll explain experiment environment. We used two famous dataset, one is a synthetic dataset and another is real dataset. We implemented nr3 and pNR3 in Java 8, and executed in the common Core i7 machine which has 4 physical cores.
  24. This is an experiment result on synthetic dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenPre takes about 20% of runtime, and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm becomes 20% of this dataset, then we can say our algorithm is effective. As the results show, the runtime of 8-pNR3 is about 20% of NR3, so we can say our algorithm is very effective.
  25. This is an experiment result on real world dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenRule takes almost 100% of runtime, and GenPre and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm decreases as we increase the number of threads, then we can say our algorithm is effective. As the results show, the runtime of 4-pNR3 is about 30% of NR3, and 8-pNR3 is about 20% of NR3, so we can say our algorithm is effective, even if we take account into some overheads due to parallelization.
  26. From now I will show our algorithm, parallel mining of recurrent rules, pNR3.
  27. I’ll explain experiment environment. We used two famous dataset, one is a synthetic dataset and another is real dataset. We implemented nr3 and pNR3 in Java 8, and executed in the common Core i7 machine which has 4 physical cores.
  28. This is an experiment result on synthetic dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenPre takes about 20% of runtime, and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm becomes 20% of this dataset, then we can say our algorithm is effective. As the results show, the runtime of 8-pNR3 is about 20% of NR3, so we can say our algorithm is very effective.
  29. This is an experiment result on real world dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenRule takes almost 100% of runtime, and GenPre and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm decreases as we increase the number of threads, then we can say our algorithm is effective. As the results show, the runtime of 4-pNR3 is about 30% of NR3, and 8-pNR3 is about 20% of NR3, so we can say our algorithm is effective, even if we take account into some overheads due to parallelization.
  30. From now, I will introduce mining recurrent rules, and the algorithm NR3.
  31. We first define some terminologies. A sequence database is a set of sequences. A sequence is a series of events. In a sequence, we say the position of each event a temporal point. And, we refer the first j event as the j-prefix of sequence.
  32. From now I will show our algorithm, parallel mining of recurrent rules, pNR3.
  33. I’ll explain experiment environment. We used two famous dataset, one is a synthetic dataset and another is real dataset. We implemented nr3 and pNR3 in Java 8, and executed in the common Core i7 machine which has 4 physical cores.
  34. This is an experiment result on synthetic dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenPre takes about 20% of runtime, and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm becomes 20% of this dataset, then we can say our algorithm is effective. As the results show, the runtime of 8-pNR3 is about 20% of NR3, so we can say our algorithm is very effective.
  35. This is an experiment result on real world dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenRule takes almost 100% of runtime, and GenPre and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm decreases as we increase the number of threads, then we can say our algorithm is effective. As the results show, the runtime of 4-pNR3 is about 30% of NR3, and 8-pNR3 is about 20% of NR3, so we can say our algorithm is effective, even if we take account into some overheads due to parallelization.
  36. Now I finally conclude
  37. We have proposed the algorithm Parallel Non-Redundant Recurrent Rules Miner, pNR3. It utilized two strategy, the single-producer-multiple-consumer framework and the loop-level parallelism. We showed the effectiveness of our algorithm based on the experiment on synthetic and real datasets. For the future works, we will do some experiments on the program trace, as the purpose of the rules. We will do experiment on many cores processor to see the effects accurately. Also, using the large memory, we will compare our algorithm to BOB, the successor of NR3. We are now working on improvement of the sequential recurrent rule mining algorithms. You can refer our implementation in this repository. This is all of my presentation. Thank you for listening. Do you have any questions?