SlideShare a Scribd company logo
1 of 52
Mining Non-Redundant Recurrent Rules from a Sequence Database
Yoon SeungYong
Ministry of Science and ICT, Republic of Korea
forcom@forcom.kr
- Efficient Mining of Recurrent Rules from a Sequence Database(Lo et al., DASFAA 2008)
- Parallel Mining of Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, ISIS 2017)
ยท A Parallel Algorithm for Mining Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, JACIII 2019)
- Towards Efficient Mining of Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, IWCIA 2017)
ยท Mining Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, IJCISTUDIES 2018)
- Efficient Mining of Recurrent Rules from a Sequence Database Using Multi-Core Processors(Yoon and Seki, SCIS&ISIS 2018)
- Bidirectional Mining of Non-Redundant Recurrent Rules from a Sequence Database(Lo et al., IEEE ICDE 2011)
- A New Algorithm for Mining Recurrent Rules from a Sequence Database(Seki and Yoon, IEEE SMC 2019)
Table of Contents
1. Motivation
2. Mining Non-Redundant Recurrent Rules (NR3) โ€“ Lo et al.
3. Parallel Mining of Non-Redundant Recurrent Rules (pNR3)
4. Loop-Fused Mining of NR3 (LF-NR3)
5. Parallel Loop-Fused Mining of NR3 (pLF-NR3)
6. Bidirectional Mining of NR3 (BOB) โ€“ Lo et al.
7. Interleaved Bidirectional Mining of NR3 (iBiRM)
8. Conclusion
2019.11.18. 2
Motivation
2019.11.18. 3
Sequence Database & Sequential Rule
๏‚ง Transaction Histories
๏‚ง Program Traces
2019.11.18. 4
Customer Movie Rental History
Alice Star Wars 4, Star Wars 5, Star Wars 6, Star Wars 1
Bob Shrek, Spirited Away, Your Name
Clara Spirited Away, Howlโ€™s Moving Castle, Princess Mononoke
David Star Wars 1, Star Wars 2, Star Wars 3, Star Wars 4, Star Wars 5
Eve Your Name
Trace ID Command
1 check, lock, use, use, unlock, exit
2 check, lock, use, check, lock, use, unlock, exit
3 check, use, unlock, exit
4 check, lock, use
5 check, lock, use, unlock, check, lock, use, unlock, exit
ใ€ˆStar Wars 4ใ€‰โ†’ ใ€ˆStar Wars 5ใ€‰
ใ€ˆlockใ€‰โ†’ ใ€ˆunlockใ€‰
What is a recurrent rule?
๏‚ง Recurrent Rule ๐‘… = ๐‘… ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
๏‚ง โ€œWhenever a series of precedent events occurs,
eventually another series of consequent events occursโ€
๏‚ง e.g., ๐‘… = โŸจcheck, lockโŸฉ โ†’ โŸจuse, unlockโŸฉ
โ€œWhenever โŸจcheck, lockโŸฉ occurs, eventually โŸจuse, unlockโŸฉ occursโ€
๏‚ง Captures temporal constraints that repeat a meaningful number of times
both within a sequence and across multiple sequences
๏‚ง A sequential rule ๐‘… = ๐‘… ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก means โ€œwhenever a sequence is a super-sequence of
๐‘… ๐‘๐‘Ÿ๐‘’, it will be a super-sequence of ๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘… ๐‘๐‘œ๐‘ ๐‘กโ€
๏‚ง Linear Temporal Logic (LTL)
๏‚ง One of the most widely-used formalism for program verification
๏‚ง Clarke, Edmund M., Orna Grumberg, and Doron Peled. Model checking. MIT press, 1999.
๏‚ง Recurrent rule can be expressed in the form of LTL
2019.11.18. 5
- proposed by David LO
Mining Non-Redundant Recurrent Rules (NR3)
based on David LO, Siau-Cheng KHOO, NUS and Chao LIU, DASFAA, 2008
2019.11.18. 6
Preliminaries & Examples (1)
๏‚ง a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต โ€“ a set of sequences : ๐‘†1, ๐‘†2, ๐‘†3, ๐‘†4, ๐‘†5
๏‚ง a set of events ๐ผ in ๐‘†๐‘’๐‘ž๐ท๐ต : {check, exit, lock, unlock, use}
๏‚ง a size of ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘†๐‘’๐‘ž๐ท๐ต : ๐‘†๐‘’๐‘ž๐ท๐ต = 5
๏‚ง a sequence ๐‘† = ๐‘’1, ๐‘’2, โ€ฆ , ๐‘’ ๐‘› โˆถ ๐‘†1 = โŸจcheck, lock, use, use, unlock, exitโŸฉ
๏‚ง a temporal point ๐‘— of ๐‘’๐‘— in ๐‘† : an event of a temporal point 5 in ๐‘†1 is unlock
๏‚ง a length of ๐‘† = ๐‘† = ๐‘› : ๐‘†1 = 6
๏‚ง the last event of ๐‘† = ๐‘™๐‘Ž๐‘ ๐‘ก ๐‘† = ๐‘†[๐‘›] : ๐‘™๐‘Ž๐‘ ๐‘ก ๐‘†1 = exit
๏‚ง the j-prefix of ๐‘† = ๐‘† ๐‘—
= โŸจ๐‘’1, ๐‘’2, โ€ฆ , ๐‘’๐‘—โŸฉ : ๐‘†1
2
= โŸจcheck, lockโŸฉ
2019.11.18. 7
SID Sequence
๐‘†1 โŸจcheck, lock, use, use, unlock, exitโŸฉ
๐‘†2 โŸจcheck, lock, use, check, lock, use, unlock, exitโŸฉ
๐‘†3 โŸจcheck, use, unlock, exitโŸฉ
๐‘†4 โŸจcheck, lock, useโŸฉ
๐‘†5 โŸจcheck, lock, use, unlock, check, lock, use, unlock, exitโŸฉ
an example sequence database ๐‘†๐‘’๐‘ž๐ท๐ต
Preliminaries & Examples (2)
๏‚ง Given a sequence ๐‘† = โŸจ๐‘’1, โ€ฆ , ๐‘’ ๐‘›โŸฉ and ๐‘†โ€ฒ = โŸจ๐‘’1
โ€ฒ
, โ€ฆ , ๐‘’ ๐‘š
โ€ฒ โŸฉ
๏‚ง the concatenation of ๐‘† and ๐‘†โ€ฒ
โ‰” ๐‘† ++๐‘†โ€ฒ
= โŸจ๐‘’1, โ€ฆ , ๐‘’ ๐‘›, ๐‘’1
โ€ฒ
, โ€ฆ , ๐‘’ ๐‘š
โ€ฒ
โŸฉ
๏‚ง ๐‘† is a super-sequence of ๐‘†โ€ฒ
โ‰” ๐‘† โŠ’ ๐‘†โ€ฒ
if ๐‘’๐‘–1
= ๐‘’1
โ€ฒ
, โ€ฆ , ๐‘’๐‘– ๐‘š
= ๐‘’ ๐‘š
โ€ฒ
(1 โ‰ค ๐‘–1 โ‰ค โ‹ฏ โ‰ค ๐‘– ๐‘š โ‰ค ๐‘›)
๏‚ง e.g., ๐‘†1 โŠ’ โŸจcheck, lock, unlockโŸฉ :
๏‚ง ๐‘† ๐‘—
is an instance of ๐‘†โ€ฒ
in ๐‘†, if ๐‘† ๐‘—
โŠ’ ๐‘†โ€ฒ
and ๐‘™๐‘Ž๐‘ ๐‘ก ๐‘†โ€ฒ
= ๐‘† ๐‘—
๏‚ง ๐‘† ๐‘— is the minimum instance of ๐‘†โ€ฒ in ๐‘†,
if ๐‘† ๐‘— is an instance of ๐‘†โ€ฒ and โˆ„๐‘˜ < ๐‘—, ๐‘ . ๐‘ก. , ๐‘† ๐‘˜ is an instance of ๐‘†โ€ฒ
๏‚ง e.g., ๐‘†1
3
, ๐‘†1
4
are instances of โŸจcheck, lock, useโŸฉ in ๐‘†1, and ๐‘†1
3
is the minimum
๏‚ง ๐‘†5
9
is an instance of ๐‘†1 in ๐‘†5, and it is the minimum
2019.11.18. 8
SID Sequence
๐‘†1 โŸจcheck, lock, use, use, unlock, exitโŸฉ
๐‘†2 โŸจcheck, lock, use, check, lock, use, unlock, exitโŸฉ
๐‘†3 โŸจcheck, use, unlock, exitโŸฉ
๐‘†4 โŸจcheck, lock, useโŸฉ
๐‘†5 โŸจcheck, lock, use, unlock, check, lock, use, unlock, exitโŸฉ
๐‘†1 = โŸจcheck, lock, use, use, unlock, exitโŸฉ
an example sequence database ๐‘†๐‘’๐‘ž๐ท๐ต
Definitions & Examples (1)
๏‚ง Given a sequence ๐‘ƒ = โŸจlock, useโŸฉ and a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต
๏‚ง Consider a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต and a sequence ๐‘ƒ
๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต projected on ๐‘ƒ
๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ = ๐‘–, ๐‘ ๐‘ฅ ๐‘†๐‘– = ๐‘๐‘ฅ ++๐‘ ๐‘ฅ โˆˆ ๐‘†๐‘’๐‘ž๐ท๐ต, ๐‘๐‘ฅ is the minimum instance of ๐‘ƒ }
๏‚ง the sequence support ๐‘ ๐‘ข๐‘ ๐‘ƒ, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ
๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต all-projected on ๐‘ƒ
๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ
๐‘Ž๐‘™๐‘™
= ๐‘–, ๐‘ ๐‘ฅ ๐‘†๐‘– = ๐‘๐‘ฅ ++๐‘ ๐‘ฅ โˆˆ ๐‘†๐‘’๐‘ž๐ท๐ต, ๐‘๐‘ฅ is ๐š๐ง ๐ข๐ง๐ฌ๐ญ๐š๐ง๐œ๐ž of ๐‘ƒ }
๏‚ง the instance support ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
๐‘ƒ, ๐‘†๐‘’๐‘ž๐ท๐ต = |๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ
๐‘Ž๐‘™๐‘™
|
2019.11.18. 9
SID Sequence
๐‘†1 โŸจcheck, lock, use, use, unlock, exitโŸฉ
๐‘†2 โŸจcheck, lock, use, check, lock, use, unlock, exitโŸฉ
๐‘†3 โŸจcheck, use, unlock, exitโŸฉ
๐‘†4 โŸจcheck, lock, useโŸฉ
๐‘†5 โŸจcheck, lock, use, unlock, check, lock, use, unlock, exitโŸฉ
SIDSequence
๐‘†1 โŸจuse, unlock, exitโŸฉ
๐‘†2 โŸจcheck, lock, use, unlock, exitโŸฉ
๐‘†4 โŸจโŸฉ
๐‘†5 โŸจunlock, check, lock, use, unlock, exitโŸฉ
๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ
๐‘ ๐‘ข๐‘ ๐‘ƒ, ๐‘†๐‘’๐‘ž๐ท๐ต = 4
SIDSequence
๐‘†1 โŸจuse, unlock, exitโŸฉ
๐‘†1 โŸจunlock, exitโŸฉ
๐‘†2 โŸจcheck, lock, use, unlock, exitโŸฉ
๐‘†2 โŸจunlock, exitโŸฉ
๐‘†4 โŸจโŸฉ
๐‘†5 โŸจunlock, check, lock, use, unlock, exitโŸฉ
๐‘†5 โŸจunlock, exitโŸฉ
๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ
๐‘Ž๐‘™๐‘™
๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
๐‘ƒ, ๐‘†๐‘’๐‘ž๐ท๐ต = 7
Definitions & Examples (2)
๏‚ง Consider a recurrent rule ๐‘… = ๐‘… ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก in a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต
๏‚ง the pre-condition ๐‘… ๐‘๐‘Ÿ๐‘’, the post-condition ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
๏‚ง the sequence support ๐‘ ๐‘ข๐‘ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘(๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต)
๏‚ง the instance support ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
(๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต)
๏‚ง the confidence ๐‘๐‘œ๐‘›๐‘“ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต =
sup ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™
๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘… ๐‘๐‘Ÿ๐‘’, ๐‘†๐‘’๐‘ž๐ท๐ต
=
๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™
๐‘… ๐‘๐‘œ๐‘ ๐‘ก
๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™
๏‚ง ๐‘… is significant if ๐‘ ๐‘ข๐‘ ๐‘…,๐‘†๐‘’๐‘ž๐ท๐ต โ‰ฅ ๐‘š๐‘– ๐‘›_๐‘ ๐‘ข๐‘, ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
๐‘…,๐‘†๐‘’๐‘ž๐ท๐ต โ‰ฅ ๐‘š๐‘– ๐‘›_๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
, ๐‘๐‘œ๐‘›๐‘“ ๐‘…,๐‘†๐‘’๐‘ž๐ท๐ต โ‰ฅ ๐‘š๐‘– ๐‘›_๐‘๐‘œ๐‘›๐‘“
๏‚ง Given a rule ๐‘… = โŸจlock, useโŸฉ โ†’ unlock and a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต
๏‚ง the sequence support ๐‘ ๐‘ข๐‘ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ โŸจlock, use, unlockโŸฉ, ๐‘†๐‘’๐‘ž๐ท๐ต = 3
๏‚ง the instance support ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ โŸจlock, use, unlockโŸฉ, ๐‘†๐‘’๐‘ž๐ท๐ต = 4
๏‚ง the confidence ๐‘๐‘œ๐‘›๐‘“ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต =
sup โŸจunlockโŸฉ, ๐‘†๐‘’๐‘ž๐ท๐ตโŸจlock,useโŸฉ
๐‘Ž๐‘™๐‘™
๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ โŸจlock,useโŸฉ, ๐‘†๐‘’๐‘ž๐ท๐ต
=
6
7
2019.11.18. 10
SID Sequence
๐‘†1 โŸจcheck, lock, use, use, unlock, exitโŸฉ
๐‘†2 โŸจcheck, lock, use, check, lock, use, unlock, exitโŸฉ
๐‘†3 โŸจcheck, use, unlock, exitโŸฉ
๐‘†4 โŸจcheck, lock, useโŸฉ
๐‘†5 โŸจcheck, lock, use, unlock, check, lock, use, unlock, exitโŸฉ
๐‘†๐‘’๐‘ž๐ท๐ต
โ†’
Rule Redundancy
๏‚ง Consider ๐‘… = โŸจcheckโŸฉ โ†’ โŸจlock, use, unlockโŸฉ and ๐‘…โ€ฒ = โŸจcheckโŸฉ โ†’ โŸจunlockโŸฉ
with the same sequence/instance support and confidence
๏‚ง Do we really need both these rules?
๏‚ง Rule Redundancy
๏‚ง A rule ๐‘…โ€ฒ = ๐‘… ๐‘๐‘Ÿ๐‘’
โ€ฒ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
is redundant if there is another rule ๐‘… = ๐‘… ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
1. the same sequence/instance support and confidence
2. ๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก โŠ’ ๐‘… ๐‘๐‘Ÿ๐‘’
โ€ฒ
++๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
(R is longer than Rโ€™)
๏‚ง Mining Non-Redundant Recurrent Rules
๏‚ง Mine pruned pre/post-conditions using modified BIDE (LS-Set miner)
๏‚ง BIDE : frequent closed sequence mining algorithm based on pattern-growth strategy
๏‚ง Wang, Jianyong, and Jiawei Han. "BIDE: Efficient mining of frequent closed sequences." Data Engineering, 2004.
Proceedings. 20th International Conference on. IEEE, 2004.
2019.11.18. 11
๐‘† = โŸจcheck, lock, use, unlockโŸฉ
FS-Set, CS-Set, LS-Set
๏‚ง The set of frequent sequential pattern (FS-Set)
๏‚ง ๐น๐‘† = {๐‘ | support ๐‘  โ‰ฅ min_sup}
๏‚ง The set of closed frequent sequential pattern (CS-Set)
๏‚ง ๐ถ๐‘† = {๐‘ |๐‘  โˆˆ ๐น๐‘† ๐‘Ž๐‘›๐‘‘ โˆ„๐‘ โ€ฒ
โˆˆ ๐น๐‘†, ๐‘ ๐‘ข๐‘โ„Ž ๐‘กโ„Ž๐‘Ž๐‘ก ๐‘  โŠ‘ ๐‘ โ€ฒ
๐‘Ž๐‘›๐‘‘ support ๐‘  = support ๐‘ โ€ฒ
}
๏‚ง Project Database Closed Set (LS-Set)
๏‚ง ๐ฟ๐‘† = {๐‘ | support ๐‘  โ‰ฅ min_sup ๐‘Ž๐‘›๐‘‘ โˆ„๐‘ โ€ฒ
, ๐‘ ๐‘ข๐‘โ„Ž ๐‘กโ„Ž๐‘Ž๐‘ก ๐‘  โŠ‘ ๐‘ โ€ฒ
๐‘Ž๐‘›๐‘‘ ๐‘†๐‘’๐‘ž๐ท๐ต๐‘  = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ โ€ฒ}
๏‚ง cf. ๐‘†๐‘’๐‘ž๐ท๐ต๐‘  = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ โ€ฒ โ‡” ๐‘†๐‘’๐‘ž๐ท๐ต๐‘  = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ โ€ฒ
๏‚ง Xifeng Yan, Jiawei Han, Ramin Afshar, โ€œCloSpan: Mining Closed Sequential Patterns in Large Datasetsโ€œ, SIAM 2003
2019.11.18. 12
Pruning Redundant Pre-Conds
๏‚ง In a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต, consider a pre-condition candidate ๐‘… ๐‘๐‘Ÿ๐‘’.
๏‚ง If there is a pre-condition candidate ๐‘… ๐‘๐‘Ÿ๐‘’
โ€ฒ
โŠ ๐‘… ๐‘๐‘Ÿ๐‘’ such that
๏‚ง (i) ๐‘… ๐‘๐‘Ÿ๐‘’
โ€ฒ
= ๐‘ƒ1 ++๐‘’ ++๐‘ƒ2 while ๐‘… ๐‘๐‘Ÿ๐‘’ = ๐‘ƒ1 ++๐‘ƒ2, for some event ๐‘’ and nonempty ๐‘ƒ1, ๐‘ƒ2
๏‚ง (ii) ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’
= ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’
โ€ฒ
๏‚ง then, for any post-condition candidate ๐‘๐‘œ๐‘ ๐‘ก and any forward extension ๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘ƒ,
๏‚ง the rule ๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘ƒ โ†’ ๐‘๐‘œ๐‘ ๐‘ก is redundant
2019.11.18. 13
LS-Set BIDE
2019.11.18. 14
Backward-extension event checking is omitted from the original BIDE algorithm
โ€ข David Lo, Siau-Cheng KHOO, Chao LIU, โ€œMining Recurrent Rules from Sequence Databaseโ€, TR12/07 NUS
Non-Redundant Recurrent Rules Miner (NR3)
๏‚ง Input: a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต; thresholds min_sup, min_supall, min_conf
๏‚ง Output: Significant and non-redundant recurrent rules ๐‘…๐‘ข๐‘™๐‘’๐‘ 
๏‚ง Procedure
1. ๐‘ƒ๐‘Ÿ๐‘’๐ถ๐‘œ๐‘›๐‘‘ โ‰” A pruned set of pre-conditions from ๐‘†๐‘’๐‘ž๐ท๐ต satisfying ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘
2. foreach ๐‘๐‘Ÿ๐‘’ โˆˆ ๐‘ƒ๐‘Ÿ๐‘’๐ถ๐‘œ๐‘›๐‘‘ do
1. ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™ โ‰” ๐‘†๐‘’๐‘ž๐ท๐ต allโˆ’projected on ๐‘๐‘Ÿ๐‘’
2. ๐‘๐‘กโ„Ž๐‘‘ โ‰” ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ ร— ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™
3. ๐‘ƒ๐‘œ๐‘ ๐‘ก๐ถ๐‘œ๐‘›๐‘‘ โ‰” A pruned set of post-conditions from ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™ satisfying ๐‘๐‘กโ„Ž๐‘‘
4. foreach ๐‘๐‘œ๐‘ ๐‘ก โˆˆ ๐‘ƒ๐‘œ๐‘ ๐‘ก๐ถ๐‘œ๐‘›๐‘‘ do
1. if ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘๐‘Ÿ๐‘’ ++๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ฅ ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ then
1. ๐‘…๐‘ข๐‘™๐‘’๐‘  = ๐‘…๐‘ข๐‘™๐‘’๐‘  โˆช ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก
3. Remove remaining redundancy in ๐‘…๐‘ข๐‘™๐‘’๐‘ 
๏‚ง Alias for Tasks
๏‚ง Procedure line 1 : GenPre task
๏‚ง Procedure line 2.1 โ€“ 2.4 : GenRule task
๏‚ง Procedure line 3 : RemRedun task
2019.11.18. 15
a c
b ac b
a a b c
๐œ€
<a>โ†’<c,a,d>
<a>โ†’<c,b,b>
<a>โ†’<b>
Rules
<a,b>โ†’<c,d>
hash table <a>โ†’<c,a,d>
<a>โ†’<c,b,b>
<a,b>โ†’<c,d>
<a,b>โ†’<c,a>
<a>โ†’<b>
Rules
<c,a,d>
Parallel Mining of Non-Redundant Recurrent Rules (pNR3)
2019.11.18. 16
Revisiting Non-Redundant Recurrent Rules Miner (NR3)
๏‚ง Input: a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต; thresholds min_sup, min_supall, min_conf
๏‚ง Output: Significant and non-redundant recurrent rules ๐‘…๐‘ข๐‘™๐‘’๐‘ 
๏‚ง Procedure
1. ๐‘ƒ๐‘Ÿ๐‘’๐ถ๐‘œ๐‘›๐‘‘ โ‰” A pruned set of pre-conditions from ๐‘†๐‘’๐‘ž๐ท๐ต satisfying ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘
2. foreach ๐‘๐‘Ÿ๐‘’ โˆˆ ๐‘ƒ๐‘Ÿ๐‘’๐ถ๐‘œ๐‘›๐‘‘ do
1. ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™ โ‰” ๐‘†๐‘’๐‘ž๐ท๐ต allโˆ’projected on ๐‘๐‘Ÿ๐‘’
2. ๐‘๐‘กโ„Ž๐‘‘ โ‰” ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ ร— ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™
3. ๐‘ƒ๐‘œ๐‘ ๐‘ก๐ถ๐‘œ๐‘›๐‘‘ โ‰” A pruned set of post-conditions from ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™ satisfying ๐‘๐‘กโ„Ž๐‘‘
4. foreach ๐‘๐‘œ๐‘ ๐‘ก โˆˆ ๐‘ƒ๐‘œ๐‘ ๐‘ก๐ถ๐‘œ๐‘›๐‘‘ do
1. if ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘๐‘Ÿ๐‘’ ++๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ฅ ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ then
1. ๐‘…๐‘ข๐‘™๐‘’๐‘  = ๐‘…๐‘ข๐‘™๐‘’๐‘  โˆช ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก
3. Remove remaining redundancy in ๐‘…๐‘ข๐‘™๐‘’๐‘ 
๏‚ง Parallelization Strategy
๏‚ง 1. the single-producer-multiple-consumer framework
๏‚ง 2. the loop-level parallelization
2019.11.18. 17
a c
b ac b
a a b c
๐œ€
<a>โ†’<c,a,d>
<a>โ†’<c,b,b>
<a>โ†’<b>
Rules
<a,b>โ†’<c,d>
hash table <a>โ†’<c,a,d>
<a>โ†’<c,b,b>
<a,b>โ†’<c,d>
<a,b>โ†’<c,a>
<a>โ†’<b>
Rules
<c,a,d>
1
2
Parallel Non-Redundant Recurrent Rules Miner (pNR3)
2019.11.18. 18
a c
b ac b
a a b c
GenPre task
<a>โž<c,a,d>
<a>โž<c,b,b>
<a,b>โž<c,d>
<a,b>โž<c,a>
<a>โž<b>
RulesThread pool
GenRule[c,b]
GenRule[c,b,c]
GenRule[a,b]
GenRule[a]
task queue worker threads
GenPre
[1]
GenRule[a]
[2]
GenRule[a,b]
[N]
<a>โž<c,a,d>
<a>โž<c,b,b>
<a>โž<b>
Rules
<a,b>โž<c,d>
RemRedun task
hash table
Image
UML
Parallel Non-Redundant Recurrent Rules Miner (pNR3)
2019.11.18. 19
- pNR3 framework
- GenPre task
- GenRule task
Source codes are available at https://bitbucket.org/sekilab/nr3
Parallelization Effects of pNR3
๏‚ง Let ๐‘ก ๐‘‡ be the runtime of a task ๐‘‡, ๐‘ be the number of available threads
๏‚ง NR3 : ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’ + ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘›
๏‚ง pNR3 : max ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’, ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’/๐‘ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘›
๏‚ง GenPre Concurrency : max ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’, ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘›
๏‚ง GenRule Parallelization : ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’ + ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’/๐‘ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘›
2019.11.18. 20
a c
b ac b
a a b c
๐œ€
<a>โ†’<c,a,d>
<a>โ†’<c,b,b>
<a,b>โ†’<c,d>
<a,b>โ†’<c,a>
<a>โ†’<b>
Rules
<a>
<a, b>
<c,a,d>
<a>โ†’<c,a,d>
<a>โ†’<c,b,b>
<a>โ†’<b>
Rules
<a,b>โ†’<c,d>
hash table
GenRule par. (1/N)
GenPre Concurrency (max func) RemRedun
Experiment Environment
๏‚ง Dataset
๏‚ง D10C10N10R0.5 (IBM synthetic data generator)
๏‚ง 9,678 sequences, average length 31.22
๏‚ง BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)
๏‚ง 59,601 sequences, average length 2.42
๏‚ง Experiment Machine
๏‚ง Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores)
๏‚ง 8GB RAM
๏‚ง Microsoft Windows 7 Professional x64
๏‚ง Implementation
๏‚ง Java SE 8
๏‚ง Default JVM settings
2019.11.18. 21
D10C10N10R0.5
๏‚ง ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5 โˆ’ 0.9%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
= 1
๏‚ง ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
2019.11.18. 22
0
5000
10000
15000
20000
25000
0.5 0.6 0.7 0.8 0.9
size
min_sup (%)
PreCond
RuleCand
Rules
0
50
100
150
200
250
300
0.5 0.6 0.7 0.8 0.9
runtime(s)
min_sup (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
0.5 0.6 0.7 0.8 0.9
runtime(%)
min_sup (%)
GenPre GenRule RemRedun
(sec) 0.5 0.6 0.7 0.8 0.9
NR3 241 152 99 69 54
2-pNR3 118 78 49 37 26
4-pNR3 74 47 31 22 17
8-pNR3 54 35 23 18 14
(sec) 0.5 0.6 0.7 0.8 0.9
GenPre 34 22 15 11 8
GenRule 206 130 83 57 46
RemRedun 0 0 0 0 0
Elapsed 241 152 99 69 54
(size) 0.5 0.6 0.7 0.8 0.9
PreCond 21563 15013 11105 8917 7262
RuleCand 3965 2418 1622 1258 956
Rules 3912 2414 1621 1258 956
100
1000
10000
100000
50 60 70 80 90
size-(logscale)
min_conf (%)
PreCond
RuleCand
Rules
0
50
100
150
200
250
300
50 60 70 80 90
runtime(s)
min_conf (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
50 60 70 80 90
runtime(%)
min_conf (%)
GenPre GenRule RemRedun
(sec) 50 60 70 80 90
NR3 241 184 176 170 167
2-pNR3 119 92 88 85 83
4-pNR3 74 56 50 52 52
8-pNR3 54 47 46 45 45
(sec) 50 60 70 80 90
GenPre 34 34 34 34 34
GenRule 206 149 140 135 132
RemRedun 0 0 0 0 0
Elapsed 241 184 176 170 167
(size) 50 60 70 80 90
PreCond 21563 21563 21563 21563 21563
RuleCand 3965 1392 527 374 297
Rules 3912 1372 519 368 294
max ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’, ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’/๐‘ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘›
BMSWebView1
๏‚ง ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.080 โˆ’ 0.100%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
๏‚ง ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.090%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
2019.11.18. 23
0
2000
4000
6000
8000
10000
0.080 0.085 0.090 0.095 0.100
size
min_sup (%)
PreCond
RuleCand
Rules
100
1000
10000
100000
0.080 0.085 0.090 0.095 0.100
runtime(s)-(logscale)
min_sup (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
0.080 0.085 0.090 0.095 0.100
runtime(%)
min_sup (%)
GenPre GenRule RemRedun
(sec) 0.080 0.085 0.090 0.095 0.100
NR3 43357 23729 12049 5063 2212
2-pNR3 21440 11737 6100 2567 1034
4-pNR3 12937 6839 3566 1550 618
8-pNR3 9567 5261 2721 1118 450
(sec) 0.080 0.085 0.090 0.095 0.100
GenPre 16 11 9 8 7
GenRule 43340 23718 12039 5055 2204
RemRedun 0 0 0 0 0
Elapsed 43357 23729 12049 5063 2212
(size) 0.080 0.085 0.090 0.095 0.100
PreCond 9476 7222 5734 4725 3981
RuleCand 6413 3638 2333 1605 1147
Rules 5976 3498 2260 1570 1139
0
1000
2000
3000
4000
5000
6000
50 60 70 80 90
size
min_conf (%)
PreCond
RuleCand
Rules
10
100
1000
10000
100000
50 60 70 80 90
runtime(s)-(logscale)
min_conf (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
50 60 70 80 90
runtime(%)
min_conf (%)
GenPre GenRule RemRedun
(sec) 50 60 70 80 90
NR3 12049 1778 304 145 104
2-pNR3 6100 932 157 72 50
4-pNR3 3566 580 90 42 32
8-pNR3 2721 400 69 32 22
(sec) 50 60 70 80 90
GenPre 9 9 9 9 10
GenRule 12039 1768 294 135 93
RemRedun 0 0 0 0 0
Elapsed 12049 1778 304 145 104
(size) 50 60 70 80 90
PreCond 5734 5734 5734 5734 5734
RuleCand 2333 1703 1173 685 288
Rules 2260 1648 1123 645 268
max ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’, ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’/๐‘ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘›
Loop Fused Mining of NR3 (LF-NR3)
2019.11.18. 24
Simplifying the all-projection operation
๏‚ง Given the projected database ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’,
๏‚ง The all-projected database ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™ can be simplified:
๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™
= ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ โˆช ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘™๐‘Ž๐‘ ๐‘ก ๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™
2019.11.18. 25
Non-Redundant Recurrent Rules Miner (NR3)
2019.11.18. 26
Loop-Fused NR3 (LF-NR3)
2019.11.18. โ€น#โ€บ
Data Structure Level Optimization for Projections
๏‚ง For each sequence Si in SeqDB and a set I of events,
๏‚ง A hash map ๐‘€๐‘Ž๐‘๐‘– โˆถ ๐ผ โ†’ 2 1,โ€ฆ, ๐‘† ๐‘–
๏‚ง such that each key ๐‘’ โˆˆ ๐ผ is mapped to the set of values each of which is a temporal point
of event e occurring in Si
2019.11.18. 28
Experiment Environment
๏‚ง Dataset
๏‚ง D10C10N10R0.5 (IBM synthetic data generator)
๏‚ง 9,678 sequences, average length 31.22
๏‚ง BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)
๏‚ง 59,601 sequences, average length 2.42
๏‚ง Experiment Machine
๏‚ง Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores)
๏‚ง 8GB RAM
๏‚ง Microsoft Windows 7 Professional x64
๏‚ง Implementation
๏‚ง Java SE 8
๏‚ง Default JVM settings
2019.11.18. 29
D10C10N10R0.5
๏‚ง (a)-(c) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5 โˆ’ 0.9%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
= 1
๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
2019.11.18. 30
BMSWebView1
๏‚ง (a)-(c) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.100 โˆ’ 0.120%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.090%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
2019.11.18. 31
Discussion
๏‚ง Computational Complexity of the Algorithms
๏‚ง ๐ผ ๐‘˜ ร— ๐ผ ๐‘˜ (I : the set of events, k : the length of the longest frequent pattern)
๏‚ง The effects of fusing loops in NR3
๏‚ง The foreach loop in the GenRule step eliminated
๏‚ง The use of intermediate data ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ simplifies the computation of
๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™
= ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ โˆช ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘™๐‘Ž๐‘ ๐‘ก ๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™
๏‚ง ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
๐‘๐‘Ÿ๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๏‚ง The effect of the hash-based data structure
๏‚ง The efficient computation of (all-)projected databases
๏‚ง Using the hash-based data structure is not always efficient if the sequences are short
2019.11.18. 32
Parallel Loop Fused Mining of NR3 (pLF-NR3)
2019.11.18. 33
Loop-Fused NR3 (LF-NR3)
2019.11.18. โ€น#โ€บ
Possible to use the task-parallelism
underlying in the LF-NR3 algorithm,
โ€ข which can be handled within the
single-producer-multiple-consumer
framework
Parallel Loop Fused NR3 (pLF-NR3)
2019.11.18. 35
Experiment Environment
๏‚ง Dataset
๏‚ง D10C10N10R0.5 (IBM synthetic data generator)
๏‚ง 9,678 sequences, average length 31.22
๏‚ง BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)
๏‚ง 59,601 sequences, average length 2.42
๏‚ง Experiment Machine
๏‚ง Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores)
๏‚ง 8GB RAM
๏‚ง Microsoft Windows 7 Professional x64
๏‚ง Implementation
๏‚ง Java SE 8
๏‚ง Default JVM settings
2019.11.18. 36
D10C10N10R0.5
๏‚ง (a)-(c) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5 โˆ’ 0.9%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
= 1
๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
2019.11.18. 37
BMSWebView1
๏‚ง (a)-(c) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.092 โˆ’ 0.108%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.092%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
2019.11.18. 38
Bidirectional Mining Non-Redundant Recurrent Rules (BOB)
based on David LO, Bolin DING, Lucia, Jiawei HAN, ICDE, 2011
2019.11.18. 39
Additional Definitions
๏‚ง a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต โ€“ a set of sequences
๏‚ง a sequence ๐‘† = ๐‘’1, ๐‘’2, โ€ฆ , ๐‘’ ๐‘›
๏‚ง the j-suffix of ๐‘† = ๐‘’ ๐‘›โˆ’๐‘—+1, ๐‘’ ๐‘›โˆ’๐‘—+2, โ€ฆ , ๐‘’ ๐‘›
๏‚ง ๐‘†โ€ฒ is the ๐‘— ๐‘กโ„Ž minimum suffix of ๐‘†,
if ๐‘†โ€ฒ
is an suffix of ๐‘† iff no suffix starting with first(P) shorter than sx,
and longer than the (j-1)th minimum suffix
๏‚ง The ๐’‹ ๐’•๐’‰ suf-projection of ๐‘†๐‘’๐‘ž๐ท๐ต with regarding to a pattern ๐‘ƒ
๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต๐‘ƒ
๐‘ ๐‘ข๐‘“โˆ’ ๐‘—
= ๐‘–, ๐‘ ๐‘ฅ |๐‘†๐‘– = ๐‘๐‘ฅ ++๐‘ ๐‘ฅ โˆˆ ๐‘†๐‘’๐‘ž๐ท๐ต, ๐‘ ๐‘ฅ is the ๐‘— ๐‘กโ„Ž
minimum suffix of ๐‘†๐‘– of ๐‘ƒ
๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต pre-projected on ๐‘ƒ
๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต๐‘ƒ
๐‘๐‘Ÿ๐‘’
= ๐‘–, ๐‘๐‘ฅ ๐‘†๐‘– = ๐‘๐‘ฅ ++๐‘ ๐‘ฅ โˆˆ ๐‘†๐‘’๐‘ž๐ท๐ต, ๐‘ ๐‘ฅ is ๐ญ๐ก๐ž ๐ฆ๐ข๐ง๐ข๐ฆ๐ฎ๐ฆ ๐ฌ๐ฎ๐Ÿ๐Ÿ๐ข๐ฑ of ๐‘ƒ }
2019.11.18. 40
Anti-Monotonicity Property of Confidence
๏‚ง Proposition 1
๏‚ง Consider a rule ๐‘…, in the form of ๐‘… ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, and a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต
๏‚ง ๐‘๐‘œ๐‘›๐‘“ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต =
sup ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’
๐‘Ž๐‘™๐‘™
๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘… ๐‘๐‘Ÿ๐‘’, ๐‘†๐‘’๐‘ž๐ท๐ต
=
๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘… ๐‘๐‘Ÿ๐‘’, ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
๐‘๐‘Ÿ๐‘’
๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘… ๐‘๐‘Ÿ๐‘’, ๐‘†๐‘’๐‘ž๐ท๐ต
๏‚ง Proposition 2
๏‚ง Consider two rules ๐‘… and ๐‘…โ€ฒ in a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต with ๐‘… ๐‘๐‘Ÿ๐‘’
โ€ฒ = ๐‘… ๐‘๐‘Ÿ๐‘’ and
๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
= ๐‘’ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก for some event ๐‘’ โˆˆ ๐ผ
๏‚ง ๐‘๐‘œ๐‘›๐‘“ ๐‘… โ‰ฅ ๐‘๐‘œ๐‘›๐‘“ ๐‘…โ€ฒ
๏‚ง Theorem. Anti-Monotonicity Property of Confidence
๏‚ง Consider two rules ๐‘… and ๐‘…โ€ฒ
in a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต with ๐‘… ๐‘๐‘Ÿ๐‘’
โ€ฒ
= ๐‘… ๐‘๐‘Ÿ๐‘’ and
๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
= ๐‘’๐‘ฃ๐‘  ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก where ๐‘’๐‘ฃ๐‘  is an arbitrary series of events.
๏‚ง ๐‘๐‘œ๐‘›๐‘“ ๐‘… โ‰ฅ ๐‘๐‘œ๐‘›๐‘“ ๐‘…โ€ฒ
๏‚ง If ๐‘… is not confident enough(๐‘๐‘œ๐‘›๐‘“ ๐‘… < ๐‘š๐‘–๐‘›_๐‘๐‘œ๐‘›๐‘“), ๐‘…โ€ฒ
is not either
2019.11.18. 41
Pruning Redundant Post-Conds
๏‚ง In a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต, consider a post condition candidate ๐‘… ๐‘๐‘œ๐‘ ๐‘ก.
๏‚ง Lemma 1
๏‚ง If there is a post-condition candidate ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
โŠ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก such that
๏‚ง (i) ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
= ๐‘ƒ1 ++๐‘’ ++๐‘ƒ2 while ๐‘… ๐‘๐‘œ๐‘ ๐‘ก = ๐‘ƒ1 ++๐‘ƒ2, for some event ๐‘’, subsequences ๐‘ƒ1, (nonempty) ๐‘ƒ2
๏‚ง (ii) ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
๐‘๐‘Ÿ๐‘’
= ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
๐‘๐‘Ÿ๐‘’
๏‚ง then for any pre-condition candidate ๐‘๐‘Ÿ๐‘’ and any backward extension ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก of ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, the rule ๐‘… =
๐‘๐‘Ÿ๐‘’ โ†’ ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก is not confidence-closed
๏‚ง i.e., there exists another rule ๐‘…โ€ฒ
โŠ ๐‘… such that ๐‘๐‘œ๐‘›๐‘“ ๐‘… = ๐‘๐‘œ๐‘›๐‘“ ๐‘…โ€ฒ
๏‚ง Lemma 2
๏‚ง If there is a post-condition candidate ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
โŠ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก such that
๏‚ง (i) ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
= ๐‘ƒ1 ++๐‘’ ++๐‘ƒ2 while ๐‘… ๐‘๐‘œ๐‘ ๐‘ก = ๐‘ƒ1 ++๐‘ƒ2, for some event ๐‘’, subsequences (nonempty) ๐‘ƒ1, ๐‘ƒ2
๏‚ง (iii) โˆ€๐‘— โˆถ ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
๐‘ ๐‘ข๐‘“โˆ’๐‘—
= ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
๐‘ ๐‘ข๐‘“โˆ’๐‘—
, and
๏‚ง (iv) โˆ€๐‘— โˆถ ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
๐‘ ๐‘ข๐‘“โˆ’๐‘—
๐‘… ๐‘๐‘œ๐‘ ๐‘ก
๐‘Ž๐‘™๐‘™
= ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
๐‘ ๐‘ข๐‘“โˆ’๐‘—
๐‘… ๐‘๐‘œ๐‘ ๐‘ก
โ€ฒ
๐‘Ž๐‘™๐‘™
๏‚ง then for any pre-condition candidate ๐‘๐‘Ÿ๐‘’ and any backward extension ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก of ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, the rule ๐‘… =
๐‘๐‘Ÿ๐‘’ โ†’ ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก is not support-closed
๏‚ง i.e., there exists another rule ๐‘…โ€ฒ
โŠ ๐‘… such that ๐‘ ๐‘ข๐‘ ๐‘… = ๐‘ ๐‘ข๐‘ ๐‘…โ€ฒ
and ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
๐‘… = ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
๐‘…โ€ฒ
๏‚ง Theorem. Pruning Redundant Post-Conds
๏‚ง If the properties (i)-(iv) in Lemma 1 and 2 are satisfied,
๏‚ง then for any pre-condition candidate ๐‘๐‘Ÿ๐‘’ and any backward extension ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก of ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, the rule ๐‘… =
๐‘๐‘Ÿ๐‘’ โ†’ ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก is redundant.
2019.11.18. 42
Bidirectional Pruning-based Recurrent Rule Mining(BOB)
2019.11.18. 43
Interleaved Bidirectional Mining of NR3 (iBiRM)
2019.11.18. 44
Optimizing Operations
๏‚ง Given the sequence database ๐‘†๐‘’๐‘ž๐ท๐ต, and the rule ๐‘… = ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก
๏‚ง ๐‘ ๐‘ข๐‘ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๏‚ง ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’
๏‚ง Pruning the search space of PRE early
๏‚ง for ๐‘… = ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก and ๐‘…โ€ฒ = ๐‘๐‘Ÿ๐‘’ ++๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก,
๏‚ง if ๐‘ ๐‘ข๐‘ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ค ๐‘š๐‘–๐‘›_๐‘ ๐‘ข๐‘, then ๐‘ ๐‘ข๐‘ ๐‘…โ€ฒ, ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ค ๐‘š๐‘–๐‘›_๐‘ ๐‘ข๐‘
๏‚ง if ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ค ๐‘š๐‘–๐‘›_๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
, then ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™
๐‘…โ€ฒ
, ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ค ๐‘š๐‘–๐‘›_๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
๏‚ง Decreasing the number of scanning a database using a prefix tree
๏‚ง for each pre-condition ๐‘๐‘Ÿ๐‘’ โˆˆ ๐‘ƒ๐‘…๐ธ, suppose that a node ๐‘0 โˆˆ ๐‘‡๐‘ƒ๐‘‚๐‘†๐‘‡ has its children
nodes ๐‘1, โ€ฆ , ๐‘๐‘˜
๏‚ง we can compute the instance supports of its children nodes ๐‘1, โ€ฆ , ๐‘๐‘˜ by scanning ๐‘†๐‘’๐‘ž๐ท๐ต
once
๏‚ง When ๐‘0 corresponds to a post-condition ๐‘๐‘œ๐‘ ๐‘ก โˆˆ ๐‘ƒ๐‘‚๐‘†๐‘‡, each child node ๐‘๐‘– corresponds to
a post-condition ๐‘๐‘œ๐‘ ๐‘ก๐‘– = ๐‘’๐‘– ++๐‘๐‘œ๐‘ ๐‘ก for some event ๐‘’๐‘–, and the post condition of each child
node thus has its suffix ๐‘๐‘œ๐‘ ๐‘ก in common.
๏‚ง When scanning a sequence ๐‘  โˆˆ ๐‘†๐‘’๐‘ž๐ท๐ต, we record the positions of each ๐‘’๐‘–โ€™s and
those of the events appearing in ๐‘๐‘œ๐‘ ๐‘ก, from which we can compute the number of
instances of ๐‘๐‘Ÿ๐‘’ ++๐‘๐‘œ๐‘ ๐‘ก๐‘– in ๐‘ 
2019.11.18. โ€น#โ€บ
Bidirectional Pruning-based Recurrent Rule Mining(BOB)
2019.11.18. 46
Interleaved Bidirectional Recurrent Rule Miner (iBiRM)
2019.11.18. โ€น#โ€บ
Experiment Environment
๏‚ง Dataset
๏‚ง D5C20N10R0.5 (IBM synthetic data generator)
๏‚ง 4,999 sequences, average length 64.39
๏‚ง BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)
๏‚ง 59,601 sequences, average length 2.42
๏‚ง Experiment Machine
๏‚ง Intel Core i5 2.50GHz
๏‚ง 8GB RAM
๏‚ง Microsoft Windows 7 Professional x64
๏‚ง Implementation
๏‚ง Java SE 8
๏‚ง Default JVM settings
2019.11.18. 48
D5C20N10R0.5
๏‚ง (a)-(d) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 2.0 โˆ’ 2.8%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™
= 1
๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 2.4%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
2019.11.18. 49
BMSWebView1
๏‚ง (a)-(c) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.092 โˆ’ 0.108%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.092%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1
2019.11.18. 50
Conclusion
2019.11.18. 51
Conclusion & Future Works
๏‚ง Conclusion
๏‚ง We have proposed Parallel Non-Redundant Recurrent Rules Miner (pNR3)
๏‚ง We have proposed Loop-Fused Non-Redundant Recurrent Rules Miner(LF-NR3)
๏‚ง We have proposed Parallel Loop-Fused Non-Redundant Recurrent Rules Miner
(pLF-NR3)
๏‚ง We have proposed Interleaved Bidirectional Non-Redundant Recurrent Rules Miner
(iBiRM)
๏‚ง Future works
๏‚ง Improvement of the sequential recurrent rule mining algorithm
๏‚ง Improvement of the parallel algorithms
๏‚ง Source codes are available at https://bitbucket.org/sekilab/nr3
2019.11.18. 52

More Related Content

What's hot

Deadlock
DeadlockDeadlock
Deadlock
Mohd Arif
ย 
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
Shayek Parvez
ย 
ใ€Maclean liuๆŠ€ๆœฏๅˆ†ไบซใ€‘ๆ‹จๅผ€oracle cboไผ˜ๅŒ–ๅ™จ่ฟท้›พ,ๆŽข็ฉถhistogram็›ดๆ–นๅ›พไน‹็ง˜ 0321
ใ€Maclean liuๆŠ€ๆœฏๅˆ†ไบซใ€‘ๆ‹จๅผ€oracle cboไผ˜ๅŒ–ๅ™จ่ฟท้›พ,ๆŽข็ฉถhistogram็›ดๆ–นๅ›พไน‹็ง˜ 0321ใ€Maclean liuๆŠ€ๆœฏๅˆ†ไบซใ€‘ๆ‹จๅผ€oracle cboไผ˜ๅŒ–ๅ™จ่ฟท้›พ,ๆŽข็ฉถhistogram็›ดๆ–นๅ›พไน‹็ง˜ 0321
ใ€Maclean liuๆŠ€ๆœฏๅˆ†ไบซใ€‘ๆ‹จๅผ€oracle cboไผ˜ๅŒ–ๅ™จ่ฟท้›พ,ๆŽข็ฉถhistogram็›ดๆ–นๅ›พไน‹็ง˜ 0321
maclean liu
ย 
The implementation of Banker's algorithm, data structure and its parser
The implementation of Banker's algorithm, data structure and its parserThe implementation of Banker's algorithm, data structure and its parser
The implementation of Banker's algorithm, data structure and its parser
Matthew Chang
ย 
Ch8 OS
Ch8 OSCh8 OS
Ch8 OS
C.U
ย 

What's hot (20)

Deadlocks in operating system
Deadlocks in operating systemDeadlocks in operating system
Deadlocks in operating system
ย 
Deadlock
DeadlockDeadlock
Deadlock
ย 
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
Deadlock avoidance (Safe State, Resource Allocation Graph Algorithm)
ย 
7 Deadlocks
7 Deadlocks7 Deadlocks
7 Deadlocks
ย 
Chapter 7 - Deadlocks
Chapter 7 - DeadlocksChapter 7 - Deadlocks
Chapter 7 - Deadlocks
ย 
Deadlock Detection in Distributed Systems
Deadlock Detection in Distributed SystemsDeadlock Detection in Distributed Systems
Deadlock Detection in Distributed Systems
ย 
Deadlock
DeadlockDeadlock
Deadlock
ย 
Operating System
Operating SystemOperating System
Operating System
ย 
OS_Ch8
OS_Ch8OS_Ch8
OS_Ch8
ย 
Bankers
BankersBankers
Bankers
ย 
ใ€Maclean liuๆŠ€ๆœฏๅˆ†ไบซใ€‘ๆ‹จๅผ€oracle cboไผ˜ๅŒ–ๅ™จ่ฟท้›พ,ๆŽข็ฉถhistogram็›ดๆ–นๅ›พไน‹็ง˜ 0321
ใ€Maclean liuๆŠ€ๆœฏๅˆ†ไบซใ€‘ๆ‹จๅผ€oracle cboไผ˜ๅŒ–ๅ™จ่ฟท้›พ,ๆŽข็ฉถhistogram็›ดๆ–นๅ›พไน‹็ง˜ 0321ใ€Maclean liuๆŠ€ๆœฏๅˆ†ไบซใ€‘ๆ‹จๅผ€oracle cboไผ˜ๅŒ–ๅ™จ่ฟท้›พ,ๆŽข็ฉถhistogram็›ดๆ–นๅ›พไน‹็ง˜ 0321
ใ€Maclean liuๆŠ€ๆœฏๅˆ†ไบซใ€‘ๆ‹จๅผ€oracle cboไผ˜ๅŒ–ๅ™จ่ฟท้›พ,ๆŽข็ฉถhistogram็›ดๆ–นๅ›พไน‹็ง˜ 0321
ย 
The implementation of Banker's algorithm, data structure and its parser
The implementation of Banker's algorithm, data structure and its parserThe implementation of Banker's algorithm, data structure and its parser
The implementation of Banker's algorithm, data structure and its parser
ย 
Mca ii os u-3 dead lock & io systems
Mca  ii  os u-3 dead lock & io systemsMca  ii  os u-3 dead lock & io systems
Mca ii os u-3 dead lock & io systems
ย 
OSCh8
OSCh8OSCh8
OSCh8
ย 
Ch8 OS
Ch8 OSCh8 OS
Ch8 OS
ย 
Deadlocks in operating system
Deadlocks in operating systemDeadlocks in operating system
Deadlocks in operating system
ย 
Chapter 4
Chapter 4Chapter 4
Chapter 4
ย 
Deadlocks
DeadlocksDeadlocks
Deadlocks
ย 
Deadlock
DeadlockDeadlock
Deadlock
ย 
OOW13 JB KP ASH Deep Dive
OOW13 JB KP ASH Deep DiveOOW13 JB KP ASH Deep Dive
OOW13 JB KP ASH Deep Dive
ย 

Similar to Mining non-redundant recurrent rules from a sequence database

๊ธฐ๊ณ„ํ•™์Šต์„ ์ด์šฉํ•˜์—ฌ ์ •์  ๋ถ„์„๊ธฐ์˜ ์•ˆ์ „์„ฑ์„ ์„ ๋ณ„์ ์œผ๋กœ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•
๊ธฐ๊ณ„ํ•™์Šต์„ ์ด์šฉํ•˜์—ฌ ์ •์  ๋ถ„์„๊ธฐ์˜ ์•ˆ์ „์„ฑ์„ ์„ ๋ณ„์ ์œผ๋กœ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๊ธฐ๊ณ„ํ•™์Šต์„ ์ด์šฉํ•˜์—ฌ ์ •์  ๋ถ„์„๊ธฐ์˜ ์•ˆ์ „์„ฑ์„ ์„ ๋ณ„์ ์œผ๋กœ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•
๊ธฐ๊ณ„ํ•™์Šต์„ ์ด์šฉํ•˜์—ฌ ์ •์  ๋ถ„์„๊ธฐ์˜ ์•ˆ์ „์„ฑ์„ ์„ ๋ณ„์ ์œผ๋กœ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•
NAVER Engineering
ย 
Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1
paul0001
ย 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!
Michael Barker
ย 

Similar to Mining non-redundant recurrent rules from a sequence database (20)

WWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generatorsWWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generators
ย 
๊ธฐ๊ณ„ํ•™์Šต์„ ์ด์šฉํ•˜์—ฌ ์ •์  ๋ถ„์„๊ธฐ์˜ ์•ˆ์ „์„ฑ์„ ์„ ๋ณ„์ ์œผ๋กœ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•
๊ธฐ๊ณ„ํ•™์Šต์„ ์ด์šฉํ•˜์—ฌ ์ •์  ๋ถ„์„๊ธฐ์˜ ์•ˆ์ „์„ฑ์„ ์„ ๋ณ„์ ์œผ๋กœ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๊ธฐ๊ณ„ํ•™์Šต์„ ์ด์šฉํ•˜์—ฌ ์ •์  ๋ถ„์„๊ธฐ์˜ ์•ˆ์ „์„ฑ์„ ์„ ๋ณ„์ ์œผ๋กœ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•
๊ธฐ๊ณ„ํ•™์Šต์„ ์ด์šฉํ•˜์—ฌ ์ •์  ๋ถ„์„๊ธฐ์˜ ์•ˆ์ „์„ฑ์„ ์„ ๋ณ„์ ์œผ๋กœ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•
ย 
Lash
LashLash
Lash
ย 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
ย 
Fast Sequential Rule Mining
Fast Sequential Rule MiningFast Sequential Rule Mining
Fast Sequential Rule Mining
ย 
Mining Top-k Closed Sequential Patterns in Sequential Databases
Mining Top-k Closed Sequential Patterns in Sequential Databases Mining Top-k Closed Sequential Patterns in Sequential Databases
Mining Top-k Closed Sequential Patterns in Sequential Databases
ย 
lecture2.pdf
lecture2.pdflecture2.pdf
lecture2.pdf
ย 
Foundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual SystemsFoundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual Systems
ย 
Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...
ย 
Learning from 6,000 projects mining specifications in the large
Learning from 6,000 projects   mining specifications in the largeLearning from 6,000 projects   mining specifications in the large
Learning from 6,000 projects mining specifications in the large
ย 
3 recursion
3 recursion3 recursion
3 recursion
ย 
3-Recursion.ppt
3-Recursion.ppt3-Recursion.ppt
3-Recursion.ppt
ย 
Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1
ย 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature Hashing
ย 
Modifed my_poster
Modifed my_posterModifed my_poster
Modifed my_poster
ย 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
ย 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
ย 
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
ย 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael Barker
ย 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!
ย 

More from SeungYong Yoon

์ •๋ณด๋ณดํ˜ธ ์ตœ๊ณ ์ฑ…์ž„์ž(CISO)์˜ ๋ฒ•์  ์ง€์œ„ ์ œ์•ˆ
์ •๋ณด๋ณดํ˜ธ ์ตœ๊ณ ์ฑ…์ž„์ž(CISO)์˜ ๋ฒ•์  ์ง€์œ„ ์ œ์•ˆ์ •๋ณด๋ณดํ˜ธ ์ตœ๊ณ ์ฑ…์ž„์ž(CISO)์˜ ๋ฒ•์  ์ง€์œ„ ์ œ์•ˆ
์ •๋ณด๋ณดํ˜ธ ์ตœ๊ณ ์ฑ…์ž„์ž(CISO)์˜ ๋ฒ•์  ์ง€์œ„ ์ œ์•ˆ
SeungYong Yoon
ย 

More from SeungYong Yoon (9)

์ •๋ณด๋ณดํ˜ธ ์ตœ๊ณ ์ฑ…์ž„์ž(CISO)์˜ ๋ฒ•์  ์ง€์œ„ ์ œ์•ˆ
์ •๋ณด๋ณดํ˜ธ ์ตœ๊ณ ์ฑ…์ž„์ž(CISO)์˜ ๋ฒ•์  ์ง€์œ„ ์ œ์•ˆ์ •๋ณด๋ณดํ˜ธ ์ตœ๊ณ ์ฑ…์ž„์ž(CISO)์˜ ๋ฒ•์  ์ง€์œ„ ์ œ์•ˆ
์ •๋ณด๋ณดํ˜ธ ์ตœ๊ณ ์ฑ…์ž„์ž(CISO)์˜ ๋ฒ•์  ์ง€์œ„ ์ œ์•ˆ
ย 
๊ณ„์‚ฐ ์ข…์ด์ ‘๊ธฐ ์ž…๋ฌธ(2)
๊ณ„์‚ฐ ์ข…์ด์ ‘๊ธฐ ์ž…๋ฌธ(2)๊ณ„์‚ฐ ์ข…์ด์ ‘๊ธฐ ์ž…๋ฌธ(2)
๊ณ„์‚ฐ ์ข…์ด์ ‘๊ธฐ ์ž…๋ฌธ(2)
ย 
๊ณ„์‚ฐ ์ข…์ด์ ‘๊ธฐ ์ž…๋ฌธ(1)
๊ณ„์‚ฐ ์ข…์ด์ ‘๊ธฐ ์ž…๋ฌธ(1)๊ณ„์‚ฐ ์ข…์ด์ ‘๊ธฐ ์ž…๋ฌธ(1)
๊ณ„์‚ฐ ์ข…์ด์ ‘๊ธฐ ์ž…๋ฌธ(1)
ย 
์–‘์ž ์ •๋ณดํ•™ ๊ฐ•์˜ (Quantum Information Lecture)
์–‘์ž ์ •๋ณดํ•™ ๊ฐ•์˜ (Quantum Information Lecture)์–‘์ž ์ •๋ณดํ•™ ๊ฐ•์˜ (Quantum Information Lecture)
์–‘์ž ์ •๋ณดํ•™ ๊ฐ•์˜ (Quantum Information Lecture)
ย 
๋””์ง€ํ„ธํฌ๋ Œ์‹, ์ด๊ฒƒ๋งŒ ์•Œ์ž!
๋””์ง€ํ„ธํฌ๋ Œ์‹, ์ด๊ฒƒ๋งŒ ์•Œ์ž!๋””์ง€ํ„ธํฌ๋ Œ์‹, ์ด๊ฒƒ๋งŒ ์•Œ์ž!
๋””์ง€ํ„ธํฌ๋ Œ์‹, ์ด๊ฒƒ๋งŒ ์•Œ์ž!
ย 
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (4)
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (4)ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (4)
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (4)
ย 
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (2)
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (2)ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (2)
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (2)
ย 
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (1)
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (1)ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (1)
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (1)
ย 
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (3)
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (3)ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (3)
ใ‚ตใƒผใƒใ‚’ไฝœใฃใฆใฟใŸ (3)
ย 

Recently uploaded

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
ย 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
ย 
Call Girls in South Ex (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in South Ex (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7Call Girls in South Ex (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in South Ex (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 

Recently uploaded (20)

Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
ย 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
ย 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
ย 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
ย 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
ย 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
ย 
Bhubaneswar๐ŸŒนCall Girls Bhubaneswar โคKomal 9777949614 ๐Ÿ’Ÿ Full Trusted CALL GIRL...
Bhubaneswar๐ŸŒนCall Girls Bhubaneswar โคKomal 9777949614 ๐Ÿ’Ÿ Full Trusted CALL GIRL...Bhubaneswar๐ŸŒนCall Girls Bhubaneswar โคKomal 9777949614 ๐Ÿ’Ÿ Full Trusted CALL GIRL...
Bhubaneswar๐ŸŒนCall Girls Bhubaneswar โคKomal 9777949614 ๐Ÿ’Ÿ Full Trusted CALL GIRL...
ย 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
ย 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
ย 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
ย 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
ย 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
ย 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
ย 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
ย 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
ย 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
ย 
Call Girls in South Ex (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in South Ex (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7Call Girls in South Ex (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in South Ex (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
ย 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
ย 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
ย 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
ย 

Mining non-redundant recurrent rules from a sequence database

  • 1. Mining Non-Redundant Recurrent Rules from a Sequence Database Yoon SeungYong Ministry of Science and ICT, Republic of Korea forcom@forcom.kr - Efficient Mining of Recurrent Rules from a Sequence Database(Lo et al., DASFAA 2008) - Parallel Mining of Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, ISIS 2017) ยท A Parallel Algorithm for Mining Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, JACIII 2019) - Towards Efficient Mining of Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, IWCIA 2017) ยท Mining Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, IJCISTUDIES 2018) - Efficient Mining of Recurrent Rules from a Sequence Database Using Multi-Core Processors(Yoon and Seki, SCIS&ISIS 2018) - Bidirectional Mining of Non-Redundant Recurrent Rules from a Sequence Database(Lo et al., IEEE ICDE 2011) - A New Algorithm for Mining Recurrent Rules from a Sequence Database(Seki and Yoon, IEEE SMC 2019)
  • 2. Table of Contents 1. Motivation 2. Mining Non-Redundant Recurrent Rules (NR3) โ€“ Lo et al. 3. Parallel Mining of Non-Redundant Recurrent Rules (pNR3) 4. Loop-Fused Mining of NR3 (LF-NR3) 5. Parallel Loop-Fused Mining of NR3 (pLF-NR3) 6. Bidirectional Mining of NR3 (BOB) โ€“ Lo et al. 7. Interleaved Bidirectional Mining of NR3 (iBiRM) 8. Conclusion 2019.11.18. 2
  • 4. Sequence Database & Sequential Rule ๏‚ง Transaction Histories ๏‚ง Program Traces 2019.11.18. 4 Customer Movie Rental History Alice Star Wars 4, Star Wars 5, Star Wars 6, Star Wars 1 Bob Shrek, Spirited Away, Your Name Clara Spirited Away, Howlโ€™s Moving Castle, Princess Mononoke David Star Wars 1, Star Wars 2, Star Wars 3, Star Wars 4, Star Wars 5 Eve Your Name Trace ID Command 1 check, lock, use, use, unlock, exit 2 check, lock, use, check, lock, use, unlock, exit 3 check, use, unlock, exit 4 check, lock, use 5 check, lock, use, unlock, check, lock, use, unlock, exit ใ€ˆStar Wars 4ใ€‰โ†’ ใ€ˆStar Wars 5ใ€‰ ใ€ˆlockใ€‰โ†’ ใ€ˆunlockใ€‰
  • 5. What is a recurrent rule? ๏‚ง Recurrent Rule ๐‘… = ๐‘… ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก ๏‚ง โ€œWhenever a series of precedent events occurs, eventually another series of consequent events occursโ€ ๏‚ง e.g., ๐‘… = โŸจcheck, lockโŸฉ โ†’ โŸจuse, unlockโŸฉ โ€œWhenever โŸจcheck, lockโŸฉ occurs, eventually โŸจuse, unlockโŸฉ occursโ€ ๏‚ง Captures temporal constraints that repeat a meaningful number of times both within a sequence and across multiple sequences ๏‚ง A sequential rule ๐‘… = ๐‘… ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก means โ€œwhenever a sequence is a super-sequence of ๐‘… ๐‘๐‘Ÿ๐‘’, it will be a super-sequence of ๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘… ๐‘๐‘œ๐‘ ๐‘กโ€ ๏‚ง Linear Temporal Logic (LTL) ๏‚ง One of the most widely-used formalism for program verification ๏‚ง Clarke, Edmund M., Orna Grumberg, and Doron Peled. Model checking. MIT press, 1999. ๏‚ง Recurrent rule can be expressed in the form of LTL 2019.11.18. 5 - proposed by David LO
  • 6. Mining Non-Redundant Recurrent Rules (NR3) based on David LO, Siau-Cheng KHOO, NUS and Chao LIU, DASFAA, 2008 2019.11.18. 6
  • 7. Preliminaries & Examples (1) ๏‚ง a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต โ€“ a set of sequences : ๐‘†1, ๐‘†2, ๐‘†3, ๐‘†4, ๐‘†5 ๏‚ง a set of events ๐ผ in ๐‘†๐‘’๐‘ž๐ท๐ต : {check, exit, lock, unlock, use} ๏‚ง a size of ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘†๐‘’๐‘ž๐ท๐ต : ๐‘†๐‘’๐‘ž๐ท๐ต = 5 ๏‚ง a sequence ๐‘† = ๐‘’1, ๐‘’2, โ€ฆ , ๐‘’ ๐‘› โˆถ ๐‘†1 = โŸจcheck, lock, use, use, unlock, exitโŸฉ ๏‚ง a temporal point ๐‘— of ๐‘’๐‘— in ๐‘† : an event of a temporal point 5 in ๐‘†1 is unlock ๏‚ง a length of ๐‘† = ๐‘† = ๐‘› : ๐‘†1 = 6 ๏‚ง the last event of ๐‘† = ๐‘™๐‘Ž๐‘ ๐‘ก ๐‘† = ๐‘†[๐‘›] : ๐‘™๐‘Ž๐‘ ๐‘ก ๐‘†1 = exit ๏‚ง the j-prefix of ๐‘† = ๐‘† ๐‘— = โŸจ๐‘’1, ๐‘’2, โ€ฆ , ๐‘’๐‘—โŸฉ : ๐‘†1 2 = โŸจcheck, lockโŸฉ 2019.11.18. 7 SID Sequence ๐‘†1 โŸจcheck, lock, use, use, unlock, exitโŸฉ ๐‘†2 โŸจcheck, lock, use, check, lock, use, unlock, exitโŸฉ ๐‘†3 โŸจcheck, use, unlock, exitโŸฉ ๐‘†4 โŸจcheck, lock, useโŸฉ ๐‘†5 โŸจcheck, lock, use, unlock, check, lock, use, unlock, exitโŸฉ an example sequence database ๐‘†๐‘’๐‘ž๐ท๐ต
  • 8. Preliminaries & Examples (2) ๏‚ง Given a sequence ๐‘† = โŸจ๐‘’1, โ€ฆ , ๐‘’ ๐‘›โŸฉ and ๐‘†โ€ฒ = โŸจ๐‘’1 โ€ฒ , โ€ฆ , ๐‘’ ๐‘š โ€ฒ โŸฉ ๏‚ง the concatenation of ๐‘† and ๐‘†โ€ฒ โ‰” ๐‘† ++๐‘†โ€ฒ = โŸจ๐‘’1, โ€ฆ , ๐‘’ ๐‘›, ๐‘’1 โ€ฒ , โ€ฆ , ๐‘’ ๐‘š โ€ฒ โŸฉ ๏‚ง ๐‘† is a super-sequence of ๐‘†โ€ฒ โ‰” ๐‘† โŠ’ ๐‘†โ€ฒ if ๐‘’๐‘–1 = ๐‘’1 โ€ฒ , โ€ฆ , ๐‘’๐‘– ๐‘š = ๐‘’ ๐‘š โ€ฒ (1 โ‰ค ๐‘–1 โ‰ค โ‹ฏ โ‰ค ๐‘– ๐‘š โ‰ค ๐‘›) ๏‚ง e.g., ๐‘†1 โŠ’ โŸจcheck, lock, unlockโŸฉ : ๏‚ง ๐‘† ๐‘— is an instance of ๐‘†โ€ฒ in ๐‘†, if ๐‘† ๐‘— โŠ’ ๐‘†โ€ฒ and ๐‘™๐‘Ž๐‘ ๐‘ก ๐‘†โ€ฒ = ๐‘† ๐‘— ๏‚ง ๐‘† ๐‘— is the minimum instance of ๐‘†โ€ฒ in ๐‘†, if ๐‘† ๐‘— is an instance of ๐‘†โ€ฒ and โˆ„๐‘˜ < ๐‘—, ๐‘ . ๐‘ก. , ๐‘† ๐‘˜ is an instance of ๐‘†โ€ฒ ๏‚ง e.g., ๐‘†1 3 , ๐‘†1 4 are instances of โŸจcheck, lock, useโŸฉ in ๐‘†1, and ๐‘†1 3 is the minimum ๏‚ง ๐‘†5 9 is an instance of ๐‘†1 in ๐‘†5, and it is the minimum 2019.11.18. 8 SID Sequence ๐‘†1 โŸจcheck, lock, use, use, unlock, exitโŸฉ ๐‘†2 โŸจcheck, lock, use, check, lock, use, unlock, exitโŸฉ ๐‘†3 โŸจcheck, use, unlock, exitโŸฉ ๐‘†4 โŸจcheck, lock, useโŸฉ ๐‘†5 โŸจcheck, lock, use, unlock, check, lock, use, unlock, exitโŸฉ ๐‘†1 = โŸจcheck, lock, use, use, unlock, exitโŸฉ an example sequence database ๐‘†๐‘’๐‘ž๐ท๐ต
  • 9. Definitions & Examples (1) ๏‚ง Given a sequence ๐‘ƒ = โŸจlock, useโŸฉ and a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต ๏‚ง Consider a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต and a sequence ๐‘ƒ ๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต projected on ๐‘ƒ ๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ = ๐‘–, ๐‘ ๐‘ฅ ๐‘†๐‘– = ๐‘๐‘ฅ ++๐‘ ๐‘ฅ โˆˆ ๐‘†๐‘’๐‘ž๐ท๐ต, ๐‘๐‘ฅ is the minimum instance of ๐‘ƒ } ๏‚ง the sequence support ๐‘ ๐‘ข๐‘ ๐‘ƒ, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ ๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต all-projected on ๐‘ƒ ๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ ๐‘Ž๐‘™๐‘™ = ๐‘–, ๐‘ ๐‘ฅ ๐‘†๐‘– = ๐‘๐‘ฅ ++๐‘ ๐‘ฅ โˆˆ ๐‘†๐‘’๐‘ž๐ท๐ต, ๐‘๐‘ฅ is ๐š๐ง ๐ข๐ง๐ฌ๐ญ๐š๐ง๐œ๐ž of ๐‘ƒ } ๏‚ง the instance support ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘ƒ, ๐‘†๐‘’๐‘ž๐ท๐ต = |๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ ๐‘Ž๐‘™๐‘™ | 2019.11.18. 9 SID Sequence ๐‘†1 โŸจcheck, lock, use, use, unlock, exitโŸฉ ๐‘†2 โŸจcheck, lock, use, check, lock, use, unlock, exitโŸฉ ๐‘†3 โŸจcheck, use, unlock, exitโŸฉ ๐‘†4 โŸจcheck, lock, useโŸฉ ๐‘†5 โŸจcheck, lock, use, unlock, check, lock, use, unlock, exitโŸฉ SIDSequence ๐‘†1 โŸจuse, unlock, exitโŸฉ ๐‘†2 โŸจcheck, lock, use, unlock, exitโŸฉ ๐‘†4 โŸจโŸฉ ๐‘†5 โŸจunlock, check, lock, use, unlock, exitโŸฉ ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ ๐‘ ๐‘ข๐‘ ๐‘ƒ, ๐‘†๐‘’๐‘ž๐ท๐ต = 4 SIDSequence ๐‘†1 โŸจuse, unlock, exitโŸฉ ๐‘†1 โŸจunlock, exitโŸฉ ๐‘†2 โŸจcheck, lock, use, unlock, exitโŸฉ ๐‘†2 โŸจunlock, exitโŸฉ ๐‘†4 โŸจโŸฉ ๐‘†5 โŸจunlock, check, lock, use, unlock, exitโŸฉ ๐‘†5 โŸจunlock, exitโŸฉ ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ƒ ๐‘Ž๐‘™๐‘™ ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘ƒ, ๐‘†๐‘’๐‘ž๐ท๐ต = 7
  • 10. Definitions & Examples (2) ๏‚ง Consider a recurrent rule ๐‘… = ๐‘… ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก in a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต ๏‚ง the pre-condition ๐‘… ๐‘๐‘Ÿ๐‘’, the post-condition ๐‘… ๐‘๐‘œ๐‘ ๐‘ก ๏‚ง the sequence support ๐‘ ๐‘ข๐‘ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘(๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต) ๏‚ง the instance support ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ (๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต) ๏‚ง the confidence ๐‘๐‘œ๐‘›๐‘“ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = sup ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘… ๐‘๐‘Ÿ๐‘’, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ ๏‚ง ๐‘… is significant if ๐‘ ๐‘ข๐‘ ๐‘…,๐‘†๐‘’๐‘ž๐ท๐ต โ‰ฅ ๐‘š๐‘– ๐‘›_๐‘ ๐‘ข๐‘, ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘…,๐‘†๐‘’๐‘ž๐ท๐ต โ‰ฅ ๐‘š๐‘– ๐‘›_๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ , ๐‘๐‘œ๐‘›๐‘“ ๐‘…,๐‘†๐‘’๐‘ž๐ท๐ต โ‰ฅ ๐‘š๐‘– ๐‘›_๐‘๐‘œ๐‘›๐‘“ ๏‚ง Given a rule ๐‘… = โŸจlock, useโŸฉ โ†’ unlock and a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต ๏‚ง the sequence support ๐‘ ๐‘ข๐‘ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ โŸจlock, use, unlockโŸฉ, ๐‘†๐‘’๐‘ž๐ท๐ต = 3 ๏‚ง the instance support ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ โŸจlock, use, unlockโŸฉ, ๐‘†๐‘’๐‘ž๐ท๐ต = 4 ๏‚ง the confidence ๐‘๐‘œ๐‘›๐‘“ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = sup โŸจunlockโŸฉ, ๐‘†๐‘’๐‘ž๐ท๐ตโŸจlock,useโŸฉ ๐‘Ž๐‘™๐‘™ ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ โŸจlock,useโŸฉ, ๐‘†๐‘’๐‘ž๐ท๐ต = 6 7 2019.11.18. 10 SID Sequence ๐‘†1 โŸจcheck, lock, use, use, unlock, exitโŸฉ ๐‘†2 โŸจcheck, lock, use, check, lock, use, unlock, exitโŸฉ ๐‘†3 โŸจcheck, use, unlock, exitโŸฉ ๐‘†4 โŸจcheck, lock, useโŸฉ ๐‘†5 โŸจcheck, lock, use, unlock, check, lock, use, unlock, exitโŸฉ ๐‘†๐‘’๐‘ž๐ท๐ต โ†’
  • 11. Rule Redundancy ๏‚ง Consider ๐‘… = โŸจcheckโŸฉ โ†’ โŸจlock, use, unlockโŸฉ and ๐‘…โ€ฒ = โŸจcheckโŸฉ โ†’ โŸจunlockโŸฉ with the same sequence/instance support and confidence ๏‚ง Do we really need both these rules? ๏‚ง Rule Redundancy ๏‚ง A rule ๐‘…โ€ฒ = ๐‘… ๐‘๐‘Ÿ๐‘’ โ€ฒ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ is redundant if there is another rule ๐‘… = ๐‘… ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก 1. the same sequence/instance support and confidence 2. ๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก โŠ’ ๐‘… ๐‘๐‘Ÿ๐‘’ โ€ฒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ (R is longer than Rโ€™) ๏‚ง Mining Non-Redundant Recurrent Rules ๏‚ง Mine pruned pre/post-conditions using modified BIDE (LS-Set miner) ๏‚ง BIDE : frequent closed sequence mining algorithm based on pattern-growth strategy ๏‚ง Wang, Jianyong, and Jiawei Han. "BIDE: Efficient mining of frequent closed sequences." Data Engineering, 2004. Proceedings. 20th International Conference on. IEEE, 2004. 2019.11.18. 11 ๐‘† = โŸจcheck, lock, use, unlockโŸฉ
  • 12. FS-Set, CS-Set, LS-Set ๏‚ง The set of frequent sequential pattern (FS-Set) ๏‚ง ๐น๐‘† = {๐‘ | support ๐‘  โ‰ฅ min_sup} ๏‚ง The set of closed frequent sequential pattern (CS-Set) ๏‚ง ๐ถ๐‘† = {๐‘ |๐‘  โˆˆ ๐น๐‘† ๐‘Ž๐‘›๐‘‘ โˆ„๐‘ โ€ฒ โˆˆ ๐น๐‘†, ๐‘ ๐‘ข๐‘โ„Ž ๐‘กโ„Ž๐‘Ž๐‘ก ๐‘  โŠ‘ ๐‘ โ€ฒ ๐‘Ž๐‘›๐‘‘ support ๐‘  = support ๐‘ โ€ฒ } ๏‚ง Project Database Closed Set (LS-Set) ๏‚ง ๐ฟ๐‘† = {๐‘ | support ๐‘  โ‰ฅ min_sup ๐‘Ž๐‘›๐‘‘ โˆ„๐‘ โ€ฒ , ๐‘ ๐‘ข๐‘โ„Ž ๐‘กโ„Ž๐‘Ž๐‘ก ๐‘  โŠ‘ ๐‘ โ€ฒ ๐‘Ž๐‘›๐‘‘ ๐‘†๐‘’๐‘ž๐ท๐ต๐‘  = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ โ€ฒ} ๏‚ง cf. ๐‘†๐‘’๐‘ž๐ท๐ต๐‘  = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ โ€ฒ โ‡” ๐‘†๐‘’๐‘ž๐ท๐ต๐‘  = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘ โ€ฒ ๏‚ง Xifeng Yan, Jiawei Han, Ramin Afshar, โ€œCloSpan: Mining Closed Sequential Patterns in Large Datasetsโ€œ, SIAM 2003 2019.11.18. 12
  • 13. Pruning Redundant Pre-Conds ๏‚ง In a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต, consider a pre-condition candidate ๐‘… ๐‘๐‘Ÿ๐‘’. ๏‚ง If there is a pre-condition candidate ๐‘… ๐‘๐‘Ÿ๐‘’ โ€ฒ โŠ ๐‘… ๐‘๐‘Ÿ๐‘’ such that ๏‚ง (i) ๐‘… ๐‘๐‘Ÿ๐‘’ โ€ฒ = ๐‘ƒ1 ++๐‘’ ++๐‘ƒ2 while ๐‘… ๐‘๐‘Ÿ๐‘’ = ๐‘ƒ1 ++๐‘ƒ2, for some event ๐‘’ and nonempty ๐‘ƒ1, ๐‘ƒ2 ๏‚ง (ii) ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’ = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’ โ€ฒ ๏‚ง then, for any post-condition candidate ๐‘๐‘œ๐‘ ๐‘ก and any forward extension ๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘ƒ, ๏‚ง the rule ๐‘… ๐‘๐‘Ÿ๐‘’ ++๐‘ƒ โ†’ ๐‘๐‘œ๐‘ ๐‘ก is redundant 2019.11.18. 13
  • 14. LS-Set BIDE 2019.11.18. 14 Backward-extension event checking is omitted from the original BIDE algorithm โ€ข David Lo, Siau-Cheng KHOO, Chao LIU, โ€œMining Recurrent Rules from Sequence Databaseโ€, TR12/07 NUS
  • 15. Non-Redundant Recurrent Rules Miner (NR3) ๏‚ง Input: a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต; thresholds min_sup, min_supall, min_conf ๏‚ง Output: Significant and non-redundant recurrent rules ๐‘…๐‘ข๐‘™๐‘’๐‘  ๏‚ง Procedure 1. ๐‘ƒ๐‘Ÿ๐‘’๐ถ๐‘œ๐‘›๐‘‘ โ‰” A pruned set of pre-conditions from ๐‘†๐‘’๐‘ž๐ท๐ต satisfying ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ 2. foreach ๐‘๐‘Ÿ๐‘’ โˆˆ ๐‘ƒ๐‘Ÿ๐‘’๐ถ๐‘œ๐‘›๐‘‘ do 1. ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ โ‰” ๐‘†๐‘’๐‘ž๐ท๐ต allโˆ’projected on ๐‘๐‘Ÿ๐‘’ 2. ๐‘๐‘กโ„Ž๐‘‘ โ‰” ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ ร— ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ 3. ๐‘ƒ๐‘œ๐‘ ๐‘ก๐ถ๐‘œ๐‘›๐‘‘ โ‰” A pruned set of post-conditions from ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ satisfying ๐‘๐‘กโ„Ž๐‘‘ 4. foreach ๐‘๐‘œ๐‘ ๐‘ก โˆˆ ๐‘ƒ๐‘œ๐‘ ๐‘ก๐ถ๐‘œ๐‘›๐‘‘ do 1. if ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘๐‘Ÿ๐‘’ ++๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ฅ ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ then 1. ๐‘…๐‘ข๐‘™๐‘’๐‘  = ๐‘…๐‘ข๐‘™๐‘’๐‘  โˆช ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก 3. Remove remaining redundancy in ๐‘…๐‘ข๐‘™๐‘’๐‘  ๏‚ง Alias for Tasks ๏‚ง Procedure line 1 : GenPre task ๏‚ง Procedure line 2.1 โ€“ 2.4 : GenRule task ๏‚ง Procedure line 3 : RemRedun task 2019.11.18. 15 a c b ac b a a b c ๐œ€ <a>โ†’<c,a,d> <a>โ†’<c,b,b> <a>โ†’<b> Rules <a,b>โ†’<c,d> hash table <a>โ†’<c,a,d> <a>โ†’<c,b,b> <a,b>โ†’<c,d> <a,b>โ†’<c,a> <a>โ†’<b> Rules <c,a,d>
  • 16. Parallel Mining of Non-Redundant Recurrent Rules (pNR3) 2019.11.18. 16
  • 17. Revisiting Non-Redundant Recurrent Rules Miner (NR3) ๏‚ง Input: a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต; thresholds min_sup, min_supall, min_conf ๏‚ง Output: Significant and non-redundant recurrent rules ๐‘…๐‘ข๐‘™๐‘’๐‘  ๏‚ง Procedure 1. ๐‘ƒ๐‘Ÿ๐‘’๐ถ๐‘œ๐‘›๐‘‘ โ‰” A pruned set of pre-conditions from ๐‘†๐‘’๐‘ž๐ท๐ต satisfying ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ 2. foreach ๐‘๐‘Ÿ๐‘’ โˆˆ ๐‘ƒ๐‘Ÿ๐‘’๐ถ๐‘œ๐‘›๐‘‘ do 1. ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ โ‰” ๐‘†๐‘’๐‘ž๐ท๐ต allโˆ’projected on ๐‘๐‘Ÿ๐‘’ 2. ๐‘๐‘กโ„Ž๐‘‘ โ‰” ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ ร— ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ 3. ๐‘ƒ๐‘œ๐‘ ๐‘ก๐ถ๐‘œ๐‘›๐‘‘ โ‰” A pruned set of post-conditions from ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ satisfying ๐‘๐‘กโ„Ž๐‘‘ 4. foreach ๐‘๐‘œ๐‘ ๐‘ก โˆˆ ๐‘ƒ๐‘œ๐‘ ๐‘ก๐ถ๐‘œ๐‘›๐‘‘ do 1. if ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘๐‘Ÿ๐‘’ ++๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ฅ ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ then 1. ๐‘…๐‘ข๐‘™๐‘’๐‘  = ๐‘…๐‘ข๐‘™๐‘’๐‘  โˆช ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก 3. Remove remaining redundancy in ๐‘…๐‘ข๐‘™๐‘’๐‘  ๏‚ง Parallelization Strategy ๏‚ง 1. the single-producer-multiple-consumer framework ๏‚ง 2. the loop-level parallelization 2019.11.18. 17 a c b ac b a a b c ๐œ€ <a>โ†’<c,a,d> <a>โ†’<c,b,b> <a>โ†’<b> Rules <a,b>โ†’<c,d> hash table <a>โ†’<c,a,d> <a>โ†’<c,b,b> <a,b>โ†’<c,d> <a,b>โ†’<c,a> <a>โ†’<b> Rules <c,a,d> 1 2
  • 18. Parallel Non-Redundant Recurrent Rules Miner (pNR3) 2019.11.18. 18 a c b ac b a a b c GenPre task <a>โž<c,a,d> <a>โž<c,b,b> <a,b>โž<c,d> <a,b>โž<c,a> <a>โž<b> RulesThread pool GenRule[c,b] GenRule[c,b,c] GenRule[a,b] GenRule[a] task queue worker threads GenPre [1] GenRule[a] [2] GenRule[a,b] [N] <a>โž<c,a,d> <a>โž<c,b,b> <a>โž<b> Rules <a,b>โž<c,d> RemRedun task hash table Image UML
  • 19. Parallel Non-Redundant Recurrent Rules Miner (pNR3) 2019.11.18. 19 - pNR3 framework - GenPre task - GenRule task Source codes are available at https://bitbucket.org/sekilab/nr3
  • 20. Parallelization Effects of pNR3 ๏‚ง Let ๐‘ก ๐‘‡ be the runtime of a task ๐‘‡, ๐‘ be the number of available threads ๏‚ง NR3 : ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’ + ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘› ๏‚ง pNR3 : max ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’, ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’/๐‘ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘› ๏‚ง GenPre Concurrency : max ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’, ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘› ๏‚ง GenRule Parallelization : ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’ + ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’/๐‘ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘› 2019.11.18. 20 a c b ac b a a b c ๐œ€ <a>โ†’<c,a,d> <a>โ†’<c,b,b> <a,b>โ†’<c,d> <a,b>โ†’<c,a> <a>โ†’<b> Rules <a> <a, b> <c,a,d> <a>โ†’<c,a,d> <a>โ†’<c,b,b> <a>โ†’<b> Rules <a,b>โ†’<c,d> hash table GenRule par. (1/N) GenPre Concurrency (max func) RemRedun
  • 21. Experiment Environment ๏‚ง Dataset ๏‚ง D10C10N10R0.5 (IBM synthetic data generator) ๏‚ง 9,678 sequences, average length 31.22 ๏‚ง BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000) ๏‚ง 59,601 sequences, average length 2.42 ๏‚ง Experiment Machine ๏‚ง Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores) ๏‚ง 8GB RAM ๏‚ง Microsoft Windows 7 Professional x64 ๏‚ง Implementation ๏‚ง Java SE 8 ๏‚ง Default JVM settings 2019.11.18. 21
  • 22. D10C10N10R0.5 ๏‚ง ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5 โˆ’ 0.9%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 ๏‚ง ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 2019.11.18. 22 0 5000 10000 15000 20000 25000 0.5 0.6 0.7 0.8 0.9 size min_sup (%) PreCond RuleCand Rules 0 50 100 150 200 250 300 0.5 0.6 0.7 0.8 0.9 runtime(s) min_sup (%) NR3 2-pNR3 4-pNR3 8-pNR3 0% 20% 40% 60% 80% 100% 0.5 0.6 0.7 0.8 0.9 runtime(%) min_sup (%) GenPre GenRule RemRedun (sec) 0.5 0.6 0.7 0.8 0.9 NR3 241 152 99 69 54 2-pNR3 118 78 49 37 26 4-pNR3 74 47 31 22 17 8-pNR3 54 35 23 18 14 (sec) 0.5 0.6 0.7 0.8 0.9 GenPre 34 22 15 11 8 GenRule 206 130 83 57 46 RemRedun 0 0 0 0 0 Elapsed 241 152 99 69 54 (size) 0.5 0.6 0.7 0.8 0.9 PreCond 21563 15013 11105 8917 7262 RuleCand 3965 2418 1622 1258 956 Rules 3912 2414 1621 1258 956 100 1000 10000 100000 50 60 70 80 90 size-(logscale) min_conf (%) PreCond RuleCand Rules 0 50 100 150 200 250 300 50 60 70 80 90 runtime(s) min_conf (%) NR3 2-pNR3 4-pNR3 8-pNR3 0% 20% 40% 60% 80% 100% 50 60 70 80 90 runtime(%) min_conf (%) GenPre GenRule RemRedun (sec) 50 60 70 80 90 NR3 241 184 176 170 167 2-pNR3 119 92 88 85 83 4-pNR3 74 56 50 52 52 8-pNR3 54 47 46 45 45 (sec) 50 60 70 80 90 GenPre 34 34 34 34 34 GenRule 206 149 140 135 132 RemRedun 0 0 0 0 0 Elapsed 241 184 176 170 167 (size) 50 60 70 80 90 PreCond 21563 21563 21563 21563 21563 RuleCand 3965 1392 527 374 297 Rules 3912 1372 519 368 294 max ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’, ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’/๐‘ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘›
  • 23. BMSWebView1 ๏‚ง ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.080 โˆ’ 0.100%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 ๏‚ง ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.090%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 2019.11.18. 23 0 2000 4000 6000 8000 10000 0.080 0.085 0.090 0.095 0.100 size min_sup (%) PreCond RuleCand Rules 100 1000 10000 100000 0.080 0.085 0.090 0.095 0.100 runtime(s)-(logscale) min_sup (%) NR3 2-pNR3 4-pNR3 8-pNR3 0% 20% 40% 60% 80% 100% 0.080 0.085 0.090 0.095 0.100 runtime(%) min_sup (%) GenPre GenRule RemRedun (sec) 0.080 0.085 0.090 0.095 0.100 NR3 43357 23729 12049 5063 2212 2-pNR3 21440 11737 6100 2567 1034 4-pNR3 12937 6839 3566 1550 618 8-pNR3 9567 5261 2721 1118 450 (sec) 0.080 0.085 0.090 0.095 0.100 GenPre 16 11 9 8 7 GenRule 43340 23718 12039 5055 2204 RemRedun 0 0 0 0 0 Elapsed 43357 23729 12049 5063 2212 (size) 0.080 0.085 0.090 0.095 0.100 PreCond 9476 7222 5734 4725 3981 RuleCand 6413 3638 2333 1605 1147 Rules 5976 3498 2260 1570 1139 0 1000 2000 3000 4000 5000 6000 50 60 70 80 90 size min_conf (%) PreCond RuleCand Rules 10 100 1000 10000 100000 50 60 70 80 90 runtime(s)-(logscale) min_conf (%) NR3 2-pNR3 4-pNR3 8-pNR3 0% 20% 40% 60% 80% 100% 50 60 70 80 90 runtime(%) min_conf (%) GenPre GenRule RemRedun (sec) 50 60 70 80 90 NR3 12049 1778 304 145 104 2-pNR3 6100 932 157 72 50 4-pNR3 3566 580 90 42 32 8-pNR3 2721 400 69 32 22 (sec) 50 60 70 80 90 GenPre 9 9 9 9 10 GenRule 12039 1768 294 135 93 RemRedun 0 0 0 0 0 Elapsed 12049 1778 304 145 104 (size) 50 60 70 80 90 PreCond 5734 5734 5734 5734 5734 RuleCand 2333 1703 1173 685 288 Rules 2260 1648 1123 645 268 max ๐‘ก ๐บ๐‘’๐‘›๐‘ƒ๐‘Ÿ๐‘’, ๐‘ก ๐บ๐‘’๐‘›๐‘…๐‘ข๐‘™๐‘’/๐‘ + ๐‘ก ๐‘…๐‘’๐‘š๐‘…๐‘’๐‘‘๐‘ข๐‘›
  • 24. Loop Fused Mining of NR3 (LF-NR3) 2019.11.18. 24
  • 25. Simplifying the all-projection operation ๏‚ง Given the projected database ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’, ๏‚ง The all-projected database ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ can be simplified: ๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ = ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ โˆช ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘™๐‘Ž๐‘ ๐‘ก ๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ 2019.11.18. 25
  • 26. Non-Redundant Recurrent Rules Miner (NR3) 2019.11.18. 26
  • 28. Data Structure Level Optimization for Projections ๏‚ง For each sequence Si in SeqDB and a set I of events, ๏‚ง A hash map ๐‘€๐‘Ž๐‘๐‘– โˆถ ๐ผ โ†’ 2 1,โ€ฆ, ๐‘† ๐‘– ๏‚ง such that each key ๐‘’ โˆˆ ๐ผ is mapped to the set of values each of which is a temporal point of event e occurring in Si 2019.11.18. 28
  • 29. Experiment Environment ๏‚ง Dataset ๏‚ง D10C10N10R0.5 (IBM synthetic data generator) ๏‚ง 9,678 sequences, average length 31.22 ๏‚ง BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000) ๏‚ง 59,601 sequences, average length 2.42 ๏‚ง Experiment Machine ๏‚ง Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores) ๏‚ง 8GB RAM ๏‚ง Microsoft Windows 7 Professional x64 ๏‚ง Implementation ๏‚ง Java SE 8 ๏‚ง Default JVM settings 2019.11.18. 29
  • 30. D10C10N10R0.5 ๏‚ง (a)-(c) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5 โˆ’ 0.9%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 ๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 2019.11.18. 30
  • 31. BMSWebView1 ๏‚ง (a)-(c) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.100 โˆ’ 0.120%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 ๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.090%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 2019.11.18. 31
  • 32. Discussion ๏‚ง Computational Complexity of the Algorithms ๏‚ง ๐ผ ๐‘˜ ร— ๐ผ ๐‘˜ (I : the set of events, k : the length of the longest frequent pattern) ๏‚ง The effects of fusing loops in NR3 ๏‚ง The foreach loop in the GenRule step eliminated ๏‚ง The use of intermediate data ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ simplifies the computation of ๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ = ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ โˆช ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๐‘™๐‘Ž๐‘ ๐‘ก ๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ ๏‚ง ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๏‚ง The effect of the hash-based data structure ๏‚ง The efficient computation of (all-)projected databases ๏‚ง Using the hash-based data structure is not always efficient if the sequences are short 2019.11.18. 32
  • 33. Parallel Loop Fused Mining of NR3 (pLF-NR3) 2019.11.18. 33
  • 34. Loop-Fused NR3 (LF-NR3) 2019.11.18. โ€น#โ€บ Possible to use the task-parallelism underlying in the LF-NR3 algorithm, โ€ข which can be handled within the single-producer-multiple-consumer framework
  • 35. Parallel Loop Fused NR3 (pLF-NR3) 2019.11.18. 35
  • 36. Experiment Environment ๏‚ง Dataset ๏‚ง D10C10N10R0.5 (IBM synthetic data generator) ๏‚ง 9,678 sequences, average length 31.22 ๏‚ง BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000) ๏‚ง 59,601 sequences, average length 2.42 ๏‚ง Experiment Machine ๏‚ง Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores) ๏‚ง 8GB RAM ๏‚ง Microsoft Windows 7 Professional x64 ๏‚ง Implementation ๏‚ง Java SE 8 ๏‚ง Default JVM settings 2019.11.18. 36
  • 37. D10C10N10R0.5 ๏‚ง (a)-(c) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5 โˆ’ 0.9%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 ๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.5%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 2019.11.18. 37
  • 38. BMSWebView1 ๏‚ง (a)-(c) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.092 โˆ’ 0.108%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 ๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.092%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 2019.11.18. 38
  • 39. Bidirectional Mining Non-Redundant Recurrent Rules (BOB) based on David LO, Bolin DING, Lucia, Jiawei HAN, ICDE, 2011 2019.11.18. 39
  • 40. Additional Definitions ๏‚ง a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต โ€“ a set of sequences ๏‚ง a sequence ๐‘† = ๐‘’1, ๐‘’2, โ€ฆ , ๐‘’ ๐‘› ๏‚ง the j-suffix of ๐‘† = ๐‘’ ๐‘›โˆ’๐‘—+1, ๐‘’ ๐‘›โˆ’๐‘—+2, โ€ฆ , ๐‘’ ๐‘› ๏‚ง ๐‘†โ€ฒ is the ๐‘— ๐‘กโ„Ž minimum suffix of ๐‘†, if ๐‘†โ€ฒ is an suffix of ๐‘† iff no suffix starting with first(P) shorter than sx, and longer than the (j-1)th minimum suffix ๏‚ง The ๐’‹ ๐’•๐’‰ suf-projection of ๐‘†๐‘’๐‘ž๐ท๐ต with regarding to a pattern ๐‘ƒ ๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต๐‘ƒ ๐‘ ๐‘ข๐‘“โˆ’ ๐‘— = ๐‘–, ๐‘ ๐‘ฅ |๐‘†๐‘– = ๐‘๐‘ฅ ++๐‘ ๐‘ฅ โˆˆ ๐‘†๐‘’๐‘ž๐ท๐ต, ๐‘ ๐‘ฅ is the ๐‘— ๐‘กโ„Ž minimum suffix of ๐‘†๐‘– of ๐‘ƒ ๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต pre-projected on ๐‘ƒ ๏‚ง ๐‘†๐‘’๐‘ž๐ท๐ต๐‘ƒ ๐‘๐‘Ÿ๐‘’ = ๐‘–, ๐‘๐‘ฅ ๐‘†๐‘– = ๐‘๐‘ฅ ++๐‘ ๐‘ฅ โˆˆ ๐‘†๐‘’๐‘ž๐ท๐ต, ๐‘ ๐‘ฅ is ๐ญ๐ก๐ž ๐ฆ๐ข๐ง๐ข๐ฆ๐ฎ๐ฆ ๐ฌ๐ฎ๐Ÿ๐Ÿ๐ข๐ฑ of ๐‘ƒ } 2019.11.18. 40
  • 41. Anti-Monotonicity Property of Confidence ๏‚ง Proposition 1 ๏‚ง Consider a rule ๐‘…, in the form of ๐‘… ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, and a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต ๏‚ง ๐‘๐‘œ๐‘›๐‘“ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = sup ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘Ÿ๐‘’ ๐‘Ž๐‘™๐‘™ ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘… ๐‘๐‘Ÿ๐‘’, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘… ๐‘๐‘Ÿ๐‘’, ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก ๐‘๐‘Ÿ๐‘’ ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘… ๐‘๐‘Ÿ๐‘’, ๐‘†๐‘’๐‘ž๐ท๐ต ๏‚ง Proposition 2 ๏‚ง Consider two rules ๐‘… and ๐‘…โ€ฒ in a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต with ๐‘… ๐‘๐‘Ÿ๐‘’ โ€ฒ = ๐‘… ๐‘๐‘Ÿ๐‘’ and ๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ = ๐‘’ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก for some event ๐‘’ โˆˆ ๐ผ ๏‚ง ๐‘๐‘œ๐‘›๐‘“ ๐‘… โ‰ฅ ๐‘๐‘œ๐‘›๐‘“ ๐‘…โ€ฒ ๏‚ง Theorem. Anti-Monotonicity Property of Confidence ๏‚ง Consider two rules ๐‘… and ๐‘…โ€ฒ in a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต with ๐‘… ๐‘๐‘Ÿ๐‘’ โ€ฒ = ๐‘… ๐‘๐‘Ÿ๐‘’ and ๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ = ๐‘’๐‘ฃ๐‘  ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก where ๐‘’๐‘ฃ๐‘  is an arbitrary series of events. ๏‚ง ๐‘๐‘œ๐‘›๐‘“ ๐‘… โ‰ฅ ๐‘๐‘œ๐‘›๐‘“ ๐‘…โ€ฒ ๏‚ง If ๐‘… is not confident enough(๐‘๐‘œ๐‘›๐‘“ ๐‘… < ๐‘š๐‘–๐‘›_๐‘๐‘œ๐‘›๐‘“), ๐‘…โ€ฒ is not either 2019.11.18. 41
  • 42. Pruning Redundant Post-Conds ๏‚ง In a sequence database ๐‘†๐‘’๐‘ž๐ท๐ต, consider a post condition candidate ๐‘… ๐‘๐‘œ๐‘ ๐‘ก. ๏‚ง Lemma 1 ๏‚ง If there is a post-condition candidate ๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ โŠ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก such that ๏‚ง (i) ๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ = ๐‘ƒ1 ++๐‘’ ++๐‘ƒ2 while ๐‘… ๐‘๐‘œ๐‘ ๐‘ก = ๐‘ƒ1 ++๐‘ƒ2, for some event ๐‘’, subsequences ๐‘ƒ1, (nonempty) ๐‘ƒ2 ๏‚ง (ii) ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก ๐‘๐‘Ÿ๐‘’ = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ ๐‘๐‘Ÿ๐‘’ ๏‚ง then for any pre-condition candidate ๐‘๐‘Ÿ๐‘’ and any backward extension ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก of ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, the rule ๐‘… = ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก is not confidence-closed ๏‚ง i.e., there exists another rule ๐‘…โ€ฒ โŠ ๐‘… such that ๐‘๐‘œ๐‘›๐‘“ ๐‘… = ๐‘๐‘œ๐‘›๐‘“ ๐‘…โ€ฒ ๏‚ง Lemma 2 ๏‚ง If there is a post-condition candidate ๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ โŠ ๐‘… ๐‘๐‘œ๐‘ ๐‘ก such that ๏‚ง (i) ๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ = ๐‘ƒ1 ++๐‘’ ++๐‘ƒ2 while ๐‘… ๐‘๐‘œ๐‘ ๐‘ก = ๐‘ƒ1 ++๐‘ƒ2, for some event ๐‘’, subsequences (nonempty) ๐‘ƒ1, ๐‘ƒ2 ๏‚ง (iii) โˆ€๐‘— โˆถ ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก ๐‘ ๐‘ข๐‘“โˆ’๐‘— = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ ๐‘ ๐‘ข๐‘“โˆ’๐‘— , and ๏‚ง (iv) โˆ€๐‘— โˆถ ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก ๐‘ ๐‘ข๐‘“โˆ’๐‘— ๐‘… ๐‘๐‘œ๐‘ ๐‘ก ๐‘Ž๐‘™๐‘™ = ๐‘†๐‘’๐‘ž๐ท๐ต ๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ ๐‘ ๐‘ข๐‘“โˆ’๐‘— ๐‘… ๐‘๐‘œ๐‘ ๐‘ก โ€ฒ ๐‘Ž๐‘™๐‘™ ๏‚ง then for any pre-condition candidate ๐‘๐‘Ÿ๐‘’ and any backward extension ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก of ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, the rule ๐‘… = ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก is not support-closed ๏‚ง i.e., there exists another rule ๐‘…โ€ฒ โŠ ๐‘… such that ๐‘ ๐‘ข๐‘ ๐‘… = ๐‘ ๐‘ข๐‘ ๐‘…โ€ฒ and ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘… = ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘…โ€ฒ ๏‚ง Theorem. Pruning Redundant Post-Conds ๏‚ง If the properties (i)-(iv) in Lemma 1 and 2 are satisfied, ๏‚ง then for any pre-condition candidate ๐‘๐‘Ÿ๐‘’ and any backward extension ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก of ๐‘… ๐‘๐‘œ๐‘ ๐‘ก, the rule ๐‘… = ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘ƒ ++๐‘… ๐‘๐‘œ๐‘ ๐‘ก is redundant. 2019.11.18. 42
  • 43. Bidirectional Pruning-based Recurrent Rule Mining(BOB) 2019.11.18. 43
  • 44. Interleaved Bidirectional Mining of NR3 (iBiRM) 2019.11.18. 44
  • 45. Optimizing Operations ๏‚ง Given the sequence database ๐‘†๐‘’๐‘ž๐ท๐ต, and the rule ๐‘… = ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก ๏‚ง ๐‘ ๐‘ข๐‘ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๏‚ง ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต = ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘๐‘œ๐‘ ๐‘ก, ๐‘†๐‘’๐‘ž๐ท๐ต๐‘๐‘Ÿ๐‘’ ๏‚ง Pruning the search space of PRE early ๏‚ง for ๐‘… = ๐‘๐‘Ÿ๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก and ๐‘…โ€ฒ = ๐‘๐‘Ÿ๐‘’ ++๐‘’ โ†’ ๐‘๐‘œ๐‘ ๐‘ก, ๏‚ง if ๐‘ ๐‘ข๐‘ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ค ๐‘š๐‘–๐‘›_๐‘ ๐‘ข๐‘, then ๐‘ ๐‘ข๐‘ ๐‘…โ€ฒ, ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ค ๐‘š๐‘–๐‘›_๐‘ ๐‘ข๐‘ ๏‚ง if ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๐‘…, ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ค ๐‘š๐‘–๐‘›_๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ , then ๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™ ๐‘…โ€ฒ , ๐‘†๐‘’๐‘ž๐ท๐ต โ‰ค ๐‘š๐‘–๐‘›_๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ ๏‚ง Decreasing the number of scanning a database using a prefix tree ๏‚ง for each pre-condition ๐‘๐‘Ÿ๐‘’ โˆˆ ๐‘ƒ๐‘…๐ธ, suppose that a node ๐‘0 โˆˆ ๐‘‡๐‘ƒ๐‘‚๐‘†๐‘‡ has its children nodes ๐‘1, โ€ฆ , ๐‘๐‘˜ ๏‚ง we can compute the instance supports of its children nodes ๐‘1, โ€ฆ , ๐‘๐‘˜ by scanning ๐‘†๐‘’๐‘ž๐ท๐ต once ๏‚ง When ๐‘0 corresponds to a post-condition ๐‘๐‘œ๐‘ ๐‘ก โˆˆ ๐‘ƒ๐‘‚๐‘†๐‘‡, each child node ๐‘๐‘– corresponds to a post-condition ๐‘๐‘œ๐‘ ๐‘ก๐‘– = ๐‘’๐‘– ++๐‘๐‘œ๐‘ ๐‘ก for some event ๐‘’๐‘–, and the post condition of each child node thus has its suffix ๐‘๐‘œ๐‘ ๐‘ก in common. ๏‚ง When scanning a sequence ๐‘  โˆˆ ๐‘†๐‘’๐‘ž๐ท๐ต, we record the positions of each ๐‘’๐‘–โ€™s and those of the events appearing in ๐‘๐‘œ๐‘ ๐‘ก, from which we can compute the number of instances of ๐‘๐‘Ÿ๐‘’ ++๐‘๐‘œ๐‘ ๐‘ก๐‘– in ๐‘  2019.11.18. โ€น#โ€บ
  • 46. Bidirectional Pruning-based Recurrent Rule Mining(BOB) 2019.11.18. 46
  • 47. Interleaved Bidirectional Recurrent Rule Miner (iBiRM) 2019.11.18. โ€น#โ€บ
  • 48. Experiment Environment ๏‚ง Dataset ๏‚ง D5C20N10R0.5 (IBM synthetic data generator) ๏‚ง 4,999 sequences, average length 64.39 ๏‚ง BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000) ๏‚ง 59,601 sequences, average length 2.42 ๏‚ง Experiment Machine ๏‚ง Intel Core i5 2.50GHz ๏‚ง 8GB RAM ๏‚ง Microsoft Windows 7 Professional x64 ๏‚ง Implementation ๏‚ง Java SE 8 ๏‚ง Default JVM settings 2019.11.18. 48
  • 49. D5C20N10R0.5 ๏‚ง (a)-(d) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 2.0 โˆ’ 2.8%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 ๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 2.4%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 2019.11.18. 49
  • 50. BMSWebView1 ๏‚ง (a)-(c) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.092 โˆ’ 0.108%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 ๏‚ง (d)-(f) ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ = 0.092%, ๐‘š๐‘–๐‘› _๐‘๐‘œ๐‘›๐‘“ = 50 โˆ’ 90%, ๐‘š๐‘–๐‘› _๐‘ ๐‘ข๐‘ ๐‘Ž๐‘™๐‘™ = 1 2019.11.18. 50
  • 52. Conclusion & Future Works ๏‚ง Conclusion ๏‚ง We have proposed Parallel Non-Redundant Recurrent Rules Miner (pNR3) ๏‚ง We have proposed Loop-Fused Non-Redundant Recurrent Rules Miner(LF-NR3) ๏‚ง We have proposed Parallel Loop-Fused Non-Redundant Recurrent Rules Miner (pLF-NR3) ๏‚ง We have proposed Interleaved Bidirectional Non-Redundant Recurrent Rules Miner (iBiRM) ๏‚ง Future works ๏‚ง Improvement of the sequential recurrent rule mining algorithm ๏‚ง Improvement of the parallel algorithms ๏‚ง Source codes are available at https://bitbucket.org/sekilab/nr3 2019.11.18. 52

Editor's Notes

  1. Good morning everyone. I am Yoon SeungYong, a student in Nagoya Institute of Technology. Seki Hirohisa is my advisor, and participated in this research. From now, Iโ€™d like to introduce my research, โ€˜Parallel Mining of Non-Redundant Recurrent Rules from a Sequence Databaseโ€™.
  2. I will, first, speak of the motivation of this research, and introduce the recurrent rules and the algorithm NR3, base of this research. I, then, present our algorithm, parallel mining of recurrent rules, pNR3, and show the effectiveness of our algorithm based on experiment results.
  3. Our motivation on the research
  4. I first talk about the sequence database and sequential rules. An example of a sequence database is transaction histories. For instance, Alice rented Star Wars 4, 5, and 6, and then Star Wars 1, as the release date. Another example is program traces. From these databases, we can infer a rule <Star Wars 4> then <Star Wars 5>, and <lock> then <unlock>.
  5. But why recurrent rules? Because a recurrent rule captures temporal constraints within a sequence and across multiple sequences. Recall the previous examples. In the transaction histories, we rarely cares how many times a customer lend same videos. But in the program traces, we have to consider how many times a series of commands has been executed. This is the reason that a recurrent rule has been proposed And mined recurrent rules can be directly converted into Linear Temporal Logic, the most widely used formalism for program verification. For more details, refer a favorite text book, Model checking.
  6. From now, I will introduce mining recurrent rules, and the algorithm NR3.
  7. We first define some terminologies. A sequence database is a set of sequences. A sequence is a series of events. In a sequence, we say the position of each event a temporal point. And, we refer the first j event as the j-prefix of sequence.
  8. We will define some operations on the sequence. This is a concatenation of S and Sโ€™. We say S is a super-sequence of Sโ€™, if S contains Sโ€™. And the matched prefix is called as instance, and the shortest one is the minimum instance.
  9. We will define the operation on a database. We say a database is projected on a sequence P, if a sequence contains P, the longest remaining part will be a projected database, and as it is known operation. We say a database is all-projected on a sequence P, if a sequence contains P, all of the remaining part will be a all-projected database. We say the number of the sequences support, especially, the sequence support is for projection, and the instance support is for all-projection.
  10. We will define a recurrent rule R equals pre then post. The supports are almost same as we previously defined. The confidence has special form, we can intuitively see it how many sequences contains post in the all-projected database on pre. We say a rule is significant if the number of rules is above the thresholds.
  11. We will define the notion of Rule Redundancy. Consider these two rules. R contains Rโ€™, and have the same support and confidence. It means if a sequence contains R then it also contains Rโ€™. We do not need to mine these rules, so we will prune some of them. We define a rule is redundant if there is another longer rule that has the same support confidence. And this will be processed using the algorithm BIDE, well-known frequent closed sequence miner.
  12. Now I will introduce the algorithm of Non-Redundant Recurrent Rules Miner, NR3, the work of David Lo, and others. The NR3 receives a sequence database and three thresholds, and emits significant and non-redundant recurrent rules. It first generates the candidates of pre-conditions using BIDE, consisting of recursions. So we call this step GenPre. Next, by looping the candidate pre, it generates the candidates of post-conditions and generates rules. We call this step GenRule, and in this step, we get significant rules. Finally, we remove remaining redundant rules using hash tables using the supports and confidence as a key. We call this step RemRedun.
  13. From now I will show our algorithm, parallel mining of recurrent rules, pNR3.
  14. Letโ€™s review the previous work. First, if GenPre task find one pre-condition candidate, then we can handle GenRule task immediately. We call this strategy, the single-producer-multiple-consumer-framework. Because the GenRule tasks can be consumed as the GenPre task produces a pre. Second, we can concurrently handle the GenRule tasks. We call this strategy, namely, the loop-level parallelization.
  15. This is our algorithm Parallel Non-Redundant Recurrent Rules Miner, pNR3. The pNR3 instance starts to mine pre-conditions. Then the GenPre emits GenRule tasks using found pre, and push them into the thread pool. The thread pool handles these GenRule tasks, and the tasks collect significant rules. Finally the RemRedun instance removes redundant rules.
  16. This is our Java implementation. It works as I explained. The source codes are available at our Bitbucket repository.
  17. I will discuss the effect of parallelization. We utilized two strategy, GenPre Concurrency, the single-producer-multiple-consumer framework and GenRule Parallelization, the loop-level parallelization. GenPre Concurrency works as maximum function of GenPre or GenRule, because the longer task effects the total runtime. GenRule Parallelization works as a divider function, because available threads can handle each GenRule task. As a result, the runtime of our pNR3 is max GenPre or GenRule divided by N plus RemRedun. We will see these discussion in experiment results.
  18. Iโ€™ll explain experiment environment. We used two famous dataset, one is a synthetic dataset and another is real dataset. We implemented nr3 and pNR3 in Java 8, and executed in the common Core i7 machine which has 4 physical cores.
  19. This is an experiment result on synthetic dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenPre takes about 20% of runtime, and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm becomes 20% of this dataset, then we can say our algorithm is effective. As the results show, the runtime of 8-pNR3 is about 20% of NR3, so we can say our algorithm is very effective.
  20. This is an experiment result on real world dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenRule takes almost 100% of runtime, and GenPre and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm decreases as we increase the number of threads, then we can say our algorithm is effective. As the results show, the runtime of 4-pNR3 is about 30% of NR3, and 8-pNR3 is about 20% of NR3, so we can say our algorithm is effective, even if we take account into some overheads due to parallelization.
  21. From now I will show our algorithm, parallel mining of recurrent rules, pNR3.
  22. Now I will introduce the algorithm of Non-Redundant Recurrent Rules Miner, NR3, the work of David Lo, and others. The NR3 receives a sequence database and three thresholds, and emits significant and non-redundant recurrent rules. It first generates the candidates of pre-conditions using BIDE, consisting of recursions. So we call this step GenPre. Next, by looping the candidate pre, it generates the candidates of post-conditions and generates rules. We call this step GenRule, and in this step, we get significant rules. Finally, we remove remaining redundant rules using hash tables using the supports and confidence as a key. We call this step RemRedun.
  23. Iโ€™ll explain experiment environment. We used two famous dataset, one is a synthetic dataset and another is real dataset. We implemented nr3 and pNR3 in Java 8, and executed in the common Core i7 machine which has 4 physical cores.
  24. This is an experiment result on synthetic dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenPre takes about 20% of runtime, and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm becomes 20% of this dataset, then we can say our algorithm is effective. As the results show, the runtime of 8-pNR3 is about 20% of NR3, so we can say our algorithm is very effective.
  25. This is an experiment result on real world dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenRule takes almost 100% of runtime, and GenPre and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm decreases as we increase the number of threads, then we can say our algorithm is effective. As the results show, the runtime of 4-pNR3 is about 30% of NR3, and 8-pNR3 is about 20% of NR3, so we can say our algorithm is effective, even if we take account into some overheads due to parallelization.
  26. From now I will show our algorithm, parallel mining of recurrent rules, pNR3.
  27. Iโ€™ll explain experiment environment. We used two famous dataset, one is a synthetic dataset and another is real dataset. We implemented nr3 and pNR3 in Java 8, and executed in the common Core i7 machine which has 4 physical cores.
  28. This is an experiment result on synthetic dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenPre takes about 20% of runtime, and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm becomes 20% of this dataset, then we can say our algorithm is effective. As the results show, the runtime of 8-pNR3 is about 20% of NR3, so we can say our algorithm is very effective.
  29. This is an experiment result on real world dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenRule takes almost 100% of runtime, and GenPre and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm decreases as we increase the number of threads, then we can say our algorithm is effective. As the results show, the runtime of 4-pNR3 is about 30% of NR3, and 8-pNR3 is about 20% of NR3, so we can say our algorithm is effective, even if we take account into some overheads due to parallelization.
  30. From now, I will introduce mining recurrent rules, and the algorithm NR3.
  31. We first define some terminologies. A sequence database is a set of sequences. A sequence is a series of events. In a sequence, we say the position of each event a temporal point. And, we refer the first j event as the j-prefix of sequence.
  32. From now I will show our algorithm, parallel mining of recurrent rules, pNR3.
  33. Iโ€™ll explain experiment environment. We used two famous dataset, one is a synthetic dataset and another is real dataset. We implemented nr3 and pNR3 in Java 8, and executed in the common Core i7 machine which has 4 physical cores.
  34. This is an experiment result on synthetic dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenPre takes about 20% of runtime, and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm becomes 20% of this dataset, then we can say our algorithm is effective. As the results show, the runtime of 8-pNR3 is about 20% of NR3, so we can say our algorithm is very effective.
  35. This is an experiment result on real world dataset. Above is when change minimum support, and below is when change confidence. First chart is a runtime of algorithms, NR3 and pNR3 on 2, 4, 8 threads, second is the ratio of each tasks in NR3, and third is the size of pre-condition candidates and rules. As we discussed before, the runtime of our parallel algorithm is maximum of GenPre and GenRule divided by N plus RemRedun. In NR3, GenRule takes almost 100% of runtime, and GenPre and RemRedun is negligible in this dataset. So if the runtime of our parallel algorithm decreases as we increase the number of threads, then we can say our algorithm is effective. As the results show, the runtime of 4-pNR3 is about 30% of NR3, and 8-pNR3 is about 20% of NR3, so we can say our algorithm is effective, even if we take account into some overheads due to parallelization.
  36. Now I finally conclude
  37. We have proposed the algorithm Parallel Non-Redundant Recurrent Rules Miner, pNR3. It utilized two strategy, the single-producer-multiple-consumer framework and the loop-level parallelism. We showed the effectiveness of our algorithm based on the experiment on synthetic and real datasets. For the future works, we will do some experiments on the program trace, as the purpose of the rules. We will do experiment on many cores processor to see the effects accurately. Also, using the large memory, we will compare our algorithm to BOB, the successor of NR3. We are now working on improvement of the sequential recurrent rule mining algorithms. You can refer our implementation in this repository. This is all of my presentation. Thank you for listening. Do you have any questions?