Mining non-redundant recurrent rules from a sequence database

Mining Non-Redundant Recurrent Rules from a Sequence Database
Yoon SeungYong
Ministry of Science and ICT, Republic of Korea
forcom@forcom.kr
- Efficient Mining of Recurrent Rules from a Sequence Database(Lo et al., DASFAA 2008)
- Parallel Mining of Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, ISIS 2017)
· A Parallel Algorithm for Mining Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, JACIII 2019)
- Towards Efficient Mining of Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, IWCIA 2017)
· Mining Non-Redundant Recurrent Rules from a Sequence Database(Yoon and Seki, IJCISTUDIES 2018)
- Efficient Mining of Recurrent Rules from a Sequence Database Using Multi-Core Processors(Yoon and Seki, SCIS&ISIS 2018)
- Bidirectional Mining of Non-Redundant Recurrent Rules from a Sequence Database(Lo et al., IEEE ICDE 2011)
- A New Algorithm for Mining Recurrent Rules from a Sequence Database(Seki and Yoon, IEEE SMC 2019)

Table of Contents
1. Motivation
2. Mining Non-Redundant Recurrent Rules (NR3) – Lo et al.
3. Parallel Mining of Non-Redundant Recurrent Rules (pNR3)
4. Loop-Fused Mining of NR3 (LF-NR3)
5. Parallel Loop-Fused Mining of NR3 (pLF-NR3)
6. Bidirectional Mining of NR3 (BOB) – Lo et al.
7. Interleaved Bidirectional Mining of NR3 (iBiRM)
8. Conclusion
2019.11.18. 2

Sequence Database & Sequential Rule
 Transaction Histories
 Program Traces
2019.11.18. 4
Customer Movie Rental History
Alice Star Wars 4, Star Wars 5, Star Wars 6, Star Wars 1
Bob Shrek, Spirited Away, Your Name
Clara Spirited Away, Howl’s Moving Castle, Princess Mononoke
David Star Wars 1, Star Wars 2, Star Wars 3, Star Wars 4, Star Wars 5
Eve Your Name
Trace ID Command
1 check, lock, use, use, unlock, exit
2 check, lock, use, check, lock, use, unlock, exit
3 check, use, unlock, exit
4 check, lock, use
5 check, lock, use, unlock, check, lock, use, unlock, exit
〈Star Wars 4〉→ 〈Star Wars 5〉
〈lock〉→ 〈unlock〉

What is a recurrent rule?
 Recurrent Rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡
 “Whenever a series of precedent events occurs,
eventually another series of consequent events occurs”
 e.g., 𝑅 = ⟨check, lock⟩ → ⟨use, unlock⟩
“Whenever ⟨check, lock⟩ occurs, eventually ⟨use, unlock⟩ occurs”
 Captures temporal constraints that repeat a meaningful number of times
both within a sequence and across multiple sequences
 A sequential rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡 means “whenever a sequence is a super-sequence of
𝑅 𝑝𝑟𝑒, it will be a super-sequence of 𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡”
 Linear Temporal Logic (LTL)
 One of the most widely-used formalism for program verification
 Clarke, Edmund M., Orna Grumberg, and Doron Peled. Model checking. MIT press, 1999.
 Recurrent rule can be expressed in the form of LTL
2019.11.18. 5
- proposed by David LO

Mining Non-Redundant Recurrent Rules (NR3)
based on David LO, Siau-Cheng KHOO, NUS and Chao LIU, DASFAA, 2008
2019.11.18. 6

Preliminaries & Examples (1)
 a sequence database 𝑆𝑒𝑞𝐷𝐵 – a set of sequences : 𝑆1, 𝑆2, 𝑆3, 𝑆4, 𝑆5
 a set of events 𝐼 in 𝑆𝑒𝑞𝐷𝐵 : {check, exit, lock, unlock, use}
 a size of 𝑆𝑒𝑞𝐷𝐵 = 𝑆𝑒𝑞𝐷𝐵 : 𝑆𝑒𝑞𝐷𝐵 = 5
 a sequence 𝑆 = 𝑒1, 𝑒2, … , 𝑒 𝑛 ∶ 𝑆1 = ⟨check, lock, use, use, unlock, exit⟩
 a temporal point 𝑗 of 𝑒𝑗 in 𝑆 : an event of a temporal point 5 in 𝑆1 is unlock
 a length of 𝑆 = 𝑆 = 𝑛 : 𝑆1 = 6
 the last event of 𝑆 = 𝑙𝑎𝑠𝑡 𝑆 = 𝑆[𝑛] : 𝑙𝑎𝑠𝑡 𝑆1 = exit
 the j-prefix of 𝑆 = 𝑆 𝑗
= ⟨𝑒1, 𝑒2, … , 𝑒𝑗⟩ : 𝑆1
2
= ⟨check, lock⟩
2019.11.18. 7
SID Sequence
𝑆1 ⟨check, lock, use, use, unlock, exit⟩
𝑆2 ⟨check, lock, use, check, lock, use, unlock, exit⟩
𝑆3 ⟨check, use, unlock, exit⟩
𝑆4 ⟨check, lock, use⟩
𝑆5 ⟨check, lock, use, unlock, check, lock, use, unlock, exit⟩
an example sequence database 𝑆𝑒𝑞𝐷𝐵

Preliminaries & Examples (2)
 Given a sequence 𝑆 = ⟨𝑒1, … , 𝑒 𝑛⟩ and 𝑆′ = ⟨𝑒1
′
, … , 𝑒 𝑚
′ ⟩
 the concatenation of 𝑆 and 𝑆′
≔ 𝑆 ++𝑆′
= ⟨𝑒1, … , 𝑒 𝑛, 𝑒1
′
, … , 𝑒 𝑚
′
⟩
 𝑆 is a super-sequence of 𝑆′
≔ 𝑆 ⊒ 𝑆′
if 𝑒𝑖1
= 𝑒1
′
, … , 𝑒𝑖 𝑚
= 𝑒 𝑚
′
(1 ≤ 𝑖1 ≤ ⋯ ≤ 𝑖 𝑚 ≤ 𝑛)
 e.g., 𝑆1 ⊒ ⟨check, lock, unlock⟩ :
 𝑆 𝑗
is an instance of 𝑆′
in 𝑆, if 𝑆 𝑗
⊒ 𝑆′
and 𝑙𝑎𝑠𝑡 𝑆′
= 𝑆 𝑗
 𝑆 𝑗 is the minimum instance of 𝑆′ in 𝑆,
if 𝑆 𝑗 is an instance of 𝑆′ and ∄𝑘 < 𝑗, 𝑠. 𝑡. , 𝑆 𝑘 is an instance of 𝑆′
 e.g., 𝑆1
3
, 𝑆1
4
are instances of ⟨check, lock, use⟩ in 𝑆1, and 𝑆1
3
is the minimum
 𝑆5
9
is an instance of 𝑆1 in 𝑆5, and it is the minimum
2019.11.18. 8
SID Sequence
𝑆1 = ⟨check, lock, use, use, unlock, exit⟩
an example sequence database 𝑆𝑒𝑞𝐷𝐵

Definitions & Examples (1)
 Given a sequence 𝑃 = ⟨lock, use⟩ and a sequence database 𝑆𝑒𝑞𝐷𝐵
 Consider a sequence database 𝑆𝑒𝑞𝐷𝐵 and a sequence 𝑃
 𝑆𝑒𝑞𝐷𝐵 projected on 𝑃
 𝑆𝑒𝑞𝐷𝐵 𝑃 = 𝑖, 𝑠𝑥 𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑝𝑥 is the minimum instance of 𝑃 }
 the sequence support 𝑠𝑢𝑝 𝑃, 𝑆𝑒𝑞𝐷𝐵 = 𝑆𝑒𝑞𝐷𝐵 𝑃
 𝑆𝑒𝑞𝐷𝐵 all-projected on 𝑃
 𝑆𝑒𝑞𝐷𝐵 𝑃
𝑎𝑙𝑙
= 𝑖, 𝑠𝑥 𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑝𝑥 is 𝐚𝐧 𝐢𝐧𝐬𝐭𝐚𝐧𝐜𝐞 of 𝑃 }
 the instance support 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑃, 𝑆𝑒𝑞𝐷𝐵 = |𝑆𝑒𝑞𝐷𝐵 𝑃
𝑎𝑙𝑙
|
2019.11.18. 9
SID Sequence
SIDSequence
𝑆1 ⟨use, unlock, exit⟩
𝑆2 ⟨check, lock, use, unlock, exit⟩
𝑆4 ⟨⟩
𝑆5 ⟨unlock, check, lock, use, unlock, exit⟩
𝑆𝑒𝑞𝐷𝐵 𝑃
𝑠𝑢𝑝 𝑃, 𝑆𝑒𝑞𝐷𝐵 = 4
SIDSequence
𝑆1 ⟨use, unlock, exit⟩
𝑆1 ⟨unlock, exit⟩
𝑆2 ⟨check, lock, use, unlock, exit⟩
𝑆4 ⟨⟩
𝑆5 ⟨unlock, check, lock, use, unlock, exit⟩
𝑆𝑒𝑞𝐷𝐵 𝑃
𝑎𝑙𝑙
𝑠𝑢𝑝 𝑎𝑙𝑙
𝑃, 𝑆𝑒𝑞𝐷𝐵 = 7

Definitions & Examples (2)
 Consider a recurrent rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡 in a sequence database 𝑆𝑒𝑞𝐷𝐵
 the pre-condition 𝑅 𝑝𝑟𝑒, the post-condition 𝑅 𝑝𝑜𝑠𝑡
 the sequence support 𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝(𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵)
 the instance support 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙
(𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵)
 the confidence 𝑐𝑜𝑛𝑓 𝑅, 𝑆𝑒𝑞𝐷𝐵 =
sup 𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
𝑎𝑙𝑙
𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅 𝑝𝑟𝑒, 𝑆𝑒𝑞𝐷𝐵
=
𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
𝑎𝑙𝑙
𝑅 𝑝𝑜𝑠𝑡
𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
𝑎𝑙𝑙
 𝑅 is significant if 𝑠𝑢𝑝 𝑅,𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖 𝑛_𝑠𝑢𝑝, 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑅,𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖 𝑛_𝑠𝑢𝑝 𝑎𝑙𝑙
, 𝑐𝑜𝑛𝑓 𝑅,𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖 𝑛_𝑐𝑜𝑛𝑓
 Given a rule 𝑅 = ⟨lock, use⟩ → unlock and a sequence database 𝑆𝑒𝑞𝐷𝐵
 the sequence support 𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 ⟨lock, use, unlock⟩, 𝑆𝑒𝑞𝐷𝐵 = 3
 the instance support 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙 ⟨lock, use, unlock⟩, 𝑆𝑒𝑞𝐷𝐵 = 4
 the confidence 𝑐𝑜𝑛𝑓 𝑅, 𝑆𝑒𝑞𝐷𝐵 =
sup ⟨unlock⟩, 𝑆𝑒𝑞𝐷𝐵⟨lock,use⟩
𝑎𝑙𝑙
𝑠𝑢𝑝 𝑎𝑙𝑙 ⟨lock,use⟩, 𝑆𝑒𝑞𝐷𝐵
=
6
7
2019.11.18. 10
SID Sequence
𝑆𝑒𝑞𝐷𝐵
→

Rule Redundancy
 Consider 𝑅 = ⟨check⟩ → ⟨lock, use, unlock⟩ and 𝑅′ = ⟨check⟩ → ⟨unlock⟩
with the same sequence/instance support and confidence
 Do we really need both these rules?
 Rule Redundancy
 A rule 𝑅′ = 𝑅 𝑝𝑟𝑒
′ → 𝑅 𝑝𝑜𝑠𝑡
′
is redundant if there is another rule 𝑅 = 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡
1. the same sequence/instance support and confidence
2. 𝑅 𝑝𝑟𝑒 ++𝑅 𝑝𝑜𝑠𝑡 ⊒ 𝑅 𝑝𝑟𝑒
′
++𝑅 𝑝𝑜𝑠𝑡
′
(R is longer than R’)
 Mining Non-Redundant Recurrent Rules
 Mine pruned pre/post-conditions using modified BIDE (LS-Set miner)
 BIDE : frequent closed sequence mining algorithm based on pattern-growth strategy
 Wang, Jianyong, and Jiawei Han. "BIDE: Efficient mining of frequent closed sequences." Data Engineering, 2004.
Proceedings. 20th International Conference on. IEEE, 2004.
2019.11.18. 11
𝑆 = ⟨check, lock, use, unlock⟩

FS-Set, CS-Set, LS-Set
 The set of frequent sequential pattern (FS-Set)
 𝐹𝑆 = {𝑠| support 𝑠 ≥ min_sup}
 The set of closed frequent sequential pattern (CS-Set)
 𝐶𝑆 = {𝑠|𝑠 ∈ 𝐹𝑆 𝑎𝑛𝑑 ∄𝑠′
∈ 𝐹𝑆, 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑠 ⊑ 𝑠′
𝑎𝑛𝑑 support 𝑠 = support 𝑠′
}
 Project Database Closed Set (LS-Set)
 𝐿𝑆 = {𝑠| support 𝑠 ≥ min_sup 𝑎𝑛𝑑 ∄𝑠′
, 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑠 ⊑ 𝑠′
𝑎𝑛𝑑 𝑆𝑒𝑞𝐷𝐵𝑠 = 𝑆𝑒𝑞𝐷𝐵 𝑠′}
 cf. 𝑆𝑒𝑞𝐷𝐵𝑠 = 𝑆𝑒𝑞𝐷𝐵 𝑠′ ⇔ 𝑆𝑒𝑞𝐷𝐵𝑠 = 𝑆𝑒𝑞𝐷𝐵 𝑠′
 Xifeng Yan, Jiawei Han, Ramin Afshar, “CloSpan: Mining Closed Sequential Patterns in Large Datasets“, SIAM 2003
2019.11.18. 12

Pruning Redundant Pre-Conds
 In a sequence database 𝑆𝑒𝑞𝐷𝐵, consider a pre-condition candidate 𝑅 𝑝𝑟𝑒.
 If there is a pre-condition candidate 𝑅 𝑝𝑟𝑒
′
⊐ 𝑅 𝑝𝑟𝑒 such that
 (i) 𝑅 𝑝𝑟𝑒
′
= 𝑃1 ++𝑒 ++𝑃2 while 𝑅 𝑝𝑟𝑒 = 𝑃1 ++𝑃2, for some event 𝑒 and nonempty 𝑃1, 𝑃2
 (ii) 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
= 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
′
 then, for any post-condition candidate 𝑝𝑜𝑠𝑡 and any forward extension 𝑅 𝑝𝑟𝑒 ++𝑃,
 the rule 𝑅 𝑝𝑟𝑒 ++𝑃 → 𝑝𝑜𝑠𝑡 is redundant
2019.11.18. 13

LS-Set BIDE
2019.11.18. 14
Backward-extension event checking is omitted from the original BIDE algorithm
• David Lo, Siau-Cheng KHOO, Chao LIU, “Mining Recurrent Rules from Sequence Database”, TR12/07 NUS

Non-Redundant Recurrent Rules Miner (NR3)
 Input: a sequence database 𝑆𝑒𝑞𝐷𝐵; thresholds min_sup, min_supall, min_conf
 Output: Significant and non-redundant recurrent rules 𝑅𝑢𝑙𝑒𝑠
 Procedure
1. 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 ≔ A pruned set of pre-conditions from 𝑆𝑒𝑞𝐷𝐵 satisfying 𝑚𝑖𝑛 _𝑠𝑢𝑝
2. foreach 𝑝𝑟𝑒 ∈ 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 do
1. 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙 ≔ 𝑆𝑒𝑞𝐷𝐵 all−projected on 𝑝𝑟𝑒
2. 𝑏𝑡ℎ𝑑 ≔ 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 × 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙
3. 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 ≔ A pruned set of post-conditions from 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙 satisfying 𝑏𝑡ℎ𝑑
4. foreach 𝑝𝑜𝑠𝑡 ∈ 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 do
1. if 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑝𝑟𝑒 ++𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 then
1. 𝑅𝑢𝑙𝑒𝑠 = 𝑅𝑢𝑙𝑒𝑠 ∪ 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡
3. Remove remaining redundancy in 𝑅𝑢𝑙𝑒𝑠
 Alias for Tasks
 Procedure line 1 : GenPre task
 Procedure line 2.1 – 2.4 : GenRule task
 Procedure line 3 : RemRedun task
2019.11.18. 15
a c
b ac b
a a b c
𝜀
<a>→<c,a,d>
<a>→<c,b,b>
<a>→
Rules
<a,b>→<c,d>
hash table <a>→<c,a,d>
<a>→<c,b,b>
<a,b>→<c,d>
<a,b>→<c,a>
<a>→
Rules
<c,a,d>

Parallel Mining of Non-Redundant Recurrent Rules (pNR3)
2019.11.18. 16

Revisiting Non-Redundant Recurrent Rules Miner (NR3)
 Input: a sequence database 𝑆𝑒𝑞𝐷𝐵; thresholds min_sup, min_supall, min_conf
 Output: Significant and non-redundant recurrent rules 𝑅𝑢𝑙𝑒𝑠
 Procedure
1. 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 ≔ A pruned set of pre-conditions from 𝑆𝑒𝑞𝐷𝐵 satisfying 𝑚𝑖𝑛 _𝑠𝑢𝑝
2. foreach 𝑝𝑟𝑒 ∈ 𝑃𝑟𝑒𝐶𝑜𝑛𝑑 do
1. 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙 ≔ 𝑆𝑒𝑞𝐷𝐵 all−projected on 𝑝𝑟𝑒
2. 𝑏𝑡ℎ𝑑 ≔ 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 × 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙
3. 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 ≔ A pruned set of post-conditions from 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙 satisfying 𝑏𝑡ℎ𝑑
4. foreach 𝑝𝑜𝑠𝑡 ∈ 𝑃𝑜𝑠𝑡𝐶𝑜𝑛𝑑 do
1. if 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑝𝑟𝑒 ++𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 then
1. 𝑅𝑢𝑙𝑒𝑠 = 𝑅𝑢𝑙𝑒𝑠 ∪ 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡
3. Remove remaining redundancy in 𝑅𝑢𝑙𝑒𝑠
 Parallelization Strategy
 1. the single-producer-multiple-consumer framework
 2. the loop-level parallelization
2019.11.18. 17
a c
b ac b
a a b c
𝜀
<a>→<c,a,d>
<a>→<c,b,b>
<a>→
Rules
<a,b>→<c,d>
hash table <a>→<c,a,d>
<a>→<c,b,b>
<a,b>→<c,d>
<a,b>→<c,a>
<a>→
Rules
<c,a,d>
1
2

Parallel Non-Redundant Recurrent Rules Miner (pNR3)
2019.11.18. 18
a c
b ac b
a a b c
GenPre task
<a>➝<c,a,d>
<a>➝<c,b,b>
<a,b>➝<c,d>
<a,b>➝<c,a>
<a>➝
RulesThread pool
GenRule[c,b]
GenRule[c,b,c]
GenRule[a,b]
GenRule[a]
task queue worker threads
GenPre
[1]
GenRule[a]
[2]
GenRule[a,b]
[N]
<a>➝<c,a,d>
<a>➝<c,b,b>
<a>➝
Rules
<a,b>➝<c,d>
RemRedun task
hash table
Image
UML

Parallel Non-Redundant Recurrent Rules Miner (pNR3)
2019.11.18. 19
- pNR3 framework
- GenPre task
- GenRule task
Source codes are available at https://bitbucket.org/sekilab/nr3

Parallelization Effects of pNR3
 Let 𝑡 𝑇 be the runtime of a task 𝑇, 𝑁 be the number of available threads
 NR3 : 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒 + 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
 pNR3 : max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
 GenPre Concurrency : max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
 GenRule Parallelization : 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒 + 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛
2019.11.18. 20
a c
b ac b
a a b c
𝜀
<a>→<c,a,d>
<a>→<c,b,b>
<a,b>→<c,d>
<a,b>→<c,a>
<a>→
Rules
<a>
<a, b>
<c,a,d>
<a>→<c,a,d>
<a>→<c,b,b>
<a>→
Rules
<a,b>→<c,d>
hash table
GenRule par. (1/N)
GenPre Concurrency (max func) RemRedun

Experiment Environment
 Dataset
 D10C10N10R0.5 (IBM synthetic data generator)
 9,678 sequences, average length 31.22
 BMSWebView1 (a click stream dataset (Gazelle) from KDD Cup 2000)
 Experiment Machine
 Intel Core i7-3610QM 2.30GHz (4 physical and 8 logical cores)
 8GB RAM
 Microsoft Windows 7 Professional x64
 Implementation
 Java SE 8
 Default JVM settings
2019.11.18. 21

D10C10N10R0.5
 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5 − 0.9%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙
= 1
 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 22
0
5000
10000
15000
20000
25000
0.5 0.6 0.7 0.8 0.9
size
min_sup (%)
PreCond
RuleCand
Rules
0
50
100
150
200
250
300
0.5 0.6 0.7 0.8 0.9
runtime(s)
min_sup (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
0.5 0.6 0.7 0.8 0.9
runtime(%)
min_sup (%)
GenPre GenRule RemRedun
(sec) 0.5 0.6 0.7 0.8 0.9
NR3 241 152 99 69 54
2-pNR3 118 78 49 37 26
4-pNR3 74 47 31 22 17
8-pNR3 54 35 23 18 14
(sec) 0.5 0.6 0.7 0.8 0.9
GenPre 34 22 15 11 8
GenRule 206 130 83 57 46
RemRedun 0 0 0 0 0
Elapsed 241 152 99 69 54
(size) 0.5 0.6 0.7 0.8 0.9
PreCond 21563 15013 11105 8917 7262
RuleCand 3965 2418 1622 1258 956
Rules 3912 2414 1621 1258 956
100
1000
10000
100000
50 60 70 80 90
size-(logscale)
min_conf (%)
PreCond
RuleCand
Rules
0
50
100
150
200
250
300
50 60 70 80 90
runtime(s)
min_conf (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
50 60 70 80 90
runtime(%)
min_conf (%)
(sec) 50 60 70 80 90
NR3 241 184 176 170 167
2-pNR3 119 92 88 85 83
4-pNR3 74 56 50 52 52
8-pNR3 54 47 46 45 45
(sec) 50 60 70 80 90
GenPre 34 34 34 34 34
GenRule 206 149 140 135 132
RemRedun 0 0 0 0 0
Elapsed 241 184 176 170 167
(size) 50 60 70 80 90
PreCond 21563 21563 21563 21563 21563
RuleCand 3965 1392 527 374 297
Rules 3912 1372 519 368 294
max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛

BMSWebView1
 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.080 − 0.100%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.090%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 23
0
2000
4000
6000
8000
10000
0.080 0.085 0.090 0.095 0.100
size
min_sup (%)
PreCond
RuleCand
Rules
100
1000
10000
100000
0.080 0.085 0.090 0.095 0.100
runtime(s)-(logscale)
min_sup (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
0.080 0.085 0.090 0.095 0.100
runtime(%)
min_sup (%)
(sec) 0.080 0.085 0.090 0.095 0.100
NR3 43357 23729 12049 5063 2212
2-pNR3 21440 11737 6100 2567 1034
4-pNR3 12937 6839 3566 1550 618
8-pNR3 9567 5261 2721 1118 450
(sec) 0.080 0.085 0.090 0.095 0.100
GenPre 16 11 9 8 7
GenRule 43340 23718 12039 5055 2204
RemRedun 0 0 0 0 0
Elapsed 43357 23729 12049 5063 2212
(size) 0.080 0.085 0.090 0.095 0.100
PreCond 9476 7222 5734 4725 3981
RuleCand 6413 3638 2333 1605 1147
Rules 5976 3498 2260 1570 1139
0
1000
2000
3000
4000
5000
6000
50 60 70 80 90
size
min_conf (%)
PreCond
RuleCand
Rules
10
100
1000
10000
100000
50 60 70 80 90
runtime(s)-(logscale)
min_conf (%)
NR3 2-pNR3
4-pNR3 8-pNR3
0%
20%
40%
60%
80%
100%
50 60 70 80 90
runtime(%)
min_conf (%)
(sec) 50 60 70 80 90
NR3 12049 1778 304 145 104
2-pNR3 6100 932 157 72 50
4-pNR3 3566 580 90 42 32
8-pNR3 2721 400 69 32 22
(sec) 50 60 70 80 90
GenPre 9 9 9 9 10
GenRule 12039 1768 294 135 93
RemRedun 0 0 0 0 0
Elapsed 12049 1778 304 145 104
(size) 50 60 70 80 90
PreCond 5734 5734 5734 5734 5734
RuleCand 2333 1703 1173 685 288
Rules 2260 1648 1123 645 268
max 𝑡 𝐺𝑒𝑛𝑃𝑟𝑒, 𝑡 𝐺𝑒𝑛𝑅𝑢𝑙𝑒/𝑁 + 𝑡 𝑅𝑒𝑚𝑅𝑒𝑑𝑢𝑛

Loop Fused Mining of NR3 (LF-NR3)
2019.11.18. 24

Simplifying the all-projection operation
 Given the projected database 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒,
 The all-projected database 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙 can be simplified:
 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙
= 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 ∪ 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑙𝑎𝑠𝑡 𝑝𝑟𝑒
𝑎𝑙𝑙
2019.11.18. 25

Non-Redundant Recurrent Rules Miner (NR3)
2019.11.18. 26

Loop-Fused NR3 (LF-NR3)
2019.11.18. ‹#›

Data Structure Level Optimization for Projections
 For each sequence Si in SeqDB and a set I of events,
 A hash map 𝑀𝑎𝑝𝑖 ∶ 𝐼 → 2 1,…, 𝑆 𝑖
 such that each key 𝑒 ∈ 𝐼 is mapped to the set of values each of which is a temporal point
of event e occurring in Si
2019.11.18. 28

 Dataset
 8GB RAM
 Implementation
 Java SE 8
2019.11.18. 29

D10C10N10R0.5
 (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5 − 0.9%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙
= 1
 (d)-(f) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50 − 90%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 30

BMSWebView1
 (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.100 − 0.120%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙 = 1
2019.11.18. 31

Discussion
 Computational Complexity of the Algorithms
 𝐼 𝑘 × 𝐼 𝑘 (I : the set of events, k : the length of the longest frequent pattern)
 The effects of fusing loops in NR3
 The foreach loop in the GenRule step eliminated
 The use of intermediate data 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 simplifies the computation of
 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
𝑎𝑙𝑙
= 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 ∪ 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒 𝑙𝑎𝑠𝑡 𝑝𝑟𝑒
𝑎𝑙𝑙
 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
 The effect of the hash-based data structure
 The efficient computation of (all-)projected databases
 Using the hash-based data structure is not always efficient if the sequences are short
2019.11.18. 32

Parallel Loop Fused Mining of NR3 (pLF-NR3)
2019.11.18. 33

Loop-Fused NR3 (LF-NR3)
2019.11.18. ‹#›
Possible to use the task-parallelism
underlying in the LF-NR3 algorithm,
• which can be handled within the
single-producer-multiple-consumer
framework

Parallel Loop Fused NR3 (pLF-NR3)
2019.11.18. 35

 Dataset
 8GB RAM
 Implementation
 Java SE 8
2019.11.18. 36

D10C10N10R0.5
 (a)-(c) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 0.5 − 0.9%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙
= 1
2019.11.18. 37

BMSWebView1
2019.11.18. 38

Bidirectional Mining Non-Redundant Recurrent Rules (BOB)
based on David LO, Bolin DING, Lucia, Jiawei HAN, ICDE, 2011
2019.11.18. 39

Additional Definitions
 a sequence database 𝑆𝑒𝑞𝐷𝐵 – a set of sequences
 a sequence 𝑆 = 𝑒1, 𝑒2, … , 𝑒 𝑛
 the j-suffix of 𝑆 = 𝑒 𝑛−𝑗+1, 𝑒 𝑛−𝑗+2, … , 𝑒 𝑛
 𝑆′ is the 𝑗 𝑡ℎ minimum suffix of 𝑆,
if 𝑆′
is an suffix of 𝑆 iff no suffix starting with first(P) shorter than sx,
and longer than the (j-1)th minimum suffix
 The 𝒋 𝒕𝒉 suf-projection of 𝑆𝑒𝑞𝐷𝐵 with regarding to a pattern 𝑃
 𝑆𝑒𝑞𝐷𝐵𝑃
𝑠𝑢𝑓− 𝑗
= 𝑖, 𝑠𝑥 |𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑠𝑥 is the 𝑗 𝑡ℎ
minimum suffix of 𝑆𝑖 of 𝑃
 𝑆𝑒𝑞𝐷𝐵 pre-projected on 𝑃
 𝑆𝑒𝑞𝐷𝐵𝑃
𝑝𝑟𝑒
= 𝑖, 𝑝𝑥 𝑆𝑖 = 𝑝𝑥 ++𝑠𝑥 ∈ 𝑆𝑒𝑞𝐷𝐵, 𝑠𝑥 is 𝐭𝐡𝐞 𝐦𝐢𝐧𝐢𝐦𝐮𝐦 𝐬𝐮𝐟𝐟𝐢𝐱 of 𝑃 }
2019.11.18. 40

Anti-Monotonicity Property of Confidence
 Proposition 1
 Consider a rule 𝑅, in the form of 𝑅 𝑝𝑟𝑒 → 𝑅 𝑝𝑜𝑠𝑡, and a sequence database 𝑆𝑒𝑞𝐷𝐵
 𝑐𝑜𝑛𝑓 𝑅, 𝑆𝑒𝑞𝐷𝐵 =
sup 𝑅 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑟𝑒
𝑎𝑙𝑙
=
𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅 𝑝𝑟𝑒, 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
𝑝𝑟𝑒
 Proposition 2
 Consider two rules 𝑅 and 𝑅′ in a sequence database 𝑆𝑒𝑞𝐷𝐵 with 𝑅 𝑝𝑟𝑒
′ = 𝑅 𝑝𝑟𝑒 and
′
= 𝑒 ++𝑅 𝑝𝑜𝑠𝑡 for some event 𝑒 ∈ 𝐼
 𝑐𝑜𝑛𝑓 𝑅 ≥ 𝑐𝑜𝑛𝑓 𝑅′
 Theorem. Anti-Monotonicity Property of Confidence
 Consider two rules 𝑅 and 𝑅′
in a sequence database 𝑆𝑒𝑞𝐷𝐵 with 𝑅 𝑝𝑟𝑒
′
= 𝑅 𝑝𝑟𝑒 and
′
= 𝑒𝑣𝑠 ++𝑅 𝑝𝑜𝑠𝑡 where 𝑒𝑣𝑠 is an arbitrary series of events.
 𝑐𝑜𝑛𝑓 𝑅 ≥ 𝑐𝑜𝑛𝑓 𝑅′
 If 𝑅 is not confident enough(𝑐𝑜𝑛𝑓 𝑅 < 𝑚𝑖𝑛_𝑐𝑜𝑛𝑓), 𝑅′
is not either
2019.11.18. 41

Pruning Redundant Post-Conds
 In a sequence database 𝑆𝑒𝑞𝐷𝐵, consider a post condition candidate 𝑅 𝑝𝑜𝑠𝑡.
 Lemma 1
 If there is a post-condition candidate 𝑅 𝑝𝑜𝑠𝑡
′
⊏ 𝑅 𝑝𝑜𝑠𝑡 such that
 (i) 𝑅 𝑝𝑜𝑠𝑡
′
= 𝑃1 ++𝑒 ++𝑃2 while 𝑅 𝑝𝑜𝑠𝑡 = 𝑃1 ++𝑃2, for some event 𝑒, subsequences 𝑃1, (nonempty) 𝑃2
 (ii) 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
𝑝𝑟𝑒
= 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
′
𝑝𝑟𝑒
 then for any pre-condition candidate 𝑝𝑟𝑒 and any backward extension 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 of 𝑅 𝑝𝑜𝑠𝑡, the rule 𝑅 =
𝑝𝑟𝑒 → 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 is not confidence-closed
 i.e., there exists another rule 𝑅′
⊐ 𝑅 such that 𝑐𝑜𝑛𝑓 𝑅 = 𝑐𝑜𝑛𝑓 𝑅′
 Lemma 2
 If there is a post-condition candidate 𝑅 𝑝𝑜𝑠𝑡
′
⊐ 𝑅 𝑝𝑜𝑠𝑡 such that
 (i) 𝑅 𝑝𝑜𝑠𝑡
′
= 𝑃1 ++𝑒 ++𝑃2 while 𝑅 𝑝𝑜𝑠𝑡 = 𝑃1 ++𝑃2, for some event 𝑒, subsequences (nonempty) 𝑃1, 𝑃2
 (iii) ∀𝑗 ∶ 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
𝑠𝑢𝑓−𝑗
′
𝑠𝑢𝑓−𝑗
, and
 (iv) ∀𝑗 ∶ 𝑆𝑒𝑞𝐷𝐵 𝑅 𝑝𝑜𝑠𝑡
𝑠𝑢𝑓−𝑗
𝑎𝑙𝑙
′
𝑠𝑢𝑓−𝑗
′
𝑎𝑙𝑙
𝑝𝑟𝑒 → 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 is not support-closed
 i.e., there exists another rule 𝑅′
⊐ 𝑅 such that 𝑠𝑢𝑝 𝑅 = 𝑠𝑢𝑝 𝑅′
and 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑅 = 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑅′
 Theorem. Pruning Redundant Post-Conds
 If the properties (i)-(iv) in Lemma 1 and 2 are satisfied,
𝑝𝑟𝑒 → 𝑃 ++𝑅 𝑝𝑜𝑠𝑡 is redundant.
2019.11.18. 42

Bidirectional Pruning-based Recurrent Rule Mining(BOB)
2019.11.18. 43

Interleaved Bidirectional Mining of NR3 (iBiRM)
2019.11.18. 44

Optimizing Operations
 Given the sequence database 𝑆𝑒𝑞𝐷𝐵, and the rule 𝑅 = 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡
 𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑅, 𝑆𝑒𝑞𝐷𝐵 = 𝑠𝑢𝑝 𝑎𝑙𝑙 𝑝𝑜𝑠𝑡, 𝑆𝑒𝑞𝐷𝐵𝑝𝑟𝑒
 Pruning the search space of PRE early
 for 𝑅 = 𝑝𝑟𝑒 → 𝑝𝑜𝑠𝑡 and 𝑅′ = 𝑝𝑟𝑒 ++𝑒 → 𝑝𝑜𝑠𝑡,
 if 𝑠𝑢𝑝 𝑅, 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝, then 𝑠𝑢𝑝 𝑅′, 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝
 if 𝑠𝑢𝑝 𝑎𝑙𝑙
𝑅, 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝 𝑎𝑙𝑙
, then 𝑠𝑢𝑝 𝑎𝑙
𝑅′
, 𝑆𝑒𝑞𝐷𝐵 ≤ 𝑚𝑖𝑛_𝑠𝑢𝑝 𝑎𝑙𝑙
 Decreasing the number of scanning a database using a prefix tree
 for each pre-condition 𝑝𝑟𝑒 ∈ 𝑃𝑅𝐸, suppose that a node 𝑁0 ∈ 𝑇𝑃𝑂𝑆𝑇 has its children
nodes 𝑁1, … , 𝑁𝑘
 we can compute the instance supports of its children nodes 𝑁1, … , 𝑁𝑘 by scanning 𝑆𝑒𝑞𝐷𝐵
once
 When 𝑁0 corresponds to a post-condition 𝑝𝑜𝑠𝑡 ∈ 𝑃𝑂𝑆𝑇, each child node 𝑁𝑖 corresponds to
a post-condition 𝑝𝑜𝑠𝑡𝑖 = 𝑒𝑖 ++𝑝𝑜𝑠𝑡 for some event 𝑒𝑖, and the post condition of each child
node thus has its suffix 𝑝𝑜𝑠𝑡 in common.
 When scanning a sequence 𝑠 ∈ 𝑆𝑒𝑞𝐷𝐵, we record the positions of each 𝑒𝑖’s and
those of the events appearing in 𝑝𝑜𝑠𝑡, from which we can compute the number of
instances of 𝑝𝑟𝑒 ++𝑝𝑜𝑠𝑡𝑖 in 𝑠
2019.11.18. ‹#›

Bidirectional Pruning-based Recurrent Rule Mining(BOB)
2019.11.18. 46

Interleaved Bidirectional Recurrent Rule Miner (iBiRM)
2019.11.18. ‹#›

 Dataset
 Intel Core i5 2.50GHz
 8GB RAM
 Implementation
 Java SE 8
2019.11.18. 48

D5C20N10R0.5
 (a)-(d) 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 2.0 − 2.8%, 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 = 50%, 𝑚𝑖𝑛 _𝑠𝑢𝑝 𝑎𝑙𝑙
= 1
2019.11.18. 49

BMSWebView1
2019.11.18. 50

Conclusion & Future Works
 Conclusion
 We have proposed Parallel Non-Redundant Recurrent Rules Miner (pNR3)
 We have proposed Loop-Fused Non-Redundant Recurrent Rules Miner(LF-NR3)
 We have proposed Parallel Loop-Fused Non-Redundant Recurrent Rules Miner
(pLF-NR3)
 We have proposed Interleaved Bidirectional Non-Redundant Recurrent Rules Miner
(iBiRM)
 Future works
 Improvement of the sequential recurrent rule mining algorithm
 Improvement of the parallel algorithms
 Source codes are available at https://bitbucket.org/sekilab/nr3
2019.11.18. 52

Mining non-redundant recurrent rules from a sequence database

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mining non-redundant recurrent rules from a sequence database

Similar to Mining non-redundant recurrent rules from a sequence database (20)

More from SeungYong Yoon

More from SeungYong Yoon (9)

Recently uploaded

Recently uploaded (20)

Mining non-redundant recurrent rules from a sequence database

Editor's Notes