EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...Chuancong Gao
The document describes a method for mining top-k interesting phrases from ad-hoc document collections using sequence pattern indexing. It discusses existing approaches, presents the problem definition, and proposes a new approach that indexes prefix-maximal phrases ordered by position. This indexing structure allows efficient computation of top-k phrases through a merge join process combined with growing phrase patterns, enabled by optimizations like early termination and search space pruning. An evaluation compares the new approach to baseline methods.
CIKM 2009 - Efficient itemset generator discovery over a stream sliding windowChuancong Gao
The document describes an algorithm called StreamGen for efficiently mining frequent generator itemsets over data streams using a sliding window model. It introduces the concepts of generator itemsets and why they are important. StreamGen uses a novel enumeration tree structure and optimization techniques. It is the first algorithm that can mine generator itemsets from data streams. Evaluation results show it outperforms other algorithms for related tasks and achieves high classification accuracy when extended to mine classification rules.
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...Chuancong Gao
The document describes a method for mining top-k interesting phrases from ad-hoc document collections using sequence pattern indexing. It discusses existing approaches, presents the problem definition, and proposes a new approach that indexes prefix-maximal phrases ordered by position. This indexing structure allows efficient computation of top-k phrases through a merge join process combined with growing phrase patterns, enabled by optimizations like early termination and search space pruning. An evaluation compares the new approach to baseline methods.
CIKM 2009 - Efficient itemset generator discovery over a stream sliding windowChuancong Gao
The document describes an algorithm called StreamGen for efficiently mining frequent generator itemsets over data streams using a sliding window model. It introduces the concepts of generator itemsets and why they are important. StreamGen uses a novel enumeration tree structure and optimization techniques. It is the first algorithm that can mine generator itemsets from data streams. Evaluation results show it outperforms other algorithms for related tasks and achieves high classification accuracy when extended to mine classification rules.
35. 添加和删除操作
核心部分:枚举树节点状态转移矩阵
添加 删除
类型 𝑥 < 𝑦 𝑥 = 𝑦 x > 𝑦 𝑥 < 𝑦 𝑥 = 𝑦 x > 𝑦
G G G G G G/U I/G
U U G/U U U U I/U
I I I I/G/U I I I
𝑥 = 𝑖𝑡𝑒𝑚𝑠𝑒𝑡 𝑛 ∩ 𝑡 , 𝑦 = 𝑖𝑡𝑒𝑚𝑠𝑒𝑡 𝑛 − 1
G:生成器,U:无用,I:非频繁
𝑡:当前正在添加/删除的事务