SlideShare a Scribd company logo
1 of 27
LASH:
Large-Scale Sequence
Mining with Hierarchies
Kaustubh Beedkar (University of Mannheim)
Rainer Gemulla (University of Mannheim)
SIGMOD (2015)
발표자: 구한준
 Introduction
 Problem Definition
 Proposed Algorithm
 Experiment
 Conclusion
Contents
 Sequential Pattern Mining is used in many area
such as market-basket analysis, web usage
mining, language model etc.
 Some of items have hierarchies and frequency
can be different
Ex)
Introduction
Photography
Analog CameraDigital Camera
Canon Nikon
Frequent!
Not Frequent!
 MG-FSM , a state-of-the-art frequent sequence
miner, was suggested (SIGMOD, 2013) but,
doesn’t support hierarchies
 Other sequential pattern mining
BFS: APRIORI, GSP, SPADE..
DFS: FP-Growth, PrefixSpan, SPAM, BIDE, GAP-
BIDE..
Related Work
 Sequence database 𝒟 = {𝑇1, 𝑇2,…,𝑇|𝒟|}
 Each sequence 𝑇 = 𝑡1 𝑡2 𝑡3 … 𝑡 𝑛 is composed with
 Vocabulary W = {𝑤1, 𝑤2,…,𝑤|𝑊|}
Problem Variables
𝑇1 𝑎 𝑏1 𝑎 𝑏1
𝑇2 𝑎 𝑏3 𝑐 𝑐 𝑏2
𝑇3 𝑎 𝑐
𝑇4 𝑏11 𝑎 𝑒 𝑎
𝑇5 𝑎 𝑏12 𝑑1 𝑐
𝑇6 𝑏13 𝑓 𝑑2
 In GSM, vocabulary is arranged in a hierarchy
𝑓𝑜𝑟 𝑢, 𝑣 ∈ 𝑊
 if 𝑢 directly generalizes to v
𝑢 → 𝑣
 if u generalizes to v (include itself)
𝑢 →∗ 𝑣
Hierarchies
𝑏11 𝑏11𝑏11
𝑏1 𝑏3𝑏2
𝐵
*
* *
 Extend relation ’→’ to sequences
 for sequence 𝑇 = 𝑡1 𝑡2 … 𝑡 𝑛, 𝑆 = 𝑠1 𝑠2 … 𝑠 𝑛′
 𝑇 directly generalizes to sequence S,
denoted 𝑇 → 𝑆
 if 𝑛 = 𝑛′
 ∃𝑗, 1 ≤ 𝑗 ≤ 𝑛 𝑠. 𝑡. 𝑡𝑗 → 𝑠𝑗
 𝑡𝑖 = 𝑠𝑖 𝑓𝑜𝑟 𝑗 ≠ 𝑖
Ex)
𝑇1 ∶ 𝑎𝑏1 𝑎𝑏1 satisfies
𝑇1 → 𝑎𝐵𝑎𝑏1
𝑇1 → 𝑎𝑏1 𝑎𝐵
Generalized Sequence
 Extend relation ’→’ to sequences
 for sequence 𝑇 = 𝑡1 𝑡2 … 𝑡 𝑛, 𝑆 = 𝑠1 𝑠2 … 𝑠 𝑛′
 𝑇 directly generalizes to sequence S,
denoted 𝑇 → 𝑆
 if 𝑛 = 𝑛′
 e𝑥𝑖𝑠𝑡𝑠 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑗, 1 ≤ 𝑗 ≤ 𝑛 𝑠. 𝑡. 𝑡𝑗 →∗ 𝑠𝑗
 𝑡𝑖 = 𝑠𝑖 𝑓𝑜𝑟 𝑗 ≠ 𝑖
Ex)
𝑇1 ∶ 𝑎𝑏1 𝑎𝑏1 satisfies
𝑇1 →∗ 𝑎𝐵𝑎𝐵
Generalized Sequence
 𝑆 is subsequence of T , denoted 𝑆 ⊆ 𝛾 𝑇
 Gap Constraint 𝛾 ≥ 0 ( 𝛾 items in between item )
Ex) 𝑇5 ∶ 𝑎𝑏12 𝑑1 𝑐
𝑎𝑏12 ⊆0 𝑇5, 𝑎𝑑1 𝑐 ⊆1 𝑇5
Subsequence
𝑇5 𝑎 𝑏12 𝑑1 𝑐
⊆0 𝑎 𝑏12
⊆0 𝑏12 𝑑1
⊆1 𝑎 𝑏12 𝑐
⊆1 𝑎 𝑑1 𝑐
⊆2 𝑎 𝑐
 S is generalized subsequence of T
denoted 𝑆 ⊑ 𝛾 𝑇
Ex) 𝑇5 ∶ 𝑎𝑏12 𝑑1 𝑐
𝑎𝑏12 ⊑0 𝑇5, 𝑎𝑏1 ⊑0 𝑇5, 𝑎𝐵 ⊑0 𝑇5, 𝑎𝐷 ⊑1 𝑇5
Generalized Subsequences
𝑇5 𝑎 𝑏12 𝑑1 𝑐
⊑0 𝑎 𝑏12
⊑0 𝑎 𝑏1
⊑0 𝑎 𝐵
⊑1 𝑎 𝐷
⊑2 𝑎 𝐶
 𝑆𝑢𝑝 𝛾 𝑆, 𝐷 = {𝑇 ∈ 𝐷: 𝑆 ⊑ 𝛾 𝑇}
Support set of sequence S in the database D
(S : generalized subsequence of T)
 𝑓𝛾 𝑆, 𝐷 = |𝑆𝑢𝑝 𝛾 𝑆, 𝐷 |
S is frequent in D if 𝑓𝛾 𝑆, 𝐷 ≥ 𝜎
𝜎 > 0 is support threshold
Ex)
𝑆𝑢𝑝1 𝑎𝐵𝑐, 𝐷 = {𝑇2, 𝑇5}
𝑆𝑢𝑝0 𝑎𝐵𝑐, 𝐷 = {𝑇2}
Support
𝑇1 𝑎 𝑏1 𝑎 𝑏1
𝑇2 𝑎 𝑏3 𝑐 𝑐 𝑏2
𝑇3 𝑎 𝑐
𝑇4 𝑏11 𝑎 𝑒 𝑎
𝑇5 𝑎 𝑏12 𝑑1 𝑐
𝑇6 𝑏13 𝑓 𝑑2
 Given
 𝜎 > 0 a minimum support threshold
 γ ≥ 0 a maximum-gap constraint
 λ ≥ 2 a maximum-length constraint
 Find all frequent generalized sequences S that
satisfies
 2 ≤ 𝑆 ≤ 𝜆,
 𝑓𝛾(𝑆, 𝐷) ≥ 𝜎
Problem Definition
 Generate all all possible subsequence (Map Phase)
and count all of them. (Reduce Phase)
 𝐺𝜆,𝛾 𝑇 = 𝑆 𝑆 ⊑ 𝛾 𝑇, 2 ≤ 𝑆 ≤ 𝜆}
Ex)
𝑇4 ∶ 𝑏11 𝑎𝑒𝑎
𝐺𝜆=3,𝛾=1 𝑇4
= { 𝑏11 𝑎, 𝑏11 𝑒, 𝑎𝑒, 𝑎𝑎, 𝑒𝑎, 𝑏11 𝑎𝑒, 𝑏11 𝑎𝑎, 𝑏11 𝑒𝑎,
𝑎𝑒𝑎, 𝑏1 𝑎, 𝑏1 𝑒, 𝑏1 𝑎𝑒, 𝑏1 𝑎𝑎, 𝑏1 𝑒𝑎, 𝐵𝑎, 𝐵𝑒, 𝐵𝑎𝑒, 𝐵𝑎𝑎, 𝐵𝑒𝑎}
Naïve Algorithm
𝑇4 𝑏11 𝑎 𝑒 𝑎
⊑1 𝑏11 𝑎
⊑1 𝑏1 𝑎
⊑1 𝐵 𝑎
⊑1 𝐵 𝑒
⊑1 𝑎 𝑎
… …
 In Preprocessing Phase, make f-list and total
order
 𝑤1 < 𝑤2 𝑤ℎ𝑒𝑛 𝑓0 𝑤1, 𝐷 > 𝑓0 𝑤2, 𝐷
 Ancestor is smaller than descendant
Preprocess
𝑇1 𝑎 𝑏1 𝑎 𝑏1
𝑇2 𝑎 𝑏3 𝑐 𝑐 𝑏2
𝑇3 𝑎 𝑐
𝑇4 𝑏11 𝑎 𝑒 𝑎
𝑇5 𝑎 𝑏12 𝑑1 𝑐
𝑇6 𝑏13 𝑓 𝑑2
f-list (𝜎 ≥ 2)
a : 5
B : 5
𝑏1: 4
c : 3
D : 2
total order : a<B<𝑏1<c<D
 Generate Subsequence only if its element is
frequent
Ex) 𝑇4 ∶ 𝑏11 𝑎𝑒𝑎
𝐺𝜆=3,𝛾=1 𝑇4 = {𝑎𝑎, 𝑏1 𝑎, 𝑏1 𝑎𝑎, 𝐵𝑎, 𝐵𝑎𝑎}
Semi-Naïve Algorithm
f-list (𝜎 ≥ 2)
a : 5
B : 5
𝑏1: 4
c : 3
D : 2
𝑇4 𝑏11 𝑎 𝑒 𝑎
⊑1 𝑏11 𝑎
⊑1 𝑏1 𝑎 𝑎
⊑1 𝐵 𝑎
⊑1 𝐵 𝑒
⊑1 𝑎 𝑎
… …
 total order : a<B<𝑏1<c<D (a is the most frequent)
 p 𝑆 = 𝑚𝑎𝑥 𝑤∈𝑆 𝑆 , the pivot item of S (item which has
maximum order)
Ex) 𝑇1 = 𝑎𝑏1 𝑎𝑏1, 𝑝 𝑇1 = 𝑏1
 A partition 𝑃𝑤 is a set of sequences which have w as pivot
Ex)T1 ∈ 𝑃𝑏1
, a ∈ 𝑃𝑎, 𝑎𝑎 ∈ 𝑃𝑎 …
 from 𝑃𝑤, mine all generalized sequences that contain w
but no larger(in total order) item
Ex)𝑃𝑎 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑠 𝑜𝑓 ′𝑎′ 𝑠 𝑜𝑛𝑙𝑦, 𝑃𝐵 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑠 𝑜𝑓 ′𝑎′𝑠 & ′𝐵′𝑠
Partition
 total order : a<B<𝑏1<c<D (a is the most frequent)
 𝐺𝜆=3,𝛾=1 𝑇4 = {𝑎𝑎, 𝑏1 𝑎, 𝑏1 𝑎𝑎, 𝐵𝑎, 𝐵𝑎𝑎}
Partition
𝑃𝑎 𝑎𝑎 ← 𝒂
𝑃𝐵 𝐵𝑎 𝐵𝑎𝑎 ← 𝑎 , 𝑩
𝑃𝑏1
𝑏1 𝑎 𝑏1 𝑎𝑎 ← 𝑎, 𝐵, 𝒃 𝟏
𝑃𝑐 ← 𝑎, 𝐵, 𝑏1, 𝒄
𝑃 𝐷 ← 𝑎, 𝐵, 𝑏1, 𝑐, 𝑫
 two sequences T and T’ are w-equivalent
if 𝐺 𝑤,𝜆,𝛾(𝑇) = 𝐺 𝑤,𝜆,𝛾(𝑇′)
where
𝐺 𝑤,𝜆 ,𝛾 𝑇 = 𝑆 𝑆 ⊑ 𝛾 𝑇, 2 ≤ 𝑆 ≤ 𝜆, 𝑝 𝑆 = 𝑤}
total order : a<B<𝑏1<c<D
Ex) 𝑇4 ∶ 𝑏11 𝑎𝑒𝑎
𝐺 𝑤=𝐵,𝜆=3,𝛾=1(𝑇4) = {𝐵𝑎𝑎, 𝐵𝑎} = 𝐺 𝑤=𝐵,𝜆=3,𝛾=1(𝐵𝑎𝑎)
w-equivalency
𝑃𝑎 𝑎𝑎
𝑃𝐵 𝐵𝑎 𝐵𝑎𝑎
𝑃𝑏1
𝑏1 𝑎 𝑏1 𝑎𝑎
𝑃𝑐
𝑃 𝐷 Not necessary!
 An item 𝑤′ is w-relevant if 𝑤′ ≤ 𝑤 (more frequent)
 1) replace irrelevant items that doesn’t have an ancestor
𝑤′ < 𝑤 by the blank symbol ⊔
 2) replace the items which are irrelevant and have an
ancestor that are smaller than the pivot
Ex) a<B<𝑏1<c<D (pivot B)
𝑇2 ∶ 𝑎𝑏3 𝑐𝑐𝑏2 →∗ 𝑇2
′
∶ 𝑎 𝐵⊔⊔ 𝐵 regarding pivot B
w-generalization
𝑇2 𝑎 𝑏3 𝑐 𝑐 𝑏2
1) 𝑎 𝐵 𝑐 𝑐 𝑏2
1) 𝑎 𝐵 𝑐 𝑐 𝐵
2) 𝑎 𝐵 ⨆ 𝑐 𝐵
𝑇2
′
𝑎 𝐵 ⨆ ⨆ 𝐵
 purpose : make sequence as short as possible
 3) remove items that locate far away from pivot
Ex) 𝛾 = 1, 𝑝𝑖𝑣𝑜𝑡: 𝐷 , a<B<𝑏1<c<D
-> 𝜆 = 2 , 𝑎𝑐𝐷𝑎𝐷𝑐⊔ 𝜆 = 3, 𝑎𝑏1 𝑎𝑐𝐷𝑎𝐷𝑐⊔ 𝐵
w-generalization
𝑇 𝑎 𝑏1 𝑎 𝑐 𝑑1 𝑎 𝑑2 𝑐 𝑓 𝑏2 𝑐
𝑇′ 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐
𝜆 = 2 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐
𝜆 = 3 ? 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐
𝜆 = 3 ? 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐
𝜆 = 3 ? 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐
𝜆 = 3 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐
Proposed Algorithm
For each Transaction 𝑇𝑖
generate 𝑇𝑖′ regarding each frequent item 𝑓𝑗
Divide 𝑇𝑖′ to each partition
Do local Mining
 Local Mining can be done efficiently with PSM
instead of ‘Apriori’s (BFS,DFS)
 Instead of Searching every frequent sequence,
LASH can enumerate efficiently a sequence has
the pivot
Ex) pivot : c, {abc, cab , abc,…}
don’t need to find {ab} because it doesn’t have {c}
Pivot Sequence Miner
Pivot Sequence Miner
 Data Set: NYT, AMZN
 NYT (50M sentences from 1.8m articles)
 n gram mining from textual data
 AMZN (35m reviews from 6m users)
 customer behavior mining from product sequences
 Cluster
 11 Dell PowerEdge R720
 64GB memory, 8*2TB hard disks, 2 * Intel Xeon E5-
2640 6core CPUs
 Hadoop 0.20.2 (JDK 1.7)
Test Environment
Experiment
Experiment
 LASH is the first parallel algorithm for mining
frequent sequence with hierarchies
 LASH divides each sequence by pivot item and
performs local mining (PSM)
 LASH can search better than MG-FSM ( state-of-
the-art Algorithm for frequent sequence miner
without hierarchies)
because of PSM
Conclusion

More Related Content

What's hot

Z TRANSFORM PROPERTIES AND INVERSE Z TRANSFORM
Z TRANSFORM PROPERTIES AND INVERSE Z TRANSFORMZ TRANSFORM PROPERTIES AND INVERSE Z TRANSFORM
Z TRANSFORM PROPERTIES AND INVERSE Z TRANSFORMTowfeeq Umar
 
Boltzmann transport equation
Boltzmann transport equationBoltzmann transport equation
Boltzmann transport equationRajendra Prasad
 
aem : Fourier series of Even and Odd Function
aem :  Fourier series of Even and Odd Functionaem :  Fourier series of Even and Odd Function
aem : Fourier series of Even and Odd FunctionSukhvinder Singh
 
Lecture 09 -_davis_putnam
Lecture 09 -_davis_putnamLecture 09 -_davis_putnam
Lecture 09 -_davis_putnamIssen Sobri
 
PFDS 9.2.3 Lazy Representations
PFDS 9.2.3 Lazy RepresentationsPFDS 9.2.3 Lazy Representations
PFDS 9.2.3 Lazy Representations昌平 村山
 
On the Choice of Compressor Pressure in the Process of Pneumatic Transport to...
On the Choice of Compressor Pressure in the Process of Pneumatic Transport to...On the Choice of Compressor Pressure in the Process of Pneumatic Transport to...
On the Choice of Compressor Pressure in the Process of Pneumatic Transport to...BRNSS Publication Hub
 
TechMathI - Point slope form
TechMathI - Point slope formTechMathI - Point slope form
TechMathI - Point slope formlmrhodes
 
Applied numerical methods lec10
Applied numerical methods lec10Applied numerical methods lec10
Applied numerical methods lec10Yasser Ahmed
 

What's hot (19)

Mid term solution
Mid term solutionMid term solution
Mid term solution
 
Z TRANSFORM PROPERTIES AND INVERSE Z TRANSFORM
Z TRANSFORM PROPERTIES AND INVERSE Z TRANSFORMZ TRANSFORM PROPERTIES AND INVERSE Z TRANSFORM
Z TRANSFORM PROPERTIES AND INVERSE Z TRANSFORM
 
Boltzmann transport equation
Boltzmann transport equationBoltzmann transport equation
Boltzmann transport equation
 
aem : Fourier series of Even and Odd Function
aem :  Fourier series of Even and Odd Functionaem :  Fourier series of Even and Odd Function
aem : Fourier series of Even and Odd Function
 
Lecture 09 -_davis_putnam
Lecture 09 -_davis_putnamLecture 09 -_davis_putnam
Lecture 09 -_davis_putnam
 
Permutations 2020
Permutations 2020Permutations 2020
Permutations 2020
 
PFDS 9.2.3 Lazy Representations
PFDS 9.2.3 Lazy RepresentationsPFDS 9.2.3 Lazy Representations
PFDS 9.2.3 Lazy Representations
 
On the Choice of Compressor Pressure in the Process of Pneumatic Transport to...
On the Choice of Compressor Pressure in the Process of Pneumatic Transport to...On the Choice of Compressor Pressure in the Process of Pneumatic Transport to...
On the Choice of Compressor Pressure in the Process of Pneumatic Transport to...
 
Ejercicios 4
Ejercicios 4Ejercicios 4
Ejercicios 4
 
21 All Pairs Shortest Path
21 All Pairs Shortest Path21 All Pairs Shortest Path
21 All Pairs Shortest Path
 
Cs419 lec5 lexical analysis using dfa
Cs419 lec5   lexical analysis using dfaCs419 lec5   lexical analysis using dfa
Cs419 lec5 lexical analysis using dfa
 
Cs419 lec6 lexical analysis using nfa
Cs419 lec6   lexical analysis using nfaCs419 lec6   lexical analysis using nfa
Cs419 lec6 lexical analysis using nfa
 
Shortest path
Shortest pathShortest path
Shortest path
 
TechMathI - Point slope form
TechMathI - Point slope formTechMathI - Point slope form
TechMathI - Point slope form
 
Splay Tree
Splay TreeSplay Tree
Splay Tree
 
Quadrature
QuadratureQuadrature
Quadrature
 
Fourier series and transforms
Fourier series and transformsFourier series and transforms
Fourier series and transforms
 
Limit & continuity, B.Sc . 1 calculus , Unit - 1
Limit & continuity, B.Sc . 1 calculus , Unit - 1Limit & continuity, B.Sc . 1 calculus , Unit - 1
Limit & continuity, B.Sc . 1 calculus , Unit - 1
 
Applied numerical methods lec10
Applied numerical methods lec10Applied numerical methods lec10
Applied numerical methods lec10
 

Viewers also liked

Database Research Principles Revealed
Database Research Principles RevealedDatabase Research Principles Revealed
Database Research Principles Revealedinfoblog
 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesGuohui Xiao
 
Finding All Maximal Cliques in Very Large Social Networks
Finding All Maximal Cliques in Very Large Social NetworksFinding All Maximal Cliques in Very Large Social Networks
Finding All Maximal Cliques in Very Large Social NetworksAntonio Maccioni
 
Interactive Browsing and Navigation in Relational Databases
Interactive Browsing and Navigation in Relational DatabasesInteractive Browsing and Navigation in Relational Databases
Interactive Browsing and Navigation in Relational DatabasesMinsuk Kahng
 
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...Pieter Pauwels
 
Can you trust the internet? An introduction to graph theory, computational co...
Can you trust the internet? An introduction to graph theory, computational co...Can you trust the internet? An introduction to graph theory, computational co...
Can you trust the internet? An introduction to graph theory, computational co...Denise Gosnell, Ph.D.
 
BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015Zuhair khayyat
 
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra SigmodJeff Hammerbacher
 
音声認識の基礎
音声認識の基礎音声認識の基礎
音声認識の基礎Akinori Ito
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 

Viewers also liked (10)

Database Research Principles Revealed
Database Research Principles RevealedDatabase Research Principles Revealed
Database Research Principles Revealed
 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational Databases
 
Finding All Maximal Cliques in Very Large Social Networks
Finding All Maximal Cliques in Very Large Social NetworksFinding All Maximal Cliques in Very Large Social Networks
Finding All Maximal Cliques in Very Large Social Networks
 
Interactive Browsing and Navigation in Relational Databases
Interactive Browsing and Navigation in Relational DatabasesInteractive Browsing and Navigation in Relational Databases
Interactive Browsing and Navigation in Relational Databases
 
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
 
Can you trust the internet? An introduction to graph theory, computational co...
Can you trust the internet? An introduction to graph theory, computational co...Can you trust the internet? An introduction to graph theory, computational co...
Can you trust the internet? An introduction to graph theory, computational co...
 
BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015
 
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra Sigmod
 
音声認識の基礎
音声認識の基礎音声認識の基礎
音声認識の基礎
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 

Similar to Lash

Matrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence SpacesMatrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence SpacesIOSR Journals
 
WWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generatorsWWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generatorsChuancong Gao
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Hojin Yang
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodelsDai-Hai Nguyen
 
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix MappingDual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mappinginventionjournals
 
Some properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesSome properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesIOSR Journals
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementationJongsu "Liam" Kim
 
Paper Study: Transformer dissection
Paper Study: Transformer dissectionPaper Study: Transformer dissection
Paper Study: Transformer dissectionChenYiHuang5
 
Complex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptxComplex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptxjyotidighole2
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종WooSung Choi
 
DSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxDSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxHamedNassar5
 
Machine learning ppt and presentation code
Machine learning ppt and presentation codeMachine learning ppt and presentation code
Machine learning ppt and presentation codesharma239172
 
Rational function 11
Rational function 11Rational function 11
Rational function 11AjayQuines
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine LearningSEMINARGROOT
 
On Series of Fuzzy Numbers
On Series of Fuzzy NumbersOn Series of Fuzzy Numbers
On Series of Fuzzy NumbersIOSR Journals
 
Review of generative adversarial nets
Review of generative adversarial netsReview of generative adversarial nets
Review of generative adversarial netsSungminYou
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowTatsuya Shirakawa
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copyShuai Zhang
 

Similar to Lash (20)

Matrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence SpacesMatrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence Spaces
 
WWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generatorsWWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generators
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodels
 
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix MappingDual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
 
Some properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesSome properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spaces
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
 
Paper Study: Transformer dissection
Paper Study: Transformer dissectionPaper Study: Transformer dissection
Paper Study: Transformer dissection
 
Complex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptxComplex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptx
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
DSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxDSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptx
 
Machine learning ppt and presentation code
Machine learning ppt and presentation codeMachine learning ppt and presentation code
Machine learning ppt and presentation code
 
Rational function 11
Rational function 11Rational function 11
Rational function 11
 
Fourier series
Fourier series Fourier series
Fourier series
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine Learning
 
On Series of Fuzzy Numbers
On Series of Fuzzy NumbersOn Series of Fuzzy Numbers
On Series of Fuzzy Numbers
 
Review of generative adversarial nets
Review of generative adversarial netsReview of generative adversarial nets
Review of generative adversarial nets
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive Flow
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
 

Recently uploaded

如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...varanasisatyanvesh
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIf6x4zqzk86
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0
 

Recently uploaded (20)

如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 

Lash

  • 1. LASH: Large-Scale Sequence Mining with Hierarchies Kaustubh Beedkar (University of Mannheim) Rainer Gemulla (University of Mannheim) SIGMOD (2015) 발표자: 구한준
  • 2.  Introduction  Problem Definition  Proposed Algorithm  Experiment  Conclusion Contents
  • 3.  Sequential Pattern Mining is used in many area such as market-basket analysis, web usage mining, language model etc.  Some of items have hierarchies and frequency can be different Ex) Introduction Photography Analog CameraDigital Camera Canon Nikon Frequent! Not Frequent!
  • 4.  MG-FSM , a state-of-the-art frequent sequence miner, was suggested (SIGMOD, 2013) but, doesn’t support hierarchies  Other sequential pattern mining BFS: APRIORI, GSP, SPADE.. DFS: FP-Growth, PrefixSpan, SPAM, BIDE, GAP- BIDE.. Related Work
  • 5.  Sequence database 𝒟 = {𝑇1, 𝑇2,…,𝑇|𝒟|}  Each sequence 𝑇 = 𝑡1 𝑡2 𝑡3 … 𝑡 𝑛 is composed with  Vocabulary W = {𝑤1, 𝑤2,…,𝑤|𝑊|} Problem Variables 𝑇1 𝑎 𝑏1 𝑎 𝑏1 𝑇2 𝑎 𝑏3 𝑐 𝑐 𝑏2 𝑇3 𝑎 𝑐 𝑇4 𝑏11 𝑎 𝑒 𝑎 𝑇5 𝑎 𝑏12 𝑑1 𝑐 𝑇6 𝑏13 𝑓 𝑑2
  • 6.  In GSM, vocabulary is arranged in a hierarchy 𝑓𝑜𝑟 𝑢, 𝑣 ∈ 𝑊  if 𝑢 directly generalizes to v 𝑢 → 𝑣  if u generalizes to v (include itself) 𝑢 →∗ 𝑣 Hierarchies 𝑏11 𝑏11𝑏11 𝑏1 𝑏3𝑏2 𝐵 * * *
  • 7.  Extend relation ’→’ to sequences  for sequence 𝑇 = 𝑡1 𝑡2 … 𝑡 𝑛, 𝑆 = 𝑠1 𝑠2 … 𝑠 𝑛′  𝑇 directly generalizes to sequence S, denoted 𝑇 → 𝑆  if 𝑛 = 𝑛′  ∃𝑗, 1 ≤ 𝑗 ≤ 𝑛 𝑠. 𝑡. 𝑡𝑗 → 𝑠𝑗  𝑡𝑖 = 𝑠𝑖 𝑓𝑜𝑟 𝑗 ≠ 𝑖 Ex) 𝑇1 ∶ 𝑎𝑏1 𝑎𝑏1 satisfies 𝑇1 → 𝑎𝐵𝑎𝑏1 𝑇1 → 𝑎𝑏1 𝑎𝐵 Generalized Sequence
  • 8.  Extend relation ’→’ to sequences  for sequence 𝑇 = 𝑡1 𝑡2 … 𝑡 𝑛, 𝑆 = 𝑠1 𝑠2 … 𝑠 𝑛′  𝑇 directly generalizes to sequence S, denoted 𝑇 → 𝑆  if 𝑛 = 𝑛′  e𝑥𝑖𝑠𝑡𝑠 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑗, 1 ≤ 𝑗 ≤ 𝑛 𝑠. 𝑡. 𝑡𝑗 →∗ 𝑠𝑗  𝑡𝑖 = 𝑠𝑖 𝑓𝑜𝑟 𝑗 ≠ 𝑖 Ex) 𝑇1 ∶ 𝑎𝑏1 𝑎𝑏1 satisfies 𝑇1 →∗ 𝑎𝐵𝑎𝐵 Generalized Sequence
  • 9.  𝑆 is subsequence of T , denoted 𝑆 ⊆ 𝛾 𝑇  Gap Constraint 𝛾 ≥ 0 ( 𝛾 items in between item ) Ex) 𝑇5 ∶ 𝑎𝑏12 𝑑1 𝑐 𝑎𝑏12 ⊆0 𝑇5, 𝑎𝑑1 𝑐 ⊆1 𝑇5 Subsequence 𝑇5 𝑎 𝑏12 𝑑1 𝑐 ⊆0 𝑎 𝑏12 ⊆0 𝑏12 𝑑1 ⊆1 𝑎 𝑏12 𝑐 ⊆1 𝑎 𝑑1 𝑐 ⊆2 𝑎 𝑐
  • 10.  S is generalized subsequence of T denoted 𝑆 ⊑ 𝛾 𝑇 Ex) 𝑇5 ∶ 𝑎𝑏12 𝑑1 𝑐 𝑎𝑏12 ⊑0 𝑇5, 𝑎𝑏1 ⊑0 𝑇5, 𝑎𝐵 ⊑0 𝑇5, 𝑎𝐷 ⊑1 𝑇5 Generalized Subsequences 𝑇5 𝑎 𝑏12 𝑑1 𝑐 ⊑0 𝑎 𝑏12 ⊑0 𝑎 𝑏1 ⊑0 𝑎 𝐵 ⊑1 𝑎 𝐷 ⊑2 𝑎 𝐶
  • 11.  𝑆𝑢𝑝 𝛾 𝑆, 𝐷 = {𝑇 ∈ 𝐷: 𝑆 ⊑ 𝛾 𝑇} Support set of sequence S in the database D (S : generalized subsequence of T)  𝑓𝛾 𝑆, 𝐷 = |𝑆𝑢𝑝 𝛾 𝑆, 𝐷 | S is frequent in D if 𝑓𝛾 𝑆, 𝐷 ≥ 𝜎 𝜎 > 0 is support threshold Ex) 𝑆𝑢𝑝1 𝑎𝐵𝑐, 𝐷 = {𝑇2, 𝑇5} 𝑆𝑢𝑝0 𝑎𝐵𝑐, 𝐷 = {𝑇2} Support 𝑇1 𝑎 𝑏1 𝑎 𝑏1 𝑇2 𝑎 𝑏3 𝑐 𝑐 𝑏2 𝑇3 𝑎 𝑐 𝑇4 𝑏11 𝑎 𝑒 𝑎 𝑇5 𝑎 𝑏12 𝑑1 𝑐 𝑇6 𝑏13 𝑓 𝑑2
  • 12.  Given  𝜎 > 0 a minimum support threshold  γ ≥ 0 a maximum-gap constraint  λ ≥ 2 a maximum-length constraint  Find all frequent generalized sequences S that satisfies  2 ≤ 𝑆 ≤ 𝜆,  𝑓𝛾(𝑆, 𝐷) ≥ 𝜎 Problem Definition
  • 13.  Generate all all possible subsequence (Map Phase) and count all of them. (Reduce Phase)  𝐺𝜆,𝛾 𝑇 = 𝑆 𝑆 ⊑ 𝛾 𝑇, 2 ≤ 𝑆 ≤ 𝜆} Ex) 𝑇4 ∶ 𝑏11 𝑎𝑒𝑎 𝐺𝜆=3,𝛾=1 𝑇4 = { 𝑏11 𝑎, 𝑏11 𝑒, 𝑎𝑒, 𝑎𝑎, 𝑒𝑎, 𝑏11 𝑎𝑒, 𝑏11 𝑎𝑎, 𝑏11 𝑒𝑎, 𝑎𝑒𝑎, 𝑏1 𝑎, 𝑏1 𝑒, 𝑏1 𝑎𝑒, 𝑏1 𝑎𝑎, 𝑏1 𝑒𝑎, 𝐵𝑎, 𝐵𝑒, 𝐵𝑎𝑒, 𝐵𝑎𝑎, 𝐵𝑒𝑎} Naïve Algorithm 𝑇4 𝑏11 𝑎 𝑒 𝑎 ⊑1 𝑏11 𝑎 ⊑1 𝑏1 𝑎 ⊑1 𝐵 𝑎 ⊑1 𝐵 𝑒 ⊑1 𝑎 𝑎 … …
  • 14.  In Preprocessing Phase, make f-list and total order  𝑤1 < 𝑤2 𝑤ℎ𝑒𝑛 𝑓0 𝑤1, 𝐷 > 𝑓0 𝑤2, 𝐷  Ancestor is smaller than descendant Preprocess 𝑇1 𝑎 𝑏1 𝑎 𝑏1 𝑇2 𝑎 𝑏3 𝑐 𝑐 𝑏2 𝑇3 𝑎 𝑐 𝑇4 𝑏11 𝑎 𝑒 𝑎 𝑇5 𝑎 𝑏12 𝑑1 𝑐 𝑇6 𝑏13 𝑓 𝑑2 f-list (𝜎 ≥ 2) a : 5 B : 5 𝑏1: 4 c : 3 D : 2 total order : a<B<𝑏1<c<D
  • 15.  Generate Subsequence only if its element is frequent Ex) 𝑇4 ∶ 𝑏11 𝑎𝑒𝑎 𝐺𝜆=3,𝛾=1 𝑇4 = {𝑎𝑎, 𝑏1 𝑎, 𝑏1 𝑎𝑎, 𝐵𝑎, 𝐵𝑎𝑎} Semi-Naïve Algorithm f-list (𝜎 ≥ 2) a : 5 B : 5 𝑏1: 4 c : 3 D : 2 𝑇4 𝑏11 𝑎 𝑒 𝑎 ⊑1 𝑏11 𝑎 ⊑1 𝑏1 𝑎 𝑎 ⊑1 𝐵 𝑎 ⊑1 𝐵 𝑒 ⊑1 𝑎 𝑎 … …
  • 16.  total order : a<B<𝑏1<c<D (a is the most frequent)  p 𝑆 = 𝑚𝑎𝑥 𝑤∈𝑆 𝑆 , the pivot item of S (item which has maximum order) Ex) 𝑇1 = 𝑎𝑏1 𝑎𝑏1, 𝑝 𝑇1 = 𝑏1  A partition 𝑃𝑤 is a set of sequences which have w as pivot Ex)T1 ∈ 𝑃𝑏1 , a ∈ 𝑃𝑎, 𝑎𝑎 ∈ 𝑃𝑎 …  from 𝑃𝑤, mine all generalized sequences that contain w but no larger(in total order) item Ex)𝑃𝑎 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑠 𝑜𝑓 ′𝑎′ 𝑠 𝑜𝑛𝑙𝑦, 𝑃𝐵 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑠 𝑜𝑓 ′𝑎′𝑠 & ′𝐵′𝑠 Partition
  • 17.  total order : a<B<𝑏1<c<D (a is the most frequent)  𝐺𝜆=3,𝛾=1 𝑇4 = {𝑎𝑎, 𝑏1 𝑎, 𝑏1 𝑎𝑎, 𝐵𝑎, 𝐵𝑎𝑎} Partition 𝑃𝑎 𝑎𝑎 ← 𝒂 𝑃𝐵 𝐵𝑎 𝐵𝑎𝑎 ← 𝑎 , 𝑩 𝑃𝑏1 𝑏1 𝑎 𝑏1 𝑎𝑎 ← 𝑎, 𝐵, 𝒃 𝟏 𝑃𝑐 ← 𝑎, 𝐵, 𝑏1, 𝒄 𝑃 𝐷 ← 𝑎, 𝐵, 𝑏1, 𝑐, 𝑫
  • 18.  two sequences T and T’ are w-equivalent if 𝐺 𝑤,𝜆,𝛾(𝑇) = 𝐺 𝑤,𝜆,𝛾(𝑇′) where 𝐺 𝑤,𝜆 ,𝛾 𝑇 = 𝑆 𝑆 ⊑ 𝛾 𝑇, 2 ≤ 𝑆 ≤ 𝜆, 𝑝 𝑆 = 𝑤} total order : a<B<𝑏1<c<D Ex) 𝑇4 ∶ 𝑏11 𝑎𝑒𝑎 𝐺 𝑤=𝐵,𝜆=3,𝛾=1(𝑇4) = {𝐵𝑎𝑎, 𝐵𝑎} = 𝐺 𝑤=𝐵,𝜆=3,𝛾=1(𝐵𝑎𝑎) w-equivalency 𝑃𝑎 𝑎𝑎 𝑃𝐵 𝐵𝑎 𝐵𝑎𝑎 𝑃𝑏1 𝑏1 𝑎 𝑏1 𝑎𝑎 𝑃𝑐 𝑃 𝐷 Not necessary!
  • 19.  An item 𝑤′ is w-relevant if 𝑤′ ≤ 𝑤 (more frequent)  1) replace irrelevant items that doesn’t have an ancestor 𝑤′ < 𝑤 by the blank symbol ⊔  2) replace the items which are irrelevant and have an ancestor that are smaller than the pivot Ex) a<B<𝑏1<c<D (pivot B) 𝑇2 ∶ 𝑎𝑏3 𝑐𝑐𝑏2 →∗ 𝑇2 ′ ∶ 𝑎 𝐵⊔⊔ 𝐵 regarding pivot B w-generalization 𝑇2 𝑎 𝑏3 𝑐 𝑐 𝑏2 1) 𝑎 𝐵 𝑐 𝑐 𝑏2 1) 𝑎 𝐵 𝑐 𝑐 𝐵 2) 𝑎 𝐵 ⨆ 𝑐 𝐵 𝑇2 ′ 𝑎 𝐵 ⨆ ⨆ 𝐵
  • 20.  purpose : make sequence as short as possible  3) remove items that locate far away from pivot Ex) 𝛾 = 1, 𝑝𝑖𝑣𝑜𝑡: 𝐷 , a<B<𝑏1<c<D -> 𝜆 = 2 , 𝑎𝑐𝐷𝑎𝐷𝑐⊔ 𝜆 = 3, 𝑎𝑏1 𝑎𝑐𝐷𝑎𝐷𝑐⊔ 𝐵 w-generalization 𝑇 𝑎 𝑏1 𝑎 𝑐 𝑑1 𝑎 𝑑2 𝑐 𝑓 𝑏2 𝑐 𝑇′ 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐 𝜆 = 2 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐 𝜆 = 3 ? 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐 𝜆 = 3 ? 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐 𝜆 = 3 ? 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐 𝜆 = 3 𝑎 𝑏1 𝑎 𝑐 𝑫 𝑎 𝑫 𝑐 ⊔ 𝐵 𝑐
  • 21. Proposed Algorithm For each Transaction 𝑇𝑖 generate 𝑇𝑖′ regarding each frequent item 𝑓𝑗 Divide 𝑇𝑖′ to each partition Do local Mining
  • 22.  Local Mining can be done efficiently with PSM instead of ‘Apriori’s (BFS,DFS)  Instead of Searching every frequent sequence, LASH can enumerate efficiently a sequence has the pivot Ex) pivot : c, {abc, cab , abc,…} don’t need to find {ab} because it doesn’t have {c} Pivot Sequence Miner
  • 24.  Data Set: NYT, AMZN  NYT (50M sentences from 1.8m articles)  n gram mining from textual data  AMZN (35m reviews from 6m users)  customer behavior mining from product sequences  Cluster  11 Dell PowerEdge R720  64GB memory, 8*2TB hard disks, 2 * Intel Xeon E5- 2640 6core CPUs  Hadoop 0.20.2 (JDK 1.7) Test Environment
  • 27.  LASH is the first parallel algorithm for mining frequent sequence with hierarchies  LASH divides each sequence by pivot item and performs local mining (PSM)  LASH can search better than MG-FSM ( state-of- the-art Algorithm for frequent sequence miner without hierarchies) because of PSM Conclusion

Editor's Notes

  1. 반복되는 패턴을 찾아낸다! Market-basket Analysis : 아이템 A를 사면, B를 사더라.. 등의 정보를 찾아내면, 유용하게 쓸 수 있다. Web Usage Mining : 사이트 A를 방문 한 뒤엔 꼭 사이트 B를 방문 하더라. 등의 정보를 찾아낼 수 있다. Language Model : 어떤 단어와 함께 오는 단어, 연관 단어 등을 찾을 수 있다. 위의 예시처럼, 어떤 문서에서 반복되는 패턴을 찾는데 쓰인다. 추가로, 어떤 아이템들은 그림과 같은 구조가 있고, 각각의 세부 아이템들은 많이 안 나오더라도, 그것의 부모는 많이 나올 수 있다. 따라서.. 아이템 C를 사면 Canon은 많이 안 사더라도, photography는 많이 사더라. 를 찾아내고 싶은 것!
  2. 원래 MG-FSM는 빨랐으나 그림과 같은 구조를 반영 못함 맵리듀스 알고리즘으로 여러 컴퓨터에서 돌릴 수 있는 알고리즘 말고, 싱글머신용 알고리즘들은 BFS 스타일, DFS 스타일로 각각 다양한 알고리즘들이 존재함
  3. 문제!!! 각 아이템이 최대 감마 만큼 떨어져있을 수 있는 조건 하에서, 길이가 최대 람다인, 시그마 보다 자주 나오는 패턴을 찾아보자!
  4. 멍청하게는, 각각의 트랜잭션에 대하여, 모든 가능한 경우를 다 만들고 그 각각의 경우를 센 뒤에, 시그마보다 많이 나온걸 알려준다!
  5. 세미-나이브, LASH알고리즘은 Preprocess과정을 필요로하는데, 데이터를 full-scan해서, 어떤 아이템이 몇번 나오더라를 쭉 세고, 시그마보다 많이 나온것들만 정렬하여 순서를 만든다!
  6. 그럼, 아까 나이브에서는 각각의 트랜잭션에 대하여 모든 경우를 다 만들었지만, 이번에는 자주 나왔던 아이템들로만 가능한 조합을 만들어서 헤아려도 된다! 왜냐면, 자주 안나오는 단어가 있는 sequence가 자주 나올리가 없기 때문 (Apriori rule)
  7. LASH에서는 조금 더 나아가서 각각의 트랜잭션에서 생성된 시퀀스들을 파티션이란 개념으로 나누고자 한다! 피벗은 어떤 시퀀스에서 가장 적은 프리퀀시를 가지는 아이템을 말한다. 파티션은, 피벗이 같은 시퀀스들의 집합
  8. 아까 세미-나이브의 예제는 이렇게 파티셔닝이 된다.
  9. 각 파티션에 모이는 transactio이 다음과 같은 분포를 이룬다고 하자! (꼭 가우시안일 필요는 없지만, 어떤 파티션은 많고, 어떤 파티션은 적을 것이다.)
  10. 그럼 저렇게 적게 모이는 애들을 모아서 한 머신에 모으면 공평하게 모이지 않을까?
  11. 여기에 그림처럼! 각 머신에 모이는 트랜잭션 숫자가 비슷했으면 좋겠다!
  12. 이것을 문제로 표기하면 다음과 같다 n개의 파티션에 있는 시퀀스들의 총 양이 c_i 라고 하면, 그것들을 k개의 머신에 잘 나누어 담는 것이다. 목적은, 머신에 담긴 시퀀스의 총 양의 맥시멈을 미니마이즈 하는 것이다. (가장 무거운놈의 무게를 최소화)
  13. 이건 멀티프로세서 스케줄링 문젠데, 이걸 풀기위해 가장 무거운 것들을 가장 가벼운 머신에 넣는 그리디 선택을 한다고 하자! 이것은 저러한 바운드를 가지는 approximatio알고리즘이다.
  14. 각 파티션의 cost를 예측해야하는데. 이것 역시 쉽지 않다. 어떤 파티션에 몇 개나 몰릴건지 직접 세기 전에 어떻게 알것인가?! 이것을 확률로 계산하면 위와 같이 계산 할 수있다. 길이 L의 시퀀스의 각 아이템을 Frequent한 item셋에서 뽑는데, 뽑힐 확률은 frequenc와 같다. 이렇게 각 파티션에 들어갈 시퀀스가 생성될 확률을 계산하면 대충 어떤 파티션에 아이템이 얼마나 많을지 예측 할 수있고, 이것을 cost로 삼아서 LPT 알고리즘을 통해 파티셔닝을 할 수 있다.
  15. 각 파티션에 데이터를 보내는데, 보낼때, 그 파티션에 필요한 정보들만 남겨놓고 나머지는 자르거나, blank로 처리해서 보내는 편이 좋다.(압축이 잘됨) 따라서 어떤 정보만 남겨서 보낼 것인지를 w-equivalency와 w-generalizatio을 통해 설명한다.
  16. 적힌대로, pivot이 존재함을 이용하여, 기존의 싱글머신 Frequent Sequence Miner 보다 더 나은 탐색을 할 수 있는데, 그것이 PSM이다.