Like this presentation? Why not share!

- Lecture 05 Association Rules Adva... by Pier Luca Lanzi 4811 views
- 5.3 mining sequential patterns by Krish_ver2 586 views
- 5.4 mining sequence patterns in bio... by Krish_ver2 409 views
- Sequential pattern mining by kiran said 2362 views
- SPADES Model for Training Managers by Franc Mlinarek 467 views
- Prof. Remigio De Ungria Lecture Sa... by Remigio Joseph De... 1519 views

No Downloads

Total views

5,745

On SlideShare

0

From Embeds

0

Number of Embeds

21

Shares

0

Downloads

218

Comments

0

Likes

6

No embeds

No notes for slide

- 1. Sequence mining algorithm Monica Dăgădiţă ISI
- 2. Introduction to sequence mining Why sequence mining? Sequence mining algorithms SPADE Motivation Definitions and examples Algorithm Implementation Data Mining 11/8/2011 2
- 3. Aim - finding statistically relevant patterns between data examples where the values are delivered in a sequence Originallyintroduced for market basket analysis - customer behaviour predictions2 types of sequence mining: string mining – biology (gene/protein sequences) itemset mining - marketing and CRM applications Data Mining 11/8/2011 3
- 4. Discovering patterns: Bookstore: 70% of the people who buy Jane Austen’s “Pride and Prejudice” also buy “Emma” within a month Website: finding sequences of most frequently accessed pages Usage: Promotions Shelf placement Restructure the website Recommender systems Data Mining 11/8/2011 4
- 5. Apriori GSP (Generalized Sequential Pattern) FreeSpan (Frequent pattern-projected Sequential pattern mining) PrefixSpan (Prefix-projected Sequential pattern mining) SPADE (Sequential PAttern Discovery using Equivalence classes) Data Mining 11/8/2011 5
- 6. Problems of existing solutions Repeated database scans Complex internal data structures Key features of SPADE: Fixed number of database scans Vertical id-list database format Decomposition of search space into smaller pieces – processed independently Data Mining 11/8/2011 6
- 7. Itemset: set of m distinct items I = {i1, i2, …, im } Event: non-empty collection of items (i1,i2 … ik) Sequence : ordered list of events < e1 -> e2 -> … -> en > K-sequence : sequence with k items (B->AC) – 3-sequence Data Mining 11/8/2011 7
- 8. Subsequence: given two sequences α=<a1 a2 … an> and β=<b1 b2 … bm>, α is called a subsequence of β, denoted as α⊆ β, if there exist integers 1≤ j1 < j2 <…< jn ≤m such that a1 ⊆ bj1, a2 ⊆ bj2,…, an ⊆ bjn Examples: 1. (B->AC) is a subsequence of (AB->E->ACD) 2. (AB->E) is not a subsequence of (ABE) Data Mining 11/8/2011 8
- 9. Data Mining 11/8/2011 9
- 10. Id-lists of the most frequent items (1-sequences) Data Mining 11/8/2011 10
- 11. D->BF->A Step 1: D->B Step 2: D->BF Data Mining 11/8/2011 11
- 12. D->BF->A Step 3 : D->BF->A Not space-efficient Solution: 2 columns - (sid,eid) for each sequence Eid – id of the sequence’s last item Data Mining 11/8/2011 12
- 13. D->BF->A (space-efficient id-list joins) D->B SID EID 1 15 1 20 4 20 D->BF->A D->BF SID EID SID EID 1 25 1 20 4 25 4 20 Data Mining 11/8/2011 13
- 14. Complete latice representation Data Mining 11/8/2011 14
- 15. Data Mining 11/8/2011 15
- 16. Decomposing the latice => smaller pieces that can be solved independently Equivalence classes 2 sequences are in the same class (Θk) if they share a common k length prefix Example k=1 : Θ1 -> {[A],[B],[D],[F]} Data Mining 11/8/2011 16
- 17. Data Mining 11/8/2011 17
- 18. Data Mining 11/8/2011 18
- 19. SPADE(min_sup,D) //min_sup – minimum_support //D –initial dataset F1<- {frequent items or 1-sequences} F2<- {frequent 2-sequences} Ε <- {equivalence classes [X] Θ1 } for all [X] in E enumerate_frequent_seq([X],min_sup) Data Mining 11/8/2011 19
- 20. Enumerate_frequent_seq(S,min_sup) for all Ai in S Ti <- {} for all Aj in S, with j≥i R<- Ai v Aj (join) if R satisfies min_sup Ti <- Ti U {R} end Enumerate_frequent_seq(Ti , min_sup) //DFS end For all non-empty Ti Enumerate_frequent_seq(Ti , min_sup) //BFS Data Mining 11/8/2011 20
- 21. The R Project for Statistical Computing developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues Different implementation of S language arulesSequences package Data Mining 11/8/2011 21
- 22. Data Mining 11/8/2011 22

No public clipboards found for this slide

Be the first to comment