0
Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

Sequential pattern mining

1,208

Published on

Published in: Education
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
1,208
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
27
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript

• 1. GUIDE : MS. ANAGHA CHAUDHARI
• 2. A sequence : < (ef) (ab) (df) c b >A sequence databaseSID sequence An element may contain a set of items. Items within an element are unordered10 <a(abc)(ac)d(cf)> and we list them alphabetically.20 <(ad)c(bc)(ae)>30 <(ef)(ab)(df)cb> <a(bc)df> is a subsequence of40 <eg(af)cbc> <a(abc)(ac)d(cf)> Given support threshold min_sup =2, <(ab)c> is a sequential pattern 6
• 3. CHALLENGES ON SEQUENTIALPATTERN MINING A huge number of possible sequential patterns are hidden in databases A mining algorithm should  find the complete set of patterns, when possible, satisfying the minimum support (frequency) threshold  be highly efficient, scalable, involving only a small number of database scans  be able to incorporate various kinds of user-specific constraints 7
• 4. The Apriori Algorithm—An Example Supmin = 2 Itemset sup Itemset supDatabase TDB {A} 2 Tid Items L1 {A} 2 C1 {B} 3 {B} 3 10 A, C, D {C} 3 1st scan {C} 3 20 B, C, E {D} 1 {E} 3 30 A, B, C, E {E} 3 40 B, E C2 Itemset sup C2 Itemset {A, B} 1 L2 Itemset sup 2nd scan {A, B} {A, C} 2 {A, C} 2 {A, C} {A, E} 1 {B, C} 2 {B, C} 2 {A, E} {B, E} 3 {B, E} 3 {B, C} {C, E} 2 {C, E} 2 {B, E} {C, E} Itemset 3rd scan L3 Itemset sup C3 {B, C, E} {B, C, E} 2 10
• 5. The Apriori Algorithm [Pseudo-Code]Ck: Candidate itemset of size kLk : frequent itemset of size kL1 = {frequent items};for (k = 1; Lk != ; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support endreturn k Lk; 11
• 6. APRIORI ADV/DISADV Advantages:  Uses large itemset property.  Easily parallelized  Easy to implement. Disadvantages:  Assumes transaction database is memory resident.  Requires up to m database scans.
• 7.  J. Han, J. Pei, and Y. Yin 2000 Depth-first search Avoid explicit candidate generation Adopt divide-and-conquer strategy Two step approach Step1:Build a compact data structure called FP tree Step2:Extract frequent itemsets from FP tree.
• 8. Step 1: FP-Tree Construction FP-Tree is constructed using 2 passes over the data-set: Pass 1:  Scan data and find support for each item.  Discard infrequent items.  Sort frequent items in decreasing order based on their support.
• 9. Pass 2:Nodes correspond to items and have a counter1. FP-Growth reads 1 transaction at a time and maps it to a path2. Fixed order is used, so paths can overlap when transactions share items (when they have the same prfix ). – In this case, counters are incremented3. Pointers are maintained between nodes containing the same item, creating singly linked lists (dotted lines) – The more paths that overlap, the higher the compression. FP-tree may fit in memory.4. Frequent itemsets extracted from the FP-Tree.
• 10.  Start from each frequent length-1 pattern (as an initial suffix pattern) construct its conditional pattern base (a ―subdatabase,‖which consists of the set of prefix paths in the FP-tree co-occurring with the suffix pattern) Construct its (conditional) FP-tree, and perform mining recursively on such a tree. The pattern growth is achieved by the concatenation of the suffix pattern with the frequent patterns generated from a conditional FP-tree.
• 11. Table : Table after first scan of databaseTable : Transactional data
• 12. Fig . FP – Tree Construction
• 13. EXAMPLE CONTTable:Mining FP Tree by creating conditional (sub)-pattern bases
• 14. EXAMPLE CONTFig.The conditional FP-tree associated with the conditiona node I3
• 15. FP-FROWTH ADV/DISADVAdvantages of FP-Growth • only 2 passes over data-set • ―compresses‖ data-set • no candidate generation • much faster than AprioriDisadvantages of FP-Growth • FP-Tree may not fit in memory!! • FP-Tree is expensive to build
• 16. APPLICATIONSCustomer shopping sequences:  First buy computer, then CD-ROM, and then digital camera, within 3 months.Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc.Telephone calling patterns, Weblog click streamsDNA sequences and gene structures 22
• 17. THANK YOU