Your SlideShare is downloading. ×
0
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Sequential pattern mining

1,208

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,208
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. GUIDE : MS. ANAGHA CHAUDHARI
  • 2. A sequence : < (ef) (ab) (df) c b >A sequence databaseSID sequence An element may contain a set of items. Items within an element are unordered10 <a(abc)(ac)d(cf)> and we list them alphabetically.20 <(ad)c(bc)(ae)>30 <(ef)(ab)(df)cb> <a(bc)df> is a subsequence of40 <eg(af)cbc> <a(abc)(ac)d(cf)> Given support threshold min_sup =2, <(ab)c> is a sequential pattern 6
  • 3. CHALLENGES ON SEQUENTIALPATTERN MINING A huge number of possible sequential patterns are hidden in databases A mining algorithm should  find the complete set of patterns, when possible, satisfying the minimum support (frequency) threshold  be highly efficient, scalable, involving only a small number of database scans  be able to incorporate various kinds of user-specific constraints 7
  • 4. The Apriori Algorithm—An Example Supmin = 2 Itemset sup Itemset supDatabase TDB {A} 2 Tid Items L1 {A} 2 C1 {B} 3 {B} 3 10 A, C, D {C} 3 1st scan {C} 3 20 B, C, E {D} 1 {E} 3 30 A, B, C, E {E} 3 40 B, E C2 Itemset sup C2 Itemset {A, B} 1 L2 Itemset sup 2nd scan {A, B} {A, C} 2 {A, C} 2 {A, C} {A, E} 1 {B, C} 2 {B, C} 2 {A, E} {B, E} 3 {B, E} 3 {B, C} {C, E} 2 {C, E} 2 {B, E} {C, E} Itemset 3rd scan L3 Itemset sup C3 {B, C, E} {B, C, E} 2 10
  • 5. The Apriori Algorithm [Pseudo-Code]Ck: Candidate itemset of size kLk : frequent itemset of size kL1 = {frequent items};for (k = 1; Lk != ; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support endreturn k Lk; 11
  • 6. APRIORI ADV/DISADV Advantages:  Uses large itemset property.  Easily parallelized  Easy to implement. Disadvantages:  Assumes transaction database is memory resident.  Requires up to m database scans.
  • 7.  J. Han, J. Pei, and Y. Yin 2000 Depth-first search Avoid explicit candidate generation Adopt divide-and-conquer strategy Two step approach Step1:Build a compact data structure called FP tree Step2:Extract frequent itemsets from FP tree.
  • 8. Step 1: FP-Tree Construction FP-Tree is constructed using 2 passes over the data-set: Pass 1:  Scan data and find support for each item.  Discard infrequent items.  Sort frequent items in decreasing order based on their support.
  • 9. Pass 2:Nodes correspond to items and have a counter1. FP-Growth reads 1 transaction at a time and maps it to a path2. Fixed order is used, so paths can overlap when transactions share items (when they have the same prfix ). – In this case, counters are incremented3. Pointers are maintained between nodes containing the same item, creating singly linked lists (dotted lines) – The more paths that overlap, the higher the compression. FP-tree may fit in memory.4. Frequent itemsets extracted from the FP-Tree.
  • 10.  Start from each frequent length-1 pattern (as an initial suffix pattern) construct its conditional pattern base (a ―subdatabase,‖which consists of the set of prefix paths in the FP-tree co-occurring with the suffix pattern) Construct its (conditional) FP-tree, and perform mining recursively on such a tree. The pattern growth is achieved by the concatenation of the suffix pattern with the frequent patterns generated from a conditional FP-tree.
  • 11. Table : Table after first scan of databaseTable : Transactional data
  • 12. Fig . FP – Tree Construction
  • 13. EXAMPLE CONTTable:Mining FP Tree by creating conditional (sub)-pattern bases
  • 14. EXAMPLE CONTFig.The conditional FP-tree associated with the conditiona node I3
  • 15. FP-FROWTH ADV/DISADVAdvantages of FP-Growth • only 2 passes over data-set • ―compresses‖ data-set • no candidate generation • much faster than AprioriDisadvantages of FP-Growth • FP-Tree may not fit in memory!! • FP-Tree is expensive to build
  • 16. APPLICATIONSCustomer shopping sequences:  First buy computer, then CD-ROM, and then digital camera, within 3 months.Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc.Telephone calling patterns, Weblog click streamsDNA sequences and gene structures 22
  • 17. THANK YOU

×