Mining top k frequent closed itemsets

2,420 views
2,267 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,420
On SlideShare
0
From Embeds
0
Number of Embeds
329
Actions
Shares
0
Downloads
36
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mining top k frequent closed itemsets

  1. 1. Mining top-k frequent closed itemsets over data streams using the sliding window model Author: Pauray S.M Tsai Publication: ESA 2010 Presenter: Yuan-Chung Chang
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Motivation </li></ul><ul><li>Mining top-k frequent closed itemsets </li></ul><ul><ul><li>FCI_max algorithm </li></ul></ul><ul><li>Example for FCI_max algorithm </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Introduction <ul><li>With the emergence of new applications, the data we process are not again static, but the continuous dynamic data stream. </li></ul><ul><li>Because the data in streams come with high speed and are continuous and unbounded, there are three challenges for data stream mining. </li></ul><ul><ul><li>First, each item in a stream could be examined only once. </li></ul></ul><ul><ul><li>Second, although the data are generated continuously, the memory space could be used is limited. </li></ul></ul><ul><ul><li>Third, the mining result should be generated as fast as possible. </li></ul></ul>
  4. 4. Introduction (cont.) <ul><li>In the database community, one of the major applications is mining association rules in large transaction databases. </li></ul><ul><li>There are two problems occurring in traditional association rule mining. </li></ul><ul><ul><li>First, a minimum support is required for mining. </li></ul></ul><ul><ul><li>Second, there are usually a lot of association rules generated from the mining, which gives rise to difficulties in practical applications. </li></ul></ul>
  5. 5. Introduction (cont.) <ul><li>In the data stream environment, the problem of mining frequent itemsets becomes more complicated. </li></ul><ul><li>Traditional algorithms for mining frequent itemsets cannot satisfy the requirement of examining each item in a stream only once. </li></ul><ul><ul><li>How to effectively maintain frequent itemsets over data streams is another important issue. </li></ul></ul><ul><li>Because data are generated continuously in data streams, present frequent itemsets may become infrequent, and present infrequent itemsets may become frequent. </li></ul><ul><li>We cannot save all the itemsets and their related information in the memory due to the restriction of memory space. </li></ul>
  6. 6. Introduction (cont.) <ul><li>The time models for data stream mining mainly include the landmark model (2002), the tilted-time window model (2003) and the sliding window model (2006). </li></ul><ul><ul><li>The landmark model considers all the data from a specified point of time to the current time. </li></ul></ul><ul><ul><li>The tilted-time window model is a variation of the landmark model. </li></ul></ul><ul><ul><li>The sliding window model focuses on the recent data from the current moment back to a specified time point. </li></ul></ul>
  7. 7. Motivation <ul><li>The two problems occurring in traditional association rule mining also exist in the data stream environment: specifying an appropriate minimum support and reducing the number of frequent itemsets. </li></ul><ul><li>The idea of mining frequent closed itemsets was first proposed in 1999. </li></ul>
  8. 8. Motivation (cont.) <ul><li>An alternative approach for mining top-k frequent closed itemsets of length no less than min_l without specifying the minimum support was proposed in 2005. </li></ul><ul><ul><li>The mining result only presents frequent closed itemsets of length no less than min_l, resulting in the loss of information about closed itemsets with high support but short length. </li></ul></ul><ul><ul><li>In fact, the longer the length of a closed itemset is, the smaller the support of it will be. </li></ul></ul><ul><li>In this paper, the author proposes an efficient single pass algorithm, FCI_max, to discover top-k frequent closed itemsets of length no more than max_l , using a sliding window technique . </li></ul>
  9. 9. Motivation (cont.) <ul><li>For mining top-k frequent closed itemsets of length no less than min_l (2005) </li></ul><ul><ul><li>Case 1: Mining top-3 frequent closed itemsets with min_l = 2 . </li></ul></ul><ul><ul><ul><li>The mining result is {ab:7, abc:6, ad:4} . {a:8} </li></ul></ul></ul><ul><ul><li>Case 2: Mining top-3 frequent closed itemsets with min_l = 3 . </li></ul></ul><ul><ul><ul><li>The mining result is {abc:6, abcd:3, abe:2, ace:2} . {a:8},{ab:7},{ad:4} </li></ul></ul></ul>
  10. 10. Motivation (cont.) <ul><li>For mining top-k frequent closed itemsets of length no more than max_l (2010) </li></ul><ul><ul><li>Case 3: Mining top-4 frequent closed itemsets with max_l = 3 . </li></ul></ul><ul><ul><ul><li>The mining result is {a:8, ab:7, abc:6, ad:4} . </li></ul></ul></ul><ul><ul><li>Case 4: Mining top-4 frequent closed itemsets with max_l = 2 . </li></ul></ul><ul><ul><ul><li>The mining result is {a:8, ab:7, ad:4, ae:3} . </li></ul></ul></ul>
  11. 11. Mining top-k frequent closed itemsets <ul><li>The auther use the sliding window model shown in Fig. 1 for the following discussion. </li></ul>
  12. 12. Mining top-k frequent closed itemsets <ul><li>The number of windows: n </li></ul><ul><li>The time covered by each window: t </li></ul><ul><li>Items in window: {x 1 ,x 2 , . . . , x m } </li></ul><ul><li>The sliding windows: {W i1 ,W i2 , . . . ,W in } </li></ul><ul><li>The set of identifiers of transactions containing itemset {x 1 ,x 2m , . . . , x m } in window W ij : SP ij ({x 1 ,x 2 , . . . , x m }) </li></ul><ul><li>The union of SP ij ({x 1 ,x 2m , . . . , x m }): CS i ({x 1 ,x 2 , . . . , x m }) </li></ul><ul><li>The number of transaction identifiers in CS i ({x 1 ,x 2m , . . . , x m }):  CS i ({x 1 ,x 2 , . . . , x m })  </li></ul><ul><li>The top-k 1-itemsets by  CS i  : {S 1 ,S 2 , . . . ,S k } </li></ul><ul><li>The current top-k frequent closed itemsets are denoted as a set: P </li></ul><ul><ul><li>The initial value of P is set to {S 1 ,S 2 , . . . ,S k } </li></ul></ul>
  13. 13. Mining top-k frequent closed itemsets <ul><li>The detailed algorithm for mining top-k frequent closed itemsets with max_l </li></ul><ul><ul><li>FCI_max algorithm </li></ul></ul>
  14. 14. Mining top-k frequent closed itemsets
  15. 15. Example for FCI_max algorithm <ul><li>Assume the number of windows is 4 and the size of a window is 5 minutes . </li></ul><ul><li>Assume the number of given frequent closed itemsets is 5 and the maximum length of frequent closed itemsets is 4. </li></ul>
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34. Conclusion <ul><li>In this paper, the auther proposes an efficient single pass algorithm, FCI_max, to discover top-k frequent closed itemsets of length no more than max_l. </li></ul><ul><li>The method of using the maximum length to replace with the minimum support resolves the problem of losing information about itemsets with short length but high support. </li></ul><ul><li>FCI_max algorithm needs not to store all the support counts of itemsets at each time point. </li></ul><ul><li>It utilizes a technique of dynamic computation to generate all the frequent closed itemsets and their related information, which efficiently discovers top-k frequent closed itemsets under the data stream environment. </li></ul>
  35. 35. www.themegallery.com Thank you for your listening Q & A

×