Mining 3-Clusters in Vertically Partitioned Data <ul><ul><li>Faris Alqadah & Raj Bhatnagar </li></ul></ul><ul><ul><li>Univ...
Outline <ul><li>Introduction to 3-clustering in binary, (categorical) vertically partitioned data </li></ul><ul><li>Propos...
Introduction Traditional clustering  Bi-Clustering 3-Clustering
Why 3-clusters? <ul><li>Find correspondence between bi-clusters of two different datasets </li></ul><ul><li>Sharpen local ...
Why 3-clusters? <A,1234> <AB,134> <AWB,13> <AY,12> <AX,24> <AWBCYZ,1> <ABDX,4>
Formal Definitions Bi-cluster in D i   3-Cluster across D 1  and D 2 Pattern in D i
Defining 3-clusters <ul><li>D 1  is the “learner” </li></ul><ul><li>Maximal rectangle of 1's under suitable permutation in...
Cluster Quality Measure <ul><li>Intuition: Maximize number of 1's while also maximizing number of items and objects </li><...
Quality Measure <ul><ul><li>Consider bi-clusters in learner alone </li></ul></ul>I 1 O C1 C2 <ul><li>Which is preferable ?...
Quality Measure <ul><li>Quality measure: </li></ul><ul><ul><li>Monotonic in both width and height </li></ul></ul><ul><ul><...
Quality Measure
Extending to 3-clusters <ul><li>Utilize same intuition </li></ul><ul><li>Width of 3-cluster is sum of individual widths </...
Selecting  β   <ul><li>Larger values yield 3-clusters that are “wide” and “short” in both D1 and D2  </li></ul><ul><ul><li...
3-Clu: Our Algorithm <ul><li>Search for 3-clusters similar to search for closed itemsets </li></ul><ul><li>How to formulat...
Algorithm
Algorithm <ul><li>Define search space with primacy to objects </li></ul><ul><li>Only need to maintain one search tree </li...
Algorithm
Algorithm <ul><li>Cluster quality measure is neither monotone nor anti-monotone in the search space </li></ul><ul><li>Prun...
Algorithm
Algorithm <ul><li>Pruning rule is very optimistic </li></ul><ul><li>Can be adjusted with some a-priori information </li></...
Algorithm Analysis <ul><li>Computational cost: O (|O|*i*N) </li></ul><ul><ul><li>Only as expensive as enumerating bi-clust...
Experimental Results <ul><li>Performance tests </li></ul><ul><li>Randomly split benchmark datasets CHESS and CONNECT </li>...
Chess Connect GO-Pheno
Experimental Results <ul><li>Test validity of 3-clusters </li></ul><ul><li>Randomly partitioned Mushrooms dataset by attri...
Conclusion <ul><li>Novel concept of 3-clusters in vertically partitioned data </li></ul><ul><li>Introduced quality measure...
Upcoming SlideShare
Loading in …5
×

Mining 3-Clusters in Vertically Partitioned Data

718 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
718
On SlideShare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Show overlapping clusters with more emphasis and introduce the ide of a lattice for organizing the overlapping clusters.
  • Mark datasest as D1 and D2 Show more columns in D2
  • Explain using monotonicity ideas
  • state that width is always anti-monotonic
  • Mining 3-Clusters in Vertically Partitioned Data

    1. 1. Mining 3-Clusters in Vertically Partitioned Data <ul><ul><li>Faris Alqadah & Raj Bhatnagar </li></ul></ul><ul><ul><li>University of Cincinnati </li></ul></ul>
    2. 2. Outline <ul><li>Introduction to 3-clustering in binary, (categorical) vertically partitioned data </li></ul><ul><li>Proposed cluster quality measure </li></ul><ul><li>3-Clu: algorithm for enumerating 3-clusters from two datasets </li></ul>
    3. 3. Introduction Traditional clustering Bi-Clustering 3-Clustering
    4. 4. Why 3-clusters? <ul><li>Find correspondence between bi-clusters of two different datasets </li></ul><ul><li>Sharpen local clusters with outside knowledge </li></ul><ul><li>Alternative? “Join datasets then search” </li></ul><ul><ul><li>Does not capture underlying interactions </li></ul></ul><ul><ul><li>Inefficient </li></ul></ul><ul><ul><li>Not always possible </li></ul></ul>
    5. 5. Why 3-clusters? <A,1234> <AB,134> <AWB,13> <AY,12> <AX,24> <AWBCYZ,1> <ABDX,4>
    6. 6. Formal Definitions Bi-cluster in D i 3-Cluster across D 1 and D 2 Pattern in D i
    7. 7. Defining 3-clusters <ul><li>D 1 is the “learner” </li></ul><ul><li>Maximal rectangle of 1's under suitable permutation in learner </li></ul><ul><li>Best Correspondence to rectangle of 1's in D 2 </li></ul>D1 D1 D 1 D 2
    8. 8. Cluster Quality Measure <ul><li>Intuition: Maximize number of 1's while also maximizing number of items and objects </li></ul><ul><li>Trade off between objects and items </li></ul><ul><ul><li>More items...less objects </li></ul></ul><ul><ul><li>More objects...less items </li></ul></ul>
    9. 9. Quality Measure <ul><ul><li>Consider bi-clusters in learner alone </li></ul></ul>I 1 O C1 C2 <ul><li>Which is preferable ? </li></ul><ul><li>User decides </li></ul>
    10. 10. Quality Measure <ul><li>Quality measure: </li></ul><ul><ul><li>Monotonic in both width and height </li></ul></ul><ul><ul><ul><li>Reflects intuition </li></ul></ul></ul><ul><ul><li>Balances width and height according to user defined parameter </li></ul></ul><ul><li>Introduce β </li></ul><ul><li>Amount of width(attributes) willing to trade for a single unit of height (objects) </li></ul>
    11. 11. Quality Measure
    12. 12. Extending to 3-clusters <ul><li>Utilize same intuition </li></ul><ul><li>Width of 3-cluster is sum of individual widths </li></ul>
    13. 13. Selecting β <ul><li>Larger values yield 3-clusters that are “wide” and “short” in both D1 and D2 </li></ul><ul><ul><li>Cluster key websites popular with large number of democrats and republicans </li></ul></ul><ul><li>Smaller values produce 3-clusters that are “narrow” and “long” </li></ul><ul><ul><li>Discover long list of websites utilized by few select democrats and republicans </li></ul></ul>
    14. 14. 3-Clu: Our Algorithm <ul><li>Search for 3-clusters similar to search for closed itemsets </li></ul><ul><li>How to formulate the search space? </li></ul><ul><ul><li>Assumption that objects out-number attributes may not hold </li></ul></ul><ul><ul><li>Several possible orderings of the search space </li></ul></ul>
    15. 15. Algorithm
    16. 16. Algorithm <ul><li>Define search space with primacy to objects </li></ul><ul><li>Only need to maintain one search tree </li></ul><ul><li>Mimic closed itemset algorithm with simultaneous pruning of search space </li></ul><ul><li>Prune with quality measure </li></ul>
    17. 17. Algorithm
    18. 18. Algorithm <ul><li>Cluster quality measure is neither monotone nor anti-monotone in the search space </li></ul><ul><li>Pruning is still possible </li></ul>Is C2 of higher quality ?
    19. 19. Algorithm
    20. 20. Algorithm <ul><li>Pruning rule is very optimistic </li></ul><ul><li>Can be adjusted with some a-priori information </li></ul><ul><li>Example β = 0.5 </li></ul><ul><li>x=2.73...can't prune </li></ul><ul><ul><li>This assumes w will stay at 15 for 3 more levels </li></ul></ul>
    21. 21. Algorithm Analysis <ul><li>Computational cost: O (|O|*i*N) </li></ul><ul><ul><li>Only as expensive as enumerating bi-clusters in single dataset </li></ul></ul><ul><li>Communication cost: O(N) </li></ul><ul><li>Correctness guaranteed by FCA theory </li></ul>
    22. 22. Experimental Results <ul><li>Performance tests </li></ul><ul><li>Randomly split benchmark datasets CHESS and CONNECT </li></ul><ul><li>Genetic dataset: Genes, GO terms, Phenotypes </li></ul><ul><li>Compared to LCM and CHARM </li></ul>
    23. 23. Chess Connect GO-Pheno
    24. 24. Experimental Results <ul><li>Test validity of 3-clusters </li></ul><ul><li>Randomly partitioned Mushrooms dataset by attributes </li></ul>
    25. 25. Conclusion <ul><li>Novel concept of 3-clusters in vertically partitioned data </li></ul><ul><li>Introduced quality measure framework for 3-clusters </li></ul><ul><li>Presented efficient algorithm based on closed itemset mining algorithms, with adaptations: </li></ul><ul><ul><li>Defined search space to enable simultaneous pruning </li></ul></ul><ul><ul><li>Incorporated novel pruning method based on cluster quality measure </li></ul></ul>

    ×