Your SlideShare is downloading. ×
Information Theoretic Co Clustering
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Information Theoretic Co Clustering

1,067
views

Published on

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log …

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log
and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses
the co-clustering problem as an optimization problem in information theory — the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm
that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous
word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

Published in: Technology, Education

1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
1,067
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
35
Comments
1
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Information-theoretic co-clustering
    Authors / Inderjit S. Dhillon, SubramanyamMallela and Dharmendra S. Modha
    Conference / ACM SIGKDD ’03, August 24-27, 2003, Washington
    Presenter / Meng-Lun, Wu
    1
  • 2. Outline
    Introduction
    Problem Formulation
    Co-Clustering Algorithm
    Experimental Result
    Conclusions And Future Work
    2
  • 3. Introduction (cont.)
    Clustering is a fundamental tool in unsupervised learning.
    Most clustering algorithms focus on one-way clustering.
    Clustering
    3
  • 4. Introduction (cont.)
    It is often desirable to co-cluster or simultaneously cluster both dimensions.
    The normalized non-negative contingency table into a joint probability distribution between two discrete random variables.
    The optimal co-clustering is one that leads to the largest mutual information between the clustered random variables.
    4
  • 5. Introduction (cont.)
    The optimal co-clustering is one that minimizes the loss in mutual information.
    The mutual information of two random variables is a quantity that measures the mutual dependence of the two variables.
    Formally, the mutual information can be defined as:
    5
  • 6. Introduction (cont.)
    The Kullback-Leibler (K-L) divergence, measures the difference between two probability distributions.
    Given the true probability distribution p(x,y) and another distribution q(x,y) can be defined as:
    6
  • 7. Problem formulation
    Let X and Y be discrete random variables.
    X: {x1,…,xm}, Y: {y1,…,yn}
    p(X, Y) denote the joint probability distribution.
    Let the k clusters of X as:
    Let the l clusters of Y as: {ŷ1, ŷ2, . . . , ŷl}
    7
  • 8. Problem formulation (cont.)
    Definition
    An optimal co-clustering minimizes
    Subject to constraints on the number of row and column clusters.
    For a fixed co-clustering (CX,CY), we can write the loss in mutual information.
    8
  • 9. Problem formulation (cont.)
    9
  • 10. Problem formulation (cont.)
    q(X,Y) is a distribution of the form
    0.18 0.18 0.14 0.14 0.18 0.18
    0.5 0.5
    0.15
    0.15
    0.15
    0.15
    0.2
    0.2
    10
    0.3
    0.3
    0.4
    Suppose
  • 11. Co-CLUSTERING Algorithm
    Input :
    The joint probability distribution p(X,Y), k the desired number of row clusters and l the desired number of column clusters.
    Output:
    The partition functions C†X and C†Y
    11
  • 12. Co-CLUSTERING Algorithm (cont.)
    12
    ^x3^x1
    ^x3^x2
  • 13. Co-CLUSTERING Algorithm (cont.)
    13
    ŷ2 ŷ1
    ŷ1 ŷ2
  • 14. Co-CLUSTERING Algorithm (cont.)
    14
    D(p||q)=0.02881
  • 15. Experimental results
    For our experimental results we use various subsets of the 20-Newsgroup data(NG20).
    We use 1D-clustering to denote document clustering without any word clustering.
    Evaluation Measures
    Micro-averaged-precision
    Micro-averaged-recall
    15
  • 16. Experimental results (cont.)
    16
  • 17. Experimental results (cont.)
    17
  • 18. Experimental results (cont.)
    18
  • 19. CONCLUSIONS AND FUTURE WORK
    The information-theoretic formulation for co-clustering can be guaranteed to reach a local minimum in a finite number of steps.
    Co-clustering for joint distribution of two random variables.
    In this paper, the row and column clusters are pre-specified.
    We hope that an information-theoretic regularization procedure may allow us to select the number of clusters.
    19

×