Information Theoretic Co Clustering

  • 1,021 views
Uploaded on

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log …

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log
and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses
the co-clustering problem as an optimization problem in information theory — the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm
that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous
word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to like this
No Downloads

Views

Total Views
1,021
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
34
Comments
1
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Information-theoretic co-clustering
    Authors / Inderjit S. Dhillon, SubramanyamMallela and Dharmendra S. Modha
    Conference / ACM SIGKDD ’03, August 24-27, 2003, Washington
    Presenter / Meng-Lun, Wu
    1
  • 2. Outline
    Introduction
    Problem Formulation
    Co-Clustering Algorithm
    Experimental Result
    Conclusions And Future Work
    2
  • 3. Introduction (cont.)
    Clustering is a fundamental tool in unsupervised learning.
    Most clustering algorithms focus on one-way clustering.
    Clustering
    3
  • 4. Introduction (cont.)
    It is often desirable to co-cluster or simultaneously cluster both dimensions.
    The normalized non-negative contingency table into a joint probability distribution between two discrete random variables.
    The optimal co-clustering is one that leads to the largest mutual information between the clustered random variables.
    4
  • 5. Introduction (cont.)
    The optimal co-clustering is one that minimizes the loss in mutual information.
    The mutual information of two random variables is a quantity that measures the mutual dependence of the two variables.
    Formally, the mutual information can be defined as:
    5
  • 6. Introduction (cont.)
    The Kullback-Leibler (K-L) divergence, measures the difference between two probability distributions.
    Given the true probability distribution p(x,y) and another distribution q(x,y) can be defined as:
    6
  • 7. Problem formulation
    Let X and Y be discrete random variables.
    X: {x1,…,xm}, Y: {y1,…,yn}
    p(X, Y) denote the joint probability distribution.
    Let the k clusters of X as:
    Let the l clusters of Y as: {ŷ1, ŷ2, . . . , ŷl}
    7
  • 8. Problem formulation (cont.)
    Definition
    An optimal co-clustering minimizes
    Subject to constraints on the number of row and column clusters.
    For a fixed co-clustering (CX,CY), we can write the loss in mutual information.
    8
  • 9. Problem formulation (cont.)
    9
  • 10. Problem formulation (cont.)
    q(X,Y) is a distribution of the form
    0.18 0.18 0.14 0.14 0.18 0.18
    0.5 0.5
    0.15
    0.15
    0.15
    0.15
    0.2
    0.2
    10
    0.3
    0.3
    0.4
    Suppose
  • 11. Co-CLUSTERING Algorithm
    Input :
    The joint probability distribution p(X,Y), k the desired number of row clusters and l the desired number of column clusters.
    Output:
    The partition functions C†X and C†Y
    11
  • 12. Co-CLUSTERING Algorithm (cont.)
    12
    ^x3^x1
    ^x3^x2
  • 13. Co-CLUSTERING Algorithm (cont.)
    13
    ŷ2 ŷ1
    ŷ1 ŷ2
  • 14. Co-CLUSTERING Algorithm (cont.)
    14
    D(p||q)=0.02881
  • 15. Experimental results
    For our experimental results we use various subsets of the 20-Newsgroup data(NG20).
    We use 1D-clustering to denote document clustering without any word clustering.
    Evaluation Measures
    Micro-averaged-precision
    Micro-averaged-recall
    15
  • 16. Experimental results (cont.)
    16
  • 17. Experimental results (cont.)
    17
  • 18. Experimental results (cont.)
    18
  • 19. CONCLUSIONS AND FUTURE WORK
    The information-theoretic formulation for co-clustering can be guaranteed to reach a local minimum in a finite number of steps.
    Co-clustering for joint distribution of two random variables.
    In this paper, the row and column clusters are pre-specified.
    We hope that an information-theoretic regularization procedure may allow us to select the number of clusters.
    19