Information Theoretic Co Clustering
Upcoming SlideShare
Loading in...5
×
 

Information Theoretic Co Clustering

on

  • 1,429 views

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log ...

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log
and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses
the co-clustering problem as an optimization problem in information theory — the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm
that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous
word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

Statistics

Views

Total Views
1,429
Views on SlideShare
1,402
Embed Views
27

Actions

Likes
0
Downloads
33
Comments
1

4 Embeds 27

https://jujo00obo2o234ungd3t8qjfcjrs3o6k-a-sites-opensocial.googleusercontent.com 15
http://progressreport4all.blogspot.com 9
http://www.slideshare.net 2
http://jujo00obo2o234ungd3t8qjfcjrs3o6k-a-sites-opensocial.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Information Theoretic Co Clustering Information Theoretic Co Clustering Presentation Transcript

    • Information-theoretic co-clustering
      Authors / Inderjit S. Dhillon, SubramanyamMallela and Dharmendra S. Modha
      Conference / ACM SIGKDD ’03, August 24-27, 2003, Washington
      Presenter / Meng-Lun, Wu
      1
    • Outline
      Introduction
      Problem Formulation
      Co-Clustering Algorithm
      Experimental Result
      Conclusions And Future Work
      2
    • Introduction (cont.)
      Clustering is a fundamental tool in unsupervised learning.
      Most clustering algorithms focus on one-way clustering.
      Clustering
      3
    • Introduction (cont.)
      It is often desirable to co-cluster or simultaneously cluster both dimensions.
      The normalized non-negative contingency table into a joint probability distribution between two discrete random variables.
      The optimal co-clustering is one that leads to the largest mutual information between the clustered random variables.
      4
    • Introduction (cont.)
      The optimal co-clustering is one that minimizes the loss in mutual information.
      The mutual information of two random variables is a quantity that measures the mutual dependence of the two variables.
      Formally, the mutual information can be defined as:
      5
    • Introduction (cont.)
      The Kullback-Leibler (K-L) divergence, measures the difference between two probability distributions.
      Given the true probability distribution p(x,y) and another distribution q(x,y) can be defined as:
      6
    • Problem formulation
      Let X and Y be discrete random variables.
      X: {x1,…,xm}, Y: {y1,…,yn}
      p(X, Y) denote the joint probability distribution.
      Let the k clusters of X as:
      Let the l clusters of Y as: {ŷ1, ŷ2, . . . , ŷl}
      7
    • Problem formulation (cont.)
      Definition
      An optimal co-clustering minimizes
      Subject to constraints on the number of row and column clusters.
      For a fixed co-clustering (CX,CY), we can write the loss in mutual information.
      8
    • Problem formulation (cont.)
      9
    • Problem formulation (cont.)
      q(X,Y) is a distribution of the form
      0.18 0.18 0.14 0.14 0.18 0.18
      0.5 0.5
      0.15
      0.15
      0.15
      0.15
      0.2
      0.2
      10
      0.3
      0.3
      0.4
      Suppose
    • Co-CLUSTERING Algorithm
      Input :
      The joint probability distribution p(X,Y), k the desired number of row clusters and l the desired number of column clusters.
      Output:
      The partition functions C†X and C†Y
      11
    • Co-CLUSTERING Algorithm (cont.)
      12
      ^x3^x1
      ^x3^x2
    • Co-CLUSTERING Algorithm (cont.)
      13
      ŷ2 ŷ1
      ŷ1 ŷ2
    • Co-CLUSTERING Algorithm (cont.)
      14
      D(p||q)=0.02881
    • Experimental results
      For our experimental results we use various subsets of the 20-Newsgroup data(NG20).
      We use 1D-clustering to denote document clustering without any word clustering.
      Evaluation Measures
      Micro-averaged-precision
      Micro-averaged-recall
      15
    • Experimental results (cont.)
      16
    • Experimental results (cont.)
      17
    • Experimental results (cont.)
      18
    • CONCLUSIONS AND FUTURE WORK
      The information-theoretic formulation for co-clustering can be guaranteed to reach a local minimum in a finite number of steps.
      Co-clustering for joint distribution of two random variables.
      In this paper, the row and column clusters are pre-specified.
      We hope that an information-theoretic regularization procedure may allow us to select the number of clusters.
      19