Author : Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. Modha Source : KDD ’04, Augu...
<ul><li>Introduction </li></ul><ul><li>Bregman divergences </li></ul><ul><li>Bregman co-clustering Algorithm </li></ul><ul...
<ul><li>Information-theoretic co-clustering (ITCC) model the co-clustering problem as the joint probability distribution. ...
<ul><li>The loss in mutual information equals </li></ul><ul><ul><li>where </li></ul></ul><ul><ul><li>Can be shown that  q(...
0.18  0.18  0.14  0.14  0.18  0.18 0.15 0.15 0.15 0.15 0.2 0.2 0.5  0.5 0.3 0.3 0.4 03/12/10
03/12/10 D(p||q) 0.041909 0.041909 0.05696 0.05696 0.0376 0.049641 D(p||q) 0.05696 0.05696 0.04191 0.04191 0.049641 0.0376
03/12/10 D(p||q) 0.02118 0.02118 0.02243 0.040765 0.04893 0.04893 D(p||q) 0.048138 0.048138 0.041942 0.02295 0.02052 0.02052
03/12/10
<ul><li>However, the matrix may contain negative entries or a distortion measure other than KL-divergence. </li></ul><ul><...
03/12/10
03/12/10
03/12/10
03/12/10
<ul><li>The objective function is </li></ul>03/12/10
<ul><li>Let  ф  be a real-valued strictly convex function defined on the convex set S=dom( ф )  R, </li></ul><ul><ul><li>...
03/12/10
<ul><li>I-Divergence </li></ul><ul><ul><li>Given z  R + , let  ф (z) = zlog(z).For z 1 , z 2    R + </li></ul></ul><ul><...
<ul><li>Bregman information is defined as the expected Bregman divergence to the expectation. </li></ul><ul><ul><li>I ф (Z...
<ul><li>Let (X, Y)~p(X, Y) be jointly distributed random variables with X, Y. </li></ul><ul><li>p(X, Y) be written the for...
<ul><li>(  ,  ) involves four random variables  corresponding to the various partitioning of the matrix Z. </li></ul><ul...
<ul><li> ( Γ ) denotes the class of matrix approximation schemes based on (  ,  ). </li></ul><ul><li>The set of approxi...
03/12/10
<ul><li>We present brief case studies to demonstrate two salient features. </li></ul><ul><ul><li>Dimensionality reduction ...
<ul><li>Clustering interleaved with implicit dimensionality reduction </li></ul><ul><li>Superior performance as compared t...
<ul><li>Assign zero measure for missing elements, co-cluster and use reconstructed matrix for prediction </li></ul><ul><li...
<ul><li>The Bregman divergence as the co-clustering loss function. </li></ul><ul><ul><li>I-divergence and squared Euclidea...
Upcoming SlideShare
Loading in …5
×

A Generalized Maximum Entropy Approach To Bregman Co Clustering

1,104 views
1,007 views

Published on

本篇主要是利用Bregman divergence來定義co-clustering的loss function,藉由minimize loss function的概念,來找到最佳的群。

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,104
On SlideShare
0
From Embeds
0
Number of Embeds
109
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Row-cluster prototyppe q t (Y|^x) is closest to P(Y|x) in Kullback-Leibler divergence.
  • The algorithm keeps iterating Step2 through 5 until some desired convergence condition is met.
  • Distortion 失真
  • A Generalized Maximum Entropy Approach To Bregman Co Clustering

    1. 1. Author : Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. Modha Source : KDD ’04, August 22-25, 2004, ACM, pp. 509- pp.514 Presenter : Allen Wu 03/12/10
    2. 2. <ul><li>Introduction </li></ul><ul><li>Bregman divergences </li></ul><ul><li>Bregman co-clustering Algorithm </li></ul><ul><li>Experiments </li></ul><ul><li>Conclusion </li></ul>03/12/10
    3. 3. <ul><li>Information-theoretic co-clustering (ITCC) model the co-clustering problem as the joint probability distribution. </li></ul><ul><li>We seek a co-clustering of both dimensions such that loss in “Mutual Information” </li></ul><ul><li>is minimized given a fixed no. of row & col. Clusters. </li></ul>03/12/10
    4. 4. <ul><li>The loss in mutual information equals </li></ul><ul><ul><li>where </li></ul></ul><ul><ul><li>Can be shown that q(x,y) is a “maximum entropy” approximation to p(x,y). </li></ul></ul>03/12/10
    5. 5. 0.18 0.18 0.14 0.14 0.18 0.18 0.15 0.15 0.15 0.15 0.2 0.2 0.5 0.5 0.3 0.3 0.4 03/12/10
    6. 6. 03/12/10 D(p||q) 0.041909 0.041909 0.05696 0.05696 0.0376 0.049641 D(p||q) 0.05696 0.05696 0.04191 0.04191 0.049641 0.0376
    7. 7. 03/12/10 D(p||q) 0.02118 0.02118 0.02243 0.040765 0.04893 0.04893 D(p||q) 0.048138 0.048138 0.041942 0.02295 0.02052 0.02052
    8. 8. 03/12/10
    9. 9. <ul><li>However, the matrix may contain negative entries or a distortion measure other than KL-divergence. </li></ul><ul><li>The squared Euclidean distance might be more appropriate. </li></ul><ul><li>This paper address the general situation by extending ITCC along three directions. </li></ul><ul><ul><li>“ Nearness” is now measured by any Bregman divergence. </li></ul></ul><ul><ul><li>Allow specification of a larger class of constraints. </li></ul></ul><ul><ul><li>Generalize the maximum entropy approach. </li></ul></ul>03/12/10
    10. 10. 03/12/10
    11. 11. 03/12/10
    12. 12. 03/12/10
    13. 13. 03/12/10
    14. 14. <ul><li>The objective function is </li></ul>03/12/10
    15. 15. <ul><li>Let ф be a real-valued strictly convex function defined on the convex set S=dom( ф )  R, </li></ul><ul><ul><li>ф is differentiable on int(S), the interior of S. </li></ul></ul><ul><li>The Bregman divergence d ф :S ×int(S)  [0,∞) is defined as </li></ul>03/12/10
    16. 16. 03/12/10
    17. 17. <ul><li>I-Divergence </li></ul><ul><ul><li>Given z  R + , let ф (z) = zlog(z).For z 1 , z 2  R + </li></ul></ul><ul><li>Squared Euclidean Distance </li></ul><ul><ul><li>Given z  R, let ф (z) =z 2 . For z 1 , z 2  R, </li></ul></ul>03/12/10
    18. 18. <ul><li>Bregman information is defined as the expected Bregman divergence to the expectation. </li></ul><ul><ul><li>I ф (Z)=E[d ф (Z,E[Z])] </li></ul></ul><ul><li>I-Divergence </li></ul><ul><ul><li>Given a real non-negative random variable Z, the Bregman information is I ф (Z)=E[Zlog(Z/E[Z])] </li></ul></ul><ul><li>Squared Euclidean Distance </li></ul><ul><ul><li>Given any real random variable Z, the Bregman information is I ф (Z)=E[(Z-E[Z]) 2 ] </li></ul></ul>03/12/10
    19. 19. <ul><li>Let (X, Y)~p(X, Y) be jointly distributed random variables with X, Y. </li></ul><ul><li>p(X, Y) be written the form of the matrix Z </li></ul><ul><li>The quality of the co-clustering can be defined as </li></ul>03/12/10
    20. 20. <ul><li>(  ,  ) involves four random variables corresponding to the various partitioning of the matrix Z. </li></ul><ul><li>We can obtain different matrix approximations based on the statistics of Z corresponding to the non-trivial combinations of </li></ul>03/12/10
    21. 21. <ul><li> ( Γ ) denotes the class of matrix approximation schemes based on (  ,  ). </li></ul><ul><li>The set of approximations M A (  ,  ,C) consists of all Z’  S m×n . </li></ul><ul><li>The “best” approximation Z. </li></ul>03/12/10
    22. 22. 03/12/10
    23. 23. <ul><li>We present brief case studies to demonstrate two salient features. </li></ul><ul><ul><li>Dimensionality reduction </li></ul></ul><ul><ul><li>Missing value prediction </li></ul></ul>03/12/10
    24. 24. <ul><li>Clustering interleaved with implicit dimensionality reduction </li></ul><ul><li>Superior performance as compared to one-sided clustering </li></ul>03/12/10
    25. 25. <ul><li>Assign zero measure for missing elements, co-cluster and use reconstructed matrix for prediction </li></ul><ul><li>Implicit discovery of correlated sub-matrices </li></ul>03/12/10
    26. 26. <ul><li>The Bregman divergence as the co-clustering loss function. </li></ul><ul><ul><li>I-divergence and squared Euclidean distance </li></ul></ul><ul><li>Approximation models of various complexities are possible depending on the statistics. </li></ul><ul><li>The minimum Bregman information principle as a generalization of the maximum entropy principle. </li></ul>03/12/10

    ×