Co-clustering with augmented data

1,407 views
1,321 views

Published on

Clustering plays an important role in data mining as many applications use it as a preprocessing step for data analysis. Traditional clustering focuses on the grouping of similar objects, while two-way co-clustering can group dyadic data (objects as well as their attributes) simultaneously. Most co-clustering research focuses on single correlation data, but there might be other possible descriptions of dyadic data that could improve co-clustering performance. In this research, we extend ITCC (Information Theoretic Co-Clustering) to the problem of co-clustering with augmented matrix. We proposed CCAM (Co-Clustering with Augmented Data Matrix) to include this augmented data for better co-clustering. We apply CCAM in the analysis of on-line advertising, where both ads and users must be clustered. The key data that connect ads and users are the user-ad link matrix, which identifies the ads that each user has linked; both ads and users also have their feature data, i.e. the augmented data matrix. To evaluate the proposed method, we use two measures: classification accuracy and K-L divergence. The experiment is done using the advertisements and user data from Morgenstern, a financial social website that focuses on the advertisement agency. The experiment results show that CCAM provides better performance than ITCC since it consider the use of augmented data during clustering.

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,407
On SlideShare
0
From Embeds
0
Number of Embeds
556
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Co-clustering with augmented data

  1. 1. Co-clustering with augmented data matrix<br />Authors: Meng-Lun Wu, Chia-HuiChang, and Rui-Zhe Liu<br />Dept. of Computer Science Information Engineering <br />National Central University<br />1<br />2011/8/24<br />DaWak 2011 in Toulouse, France<br />
  2. 2. Outline<br />Introduction<br />Related Work<br />Problem Formulation<br />Co-Clustering Algorithm<br />Experiments Result and Evaluation<br />Conclusion<br />2<br />2011/8/24<br />DaWak 2011 in Toulouse, France<br />
  3. 3. Introduction (cont.)<br />Over the past decade, co-clustering are arisen to solve the simultaneously clustering of dyadic data.<br />However, most research only take account of the dyadic data as the main clustering matrix, which are not considering of addition information.<br />In addition to user-movie click matrix, we might have user preference and movie description. <br />Similarly, in addition to document-word co-occurrence matrix, we might have document genre and word meaning.<br />3<br />2011/8/24<br />DaWak 2011 in Toulouse, France<br />
  4. 4. Introduction (cont.)<br />To fully utilize augmented matrix, we proposed a new method called Co-Clustering with Augmented data Matrix (CCAM).<br />Umatch1 social websites provide the Ad$mart service that could let user to click the ads and share the profit with users.<br />Fortunately, we could cope with Umatchwebsites, which hope us to analyze the ad-user information according to the following data.<br />ad-user click data, ad setting data, and user profile (Lohasquestionary).<br />4<br />2011/8/24<br />DaWak 2011 in Toulouse, France<br />1. Umatch: http://www.morgenstern.com.tw/users2/index.php/u_match1/<br />
  5. 5. Related work<br />Co-clustering research could separate three kinds categories, MDCC, MOCC2 andITCC.<br />MDCC: Matrix decomposition co-clustering<br />Long et al. (2005) “Co-clustering by Block Value Decomposition”<br />Ding et al. (2005) gave a similar co-clustering approach based on nonnegative matrix factorization.<br />MOCC2: topic model based co-clustering<br />Shafiei et al. (2006) “Latent Dirichlet Co-clustering“.    <br />Hanhuai et al. (2008) “Bayesian Co-clustering “<br />2011/8/24<br />5<br />DaWak 2011 in Toulouse, France<br />2. M. MahdiShafiei and Evangelos E. Milios “Model-based Overlapping Co-Clustering” Supported by grants from the Natural Sciences and Engineering Research.<br />
  6. 6. Related work (cont.)<br />ITCC: an optimization method<br />Dhillon et al. (2003) “Information-Theoretic Co-Clustering.”<br />Banerjee et al. (2004) ”A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation.”<br />Li et al. employ ITCC framework to propagate the class structure and knowledge from in-domain data to out-of-domain data.<br />As the inspiration of Li and Dhillon, we extend ITCC framework with augmented matrix to co-cluster the ad and user.<br />2011/8/24<br />6<br />DaWak 2011 in Toulouse, France<br />
  7. 7. Problem formulation<br />Let A, U, S and L be discrete random variables.<br />A denotes ads which are ranged from {a1,…,am}, <br />U denotes users which are ranged from {u1,…,un}<br />S denotes ad settings which are ranged from {s1,…,sr}<br />L denotes user Lohasquestionary which are ranged from {l1,…,lv}<br />Input Data: the joint probability distribution<br />p(A, U): ad-user link matrix<br />p(A, S): ad-setting matrix<br />p(U, L): user-Lohas matrix<br />Given a p(A,U), the mutual information is defined as<br />7<br />2011/8/24<br />DaWak 2011 in Toulouse, France<br />𝐼𝐴;𝑈=𝑎𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑢)<br /> <br />
  8. 8. Problem formulation<br />Goal: to obtain<br />k ad clusters denoted by {â1, … âk}<br />l user groups denoted by {û1, … ûl}<br />Such that the mutual information loss after co-clustering is minimized the objective function<br />where ,  are trade-off parameter that balance the effect to ad clusters or user groups.<br />8<br />2011/8/24<br />DaWak 2011 in Toulouse, France<br />𝑓𝐴,𝑈=𝐼𝐴;𝑈−𝐼𝐴;𝑈+λ𝐼𝐴;𝑆−𝐼𝐴;𝑆+𝜑[𝐼𝑈;𝐿−𝐼(𝑈;𝐿)]<br /> <br />
  9. 9. Problem formulation (cont.)<br /><ul><li>Let q(A, U) denotes the approximation distribution for p(A, U).</li></ul>Lemma 1.<br />For a fixed co-clustering (Â, Û), we can write the loss in mutual information as<br />where q(A, U), q(A, S) and q(U, L) could be obtained by<br />9<br />2011/8/24<br />DaWak 2011 in Toulouse, France<br />𝑓𝐴,𝑈=𝐼𝐴;𝑈−𝐼𝐴;𝑈+λ𝐼𝐴;𝑆−𝐼𝐴;𝑆+𝜑𝐼𝑈;𝐿−𝐼𝑈;𝐿<br />=𝐷(𝑝𝐴,𝑈||𝑞𝐴,𝑈)+λ∙𝐷(𝑝𝐴,𝑆||𝑞𝐴,𝑆)+𝜑∙𝐷(𝑝𝑈,𝐿||𝑞𝑈,𝐿)<br /> <br />𝑞𝑎,𝑢=𝑝𝑎,𝑢𝑝𝑎𝑎𝑝𝑢𝑢, 𝑤h𝑒𝑟𝑒 𝑎=𝐶𝐴𝑎 𝑎𝑛𝑑 𝑢=𝐶𝑈𝑢<br /> <br />𝑞𝑎,𝑠=𝑝𝑎,𝑠 𝑝𝑎𝑎, 𝑤h𝑒𝑟𝑒 𝑎=𝐶𝐴𝑎<br /> <br />𝑞𝑢,𝑙=𝑝𝑢,𝑙𝑝𝑢𝑢, 𝑤h𝑒𝑟𝑒 𝑢=𝐶𝑈𝑢<br /> <br />
  10. 10. Lemma 1 Proof<br />Since we are considering hard clustering<br />𝑝𝑎,𝑢=𝑎∈𝑎𝑢∈𝑢𝑝(𝑎,𝑢)<br />𝑝𝑎,𝑠 =𝑎∈𝑎𝑝(𝑎,𝑠)<br />𝑝𝑢,𝑙 =𝑢∈𝑢𝑝(𝑢,𝑙)<br />𝐼𝐴;𝑈−𝐼𝐴;𝑈<br />=𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑢)−𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑢)<br />=𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑎)𝑝(𝑢)𝑝(𝑢)<br />=𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑞(𝑎,𝑢) =𝐷𝑝𝐴,𝑈||𝑞𝐴,𝑈<br />where 𝑝𝑎𝑎=𝑝(𝑎)𝑝𝑎 𝑓𝑜𝑟 𝑎=𝐶𝐴𝑎, and similarly for 𝑝𝑢𝑢<br /> <br />2011/8/24<br />10<br />DaWak 2011 in Toulouse, France<br />
  11. 11. Lemma 1 Proof (Cont.)<br />𝐼𝐴;𝑆−𝐼𝐴;𝑆<br />=𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑝𝑎𝑝(𝑠)−𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑝𝑎𝑝(𝑠)<br />=𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑝(𝑎,𝑠)𝑝𝑎𝑝(𝑎)<br />=𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑞(𝑎,𝑠) =𝐷𝑝𝐴,𝑆||𝑞𝐴,𝑆<br />𝐼𝑈;𝐿−𝐼𝑈;𝐿<br />=𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑝𝑢𝑝(𝑙)−𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑝𝑢𝑝(𝑙)<br />=𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑝(𝑢,𝑙)𝑝𝑢𝑝(𝑢)<br />=𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑞(𝑢,𝑙) =𝐷𝑝𝑈,𝐿||𝑞𝑈,𝐿<br /> <br />2011/8/24<br />11<br />DaWak 2011 in Toulouse, France<br />
  12. 12. Problem formulation (cont.)<br />Lemma 2. <br />An alternative approach of iteratively reducing the K-L divergence values.<br />𝐷(𝑝(𝐴,𝑈)|𝑞𝐴,𝑈=𝑎∈𝐴𝑎∈𝑎𝑝𝑎𝐷(𝑝(𝑈|𝑎)|𝑞𝑈𝑎<br />=𝑢∈𝑈𝑢∈𝑢𝑝𝑢𝐷(𝑝(𝐴|𝑢)|𝑞𝐴𝑢<br />𝐷(𝑝(𝑈,𝐿)|𝑞𝑈,𝐿=𝑢∈𝑈𝑢∈𝑢𝑝𝑢𝐷(𝑝(𝐿|𝑢)|𝑞𝐿𝑢<br />𝐷(𝑝(𝐴,𝑆)|𝑞𝐴,𝑆=𝑎∈𝐴𝑎∈𝑎𝑝𝑎𝐷(𝑝(𝑆|𝑎)|𝑞𝑆𝑎<br />Theorem 1<br />The CCAM algorithm could monotonically decreases the objective function. Since<br />Where t is iteration number.<br /> <br />2011/8/24<br />12<br />DaWak 2011 in Toulouse, France<br />𝑓(𝑡)(𝐴,𝑈)≥𝑓(𝑡+1)(𝐴,𝑈)<br /> <br />
  13. 13. Co-clustering algorithm<br />13<br />2011/8/24<br />DaWak 2011 in Toulouse, France<br />
  14. 14. 2011/8/24<br />DaWak 2011 in Toulouse, France<br />14<br />
  15. 15. 2011/8/24<br />DaWak 2011 in Toulouse, France<br />15<br />
  16. 16. Experiments result and evaluation<br />The difficulty of clustering research is performance evaluation, because of it have no standard target.<br />Therefore, we present two evaluation methods based on class prediction and group variance.<br />Classification based evaluation<br />Mutual information based evaluation<br />We have retrieved the data from 2009/09/01 to 2010/03/31 that contain 530 ads and 9865 users. <br />For Lohas, only 2,124 users have values (have filled Lohasquestionary), others are filled with zero.<br />16<br />8/24/2011<br />
  17. 17. Classification based evaluation<br />Clustering evaluation is always done with classification, since we don’t have target labels, we produce the label by the following generation.<br />Target (Initial cluster) generation :<br />The target is based on the K-means clustering which is applied to the following data.<br />Ad matrix (Ad): p(A, S) + p(A, U)<br />User matrix (User): p(U, L) + p(U, A)<br />Parameter setting :<br />Iteration of K-means : 1000<br />Cluster K is set from 2 to 5.<br />Output : ad cluster𝐶𝐴 (0) and user group 𝐶𝑈 (0)<br /> <br />17<br />8/24/2011<br />
  18. 18. Classification based evaluation (cont.)<br />Co-clustering features (ITCC and CCAM):<br />User-ad cluster matrix: summation over ai belongs to ad clusterâk.<br />U𝐴=𝑙𝑛𝑎𝑖∈𝑎𝑘𝑈𝐴𝑗𝑖<br />Ad-user group matrix: summation over uj belongs to user group ûl.<br />A𝑈=𝑙𝑛𝑢𝑗∈𝑢𝑙𝐴𝑈𝑖𝑗<br />After generate target and co-clustering features, we apply decision tree to classify the co-clustering result, and use the F-measure as evaluation metric.<br />Testing data with co-clustering feature:<br />Ad + AÛ<br />User + UÂ<br /> <br />18<br />8/24/2011<br />
  19. 19. Ad cluster evaluation<br />8/24/2011<br />19<br />=0.6, =1.0<br />=0.2<br />=1.0<br />=0.8<br />=1.0<br />=0.6<br />=1.0<br />
  20. 20. User group evaluation<br />8/24/2011<br />20<br />=0.6<br />=1.0<br />=0.2<br />=1.0<br />=0.8<br />=1.0<br />=0.6, =1.0<br />
  21. 21. Parameter tuning of CCAM<br />We fix φ=1.0, and set λ from 0.2 to 1.0, then observe the average F-measure between ads and users. <br />The optimal parameter for different K are <br />K=2,4: φ=1.0, λ=0.6<br />K=3:φ=1.0, λ=0.8<br />K=5: φ=1.0, λ=0.2<br />However, we fix λ1.0 and set φfrom 0.2 to 1.0 as well as K from 3to 5. There are nothing change.<br />We suspect that φcontrol the p(U, L), but the zero entry dominate the p(U, L) of 161x7736.<br />8/24/2011<br />21<br />
  22. 22. Parameter tuning (fix =1.0)<br />8/24/2011<br />22<br />
  23. 23. Parameter tuning (fix =1.0)<br />8/24/2011<br />23<br />
  24. 24. Mutual information based evaluation<br />The mutual information are exploited the nature of co-clustering by measuring the difference between ad clusters and user groups.<br />The higher difference is performed, the better clustering is achieved.<br />We use the following equation to measure the mutual information.<br />𝐼𝐴;𝑈=𝑎𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝𝑎,𝑢𝑝𝑎𝑝(𝑢)<br />where 𝑝𝑎,𝑢=𝑎∈𝑎𝑢∈𝑢𝑝(𝑎,𝑢)<br /> <br />24<br />8/24/2011<br />
  25. 25. Mutual information based evaluation (cont.)<br />25<br />8/24/2011<br />
  26. 26. Monotonically decrease mutual information loss<br />8/24/2011<br />26<br />
  27. 27. Conclusion<br />Co-clustering is to achieve the dual goals of row clustering and column clustering.<br />However, most co-clustering algorithm focus on co-clustering of correlation matrix between row and column.<br />Our proposed method, Co-Clustering with Augmented Matrix (CCAM), can fully utilize the augmented data to achieve the better co-clustering.<br />CCAM could achieve better classification performance than ITCC and also present a comparable performance in the mutual information evaluation.<br />8/24/2011<br />27<br />
  28. 28. Thank you for listening.<br />Q & A<br />28<br />8/24/2011<br />

×