SlideShare a Scribd company logo
Co-clustering with augmented data matrix Authors: Meng-Lun Wu, Chia-HuiChang, and Rui-Zhe Liu Dept. of Computer Science Information Engineering  National Central University 1 2011/8/24 DaWak 2011 in Toulouse, France
Outline Introduction Related Work Problem Formulation Co-Clustering Algorithm Experiments Result and Evaluation Conclusion 2 2011/8/24 DaWak 2011 in Toulouse, France
Introduction (cont.) Over the past decade, co-clustering are arisen to solve the simultaneously clustering of dyadic data. However, most research only take account of the dyadic data as the main clustering matrix, which are not considering of addition information. In addition to user-movie click matrix, we might have user preference and movie description.  Similarly, in addition to document-word co-occurrence matrix, we might have document genre and word meaning. 3 2011/8/24 DaWak 2011 in Toulouse, France
Introduction (cont.) To fully utilize augmented matrix, we proposed a new method called Co-Clustering with Augmented data Matrix (CCAM). Umatch1 social websites provide the Ad$mart service that could let user to click the ads and share the profit with users. Fortunately, we could cope with Umatchwebsites, which hope us to analyze the ad-user information according to the following data. ad-user click data, ad setting data, and user profile (Lohasquestionary). 4 2011/8/24 DaWak 2011 in Toulouse, France 1. Umatch: http://www.morgenstern.com.tw/users2/index.php/u_match1/
Related work Co-clustering research could separate three kinds categories, MDCC, MOCC2 andITCC. MDCC: Matrix decomposition co-clustering Long et al. (2005) “Co-clustering by Block Value Decomposition” Ding et al. (2005) gave a similar co-clustering approach based on nonnegative matrix factorization. MOCC2: topic model based co-clustering Shafiei et al. (2006) “Latent Dirichlet Co-clustering“.     Hanhuai et al. (2008) “Bayesian Co-clustering “ 2011/8/24 5 DaWak 2011 in Toulouse, France 2.  M. MahdiShafiei and Evangelos E. Milios “Model-based Overlapping Co-Clustering” Supported by grants from the Natural Sciences and Engineering Research.
Related work (cont.) ITCC: an optimization method Dhillon et al. (2003) “Information-Theoretic Co-Clustering.” Banerjee et al. (2004) ”A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation.” Li et al. employ ITCC framework to propagate the class structure and knowledge from in-domain data to out-of-domain data. As the inspiration of Li and Dhillon, we extend ITCC framework with augmented matrix to co-cluster the ad and user. 2011/8/24 6 DaWak 2011 in Toulouse, France
Problem formulation Let A, U, S and L be discrete random variables. A denotes ads which are ranged from {a1,…,am},  U denotes users which are ranged from {u1,…,un} S denotes ad settings which are ranged from {s1,…,sr} L denotes user Lohasquestionary which are ranged from {l1,…,lv} Input Data: the joint probability distribution p(A, U): ad-user link matrix p(A, S): ad-setting matrix p(U, L): user-Lohas matrix Given a p(A,U), the mutual information is defined as 7 2011/8/24 DaWak 2011 in Toulouse, France 𝐼𝐴;𝑈=𝑎𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑢)  
Problem formulation Goal: to obtain k ad clusters denoted by {â1, … âk} l user groups denoted by {û1, … ûl} Such that the mutual information loss after co-clustering is minimized the objective function where ,  are trade-off parameter that balance the effect to ad clusters or user groups. 8 2011/8/24 DaWak 2011 in Toulouse, France 𝑓𝐴,𝑈=𝐼𝐴;𝑈−𝐼𝐴;𝑈+λ𝐼𝐴;𝑆−𝐼𝐴;𝑆+𝜑[𝐼𝑈;𝐿−𝐼(𝑈;𝐿)]  
Problem formulation (cont.) ,[object Object],Lemma 1. For a fixed co-clustering (Â, Û), we can write the loss in mutual information as where q(A, U), q(A, S) and q(U, L) could be obtained by 9 2011/8/24 DaWak 2011 in Toulouse, France 𝑓𝐴,𝑈=𝐼𝐴;𝑈−𝐼𝐴;𝑈+λ𝐼𝐴;𝑆−𝐼𝐴;𝑆+𝜑𝐼𝑈;𝐿−𝐼𝑈;𝐿 =𝐷(𝑝𝐴,𝑈||𝑞𝐴,𝑈)+λ∙𝐷(𝑝𝐴,𝑆||𝑞𝐴,𝑆)+𝜑∙𝐷(𝑝𝑈,𝐿||𝑞𝑈,𝐿)   𝑞𝑎,𝑢=𝑝𝑎,𝑢𝑝𝑎𝑎𝑝𝑢𝑢, 𝑤h𝑒𝑟𝑒 𝑎=𝐶𝐴𝑎 𝑎𝑛𝑑 𝑢=𝐶𝑈𝑢   𝑞𝑎,𝑠=𝑝𝑎,𝑠 𝑝𝑎𝑎, 𝑤h𝑒𝑟𝑒 𝑎=𝐶𝐴𝑎   𝑞𝑢,𝑙=𝑝𝑢,𝑙𝑝𝑢𝑢, 𝑤h𝑒𝑟𝑒 𝑢=𝐶𝑈𝑢  
Lemma 1 Proof Since we are considering hard clustering 𝑝𝑎,𝑢=𝑎∈𝑎𝑢∈𝑢𝑝(𝑎,𝑢) 𝑝𝑎,𝑠 =𝑎∈𝑎𝑝(𝑎,𝑠) 𝑝𝑢,𝑙 =𝑢∈𝑢𝑝(𝑢,𝑙) 𝐼𝐴;𝑈−𝐼𝐴;𝑈 =𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑢)−𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑢) =𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑎)𝑝(𝑢)𝑝(𝑢) =𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑞(𝑎,𝑢) =𝐷𝑝𝐴,𝑈||𝑞𝐴,𝑈 where 𝑝𝑎𝑎=𝑝(𝑎)𝑝𝑎 𝑓𝑜𝑟 𝑎=𝐶𝐴𝑎, and similarly for 𝑝𝑢𝑢   2011/8/24 10 DaWak 2011 in Toulouse, France
Lemma 1 Proof (Cont.) 𝐼𝐴;𝑆−𝐼𝐴;𝑆 =𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑝𝑎𝑝(𝑠)−𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑝𝑎𝑝(𝑠) =𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑝(𝑎,𝑠)𝑝𝑎𝑝(𝑎) =𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑞(𝑎,𝑠) =𝐷𝑝𝐴,𝑆||𝑞𝐴,𝑆 𝐼𝑈;𝐿−𝐼𝑈;𝐿 =𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑝𝑢𝑝(𝑙)−𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑝𝑢𝑝(𝑙) =𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑝(𝑢,𝑙)𝑝𝑢𝑝(𝑢) =𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑞(𝑢,𝑙) =𝐷𝑝𝑈,𝐿||𝑞𝑈,𝐿   2011/8/24 11 DaWak 2011 in Toulouse, France
Problem formulation (cont.) Lemma 2.  An alternative approach of iteratively reducing the K-L divergence values. 𝐷(𝑝(𝐴,𝑈)|𝑞𝐴,𝑈=𝑎∈𝐴𝑎∈𝑎𝑝𝑎𝐷(𝑝(𝑈|𝑎)|𝑞𝑈𝑎 =𝑢∈𝑈𝑢∈𝑢𝑝𝑢𝐷(𝑝(𝐴|𝑢)|𝑞𝐴𝑢 𝐷(𝑝(𝑈,𝐿)|𝑞𝑈,𝐿=𝑢∈𝑈𝑢∈𝑢𝑝𝑢𝐷(𝑝(𝐿|𝑢)|𝑞𝐿𝑢 𝐷(𝑝(𝐴,𝑆)|𝑞𝐴,𝑆=𝑎∈𝐴𝑎∈𝑎𝑝𝑎𝐷(𝑝(𝑆|𝑎)|𝑞𝑆𝑎 Theorem 1 The CCAM algorithm could monotonically decreases the objective function. Since Where t is iteration number.   2011/8/24 12 DaWak 2011 in Toulouse, France 𝑓(𝑡)(𝐴,𝑈)≥𝑓(𝑡+1)(𝐴,𝑈)  
Co-clustering algorithm 13 2011/8/24 DaWak 2011 in Toulouse, France
2011/8/24 DaWak 2011 in Toulouse, France 14
2011/8/24 DaWak 2011 in Toulouse, France 15
Experiments result and evaluation The difficulty of clustering research is performance evaluation, because of it have no standard target. Therefore, we present two evaluation methods based on class prediction and group variance. Classification based evaluation Mutual information based evaluation We have retrieved the data from 2009/09/01 to 2010/03/31 that contain 530 ads and 9865 users.  For Lohas, only 2,124 users have values (have filled Lohasquestionary), others are filled with zero. 16 8/24/2011
Classification based evaluation Clustering evaluation is always done with classification, since we don’t have target labels, we produce the label by the following generation. Target (Initial cluster) generation : The target is based on the K-means clustering which is applied to the following data. Ad matrix (Ad): p(A, S) + p(A, U) User matrix (User): p(U, L) + p(U, A) Parameter setting : Iteration of K-means : 1000 Cluster K is set from 2 to 5. Output : ad cluster𝐶𝐴 (0) and user group 𝐶𝑈 (0)   17 8/24/2011
Classification based evaluation (cont.) Co-clustering features (ITCC and CCAM): User-ad cluster matrix: summation over ai belongs to ad clusterâk. U𝐴=𝑙𝑛𝑎𝑖∈𝑎𝑘𝑈𝐴𝑗𝑖 Ad-user group matrix: summation over uj belongs to user group ûl. A𝑈=𝑙𝑛𝑢𝑗∈𝑢𝑙𝐴𝑈𝑖𝑗 After generate target and co-clustering features, we apply decision tree to classify the co-clustering result, and use the F-measure as evaluation metric. Testing data with co-clustering feature: Ad + AÛ User + UÂ   18 8/24/2011
Ad cluster evaluation 8/24/2011 19 =0.6, =1.0 =0.2 =1.0 =0.8 =1.0 =0.6 =1.0
User group evaluation 8/24/2011 20 =0.6 =1.0 =0.2 =1.0 =0.8 =1.0 =0.6, =1.0
Parameter tuning of CCAM We fix φ=1.0, and set λ from 0.2 to 1.0, then observe the average F-measure between ads and users.  The optimal parameter for different K are  K=2,4: φ=1.0, λ=0.6 K=3:φ=1.0, λ=0.8 K=5: φ=1.0, λ=0.2 However, we fix λ1.0 and set φfrom 0.2 to 1.0 as well as K from 3to 5. There are nothing change. We suspect that φcontrol the p(U, L), but the zero entry dominate the p(U, L) of 161x7736. 8/24/2011 21
Parameter tuning (fix =1.0) 8/24/2011 22
Parameter tuning (fix =1.0) 8/24/2011 23
Mutual information based evaluation The mutual information are exploited the nature of co-clustering by measuring the difference between ad clusters and user groups. The higher difference is performed, the better clustering is achieved. We use the following equation to measure the mutual information. 𝐼𝐴;𝑈=𝑎𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝𝑎,𝑢𝑝𝑎𝑝(𝑢) where 𝑝𝑎,𝑢=𝑎∈𝑎𝑢∈𝑢𝑝(𝑎,𝑢)   24 8/24/2011
Mutual information based evaluation (cont.) 25 8/24/2011
Monotonically decrease mutual information loss 8/24/2011 26
Conclusion Co-clustering is to achieve the dual goals of row clustering and column clustering. However, most co-clustering algorithm focus on co-clustering of correlation matrix between row and column. Our proposed method, Co-Clustering with Augmented Matrix (CCAM), can fully utilize the augmented data to achieve the better co-clustering. CCAM could achieve better classification performance than ITCC and also present a comparable performance in the mutual information evaluation. 8/24/2011 27
Thank you for listening. Q & A 28 8/24/2011

More Related Content

Similar to Co-clustering with augmented data

THIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterTHIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 Poster
Diana Zajac
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Dawen Liang
 
A_multi-objective_set_covering_problem_A_case_stud.pdf
A_multi-objective_set_covering_problem_A_case_stud.pdfA_multi-objective_set_covering_problem_A_case_stud.pdf
A_multi-objective_set_covering_problem_A_case_stud.pdf
appaji nayak
 
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
idescitation
 
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
sherinmm
 
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
sherinmm
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
Tomaso Aste
 
Why start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaignsWhy start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaigns
Data Con LA
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
recsysfr
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networks
butest
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Xin-She Yang
 
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
MereoConsulting
 
TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKA
Fayan TAO
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
VenkateswaraBabuRavi
 
Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential Evolution
Xin-She Yang
 
DEA
DEADEA
F5233444
F5233444F5233444
F5233444
IOSR-JEN
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
International Journal of Engineering Inventions www.ijeijournal.com
 
Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Ravi Kumar
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Xin-She Yang
 

Similar to Co-clustering with augmented data (20)

THIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterTHIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 Poster
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
 
A_multi-objective_set_covering_problem_A_case_stud.pdf
A_multi-objective_set_covering_problem_A_case_stud.pdfA_multi-objective_set_covering_problem_A_case_stud.pdf
A_multi-objective_set_covering_problem_A_case_stud.pdf
 
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
 
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
 
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
Why start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaignsWhy start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaigns
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networks
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
 
TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKA
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 
Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential Evolution
 
DEA
DEADEA
DEA
 
F5233444
F5233444F5233444
F5233444
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 

More from AllenWu

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
AllenWu
 
Collaborative filtering with CCAM
Collaborative filtering with CCAMCollaborative filtering with CCAM
Collaborative filtering with CCAM
AllenWu
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
AllenWu
 
Ch4.mapreduce algorithm design
Ch4.mapreduce algorithm designCh4.mapreduce algorithm design
Ch4.mapreduce algorithm design
AllenWu
 
地震知識
地震知識地震知識
地震知識
AllenWu
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrix
AllenWu
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decomposition
AllenWu
 
Information Theoretic Co Clustering
Information Theoretic Co ClusteringInformation Theoretic Co Clustering
Information Theoretic Co Clustering
AllenWu
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual Analysis
AllenWu
 

More from AllenWu (9)

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Collaborative filtering with CCAM
Collaborative filtering with CCAMCollaborative filtering with CCAM
Collaborative filtering with CCAM
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
 
Ch4.mapreduce algorithm design
Ch4.mapreduce algorithm designCh4.mapreduce algorithm design
Ch4.mapreduce algorithm design
 
地震知識
地震知識地震知識
地震知識
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrix
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decomposition
 
Information Theoretic Co Clustering
Information Theoretic Co ClusteringInformation Theoretic Co Clustering
Information Theoretic Co Clustering
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual Analysis
 

Recently uploaded

Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
deepaannamalai16
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
ssuser13ffe4
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
RidwanHassanYusuf
 
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDFLifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Vivekanand Anglo Vedic Academy
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 

Recently uploaded (20)

Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
 
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDFLifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 

Co-clustering with augmented data

  • 1. Co-clustering with augmented data matrix Authors: Meng-Lun Wu, Chia-HuiChang, and Rui-Zhe Liu Dept. of Computer Science Information Engineering National Central University 1 2011/8/24 DaWak 2011 in Toulouse, France
  • 2. Outline Introduction Related Work Problem Formulation Co-Clustering Algorithm Experiments Result and Evaluation Conclusion 2 2011/8/24 DaWak 2011 in Toulouse, France
  • 3. Introduction (cont.) Over the past decade, co-clustering are arisen to solve the simultaneously clustering of dyadic data. However, most research only take account of the dyadic data as the main clustering matrix, which are not considering of addition information. In addition to user-movie click matrix, we might have user preference and movie description. Similarly, in addition to document-word co-occurrence matrix, we might have document genre and word meaning. 3 2011/8/24 DaWak 2011 in Toulouse, France
  • 4. Introduction (cont.) To fully utilize augmented matrix, we proposed a new method called Co-Clustering with Augmented data Matrix (CCAM). Umatch1 social websites provide the Ad$mart service that could let user to click the ads and share the profit with users. Fortunately, we could cope with Umatchwebsites, which hope us to analyze the ad-user information according to the following data. ad-user click data, ad setting data, and user profile (Lohasquestionary). 4 2011/8/24 DaWak 2011 in Toulouse, France 1. Umatch: http://www.morgenstern.com.tw/users2/index.php/u_match1/
  • 5. Related work Co-clustering research could separate three kinds categories, MDCC, MOCC2 andITCC. MDCC: Matrix decomposition co-clustering Long et al. (2005) “Co-clustering by Block Value Decomposition” Ding et al. (2005) gave a similar co-clustering approach based on nonnegative matrix factorization. MOCC2: topic model based co-clustering Shafiei et al. (2006) “Latent Dirichlet Co-clustering“.     Hanhuai et al. (2008) “Bayesian Co-clustering “ 2011/8/24 5 DaWak 2011 in Toulouse, France 2. M. MahdiShafiei and Evangelos E. Milios “Model-based Overlapping Co-Clustering” Supported by grants from the Natural Sciences and Engineering Research.
  • 6. Related work (cont.) ITCC: an optimization method Dhillon et al. (2003) “Information-Theoretic Co-Clustering.” Banerjee et al. (2004) ”A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation.” Li et al. employ ITCC framework to propagate the class structure and knowledge from in-domain data to out-of-domain data. As the inspiration of Li and Dhillon, we extend ITCC framework with augmented matrix to co-cluster the ad and user. 2011/8/24 6 DaWak 2011 in Toulouse, France
  • 7. Problem formulation Let A, U, S and L be discrete random variables. A denotes ads which are ranged from {a1,…,am}, U denotes users which are ranged from {u1,…,un} S denotes ad settings which are ranged from {s1,…,sr} L denotes user Lohasquestionary which are ranged from {l1,…,lv} Input Data: the joint probability distribution p(A, U): ad-user link matrix p(A, S): ad-setting matrix p(U, L): user-Lohas matrix Given a p(A,U), the mutual information is defined as 7 2011/8/24 DaWak 2011 in Toulouse, France 𝐼𝐴;𝑈=𝑎𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑢)  
  • 8. Problem formulation Goal: to obtain k ad clusters denoted by {â1, … âk} l user groups denoted by {û1, … ûl} Such that the mutual information loss after co-clustering is minimized the objective function where ,  are trade-off parameter that balance the effect to ad clusters or user groups. 8 2011/8/24 DaWak 2011 in Toulouse, France 𝑓𝐴,𝑈=𝐼𝐴;𝑈−𝐼𝐴;𝑈+λ𝐼𝐴;𝑆−𝐼𝐴;𝑆+𝜑[𝐼𝑈;𝐿−𝐼(𝑈;𝐿)]  
  • 9.
  • 10. Lemma 1 Proof Since we are considering hard clustering 𝑝𝑎,𝑢=𝑎∈𝑎𝑢∈𝑢𝑝(𝑎,𝑢) 𝑝𝑎,𝑠 =𝑎∈𝑎𝑝(𝑎,𝑠) 𝑝𝑢,𝑙 =𝑢∈𝑢𝑝(𝑢,𝑙) 𝐼𝐴;𝑈−𝐼𝐴;𝑈 =𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑢)−𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑢) =𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑝(𝑎,𝑢)𝑝𝑎𝑝(𝑎)𝑝(𝑢)𝑝(𝑢) =𝑎𝑢𝑎∈𝑎𝑢∈𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝(𝑎,𝑢)𝑞(𝑎,𝑢) =𝐷𝑝𝐴,𝑈||𝑞𝐴,𝑈 where 𝑝𝑎𝑎=𝑝(𝑎)𝑝𝑎 𝑓𝑜𝑟 𝑎=𝐶𝐴𝑎, and similarly for 𝑝𝑢𝑢   2011/8/24 10 DaWak 2011 in Toulouse, France
  • 11. Lemma 1 Proof (Cont.) 𝐼𝐴;𝑆−𝐼𝐴;𝑆 =𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑝𝑎𝑝(𝑠)−𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑝𝑎𝑝(𝑠) =𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑝(𝑎,𝑠)𝑝𝑎𝑝(𝑎) =𝑎𝑎∈𝑎𝑝𝑎,𝑠𝑙𝑜𝑔𝑝(𝑎,𝑠)𝑞(𝑎,𝑠) =𝐷𝑝𝐴,𝑆||𝑞𝐴,𝑆 𝐼𝑈;𝐿−𝐼𝑈;𝐿 =𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑝𝑢𝑝(𝑙)−𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑝𝑢𝑝(𝑙) =𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑝(𝑢,𝑙)𝑝𝑢𝑝(𝑢) =𝑢𝑢∈𝑢𝑝𝑢,𝑙𝑙𝑜𝑔𝑝(𝑢,𝑙)𝑞(𝑢,𝑙) =𝐷𝑝𝑈,𝐿||𝑞𝑈,𝐿   2011/8/24 11 DaWak 2011 in Toulouse, France
  • 12. Problem formulation (cont.) Lemma 2. An alternative approach of iteratively reducing the K-L divergence values. 𝐷(𝑝(𝐴,𝑈)|𝑞𝐴,𝑈=𝑎∈𝐴𝑎∈𝑎𝑝𝑎𝐷(𝑝(𝑈|𝑎)|𝑞𝑈𝑎 =𝑢∈𝑈𝑢∈𝑢𝑝𝑢𝐷(𝑝(𝐴|𝑢)|𝑞𝐴𝑢 𝐷(𝑝(𝑈,𝐿)|𝑞𝑈,𝐿=𝑢∈𝑈𝑢∈𝑢𝑝𝑢𝐷(𝑝(𝐿|𝑢)|𝑞𝐿𝑢 𝐷(𝑝(𝐴,𝑆)|𝑞𝐴,𝑆=𝑎∈𝐴𝑎∈𝑎𝑝𝑎𝐷(𝑝(𝑆|𝑎)|𝑞𝑆𝑎 Theorem 1 The CCAM algorithm could monotonically decreases the objective function. Since Where t is iteration number.   2011/8/24 12 DaWak 2011 in Toulouse, France 𝑓(𝑡)(𝐴,𝑈)≥𝑓(𝑡+1)(𝐴,𝑈)  
  • 13. Co-clustering algorithm 13 2011/8/24 DaWak 2011 in Toulouse, France
  • 14. 2011/8/24 DaWak 2011 in Toulouse, France 14
  • 15. 2011/8/24 DaWak 2011 in Toulouse, France 15
  • 16. Experiments result and evaluation The difficulty of clustering research is performance evaluation, because of it have no standard target. Therefore, we present two evaluation methods based on class prediction and group variance. Classification based evaluation Mutual information based evaluation We have retrieved the data from 2009/09/01 to 2010/03/31 that contain 530 ads and 9865 users. For Lohas, only 2,124 users have values (have filled Lohasquestionary), others are filled with zero. 16 8/24/2011
  • 17. Classification based evaluation Clustering evaluation is always done with classification, since we don’t have target labels, we produce the label by the following generation. Target (Initial cluster) generation : The target is based on the K-means clustering which is applied to the following data. Ad matrix (Ad): p(A, S) + p(A, U) User matrix (User): p(U, L) + p(U, A) Parameter setting : Iteration of K-means : 1000 Cluster K is set from 2 to 5. Output : ad cluster𝐶𝐴 (0) and user group 𝐶𝑈 (0)   17 8/24/2011
  • 18. Classification based evaluation (cont.) Co-clustering features (ITCC and CCAM): User-ad cluster matrix: summation over ai belongs to ad clusterâk. U𝐴=𝑙𝑛𝑎𝑖∈𝑎𝑘𝑈𝐴𝑗𝑖 Ad-user group matrix: summation over uj belongs to user group ûl. A𝑈=𝑙𝑛𝑢𝑗∈𝑢𝑙𝐴𝑈𝑖𝑗 After generate target and co-clustering features, we apply decision tree to classify the co-clustering result, and use the F-measure as evaluation metric. Testing data with co-clustering feature: Ad + AÛ User + UÂ   18 8/24/2011
  • 19. Ad cluster evaluation 8/24/2011 19 =0.6, =1.0 =0.2 =1.0 =0.8 =1.0 =0.6 =1.0
  • 20. User group evaluation 8/24/2011 20 =0.6 =1.0 =0.2 =1.0 =0.8 =1.0 =0.6, =1.0
  • 21. Parameter tuning of CCAM We fix φ=1.0, and set λ from 0.2 to 1.0, then observe the average F-measure between ads and users. The optimal parameter for different K are K=2,4: φ=1.0, λ=0.6 K=3:φ=1.0, λ=0.8 K=5: φ=1.0, λ=0.2 However, we fix λ1.0 and set φfrom 0.2 to 1.0 as well as K from 3to 5. There are nothing change. We suspect that φcontrol the p(U, L), but the zero entry dominate the p(U, L) of 161x7736. 8/24/2011 21
  • 22. Parameter tuning (fix =1.0) 8/24/2011 22
  • 23. Parameter tuning (fix =1.0) 8/24/2011 23
  • 24. Mutual information based evaluation The mutual information are exploited the nature of co-clustering by measuring the difference between ad clusters and user groups. The higher difference is performed, the better clustering is achieved. We use the following equation to measure the mutual information. 𝐼𝐴;𝑈=𝑎𝑢𝑝𝑎,𝑢𝑙𝑜𝑔𝑝𝑎,𝑢𝑝𝑎𝑝(𝑢) where 𝑝𝑎,𝑢=𝑎∈𝑎𝑢∈𝑢𝑝(𝑎,𝑢)   24 8/24/2011
  • 25. Mutual information based evaluation (cont.) 25 8/24/2011
  • 26. Monotonically decrease mutual information loss 8/24/2011 26
  • 27. Conclusion Co-clustering is to achieve the dual goals of row clustering and column clustering. However, most co-clustering algorithm focus on co-clustering of correlation matrix between row and column. Our proposed method, Co-Clustering with Augmented Matrix (CCAM), can fully utilize the augmented data to achieve the better co-clustering. CCAM could achieve better classification performance than ITCC and also present a comparable performance in the mutual information evaluation. 8/24/2011 27
  • 28. Thank you for listening. Q & A 28 8/24/2011