Tutorial SIAM Data Mining 2012

915 views

Published on

This is the slides for our tutorial given at SIAM Data Mining 2012 on multi-task learning.

  • Be the first to comment

Tutorial SIAM Data Mining 2012

  1. 1. Center for Evolutionary Medicine and Informatics Multi-Task Learning:Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3, Jieping Ye1,2 1Computer Science and Engineering, Arizona State University, AZ 2The Biodesign Institute, Arizona State University, AZ 3GE Global Research, NY The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  2. 2. Center for Evolutionary Medicine and Informatics Tutorial Goals• Understand the basic concepts in multi-task learning• Understand different approaches to model task relatedness• Get familiar with different types of multi-task learning techniques• Introduce multi-task learning applications• Introduce the multi-task learning package: MALSAR2 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  3. 3. Center for Evolutionary Medicine and Informatics Tutorial Road Map• Part I: Multi-task Learning (MTL) background and motivations• Part II: MTL formulations• Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis• Part IV: An MTL Package (MALSAR)• Current and future directions3 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  4. 4. Center for Evolutionary Medicine and Informatics Tutorial Road Map• Part I: Multi-task Learning (MTL) background and motivations• Part II: MTL formulations• Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis• Part IV: An MTL Package (MALSAR)• Current and future directions4 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  5. 5. Center for Evolutionary Medicine and Informatics Multiple Taskso Examination Scores Prediction1 (Argyriou et. al.’08) School 1 - Alverno High School Student Birth Previous … School … Exam id year score ranking score 72981 1985 95 … 83% … ? student-dependent school-dependent School 138 - Jefferson Intermediate School Student Birth Previous … School … Exam id year score ranking score 31256 1986 87 … 72% … ? student-dependent school-dependent School 139 - Rosemead High School Student Birth Previous … School … Exam id year score ranking score 12381 1986 83 … 77% … ? student-dependent school-dependent 1The Inner London Education Authority (ILEA)5 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  6. 6. Center for Evolutionary Medicine and Informatics Learning Multiple Taskso Learning each task independently School 1 - Alverno High School Student Birth Previous School … Exam task id year score ranking Score 1st 72981 1985 95 83% … ? Excellent School 138 - Jefferson Intermediate School Student Birth Previous School … Exam task id year score ranking Score 138th 31256 1986 87 72% … ? Excellent School 139 - Rosemead High School Student Birth Previous School … Exam id year score ranking Score task 139th 12381 1986 83 77% … ? Excellent6 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  7. 7. Center for Evolutionary Medicine and Informatics Learning Multiple Taskso Leaning multiple tasks simultaneously School 1 - Alverno High School Student Birth Previous School … Exam task id year score ranking Score 1st 72981 1985 95 83% … ? School 138 - Jefferson Intermediate School task Student Birth Previous School … Exam id year score ranking Score 138th 31256 1986 87 72% … ? School 139 - Rosemead High School Student Birth Previous School … Exam task id year score ranking Score 139th 12381 1986 83 77% … ? Learn tasks simultaneously Model the tasks relationship7 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  8. 8. Center for Evolutionary Medicine and Informatics Performance of MTLo Evaluation on the School data: • Predict exam scores for 15362 students from 139 schools • Describe each student by 27 attributes • Compare single task learning approaches (Ridge Regression, Lasso) and one multi-task learning approach (trace-norm regularized learning) 1.05 Ridge Regression Lasso 1 Trace Norm 0.95 0.9N-MSE 0.85 0.8 0.75 Performance measure: mean squared error 0.7 N−MSE = 1 2 3 4 5 6 7 8 variance (target)8 Index of Training Ratio The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  9. 9. Center for Evolutionary Medicine and Informatics Multi-Task Learning• Multi-task Learning is Single Task Learning Trained Task 1 Training Data Training Generalization different from single task Model learning in the training Task 2 Training Data Training Trained Model Generalization ... ... (induction) process. Trained Task t Training Data Training Generalization Model Multi-Task Learning• Inductions of multiple Task 1 Training Data Trained Generalization Model tasks are performed Trained simultaneously to capture Task 2 Training Data Generalization Model Training ... ... intrinsic relatedness. Trained Task t Training Data Generalization Model9 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  10. 10. Center for Evolutionary Medicine and Informatics Learning Methods o – – – o – – – o – –The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  11. 11. Center for Evolutionary Medicine and Informatics Web Pages Categorization Chen et. al. 2009 ICML• Classify documents into Category Classifiers categories Health Classifiers’ Parameters Models of different categories are• The classification of each Travel latently related category is a task ... World• The tasks of predicting Politics different categories may US be latently related11 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  12. 12. Center for Evolutionary Medicine and Informatics MTL for HIV Therapy Screening Bickel et. al. ICML 08• Hundreds of possible combinations of drugs, some of which use similar biochemical mechanisms• The samples available for each combination are limited.• For a patient, the prediction of using one combination is a task• Use the similarity information by multiple task learning Drug Patient Treatment Record ETV ABC ETV AZT ... AZT FTC ? D4T DDI FTC12 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  13. 13. Center for Evolutionary Medicine and Informatics Other Applications• Portfolio selection [Ghosn and Bengio, NIPS’97]• Collaborative ordinal regression [Yu et. al. NIPS’06]• Web image and video search [Wang et. al. CVPR’09]• Disease progression modeling [Zhou et. al. KDD’11]• Disease prediction [Zhang et. al. NeuroImage 12]13 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  14. 14. Center for Evolutionary Medicine and Informatics Tutorial Road Map• Part I: Multi-task Learning (MTL) background and motivations• Part II: MTL formulations• Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis• Part IV: An MTL Package (MALSAR)• Current and future directions14 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  15. 15. Center for Evolutionary Medicine and Informatics How Tasks Are Related Methods • Mean-regularized MTL • Joint feature learning • Trace-Norm regularized MTL Assumption: • Alternating structural All tasks are related optimization (ASO) • Shared Parameter Gaussian Process15 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  16. 16. Center for Evolutionary Medicine and Informatics How Tasks Are Related • Assume all tasks are related may be too strong for practical applications. • There are some Assumption: irrelevant (outlier) There are outlier tasks tasks.16 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  17. 17. Center for Evolutionary Medicine and Informatics How Tasks Are Related Methods • Clustered MTL • Tree MTL • Network MTL Assumption: Tasks have group structures Assumption: Tasks have tree structures Assumption: Tasks have graph/network structures a b c Models17 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  18. 18. Center for Evolutionary Medicine and Informatics Multi-Task Learning Methods• Regularization-based MTL – All tasks are related • regularized MTL, joint feature learning, low rank MTL, ASO – Learning with outlier tasks: robust MTL – Tasks form groups/graphs/trees • clustered MTL, network MTL, tree MTL• Other Methods – Shared Hidden Nodes in Neural Network – Shared Parameter Gaussian Process18 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  19. 19. Center for Evolutionary Medicine and InformaticsRegularization-based Multi-Task Learning• All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning• Clustered MTL• MTL with Tree/Graph structure19 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  20. 20. Center for Evolutionary Medicine and Informatics Notation Dimension d Task m Task m Task m Sample n2 Sample n2 Sample nt Sample nt Dimension d... ... Learning Sample n1 Sample n1 Feature Matrices Xi Target Vectors Yi Model Matrix W • We focus on linear models: Yi = 𝑋 𝑖 × 𝑊𝑖 𝑋 𝑖 ∈ ℝ 𝑛 𝑖 ×𝑑 , 𝑌𝑖 ∈ ℝ 𝑛 𝑖 ×1 , 𝑊 = [𝑊1 , 𝑊2 , … , 𝑊 𝑚 ] 20 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  21. 21. Center for Evolutionary Medicine and Informatics Mean-Regularized Multi-Task Learning Evgeniou & Pontil, 2004 KDD • Assumption: task parameter vectors of all tasks are close to each other. – Advantage: simple, intuitive, easy to implement – Disadvantage: may not hold in real applications. Regularization penalizes the deviation of each task from the mean Task 𝑚 𝑚 2 1 2 1 meanmin 𝑋𝑊 − 𝑌 𝐹 + 𝜆 𝑊𝑖 − 𝑊𝑠 𝑊 2 𝑚 𝑖=1 𝑠=1 2 21 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  22. 22. Center for Evolutionary Medicine and InformaticsRegularization-based Multi-Task Learning• All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning• Clustered MTL• MTL with Tree/Graph structure22 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  23. 23. Center for Evolutionary Medicine and Informatics Multi-Task Learning with High Dimensional Data• In practical applications, we may deal with high dimensional data. – Gene expression data, biomedical image data• Curse of Dimensionality• Dealing with high dimensional data in multi-task learning – Embedded feature selection: L1/Lq - Group Lasso – Low-rank subspace learning: low-rank assumption – ASO, Trace-norm regularization23 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  24. 24. Center for Evolutionary Medicine and InformaticsRegularization-based Multi-Task Learning• All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning• Clustered MTL• MTL with Tree/Graph structure24 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  25. 25. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Joint Feature Learning• One way to capture the task T ask 1 T ask 2 T ask 3 T ask 4 T ask 5 relatedness from multiple Feature 1 related tasks is to constrain Feature 2 Feature 3 all models to share a common Feature 4 set of features. Feature 5• For example, in school data, Feature 6 the scores from different Feature 7 Feature 8 schools may be determined Feature 9 by a similar set of features. Feature 1025 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  26. 26. Center for Evolutionary Medicine and InformaticsMulti-Task Learning with Joint Feature Learning Obozinski et. al. 2009 Stat Comput, Liu et. al. 2010 Technical Report• Using group sparsity: ℓ1 /ℓ 𝑞 -norm regularization 𝑑• When q>1 we have group sparsity. 𝑊 1,𝑞 = 𝒘𝑖 𝑞 𝑖=1 Y X W Sample 1 Sample 2 Sample 3 ... ≈ × ... ... Sample n-2 ... Sample n-1 Sample n k1 k2 k3 km k1 k2 k3 km Tas Tas Tas Tas Tas Tas Tas Tas Output Input Model n×m n×d d×m 1 2 min 𝑋𝑊 − 𝑌 𝐹 + 𝜆 𝑊 1,𝑞 𝑊 226 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  27. 27. Center for Evolutionary Medicine and Informatics Writer-Specific Character Recognition Obozinski, Taskar, and Jordan, 2006• Each task is a classification between two letters for one writer.27 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  28. 28. Center for Evolutionary Medicine and Informatics Dirty Model for Multi-Task Learning Jalali et. al. 2010 NIPS• In practical applications, it is too restrictive to constrain all tasks to share a single shared structure. = + Group Sparse Sparse Model Component Component W P Q 2 min 𝑌 − 𝑋 𝑃 + 𝑄 𝐹 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄 1 𝑃,𝑄28 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  29. 29. Center for Evolutionary Medicine and Informatics Robust Multi-Task Learningo Most Existing MTL Approaches o Robust MTL Approaches all tasks are relevant relevant tasks irrelevant task irrelevant task Assumption: Assumption: All tasks are related There are outlier tasks 29 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  30. 30. Center for Evolutionary Medicine and Informatics Robust Multi-Task Feature Learning Gong et. al. 2012 Submitted• Simultaneously captures a common set of features among relevant tasks and identifies outlier tasks. Joint Selected Features Outlier Tasks = + Group Sparse Group Sparse Model Component Component W P Q 2 min 𝑌 − 𝑋 𝑃 + 𝑄 𝐹 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄 𝑇 1,𝑞30 𝑃,𝑄 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  31. 31. Center for Evolutionary Medicine and InformaticsRegularization-based Multi-Task Learning• All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning• Clustered MTL• MTL with Tree/Graph structure31 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  32. 32. Center for Evolutionary Medicine and Informatics Trace-Norm Regularized MTLo Capture task relatedness via a shared low-rank structure task 1 training trained generalization data model task 2 training trained generalization data training model … … … task n training trained generalization data model A shared low-rank subspace32 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  33. 33. Center for Evolutionary Medicine and Informatics Low-Rank Structure for MTL• Assume we have a rank 2 model matrix: training data weight vector target basis vector basis vector Task 1 × ≈ = 𝛼1 + 𝛼2 … Task 2 × ≈ = 𝛽1 + 𝛽1 … Task 3 × ≈ = 𝛾1 + 𝛾2 … The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  34. 34. Center for Evolutionary Medicine and Informatics Low-Rank Structure for MTL 𝛼1 𝛼2 𝑇 = × 𝛽1 𝛽2 𝛾1 𝛾2 A low rank structure Basis vectors Coefficients 𝜏 11 ⋯ 𝜏 1𝑚 × ⋮ ⋱ ⋮ … = … 𝜏 𝑝1 ⋯ 𝜏 𝑝𝑚 𝑚 > 𝑝 m tasks p basis vectors34 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  35. 35. Center for Evolutionary Medicine and Informatics Low-Rank Structure for MTL Ji et. al. 2009 ICML• Rank minimization formulation – min Loss(𝑊) + 𝜆 × Rank(𝑊) 𝑊 – Rank minimization is NP-Hard for general loss functions• Convex relaxation: trace norm minimization – min Loss(𝑊) + 𝜆 × 𝑊 ∗ 𝑊 ∗ : sum of singular values of W 𝑊 – The trace norm is theoretically shown to be a good approximation for rank function (Fazel et al., 2001).35 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  36. 36. Center for Evolutionary Medicine and Informatics Low-Rank Structure for MTLo Evaluation on the School data: • Predict exam scores for 15362 students from 139 schools • Describe each student by 27 attributes • Compare Ridge Regression, Lasso, and Trace Norm (for inducing a low-rank structure) 1.05 Ridge Regression Lasso Performance measure: 1 Trace Norm mean squared error 0.95 N−MSE = variance (target) 0.9 N-MSE The Low-Rank Structure 0.85 (induced via Trace Norm) 0.8 leads to the smallest N-MSE. 0.75 0.7 1 2 3 4 5 6 7 8 Index of Training Ratio36 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  37. 37. Center for Evolutionary Medicine and Informatics Alternating Structure Optimization (ASO) Ando and Zhang, 2005 JMRL• ASO assumes that the model is the sum of two components: a task specific one and a shared low dimensional subspace. Task 1 Input X w1 + Θ X v1 ≈ Task 2 Input X w2 + Θ X v2 Θ ≈ ... Low- Dimensional Feature Map Task m Input X wm + Θ X vm ≈ 37 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  38. 38. Center for Evolutionary Medicine and Informatics Alternating Structure Optimization (ASO) Ando and Zhang, 2005 JMRLo Learning from the i-th task ui =Өvi + wi Xi yi Input X wi + Θ X vi ≈ Low- Dimensional Feature Mapo Empirical loss function for i-th task 2 ℒi Xi Өvi + wi , yi = Xi Өvi + wi − yio ASO simultaneously learns models and the shared structure: 𝑚 min ℒi Xi Өvi + wi , yi + 𝛼 wi 2 Ө,{vi ,wi } 𝑖=1 subject to ӨT Ө =I38 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  39. 39. Center for Evolutionary Medicine and Informatics iASO Formulation Chen et al., 2009 ICMLo iASO formulation 𝑚 min ℒi Xi Өvi + wi , yi + 𝛼 Өvi + wi 2 + 𝛽 wi 2 Ө,{vi ,wi } 𝑖=1 subject to ӨT Ө = I • control both model complexity and task relatedness • subsume ASO (Ando et al.’05) as a special case • naturally lead to a convex relaxation (Chen et al., 09, ICML) • Convex relaxed ASO is equivalent to iASO under certain mild conditions39 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  40. 40. Center for Evolutionary Medicine and Informatics Incoherent Low-Rank and Sparse Structures Chen et. al. 2010 KDD• ASO uses L2-norm on task-specific component, we can also use L1-norm to learn task-specific features. Task-specific features Capture task relatedness = + = basis X coefficient Element-wise Sparse Low-Rank Model Component Component W Q P 𝑚 min ℒi Xi Pi + Q i , yi + ߣ Q 1 P,Q 𝑖=1 Convex formulation40 subject to P ∗ ⩽ 𝜂 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  41. 41. Center for Evolutionary Medicine and Informatics Robust Low-Rank in MTL Chen et. al. 2011 KDD• Simultaneously perform low-rank MTL and identify outlier tasks. Identify irrelevant tasks Capture task relatedness = + = basis X coefficient Outlier Group Sparse Tasks Low-Rank Model Component Component W Q P 𝑚 mini ℒ i X i 𝑃 𝑖 + 𝑄 𝑖 , yi + 𝛼 P ∗ + 𝛽 QT 1,𝑞 P,Q41 𝑖=1 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  42. 42. Center for Evolutionary Medicine and Informatics Summary • All multi-task learning formulations discussed above can fit into the W=P+Q schema. – Component P: shared structure – Component Q: information not captured by the shared structureEmbedded FeatureSelection Shared Structure P Component QL1/Lq Feature Selection (L1/Lq Norm) 0Dirty Feature Selection (L1/Lq Norm) L1-normrMTFL Feature Selection (L1/Lq Norm) Outlier (column-wise L1/Lq Norm)Low-Rank SubspaceLearningTrace Norm Low-Rank (Trace Norm) 0ISLR Low-Rank (Trace Norm) L1-normASO Low-Rank (Shared Subspace) L2-norm on independent comp.RMTL Low-Rank (Trace Norm) Outlier (column-wise L1/Lq Norm) 42 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  43. 43. Center for Evolutionary Medicine and InformaticsRegularization-based Multi-Task Learning• All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning• Clustered MTL• MTL with Tree/Graph structure43 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  44. 44. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Clustered Structures• Most MTL techniques assume all tasks are related• Not true in many applications• Clustered multi-task learning assumes  the tasks have a group Assumption: Tasks have group structures structure  the models of tasks from the e.g. tasks in the yellow group are predictions of heart related same group are closer to each diseases and in the blue group are other than those from a brain related diseases. different group 44 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  45. 45. Center for Evolutionary Medicine and Informatics Clustered Multi-Task Learning Jacob et. al. 2008 NIPS, Zhou et. al. 2011 NIPS• Use regularization to capture clustered structures. Clustered Models Training Data X ≈ ... ... Cluster 1 Cluster 2 Cluster k-1 Cluster k Cluster 2 Cluster 1 Training Data X ≈ Cluster k-1 Cluster k45 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  46. 46. Center for Evolutionary Medicine and Informatics Clustered Multi-Task Learning Zhou et. al. 2011 NIPS• Capture structures by minimizing sum- m tasks of-square error (SSE) in K-means clustering: 𝑘 ... 2 min 𝑤 𝑣 − 𝑤𝑗 𝐼 2 𝑗=1 𝑣∈𝐼 𝑗 Cluster 1 Cluster 2 Cluster k-1 Cluster k Ij index set of jth cluster Clustered Models Cluster 2 Cluster 1 Cluster k-1 Cluster k min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹) 𝐹 task number m > cluster number k 𝐹 : m×k orthogonal cluster indicator matrix 𝐹𝑖,𝑗 = 1/ 𝑛 𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise46 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  47. 47. Center for Evolutionary Medicine and Informatics Clustered Multi-Task Learning Zhou et. al. 2011 NIPS• Directly minimizing SSE is hard m tasks because of the non-linear constraint on F: ... min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹) 𝐹 𝐹 : m×k orthogonal cluster indicator matrix Cluster 1 Cluster 2 Cluster k-1 Cluster k 𝐹𝑖,𝑗 = 1/ 𝑛 𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise Clustered Models Cluster 2 Cluster 1 Cluster k-1 Cluster k min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹) 𝑇 task number m > cluster number k 𝐹:𝐹 𝐹=𝐼 𝑘 Zha et. al. 2001 NIPS47 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  48. 48. Center for Evolutionary Medicine and Informatics Clustered Multi-Task Learning Zhou et. al. 2011 NIPS• Clustered multi-task learning (CMTL) formulation min 𝑇 Loss W + 𝛼 tr 𝑊 𝑇 𝑊 − tr 𝐹 𝑇 𝑊 𝑇 𝑊𝐹 + 𝛽 tr 𝑊 𝑇 𝑊𝑊,𝐹:𝐹 𝐹=𝐼 𝑘 capture cluster structures Improves Cluster 2 generalization performance Cluster 1 Cluster k-1 Cluster k• CMTL has been shown to be equivalent to ASO – Given the dimension of the shared low-rank subspace in ASO and the cluster number in clustered multi-task learning (CMTL) are the same.48 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  49. 49. Center for Evolutionary Medicine and Informatics Convex Clustered Multi-Task Learning Zhou et. al. 2011 NIPS Ground Truth Mean Regularized MTL noise introduced by relaxations Low rank can also well capture cluster structure Trace Norm Regularized Convex Relaxed CMTL MTL49 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  50. 50. Center for Evolutionary Medicine and InformaticsRegularization-based Multi-Task Learning• All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning• Clustered MTL• MTL with Tree/Graph structure50 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  51. 51. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Tree Structures• In some applications, the tasks may be equipped with a tree Assumption: Tasks have tree structures structure: – The tasks belonging to the same node are similar to each other – The similarity between two nodes is related to a b c the depth of the Task a is more similar with b, Models ‘common’ tree node compared to c51 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  52. 52. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Tree Structures Kim and Xing 2010 ICML• Tree-Guided Group Lasso 𝑑 𝑗 min Loss 𝑊 + 𝜆 𝑤𝑣 𝑊𝐺 𝑣 Structure 𝑊 2 𝑗=1 𝑣∈𝑉 W1 W2 W3 Gv5={W1, W2, W3} Input Features Gv4={W1, W2} Gv1={W1} Gv2={W2} Gv3={W3} Output (Tasks)52 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  53. 53. Center for Evolutionary Medicine and InformaticsMulti-Task Learning with Graph Structures• In real applications, tasks involved in MTL may have graph structures – The two tasks are related if they are connected in a graph, i.e. the connected tasks are similar – The similarity of two related Assumption: tasks can be represented by the Tasks have graph/network structures weight of the connecting edge.53 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  54. 54. Center for Evolutionary Medicine and InformaticsMulti-Task Learning with Graph Structures• A simple way to encode graph structure is to penalize the difference of two tasks that have an edge between them• Given a set of edges E, we thus penalize: 𝐸 2 2 𝐸 ×𝑚 𝑊𝑒{𝑖,1} − 𝑊𝑒{𝑖,2} = 𝑊𝑅 𝑇 𝐹 𝑅∈ℝ 2 𝑖=1• The graph regularization term can also be represented in the form of Laplacian term 2 𝑊𝑅 𝑇 𝐹 = 𝑡𝑟 𝑊𝑅 𝑇 𝑇 𝑊𝑅 𝑇 = 𝑡𝑟 𝑊𝑅 𝑇 𝑅𝑊 𝑇 = 𝑡𝑟(𝑊ℒ𝑊 𝑇 )54 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  55. 55. Center for Evolutionary Medicine and InformaticsMulti-Task Learning with Graph Structures• How to obtain graph information – External domain knowledge • protein-protein interaction (PPI) for microarray – Discover task relatedness from data • Pairwise correlation coefficient • Sparse inverse covariance (Friedman et. al. 2008 Biostatistics)55 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  56. 56. Center for Evolutionary Medicine and InformaticsMulti-Task Learning with Graph Structures Chen et. al. 2011 UAI, Kim et. al. 2009 Bioinformatics • Graph-guided Fused Lasso Gene Input SNP ACGTTTTACTGTACAATTTAC Lasso Output phenotype Gene Input SNP ACGTTTTACTGTACAATTTAC Graph-Guided Fused Lasso Output phenotype min Loss 𝑊 + 𝜆 𝑊 1 + Ω(𝑊) Graph-guided Fusion Penalty 𝑊 𝐽 Ω 𝑊 = 𝛾 𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟 𝑚𝑙 𝑊𝑗𝑙56 𝑒= 𝑚,𝑙 ∈𝐸 𝑗=1 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  57. 57. Center for Evolutionary Medicine and InformaticsMulti-Task Learning with Graph Structures Kim et. al. 2009 Bioinformatics• In some applications, we know not only which pairs are related, but also how they are related.• Graph-Weighted Fused Lasso. GeneInput SNP ACGTTTTACTGTACAATTTAC Graph-Weighted Fused LassoOutput phenotype 𝐽 min Loss 𝑊 + 𝜆 𝑊 1 + 𝛾 𝜏(𝑟 𝑚𝑙 ) 𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟 𝑚𝑙 𝑊𝑗𝑙 𝑊 𝑒= 𝑚,𝑙 ∈𝐸 𝑗=1 Added weight information!57 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  58. 58. Center for Evolutionary Medicine and Informatics Practical Guideline• MTL versus STL – MTL is preferred when dealing with multiple related tasks with small number of training samples• Shared features versus shared subspace – Identifying shared features is preferred When the data dimensionality is large – Identifying a shared subspace is preferred when the number of tasks is large58 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  59. 59. Center for Evolutionary Medicine and Informatics Tutorial Road Map• Part I: Multi-task Learning (MTL) background and motivations• Part II: MTL formulations• Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis• Part IV: An MTL Package (MALSAR)• Current and future directions59 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  60. 60. Center for Evolutionary Medicine and Informatics Case Study I: Incomplete Multi-Source Data Fusion Yuan et. al. 2012 NeuroImage• In many applications, PET MRI CSF P1 P2 P3 … P114 P115 P116 M1 M2 M3 M4 … M303 M304 M305 C1 C2 C3 C4 C5 multiple data sources Subject1 ... ... ... may contain a Subject60 Subject61 considerable amount of Subject62 ... ... ... ... missing data. Subject139 Subject140 Subject141• In ADNI, over half of the ... ... ... Subject148 subjects lack CSF Subject149 ... ... measurements; an Subject245 independent half of the subjects do not have FDG-PET.60 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  61. 61. Center for Evolutionary Medicine and Informatics Overview of iMSF Yuan et. al. 2012 NeuroImage PET MRI CSF Task I PET MRI CSF Model I Task II Model II Model III Model IV Task III Task IV 𝑚 𝑑 1 min 𝐿𝑜𝑠𝑠 𝑋 𝑖 , 𝑌𝑖 , 𝑊𝑖 + 𝜆 𝑊𝐺 𝑘 𝑊 𝑚 2 𝑖=1 𝑘=161 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  62. 62. Center for Evolutionary Medicine and Informatics iMSF: Performance Yuan et. al. 2012 NeuroImage 0.84 0.83 iMSF Accuracy 0.82 Zero EM 0.81 KNN 0.8 SVD 0.79 50.0% 66.7% 75.0% 0.5 1 0.45 0.9 0.4 0.8 0.35 iMSF 0.7 iMSFSensitivity Specificity 0.3 Zero 0.6 Zero 0.25 0.5 EM EM 0.2 0.4 0.15 KNN 0.3 KNN 0.1 SVD 0.2 SVD 0.05 0.1 0 0 50.0% 66.7% 75.0% 50.0% 66.7% 75.0% 62 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  63. 63. Center for Evolutionary Medicine and Informatics Case Study II: Drosophila Gene Expression Image Analysis• Drosophila (fruit fly) is a favorite model system for geneticists and developmental biologists studying embryogenesis. – The small size and short generation time make it ideal for genetic studies.• In situ hybridization allows us to generate images showing when and where individual genes were active. – The analysis of such images can potentially reveal gene functions and gene-gene interactions. The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  64. 64. Center for Evolutionary Medicine and Informatics Drosophila gene expression pattern images bgm en Stage 1-3 Stage 4-6 Stage 7-8 Stage 9-10 Stage 11-12 Stage 13- Berkeley Drosophila Genome Project (BDGP)Expressions http://www.fruitfly.org/ hb runt Stage 1-3 Stage 4-5 Stage 6-7 Stage 8-9 Stage 10- Fly-FISH http://fly-fish.ccbr.utoronto.ca/ [Tomancak et al. (2002) Genome Biology; Lécuyer et al. (2007) Cell] The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  65. 65. Center for Evolutionary Medicine and Informatics Comparative image analysis Twist heartless stumpsstage 4-6 anterior endoderm AISN dorsal ectoderm AISN anterior endoderm AISN trunk mesoderm AISN procephalic ectoderm AISN trunk mesoderm AISN subset subset head mesoderm AISN cellular blastoderm cellular blastoderm mesoderm AISN mesoderm AISNstage 7-8 trunk mesoderm PR trunk mesoderm PR yolk nuclei head mesoderm PR head mesoderm PR trunk mesoderm PR anterior endoderm anlage head mesoderm PR anterior endoderm anlage We need the spatial and temporal annotations of expressions [Tomancak et al. (2002) Genome Biology; Sandmann et al. (2007) Genes & Dev.] The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  66. 66. Center for Evolutionary Medicine and Informatics Challenges of manual annotation 180000 160000 Cumulative number of in situ images 140000Number of images 120000 100000 80000 60000 40000 20000 0 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2010 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  67. 67. Center for Evolutionary Medicine and Informatics Method outline Ji et. al. 2008 Bioinformatics; Ji et. al. 2009 BMC Bioinformatics; Ji et. al. 2009 NIPS Feature extraction Model construction Low-rank multi-taskImages Sparse coding Graph-based multi-task The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  68. 68. Center for Evolutionary Medicine and InformaticsLow rank multi-task learning model Ji et. al. 2009 BMC Bioinformatics W X Y k n Low rank  L( wiT x j , Y ji )   W i 1 j 1 * trace norm = sum of singular values Loss term The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  69. 69. Center for Evolutionary Medicine and Informatics Graph-based multi-task learning model Ji et. al. 2009 SIGKDD sensory system head 0.56 ventral sensory complex 0.31 0.57 0.79 0.50 embryonic antennal 0.35 sense organ dorsal/lateral sensory complexes 0.60 0.36 embryonic maxillary sensory complex sensory nervous system loss 2-normk n L(w  g (C 2 x j , Y ji )  1 W  2 )  wp  sgn( C pq ) wq T 2 i F pqi 1 j 1 ( p , q )G Closed-form solution The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  70. 70. Center for Evolutionary Medicine and Informatics Spatial annotation performance 85 83 81 79 SC+LR 77AUC 75 SC+Graph 73 BoW+Graph 71 Kernel+SVM 69 67 65 Stages 4-6 Stages 7-8 Stages 9-10 Stages 11-12 Stages 13-16• 50% data for training and 50% for testing and 30 random trials are generated• Multi-task approaches outperform single-task approaches The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  71. 71. Center for Evolutionary Medicine and Informatics Tutorial Road Map• Part I: Multi-task Learning (MTL) background and motivations• Part II: MTL formulations• Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis• Part IV: An MTL Package (MALSAR)• Current and future directions71 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  72. 72. Center for Evolutionary Medicine and Informatics• A multi-task learning package• Encode task relationship via structural regularization• www.public.asu.edu/~jye02/Software/MALSAR/72 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  73. 73. Center for Evolutionary Medicine and Informatics MTL Algorithms in MALSAR 1.0 • Mean-Regularized Multi-Task Learning • MTL with Embedded Feature Selection – Joint Feature Learning – Dirty Multi-Task Learning – Robust Multi-Task Feature Learning • MTL with Low-Rank Subspace Learning – Trace Norm Regularized Learning – Alternating Structure Optimization – Incoherent Sparse and Low Rank Learning – Robust Low-Rank Multi-Task Learning • Clustered Multi-Task Learning • Graph Regularized Multi-Task Learning73 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  74. 74. Center for Evolutionary Medicine and Informatics Tutorial Road Map• Part I: Multi-task Learning (MTL) background and motivations• Part II: MTL formulations• Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis• Part IV: An MTL Package (MALSAR)• Current and future directions74 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  75. 75. Center for Evolutionary Medicine and Informatics Trends in Multi-Task Learning• Develop efficient algorithms for large-scale multi- task learning.• Semi-supervised and unsupervised MTL• Learn task structures automatically in MTL• Asymmetric MTL• Cross-Domain MTL – The features may be different The relationship is not mutual75 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  76. 76. Center for Evolutionary Medicine and Informatics Acknowledgement • National Science Foundation • National Institute of Health76 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  77. 77. Center for Evolutionary Medicine and Informatics Reference77 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  78. 78. Center for Evolutionary Medicine and Informatics Reference78 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  79. 79. Center for Evolutionary Medicine and Informatics Reference79 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  80. 80. Center for Evolutionary Medicine and Informatics Reference80 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  81. 81. Center for Evolutionary Medicine and Informatics Reference81 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  82. 82. Center for Evolutionary Medicine and Informatics Reference82 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  83. 83. Center for Evolutionary Medicine and Informatics Reference83 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

×