Learning similarity functions from qualitative feedback

1,003 views

Published on

The performance of a case-based reasoning system often depends on the suitability of an underlying similarity (distance) measure,
and specifying such a measure by hand can be very difficult. In this paper, we therefore develop a machine learning approach to similarity assessment. More precisely, we propose a method that learns how to combine given local similarity measures into a global one. As training information,
the method merely assumes qualitative feedback in the form of similarity comparisons, revealing which of two candidate cases is more similar to
a reference case. Experimental results, focusing on the ranking performance
of this approach, are very promising and show that good models can be obtained with a reasonable amount of training information. See more at www.chengweiwei.com

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,003
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Learning similarity functions from qualitative feedback

  1. 1. Learning Similarity Functions from Qualitative Feedback Weiwei Cheng and Eyke Hüllermeier University of Marburg, Germany
  2. 2. Introduction Proper definition of similarity (distance) measures is crucial for CBR systems. The specification of local similarity measures, pertaining to individual properties (attributes) of a case, is often less difficult than their combination into a global measure. Goal of this work: Using machine learning techniques to support elicitation of similarity measures (combination of local into global measures) on the basis of qualitative feedback. 1/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  3. 3. Problem Setting Local-global principle: The global distance is an aggregation of local distances For now, we focus on a linear model: with (monotonicity). … easy to incorporate background knowledge … amenable to efficient learning scheme … non-linear extension via kernelization 2/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  4. 4. Problem Setting cont. Learning the weights from qualitative feedback: means “case a is more similar to b than to c”. Given a query, a distance measure induces a linear order on cases: Notice: Often the ordering of cases is more important than the distance itself it is sufficient to find a , such that 3/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  5. 5. The Learning Algorithm Basic idea: From distance learning to classification Extension 1: Incorporating monotonicity Extension 2: Ensemble learning Extension 3: Active learning 4/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  6. 6. From Distance Learning to Classification CASE BASE (d-dim. vector) 5/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  7. 7. Monotonicity Our model requires that when a local distance increases, the global distance cannot decrease. Our approach: (Noise-tolerant) Perceptron learning with a modified update rule: The modified algorithm provably converges after a finite number of iterations. 7/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  8. 8. Ensemble Learning version space Bayes point hypothesis space committee Permutations Ensemble of CoM of version of training Bayes point perceptrons space data 6/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  9. 9. Active Learning Goal: Reducing the feedback effort of the user by choosing the most informative training data. Our approach (a variation of QBC): 1. choose 2 most conflicting models 2. generate 2 rankings with these 2 models 3. get the first conflict pair of these rankings Example: ranking 1: a b c d e ranking 2: a b d e c 8/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  10. 10. Experimental Setting Goal: Investigating the efficacy of our approach and the effectiveness of the extensions: 1. incorporating monotonicity 2. ensemble learning 3. active learning Data sets uni iris wine yeast nba #features 6 4 13 24 15 #cases 200 150 178 2465 3924 9/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  11. 11. Quality Measures Kendall’s tau (a common rank correlation measure) … defined by number of rank inversions (normalized to [-1,+1]): Recall (a common retrieval measure) ... defined as number of predicted among true top-k cases (k=10): Position error … defined by the position of true topmost case (minus 1): 10/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  12. 12. 12 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  13. 13. Extension to Nonlinear Models Actually, we only need linearity in the coefficients, not in the local distances. Therefore, some generalizations are easily possible, such as More generally, with : 12/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  14. 14. Extensions Special case of a kernel function leads to kernelization: Nonlinear classification and sorting (a, b, c) (b, c) classifier distance b c 0.7 12/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009
  15. 15. Conclusions Learning to combine local distance measures into a global measure. Only assuming qualitative feedback of the type “a is more similar to b than to c”. Reduction of distance learning to classification. 13/13 Weiwei Cheng & Eyke Hüllermeier 7/31/2009

×