Your SlideShare is downloading. ×
A new instance-based label ranking approach using the Mallows model
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

A new instance-based label ranking approach using the Mallows model

725
views

Published on

See more at www.chengweiwei.com

See more at www.chengweiwei.com

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
725
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Weiwei Cheng & Eyke Hüllermeier Knowledge Engineering & Bioinformatics Lab Department of Mathematics and Computer Science University of Marburg, Germany
  • 2. Label Ranking (an example) Learning customers’ preferences on cars: label ranking   customer 1 MINI      Toyota    BMW  customer 2 BMW      MINI      Toyota customer 3 BMW Toyota      MINI customer 4 Toyota      MINI      BMW new customer ??? where the customers are typically described by feature  vectors, e.g., (gender, age, place of birth, has child, …) 1/16
  • 3. Label Ranking (an example) Learning customers’ preferences on cars: MINI Toyota BMW customer 1 1 2 3 customer 2 2 3 1 customer 3 3 2 1 customer 4 2 1  3 new customer ? ? ? π(i) = position of the i‐th label in the ranking 1: MINI  2: Toyota  3: BMW 2/16
  • 4. Label Ranking (more formally) Given: a set of training instances  a set of labels for each training instance     : a set of pairwise preferences of the form Find: a ranking function (            mapping) that maps each            to a ranking      of L (permutation     ) 3/16
  • 5. Model‐Based Approaches … essentially reduce label ranking to classification: Ranking by pairwise comparison (RPC) Fürnkranz and Hüllermeier,  ECML‐03 Constraint classification (CC) Har‐Peled , Roth and Zimak,  NIPS‐03 Log linear models for label ranking (LL) Dekel, Manning and Singer,  NIPS‐03 4/16
  • 6. Instance‐Based Approach (this work) Lazy learning: Instead of eagerly inducing a model  from the data, simply store the observations. Target functions are estimated on demand in a local  way, no need to define the mapping explicitly. Core part is the aggregation of preference (order)  information from neighbored examples. 5/16
  • 7. Related Work Case‐based Label Ranking (Brinker and Hüllermeier, ECML‐06) Aggregation of complete rankings is done by median ranking Borda count Our aggregation method is based on a probabilistic model  and can handle both complete and incomplete rankings. 6/16
  • 8. Instance‐Based Prediction Basic assumption: Distribution of output is (approximately)  constant in the neighborhood of the query; consider outputs  of neighbors as an i.i.d. sample.  Conventional classification:  discrete distribution on class labels estimate probabilities by relative class frequencies class prediction by majority vote  7/16
  • 9. Probabilistic Model for Ranking  Mallows model (Mallows, Biometrika, 1957) with  center ranking spread parameter and        is a right invariant metric on permutations 8/16
  • 10. Inference (full rankings) We have observed from the neighbors. ML monotone in θ
  • 11. Inference (incomplete ranking) “marginal” distibution where denotes all consistent extensions of    . Example for label set {a,b,c}: Observation σ Extensions E(σ) a      b c a b a c b c      a b 10/16
  • 12. Inference (incomplete ranking)  cont. The corresponding likelihood: ML estimation                                         becomes more difficult. 11/16
  • 13. Inference Not only the estimated ranking     is of interest … … but also the spread parameter    , which is a measure of  precision and, therefore, reflects the confidence/reliability of  the prediction (just like the variance of an estimated mean).  The bigger   , the more peaked the distribution around the  center ranking. 12/16
  • 14. Experimental Setting Data sets Name #instances #features #labels Iris1 150  4 3 Wine1 178  13 3 Glass1 214  9 6 Vehicle1 846  18 4 Dtt2 2465  24 4 Cold2 2465  24 4 1 UCI data sets.  2 Phylogenetic profiles and DNA microarray expression data. 13/16
  • 15. Accuracy (Kendall tau) A typical run: Kendall tau missing rate of labels Main observation:  Our approach is quite competitive with the state‐of‐the‐art   model based approaches.  14/16
  • 16. Accuracy‐Rejection Curve θ as a measure of the reliability of a prediction ratio of the data that predicted Main observation:  Decreasing curve confirms that θ is a reasonable measure of  confidence. 15/16
  • 17. Take‐away Messages An instance‐based label ranking approach using a  probabilistic model. Suitable for complete and incomplete rankings. Comes with a natural measure of the reliability of a  prediction. o More efficient inference for the incomplete case.  o Generalization: distance‐weighted prediction. o Dealing with variants of the label ranking problem, such as  calibrated label ranking and multi‐label classification. 16/16
  • 18. Median rank MINI Toyota BMW ranking 1 1 3 2 ranking 2 2 3 1 ranking 3 3 2 1 median rank 2 3 1 ‐ tends to optimize Spearman footrule
  • 19. Borda count ranking 1 MINI      Toyota      BMW  ranking 2 BMW      MINI      Toyota ranking 3 BMW Toyota      MINI Borda count BMW: 4  MINI: 3  Toyota:2 ‐ tends to optimize Spearman rank correlation
  • 20. 21
  • 21. 22