CLEF’10: Conference on Multilingual and Multimodal
                     Information Access Evaluation
                    ...
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                              ...
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                              ...
1. Motivation  Tie-breaking bias illustration                                              G. Cabanac et al.


A tale abo...
1. Motivation  Tie-breaking bias illustration                                            G. Cabanac et al.


A tale about...
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                              ...
2. Context & issue  Tie-breaking bias                                          G. Cabanac et al.


Measuring the effectiv...
2. Context & issue  Tie-breaking bias                                           G. Cabanac et al.


    Runs are reordere...
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                              ...
3. Contribution  Reordering strategies                                         G. Cabanac et al.


Consequences of run re...
3. Contribution  Reordering strategies                                       G. Cabanac et al.


      Alternative unbias...
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                              ...
4. Experiments  Impact of the tie-breaking bias                                             G. Cabanac et al.


Effect of...
4. Experiments  Impact of the tie-breaking bias   G. Cabanac et al.


Ties demographics
    89.6% of the runs comprise t...
4. Experiments  Impact of the tie-breaking bias                                                G. Cabanac et al.


    Pr...
4. Experiments  Impact of the tie-breaking bias   G. Cabanac et al.


Effect on Reciprocal Rank (RR)




                ...
4. Experiments  Impact of the tie-breaking bias   G. Cabanac et al.


Effect on Average Precision (AP)




              ...
4. Experiments  Impact of the tie-breaking bias                      G. Cabanac et al.


Effect on Mean Average Precision...
4. Experiments  Impact of the tie-breaking bias                             G. Cabanac et al.


What we learnt: Beware of...
4. Experiments  Impact of the tie-breaking bias                                          G. Cabanac et al.


     Related...
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                              ...
Impact du « biais des ex aequo » dans les évaluations de RI     G. Cabanac et al.


Conclusions and future works
    Cont...
CLEF’10: Conference on Multilingual and Multimodal
                     Information Access Evaluation
                    ...
Upcoming SlideShare
Loading in …5
×

CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

740 views
646 views

Published on

Talk related to our paper presented at the 1st Conference on Multilingual and MultimodalInformation Access Evaluation

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
740
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

  1. 1. CLEF’10: Conference on Multilingual and Multimodal Information Access Evaluation September 20-23, Padua, Italy Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment
  2. 2. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 2
  3. 3. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 3
  4. 4. 1. Motivation  Tie-breaking bias illustration G. Cabanac et al. A tale about two TREC participants (1/2) Topic 031 “satellite launch contracts” 5 relevant documents Chris Ellen one single difference C = ( , 0.8), ( , 0.8), ( , 0.5) E = ( , 0.8), ( , 0.8), ( , 0.5) unlucky lucky Why such a huge difference? 4
  5. 5. 1. Motivation  Tie-breaking bias illustration G. Cabanac et al. A tale about two TREC participants (2/2) Chris Ellen one single difference C = ( , 0.8), ( , 0.8), ( , 0.5) E = ( , 0.8), ( , 0.8), ( , 0.5) After 15 days of hard work  Only difference: the name of one document  5
  6. 6. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 6
  7. 7. 2. Context & issue  Tie-breaking bias G. Cabanac et al. Measuring the effectiveness of IRSs  User-centered vs. System-focused [Spärk Jones & Willett, 1997]  Evaluation campaigns  1958 Cranfield UK  1992 TREC Text Retrieval Conference USA  1999 NTCIR NII Test Collection for IR Systems Japan  2001 CLEF Cross-Language Evaluation Forum Europe  …  “Cranfield” methodology  Task  Test collection  Corpus  Topics  Qrels  Measures : MAP, P@X ... 7 using trec_eval [Voorhees, 2007]
  8. 8. 2. Context & issue  Tie-breaking bias G. Cabanac et al. Runs are reordered prior to their evaluation Qrels = qid, iter, docno, rel Run = qid, iter, docno, rank, sim, run_id ( , 0.8), ( , 0.8), ( , 0.5) Reordering by trec_eval qid asc, sim desc, docno desc ( , 0.8), ( , 0.8), ( , 0.5) Effectiveness measure = f (intrinsic_quality, ) MAP, P@X, MRR… 8
  9. 9. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 9
  10. 10. 3. Contribution  Reordering strategies G. Cabanac et al. Consequences of run reordering  Measures of effectiveness for an IRS s  RR(s,t) 1/rank of the 1st relevant document, for topic t   P(s,t,d) precision at document d, for topic t Sensitive to  AP(s,t) average precision for topic t document rank  MAP(s) mean average precision  Tie-breaking bias Ellen Chris  Is the Wall Street Journal collection more relevant than Associated Press?  Problem 1 comparing 2 systems AP(s1, t) vs. AP(s2, t)  Problem 2 comparing 2 topics AP(s, t1) vs. AP(s, t2) 10
  11. 11. 3. Contribution  Reordering strategies G. Cabanac et al. Alternative unbiased reordering strategies ex aequo ex aequo  Conventional reordering (TREC)  Ties sorted Z  A qid asc, sim desc, docno desc  Realistic reordering  Relevant docs last qid asc, sim desc, rel asc, docno desc  Optimistic reordering 11  Relevant docs first qid asc, sim desc, rel desc, docno desc
  12. 12. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 12
  13. 13. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect of the tie-breaking bias  Study of 4 TREC tasks 1993 1997 1998 1999 2000 2002 2004 2009 routing filtering web adhoc  22 editions 3 GB of data from trec.nist.gov  1360 runs  Assessing the effect of tie-breaking  Proportion of document ties  How frequent is the bias?  Effect on measure values  Top 3 observed differences  Observed difference in %  Significance of the observed difference: Student’s t-test (paired, unilateral) 13
  14. 14. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Ties demographics  89.6% of the runs comprise ties  Ties are present all along the runs 14
  15. 15. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Proportion of tied documents in submitted runs 15 On average, 25.2 % of a result-list = tied documents On average, 10.6 docs in a tied group of docs
  16. 16. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect on Reciprocal Rank (RR) 16
  17. 17. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect on Average Precision (AP) 17
  18. 18. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect on Mean Average Precision (MAP) Difference of ranks computed on MAP not significant (Kendall’s t) 18
  19. 19. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. What we learnt: Beware of tie-breaking for AP  Poor effect on MAP, larger effect on AP  Measure bounds APRealistic  APConventionnal  APOptimistic padre1, adhoc’94  Failure analysis for the ranking process  Error bar = element of chance  potential for improvement 19
  20. 20. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Related works in IR evaluation Topics reliability? [Buckley & Voorhees, 2000]  25 [Voorhees & Buckley, 2002] error rate [Voorhees, 2009] n collections Qrels reliability? [Voorhees, 1998] quality [Al-Maskari et al., 2008] TREC vs. TREC [Voorhees, 2007] Measures reliability? [Buckley & Voorhees, 2000] MAP  [Sakai, 2008] ‘system bias’ [Moffat & Zobel, 2008] new measures [Raghavan et al., 1989] Precall Pooling reliability? [McSherry & Najork, 2008] Tied scores [Zobel, 1998] approximation  [Sanderson & Joho, 2004] manual [Cabanac et al., 2010] tie-breaking bias [Buckley et al., 2007] size adaptation 20
  21. 21. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 21
  22. 22. Impact du « biais des ex aequo » dans les évaluations de RI G. Cabanac et al. Conclusions and future works  Context: IR evaluation  TREC and other campaigns based on trec_eval  Contributions  Measure = f (intrinsic_quality, luck)  tie-breaking bias  Measure bounds (realistic  conventional  optimistic)  Study of the tie-breaking bias effect  (conventional, realistic) for RR, AP and MAP  Strong correlation, yet significant difference  No difference on system rankings (based on MAP)  Future works  Study of other / more recent evaluation campaigns  Reordering-free measures 22  Finer grained analyses: finding vs. ranking
  23. 23. CLEF’10: Conference on Multilingual and Multimodal Information Access Evaluation September 20-23, Padua, Italy Thank you

×