Upcoming SlideShare
×

# On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

137
-1

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
137
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
4
0
Likes
0
Embeds 0
No embeds

No notes for slide
• focus still on designing a scoring function of a document, but with the acknowledgement of various retrieval goals and the final rank context.
• focus still on designing a scoring function of a document, but with the acknowledgement of various retrieval goals and the final rank context.
• Informative argument.: some evaluation metrics are less informative than others [4]. some IR metrics thus do not necessarily summarize the (training) data well; if we begin optimizing IR metrics right from the data, the statistics of the data may not be fully explored and utilized. It is not really adaptive as have to re-do the whole training if want to optimize another metric.
• In the first stage, the aim is to estimate the relevance of documents as accurate as possible, and summarize it by the joint probability of documents’ relevance. Only in the second stage is the rank preference specified, possibly by an IR metric. The rank decision making is a stochastic one due to the uncertainty about the relevance. As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric
• In the first stage, the aim is to estimate the relevance of documents as accurate as possible, and summarize it by the joint probability of documents’ relevance. Only in the second stage is the rank preference specified, possibly by an IR metric. The rank decision making is a stochastic one due to the uncertainty about the relevance. As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric
• ### On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

1. 1. On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics Jun Wang Joint work with Jianhan Zhu Department of Computer Science University College London J.Wang@cs.ucl.ac.uk
2. 2. Motivation IR Models Calculate (relevance) scores for individual documents Probability Indexing BM25 Language Models The Binary Independent Rel. Model
3. 3. Motivation ✔ ✖ ✔ ✖ m (a rank order | “true” relevance of documents)) A general definition:
4. 4. Motivation We have different rank preferences and thus IR metrics NDCG IR Models MRR MAP ? … Something missing in between
5. 5. Motivation The fundamental question What is the underlying generative retrieval process?
6. 6. Outline • What is happening right now • The statistical retrieval process • Text retrieval experiments
7. 7. What is happening right now (1)? • Still focusing on (relevance) score, but with the acknowledgement the final rank context – The “less is more” model [Chen&Karger 2006] extended the relevance model – assumed the previously retrieved documents non- relevant when calculating the rel. of documents for the current rank position, – equivalent to maximizing the Reciprocal Rank measure
8. 8. What is happening right now (2)? • Still focusing on (relevance) score, but with the acknowledgement the final rank context – In the Language Model framework, various loss functions were defined to incorporate various ranking strategies [Zhai&Lafferty 2006]
9. 9. What is happening right now (3)? • Focusing on IR metrics and Ranking – bypass the step of estimating the relevance states of individual documents – construct a document ranking model from training data by directly optimizing an IR metric [Volkovs&Zemel 2009] • However, not all IR metrics necessarily summarize the (training) data well; thus, training data may not be fully explored
10. 10. A “balanced” view of the retrieval process – let us first understand (infer) the relevance of documents as accurate as possible, – and to summarize it by the joint probability of documents’ relevance – dependency between documents is considered – Secondly, rank preference is specified by an IR metric. – The rank decision making is a stochastic one due to the uncertainty about the relevance – As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric Given an IR Metric
11. 11. The statistical document ranking process ˆa = αργ µ αξα Ε(µ | θ) = αργ µ αξα1 ,...,αΝ ( µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)) ρ1 ,...,ρΝ ∑ The joint probability of relevance given a query IR metric: Input: 1.A rank order 2.Relevance of docs. r1,...,rN a1,...,aN
12. 12. The Optimal Ranker uncertainty Fixed an IR Metric OUTPUT: the estimated Performance Score E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ) ρ1 ,...,ρΝ ∑ m a1,...,aN p(r1,...,rN | q) E(m | q)
13. 13. Now the question is how to calculate the Expected IR metric under the joint probability of relevance if we predefine the IR metric E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ) ρ1 ,...,ρΝ ∑ m(a1,...,aN | r1,...,rN )
14. 14. We worked out it for the major IR metrics (Average Precision, DCG, Precision at N, Reciprocal Rank) • Certain assumptions are needed • The join distribution of relevance is summarized by the marginal means and co-variances E(r1 | q),...,E(rN | q) cov(ri ,rj | q) p(r1,...,rN | q)
15. 15. Some of the results • Expect Average Precision: • Expected Reciprocal Rank (two documents): E[ m ]
16. 16. Properties of IR metrics under the uncertainty
17. 17. But, is this analysis can be used in practice? • The key question is how to obtain the joint probability of relevance? – Click through data – Marginal mean • Current IR models – relevance models, language models - Co-variance of relevance - Use the documents’ score correlation to estimate the relevance correlation. - It is query-independent. We approximate it by sampling queries and calculating the correlation between documents’ ranking scores E(r1 | q),...,E(rN | q) cov(ri ,rj | q)
18. 18. TREC evaluation
19. 19. No free lunch
20. 20. The ideal can be applied for evaluation too. uncertainty Fixed an IR Metric Output the estimated Performance Score m a1,...,aN p(r1,...,rN | q) E(m | q) Input a IR model Relevance judgments
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.