SlideShare a Scribd company logo
1 of 22
Download to read offline
Portfolio Theory of Information
Retrieval
Jun Wang and Jianhan Zhu
jun.wang@cs.ucl.ac.uk
Department of Computer Science
University College London, UK
Portfolio Theory of Information Retrieval – p. 1/22
Outline
Research Problem
An Analogy: Stock Selection In Financial Markets
Portfolio Theory of Information Retrieval
Evaluations on ad hoc text retrieval
Conclusions
Portfolio Theory of Information Retrieval – p. 2/22
Two-stage Process in IR
Stage 1 calculates the relevance between the given
user information need (query) and each of the
documents
Producing a “best guess” at the relevance, e.g. the
BM25 and the language modelling approaches
Stage 2 presents (normally rank) relevant documents
The probability ranking principle (PRP) states that
“the system should rank documents in order of
decreasing probability of relevance” [Cooper(1971)]
Under certain assumptions, the overall effectiveness,
e.g., expected Precision, is maximized
[Robertson(1977)]
Portfolio Theory of Information Retrieval – p. 3/22
Ranking Under Uncertainty
the PRP ignores [Gordon and Lenk(1991)]:
there is uncertainty when we calculate relevance
scores, e.g., due to limited sample size, and
relevance scores of documents are correlated
0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
5
6
Document A
Document B
Uncertainty of the relevance scores Correlations of relevance scores
Portfolio Theory of Information Retrieval – p. 4/22
An Example
Suppose we have query apple, and two classes of users
U1: Apple_Computers and U2: Apple_Fruit; U1 has
twice as many members as U2
An IR system retrieved three documents d1, d2 and d3;
their probabilities of relevance are as follows:
UserClass d1: Apple_Comp. d2: Apple_Comp. d3: Apple_Fruit
Apple Computers 1 1 0
Apple Fruit 0 0 1
p(r) 2/3 2/3 1/3
The PRP is not good
({d1Apple_Comp., d2Apple_Comp., d3Apple_Fruit}) as
user group U2 (Apple Fruit) has to reject two documents
before reaching the one it wants [Robertson(1977)]
Portfolio Theory of Information Retrieval – p. 5/22
A Little History of Probabilistic Ranking
1960s [Maron and Kuhns(1960)] mentioned the
two-stage process implicitly
1970s [Cooper(1971)] examined the PRP explicitly
well discussed in [Robertson(1977)] and Stirling’s
thesis [Stirling(1977)]
1991 [Gordon and Lenk(1991)] studied its limitations
1998 [Carbonell and Goldstein(1998)] proposed
diversity-based reranking (MMR)
2006 The “less is more” model
[Chen and Karger(2006)]
maximize the probability of finding a relevant
documents in top-n ranked list, where a < n
Portfolio Theory of Information Retrieval – p. 6/22
Our View of the Ranking Problem (1)
We argue that ranking under uncertainty is not just
about picking individual relevant documents, but about
choosing the right combination of relevant
document - the Portfolio Effect
There is a similar scenario in financial markets:
Two observations:
The future returns of stocks cannot be estimated
with absolute certainty
The future returns are correlated
Portfolio Theory of Information Retrieval – p. 7/22
Our View of the Ranking Problem (2)
The analogy:
According to the PRP, one might first rank stocks and
then choose the top-n most “profitable” stocks
Such a principle that essentially maximizes the
expected future return was, however, rejected by
Markowitz in Modern Portfolio Theory [Markowitz(1952)]
Portfolio Theory of Information Retrieval – p. 8/22
Our View of the Ranking Problems (3)
Markowitz’ approach is based on the analysis of the
expected return (mean) of a portfolio and its variance
(or standard deviation) of return. The latter serves as a
measure of risk
4 6 8 10 12 14
0
0.5
1
1.5
2
2.5
3
Standard Deviation
Return
Efficient Frontier
Google
Coca− Cola
0 20 40 60 80 100 120
0
0.2
0.4
0.6
0.8
1
α (Risk Preference)
PortfolioPercentage
Google
Coca− Cola
Efficient Frontier Percentage in the Portfolio
Portfolio Theory of Information Retrieval – p. 9/22
Portfolio Theory in IR (1)
Objective: find an optimal ranked list (consisting of n
documents from rank 1 to n) that has the maximum
effectiveness in response to the given information need
Define effectiveness: consider the weighted average of
the relevance scores in the ranked list:
Rn ≡
n
i=1
wiri
where Rn denotes the overall relevance of a ranked list.
Variable wi, where n
i=1 wi = 1, differentiates the
importance of rank positions. ri is the relevance score
of a document in the list, where i = {1, ..., n}, for each of
the rank positions
Portfolio Theory of Information Retrieval – p. 10/22
Portfolio Theory in IR (2)
Weight wi is similar to the discount factors that have
been applied to IR evaluation in order to penalize
late-retrieved relevant documents
[Järvelin and Kekäläinen(2002)]
It can be easily shown that when w1 > w2... > wn, the
maximum value of Rn gives the ranking order
r1 > r2... > rn
This follows immediately that maximizing R – by which
the document with highest relevance score is retrieved
first, the document with next highest is retrieved
second, etc. – is equivalent to the PRP
Portfolio Theory of Information Retrieval – p. 11/22
Portfolio Theory in IR (3)
During retrieval, the overall relevance Rn cannot be
calculated with certainty
Quantify a ranked list based on its expectation (mean
E[Rn] ) and its variance (V ar(Rn)):
E[Rn] =
n
i=1
wiE[ri]
V ar(Rn) =
n
i=1
n
j=1
wiwjci,j
where ci,j is the (co)variance of the relevance scores
between the two documents at position i and j. E[ri] is
the expected relevance score, determined by a point
estimate from the specific retrieval model
Portfolio Theory of Information Retrieval – p. 12/22
Portfolio Theory in IR (4)
What to be optimized?
1. Maximize the mean E[Rn] regardless of its variance
2. Minimize the variance V ar(Rn) regardless of its mean
3. Minimize the variance for a specified mean t
(parameter): min V ar(Rn), subject to E[Rn] = t
4. Maximize the mean for a specified variance h
(parameter): max E[Rn], subject to V ar(Rn) = h
5. Maximize the mean and minimize the variance by using
a specified risk preference parameter b:
max On = E[Rn] − bV ar(Rn)
Portfolio Theory of Information Retrieval – p. 13/22
Portfolio Theory in IR (5)
The Efficient Frontier:
0 0.05 0.1 0.15 0.2 0.25
2
2.5
3
3.5
4
4.5
5
5.5
6
Variance
ExpectedRelevance
Objective function: On = E[Rn] − bV ar(Rn) where b is a
parameter adjusting the risk level
Portfolio Theory of Information Retrieval – p. 14/22
Portfolio Theory in IR (6)
Our solution provides a mathematical model of rank
diversification
Suppose we have two documents. Their relevance
scores are 0.5 and 0.9 respectively
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
0.4
0.5
0.6
0.7
0.8
0.9
1
Relevance
Standard Deviation
ρ = −1
ρ = −0.5
ρ = 0
ρ = 1
Portfolio Theory of Information Retrieval – p. 15/22
Evaluations on Ad hoc Text Retrieval (1)
We experimented on Ad hoc and sub topic retrieval
Calculation of the Mean and Variance:
Mean: posterior mean of the chosen text retrieval
model
Covariance matrix:
- largely missing in IR modelling
- formally, should be determined by the second
moment of the relevance scores (model
parameters), e.g., applying the Bayesian paradigm
- can be approximated by the covariance with
respect to their term occurrences
Portfolio Theory of Information Retrieval – p. 16/22
Evaluations on Ad hoc Text Retrieval (2)
Impact of parameter b on different evaluation metrics
(a) Mean Reciprocal Rank (MRR) (b) Mean Average Precision (MAP)
(a) positive b: “invest” into different docs. increases the
chance of early returning the first rel. docs
(b) negative b: “invest” in “similar” docs (big variance)
might hurt the MRR but on average increases the
performance of the entire ranked list
Portfolio Theory of Information Retrieval – p. 17/22
Evaluations on Ad hoc Text Retrieval (3)
The impact of the parameter b on a risk-sensitive
metric, k-call [Chen and Karger(2006)]
10-call: ambitious, return 10 rel. docs.
1-call: conservative, return at least one rel. doc.
Positive b when k is small (1 and 2). diversifying
reduces the risk of not returning any rel docs.
Negative b as k increases. Taking risk increases the
chance of finding more rel. docs.
Portfolio Theory of Information Retrieval – p. 18/22
Evaluations on Ad hoc Text Retrieval (4)
Comparison with the PRP (Linear smoothing LM)
Measures CSIRO WT10g Robust Robust hard TREC8
MRR 0.869 0.558 0.592 0.393 0.589
0.843 0.492 0.549 0.352 0.472
+3.08% +13.41%* +7.83%* +11.65%* +24.79%*
MAP 0.41 0.182 0.204 0.084 0.212
0.347 0.157 0.185 0.078 0.198
+18.16%*+15.92%*+10.27%* +7.69%* +7.07%*
NDCG 0.633 0.433 0.421 0.271 0.452
0.587 0.398 0.396 0.252 0.422
+7.88%* +8.82%* +6.25%* +7.55%* +7.05%*
NDCG@10 0.185 0.157 0.175 0.081 0.149
0.170 0.141 0.169 0.078 0.140
+8.96%* +11.23%* +3.80% +3.90% +6.36%*
NDCG@100 0.377 0.286 0.314 0.169 0.305
0.355 0.262 0.292 0.159 0.287
+6.25%* +9.27%* +7.55%* +6.58%* +6.34%*
Portfolio Theory of Information Retrieval – p. 19/22
Evaluations on Ad hoc Text Retrieval (5)
Comparison with diversity-based reranking, the MMR
[Carbonell and Goldstein(1998)]
In each cell, the first line shows the performance of our approach, and the second line shows
the performance of the MMR method and gain of our method over the MMR method.
Models Dirichlet Jelinek-Mercer BM25
sub-MRR 0.014 0.011 0.009
0.012 (+16.67%*)0.009 (+22.22%*)0.007 (+28.57%*)
sub-Recall@5 0.324 0.255 0.275
0.304 (+6.58%*) 0.234 (+8.97%*) 0.27 (+1.85%)
sub-Recall@10 0.381 0.366 0.352
0.362 (+5.25%) 0.351 (+4.27%) 0.344 (+2.33%)
sub-Recall@20 0.472 0.458 0.464
0.455 (+3.74%) 0.41 (+11.71%*) 0.446 (+4.04%)
sub-Recall@100 0.563 0.582 0.577
0.558 (+0.90%) 0.55 (+5.82%*) 0.558 (+3.41%)
Portfolio Theory of Information Retrieval – p. 20/22
Conclusions
We have presented a new ranking theory for IR
The benefit of diversification is well quantified
Is it a unified theory to explain risk and reward in IR?
Portfolio Theory of Information Retrieval – p. 21/22
References
[Cooper(1971)] William S. Cooper. The inadequacy of probability of usefulness as a ranking
criterion for retrieval system output. University of California, Berkeley, 1971.
[Robertson(1977)] S. E. Robertson. The probability ranking principle in IR. Journal of
Documentation, pages 294–304, 1977.
[Gordon and Lenk(1991)] Michael D. Gordon and Peter Lenk. A utility theoretic examination
of the probability ranking principle in information retrieval. JASIS, 42(10):703–714, 1991.
[Maron and Kuhns(1960)] M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing
and information retrieval. J. ACM, 7(3), 1960.
[Stirling(1977)] Keith H. Stirling. The Effect of Document Ranking on Retrieval System
Performance: A Search for an Optimal Ranking Rule. PhD thesis, UC, Berkeley, 1977.
[Carbonell and Goldstein(1998)] Jaime Carbonell and Jade Goldstein. The use of MMR,
diversity-based reranking for reordering documents and producing summaries. In SIGIR,
1998.
[Chen and Karger(2006)] Harr Chen and David R. Karger. Less is more: probabilistic
models for retrieving fewer relevant documents. In SIGIR, 2006.
[Markowitz(1952)] H Markowitz. Portfolio selection. Journal of Finance, 1952.
[Järvelin and Kekäläinen(2002)] Kalervo Järvelin and Jaana Kekäläinen. Cumulated
gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 2002.
Portfolio Theory of Information Retrieval – p. 22/22

More Related Content

What's hot

Statistics for management
Statistics for managementStatistics for management
Statistics for managementVinay Aradhya
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statisticsLamineKaba6
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427amykua
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1TimKasse
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Edureka!
 
Optimizing transformation for linearity between online
Optimizing transformation for linearity between onlineOptimizing transformation for linearity between online
Optimizing transformation for linearity between onlineAlexander Decker
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
5. Llinking employers and employees responses
5. Llinking employers and employees responses5. Llinking employers and employees responses
5. Llinking employers and employees responsesBEYOND4.0
 
Mathematical Econometrics
Mathematical EconometricsMathematical Econometrics
Mathematical Econometricsjonren
 
2. Joint analysis - TNO
2. Joint analysis - TNO2. Joint analysis - TNO
2. Joint analysis - TNOBEYOND4.0
 
A second order confirmatory factor analysis of composite
A second order confirmatory factor analysis of compositeA second order confirmatory factor analysis of composite
A second order confirmatory factor analysis of compositeAlexander Decker
 
Statistics for managers
Statistics for managersStatistics for managers
Statistics for managerssonia gupta
 
Business Development Analysis
Business Development Analysis Business Development Analysis
Business Development Analysis Manpreet Chandhok
 
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignment
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignmentSMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignment
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignmentrahul kumar verma
 
IRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET Journal
 

What's hot (20)

Statistics for management
Statistics for managementStatistics for management
Statistics for management
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statistics
 
Statistics assignment help
Statistics assignment helpStatistics assignment help
Statistics assignment help
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...
 
Dmml report final
Dmml report finalDmml report final
Dmml report final
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
 
Optimizing transformation for linearity between online
Optimizing transformation for linearity between onlineOptimizing transformation for linearity between online
Optimizing transformation for linearity between online
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
5. Llinking employers and employees responses
5. Llinking employers and employees responses5. Llinking employers and employees responses
5. Llinking employers and employees responses
 
Mathematical Econometrics
Mathematical EconometricsMathematical Econometrics
Mathematical Econometrics
 
2. Joint analysis - TNO
2. Joint analysis - TNO2. Joint analysis - TNO
2. Joint analysis - TNO
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
A second order confirmatory factor analysis of composite
A second order confirmatory factor analysis of compositeA second order confirmatory factor analysis of composite
A second order confirmatory factor analysis of composite
 
Statistics for managers
Statistics for managersStatistics for managers
Statistics for managers
 
Business Development Analysis
Business Development Analysis Business Development Analysis
Business Development Analysis
 
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignment
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignmentSMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignment
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignment
 
IRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET- Online Course Recommendation System
IRJET- Online Course Recommendation System
 

Viewers also liked

Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingWeinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingJun Wang
 
On Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time AdvertisingOn Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time AdvertisingJun Wang
 
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...Jun Wang
 
Wsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-biddingWsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-biddingJun Wang
 
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...
Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...Jun Wang
 
Deep Learning
Deep LearningDeep Learning
Deep LearningJun Wang
 

Viewers also liked (7)

Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingWeinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
 
On Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time AdvertisingOn Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time Advertising
 
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
 
Wsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-biddingWsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-bidding
 
Wsdm2015
Wsdm2015Wsdm2015
Wsdm2015
 
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...
Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 

Similar to Portfolio Theory of Information Retrieval

Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisYabebal Ayalew
 
Collaborative Proposal Analysis
Collaborative Proposal AnalysisCollaborative Proposal Analysis
Collaborative Proposal AnalysisKristi Lucas
 
Model Reuse With Bike Rental Station
Model Reuse With Bike Rental StationModel Reuse With Bike Rental Station
Model Reuse With Bike Rental StationConnie Ripp
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1warishali570
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR modelsNisha Arankandath
 
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...inventionjournals
 
Integration of Principal Component Analysis and Support Vector Regression fo...
 Integration of Principal Component Analysis and Support Vector Regression fo... Integration of Principal Component Analysis and Support Vector Regression fo...
Integration of Principal Component Analysis and Support Vector Regression fo...IJCSIS Research Publications
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicatorsvie_dels
 
Design And Development Of Simulator For Risk Analysis
Design And Development Of Simulator For Risk AnalysisDesign And Development Of Simulator For Risk Analysis
Design And Development Of Simulator For Risk AnalysisLauren Barker
 
Quantitative Risk Assessment - Road Development Perspective
Quantitative Risk Assessment - Road Development PerspectiveQuantitative Risk Assessment - Road Development Perspective
Quantitative Risk Assessment - Road Development PerspectiveSUBIR KUMAR PODDER
 
An Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing TheoryAn Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing TheoryAkhil Goyal
 
Application of K-Means Clustering Algorithm for Classification of NBA Guards
Application of K-Means Clustering Algorithm for Classification of NBA GuardsApplication of K-Means Clustering Algorithm for Classification of NBA Guards
Application of K-Means Clustering Algorithm for Classification of NBA GuardsEditor IJCATR
 
Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27IJARIIE JOURNAL
 
Fuzzy formal concept analysis: Approaches, applications and issues
Fuzzy formal concept analysis: Approaches, applications and issuesFuzzy formal concept analysis: Approaches, applications and issues
Fuzzy formal concept analysis: Approaches, applications and issuesCSITiaesprime
 

Similar to Portfolio Theory of Information Retrieval (20)

Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
fma.ny.presentation
fma.ny.presentationfma.ny.presentation
fma.ny.presentation
 
Collaborative Proposal Analysis
Collaborative Proposal AnalysisCollaborative Proposal Analysis
Collaborative Proposal Analysis
 
Model Reuse With Bike Rental Station
Model Reuse With Bike Rental StationModel Reuse With Bike Rental Station
Model Reuse With Bike Rental Station
 
Missing Data imputation
Missing Data imputationMissing Data imputation
Missing Data imputation
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR models
 
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
 
FSRM 582 Project
FSRM 582 ProjectFSRM 582 Project
FSRM 582 Project
 
Integration of Principal Component Analysis and Support Vector Regression fo...
 Integration of Principal Component Analysis and Support Vector Regression fo... Integration of Principal Component Analysis and Support Vector Regression fo...
Integration of Principal Component Analysis and Support Vector Regression fo...
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicators
 
Design And Development Of Simulator For Risk Analysis
Design And Development Of Simulator For Risk AnalysisDesign And Development Of Simulator For Risk Analysis
Design And Development Of Simulator For Risk Analysis
 
Base Stock Model Essay
Base Stock Model EssayBase Stock Model Essay
Base Stock Model Essay
 
Quantitative Risk Assessment - Road Development Perspective
Quantitative Risk Assessment - Road Development PerspectiveQuantitative Risk Assessment - Road Development Perspective
Quantitative Risk Assessment - Road Development Perspective
 
An Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing TheoryAn Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing Theory
 
Application of K-Means Clustering Algorithm for Classification of NBA Guards
Application of K-Means Clustering Algorithm for Classification of NBA GuardsApplication of K-Means Clustering Algorithm for Classification of NBA Guards
Application of K-Means Clustering Algorithm for Classification of NBA Guards
 
G1803054653
G1803054653G1803054653
G1803054653
 
Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27
 
Fuzzy formal concept analysis: Approaches, applications and issues
Fuzzy formal concept analysis: Approaches, applications and issuesFuzzy formal concept analysis: Approaches, applications and issues
Fuzzy formal concept analysis: Approaches, applications and issues
 
Lecture_note1.pdf
Lecture_note1.pdfLecture_note1.pdf
Lecture_note1.pdf
 

Recently uploaded

Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 

Recently uploaded (20)

Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 

Portfolio Theory of Information Retrieval

  • 1. Portfolio Theory of Information Retrieval Jun Wang and Jianhan Zhu jun.wang@cs.ucl.ac.uk Department of Computer Science University College London, UK Portfolio Theory of Information Retrieval – p. 1/22
  • 2. Outline Research Problem An Analogy: Stock Selection In Financial Markets Portfolio Theory of Information Retrieval Evaluations on ad hoc text retrieval Conclusions Portfolio Theory of Information Retrieval – p. 2/22
  • 3. Two-stage Process in IR Stage 1 calculates the relevance between the given user information need (query) and each of the documents Producing a “best guess” at the relevance, e.g. the BM25 and the language modelling approaches Stage 2 presents (normally rank) relevant documents The probability ranking principle (PRP) states that “the system should rank documents in order of decreasing probability of relevance” [Cooper(1971)] Under certain assumptions, the overall effectiveness, e.g., expected Precision, is maximized [Robertson(1977)] Portfolio Theory of Information Retrieval – p. 3/22
  • 4. Ranking Under Uncertainty the PRP ignores [Gordon and Lenk(1991)]: there is uncertainty when we calculate relevance scores, e.g., due to limited sample size, and relevance scores of documents are correlated 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 Document A Document B Uncertainty of the relevance scores Correlations of relevance scores Portfolio Theory of Information Retrieval – p. 4/22
  • 5. An Example Suppose we have query apple, and two classes of users U1: Apple_Computers and U2: Apple_Fruit; U1 has twice as many members as U2 An IR system retrieved three documents d1, d2 and d3; their probabilities of relevance are as follows: UserClass d1: Apple_Comp. d2: Apple_Comp. d3: Apple_Fruit Apple Computers 1 1 0 Apple Fruit 0 0 1 p(r) 2/3 2/3 1/3 The PRP is not good ({d1Apple_Comp., d2Apple_Comp., d3Apple_Fruit}) as user group U2 (Apple Fruit) has to reject two documents before reaching the one it wants [Robertson(1977)] Portfolio Theory of Information Retrieval – p. 5/22
  • 6. A Little History of Probabilistic Ranking 1960s [Maron and Kuhns(1960)] mentioned the two-stage process implicitly 1970s [Cooper(1971)] examined the PRP explicitly well discussed in [Robertson(1977)] and Stirling’s thesis [Stirling(1977)] 1991 [Gordon and Lenk(1991)] studied its limitations 1998 [Carbonell and Goldstein(1998)] proposed diversity-based reranking (MMR) 2006 The “less is more” model [Chen and Karger(2006)] maximize the probability of finding a relevant documents in top-n ranked list, where a < n Portfolio Theory of Information Retrieval – p. 6/22
  • 7. Our View of the Ranking Problem (1) We argue that ranking under uncertainty is not just about picking individual relevant documents, but about choosing the right combination of relevant document - the Portfolio Effect There is a similar scenario in financial markets: Two observations: The future returns of stocks cannot be estimated with absolute certainty The future returns are correlated Portfolio Theory of Information Retrieval – p. 7/22
  • 8. Our View of the Ranking Problem (2) The analogy: According to the PRP, one might first rank stocks and then choose the top-n most “profitable” stocks Such a principle that essentially maximizes the expected future return was, however, rejected by Markowitz in Modern Portfolio Theory [Markowitz(1952)] Portfolio Theory of Information Retrieval – p. 8/22
  • 9. Our View of the Ranking Problems (3) Markowitz’ approach is based on the analysis of the expected return (mean) of a portfolio and its variance (or standard deviation) of return. The latter serves as a measure of risk 4 6 8 10 12 14 0 0.5 1 1.5 2 2.5 3 Standard Deviation Return Efficient Frontier Google Coca− Cola 0 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 α (Risk Preference) PortfolioPercentage Google Coca− Cola Efficient Frontier Percentage in the Portfolio Portfolio Theory of Information Retrieval – p. 9/22
  • 10. Portfolio Theory in IR (1) Objective: find an optimal ranked list (consisting of n documents from rank 1 to n) that has the maximum effectiveness in response to the given information need Define effectiveness: consider the weighted average of the relevance scores in the ranked list: Rn ≡ n i=1 wiri where Rn denotes the overall relevance of a ranked list. Variable wi, where n i=1 wi = 1, differentiates the importance of rank positions. ri is the relevance score of a document in the list, where i = {1, ..., n}, for each of the rank positions Portfolio Theory of Information Retrieval – p. 10/22
  • 11. Portfolio Theory in IR (2) Weight wi is similar to the discount factors that have been applied to IR evaluation in order to penalize late-retrieved relevant documents [Järvelin and Kekäläinen(2002)] It can be easily shown that when w1 > w2... > wn, the maximum value of Rn gives the ranking order r1 > r2... > rn This follows immediately that maximizing R – by which the document with highest relevance score is retrieved first, the document with next highest is retrieved second, etc. – is equivalent to the PRP Portfolio Theory of Information Retrieval – p. 11/22
  • 12. Portfolio Theory in IR (3) During retrieval, the overall relevance Rn cannot be calculated with certainty Quantify a ranked list based on its expectation (mean E[Rn] ) and its variance (V ar(Rn)): E[Rn] = n i=1 wiE[ri] V ar(Rn) = n i=1 n j=1 wiwjci,j where ci,j is the (co)variance of the relevance scores between the two documents at position i and j. E[ri] is the expected relevance score, determined by a point estimate from the specific retrieval model Portfolio Theory of Information Retrieval – p. 12/22
  • 13. Portfolio Theory in IR (4) What to be optimized? 1. Maximize the mean E[Rn] regardless of its variance 2. Minimize the variance V ar(Rn) regardless of its mean 3. Minimize the variance for a specified mean t (parameter): min V ar(Rn), subject to E[Rn] = t 4. Maximize the mean for a specified variance h (parameter): max E[Rn], subject to V ar(Rn) = h 5. Maximize the mean and minimize the variance by using a specified risk preference parameter b: max On = E[Rn] − bV ar(Rn) Portfolio Theory of Information Retrieval – p. 13/22
  • 14. Portfolio Theory in IR (5) The Efficient Frontier: 0 0.05 0.1 0.15 0.2 0.25 2 2.5 3 3.5 4 4.5 5 5.5 6 Variance ExpectedRelevance Objective function: On = E[Rn] − bV ar(Rn) where b is a parameter adjusting the risk level Portfolio Theory of Information Retrieval – p. 14/22
  • 15. Portfolio Theory in IR (6) Our solution provides a mathematical model of rank diversification Suppose we have two documents. Their relevance scores are 0.5 and 0.9 respectively 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0.4 0.5 0.6 0.7 0.8 0.9 1 Relevance Standard Deviation ρ = −1 ρ = −0.5 ρ = 0 ρ = 1 Portfolio Theory of Information Retrieval – p. 15/22
  • 16. Evaluations on Ad hoc Text Retrieval (1) We experimented on Ad hoc and sub topic retrieval Calculation of the Mean and Variance: Mean: posterior mean of the chosen text retrieval model Covariance matrix: - largely missing in IR modelling - formally, should be determined by the second moment of the relevance scores (model parameters), e.g., applying the Bayesian paradigm - can be approximated by the covariance with respect to their term occurrences Portfolio Theory of Information Retrieval – p. 16/22
  • 17. Evaluations on Ad hoc Text Retrieval (2) Impact of parameter b on different evaluation metrics (a) Mean Reciprocal Rank (MRR) (b) Mean Average Precision (MAP) (a) positive b: “invest” into different docs. increases the chance of early returning the first rel. docs (b) negative b: “invest” in “similar” docs (big variance) might hurt the MRR but on average increases the performance of the entire ranked list Portfolio Theory of Information Retrieval – p. 17/22
  • 18. Evaluations on Ad hoc Text Retrieval (3) The impact of the parameter b on a risk-sensitive metric, k-call [Chen and Karger(2006)] 10-call: ambitious, return 10 rel. docs. 1-call: conservative, return at least one rel. doc. Positive b when k is small (1 and 2). diversifying reduces the risk of not returning any rel docs. Negative b as k increases. Taking risk increases the chance of finding more rel. docs. Portfolio Theory of Information Retrieval – p. 18/22
  • 19. Evaluations on Ad hoc Text Retrieval (4) Comparison with the PRP (Linear smoothing LM) Measures CSIRO WT10g Robust Robust hard TREC8 MRR 0.869 0.558 0.592 0.393 0.589 0.843 0.492 0.549 0.352 0.472 +3.08% +13.41%* +7.83%* +11.65%* +24.79%* MAP 0.41 0.182 0.204 0.084 0.212 0.347 0.157 0.185 0.078 0.198 +18.16%*+15.92%*+10.27%* +7.69%* +7.07%* NDCG 0.633 0.433 0.421 0.271 0.452 0.587 0.398 0.396 0.252 0.422 +7.88%* +8.82%* +6.25%* +7.55%* +7.05%* NDCG@10 0.185 0.157 0.175 0.081 0.149 0.170 0.141 0.169 0.078 0.140 +8.96%* +11.23%* +3.80% +3.90% +6.36%* NDCG@100 0.377 0.286 0.314 0.169 0.305 0.355 0.262 0.292 0.159 0.287 +6.25%* +9.27%* +7.55%* +6.58%* +6.34%* Portfolio Theory of Information Retrieval – p. 19/22
  • 20. Evaluations on Ad hoc Text Retrieval (5) Comparison with diversity-based reranking, the MMR [Carbonell and Goldstein(1998)] In each cell, the first line shows the performance of our approach, and the second line shows the performance of the MMR method and gain of our method over the MMR method. Models Dirichlet Jelinek-Mercer BM25 sub-MRR 0.014 0.011 0.009 0.012 (+16.67%*)0.009 (+22.22%*)0.007 (+28.57%*) sub-Recall@5 0.324 0.255 0.275 0.304 (+6.58%*) 0.234 (+8.97%*) 0.27 (+1.85%) sub-Recall@10 0.381 0.366 0.352 0.362 (+5.25%) 0.351 (+4.27%) 0.344 (+2.33%) sub-Recall@20 0.472 0.458 0.464 0.455 (+3.74%) 0.41 (+11.71%*) 0.446 (+4.04%) sub-Recall@100 0.563 0.582 0.577 0.558 (+0.90%) 0.55 (+5.82%*) 0.558 (+3.41%) Portfolio Theory of Information Retrieval – p. 20/22
  • 21. Conclusions We have presented a new ranking theory for IR The benefit of diversification is well quantified Is it a unified theory to explain risk and reward in IR? Portfolio Theory of Information Retrieval – p. 21/22
  • 22. References [Cooper(1971)] William S. Cooper. The inadequacy of probability of usefulness as a ranking criterion for retrieval system output. University of California, Berkeley, 1971. [Robertson(1977)] S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, pages 294–304, 1977. [Gordon and Lenk(1991)] Michael D. Gordon and Peter Lenk. A utility theoretic examination of the probability ranking principle in information retrieval. JASIS, 42(10):703–714, 1991. [Maron and Kuhns(1960)] M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and information retrieval. J. ACM, 7(3), 1960. [Stirling(1977)] Keith H. Stirling. The Effect of Document Ranking on Retrieval System Performance: A Search for an Optimal Ranking Rule. PhD thesis, UC, Berkeley, 1977. [Carbonell and Goldstein(1998)] Jaime Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998. [Chen and Karger(2006)] Harr Chen and David R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, 2006. [Markowitz(1952)] H Markowitz. Portfolio selection. Journal of Finance, 1952. [Järvelin and Kekäläinen(2002)] Kalervo Järvelin and Jaana Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 2002. Portfolio Theory of Information Retrieval – p. 22/22