SlideShare a Scribd company logo
Click Model-Based Information Retrieval Metrics
Aleksandr Chuklin˚ 1,2, Pavel Serdyukov1, Maarten de Rijke2
1Yandex, Moscow, Russia
2ISLA, University of Amsterdam, The Netherlands
SIGIR 2013
Dublin, Ireland
˚
Now at Google Switzerland
1 / 24
§ IR Metrics Overview
§ Click Model-Based Metrics
§ Analysis of the New Metrics
2 / 24
§ IR Metrics Overview
§ Click Model-Based Metrics
§ Analysis of the New Metrics
2 / 24
§ IR Metrics Overview
§ Click Model-Based Metrics
§ Analysis of the New Metrics
2 / 24
Classification of IR evaluation techniques
Offline Metrics
Traditional Click Model-Based
Precision uSDBN, ERR (Chapelle et al., 2009)
nDCG, DCG EBU (Yilmaz et al., 2010), rrDBN
MAP uDCM, rrDCM
uUBM
Online Experiments
Absolute Metrics Interleaving
MaxRR, MinRR, MeanRR Team-Draft Interleaving
UCTR, QCTR Balanced Interleaving
PLC
3 / 24
Offline metrics
§ Fixed set of queries Q
§ Documents are assessed by human judges using graded
relevance R P t0, 1, . . . , Rmax u
SystemQuality “
1
|Q|
ÿ
qPQ
Utilitypqq
§ Where Utility usually has the following form:
Utilitypqq “
Nÿ
i“1
decayi ¨ Rpdoci q
4 / 24
Click metrics: DBN
Example: DBN click model
(Chapelle and Zhang, 2009)
tion
odel
(3)
any
ion.
ting
tion
nvex
dels
ame
ulta-
arch
each
d re-
the
rob-
ant)
as-
user
ment
anks
The
EiEi 1 Ei+1
Ci
Ai Si
au su
Figure 1: The DBN used for clicks modeling. Ci is
the the only observed variable.
position, the following hidden binary variables are defined
to model examination, perceived relevance, and actual rele-
vance, respectively:
• Ei: did the user examine the url?
• Ai: was the user attracted by the url?
• Si: was the user satisfied by the landing page?
The following equations describe the model:
Track: Data Mining / Session: Click Models
§ Ci — user clicked i-th
document
§ Ei — user examined i-th
document
§ Ai — user was attracted by
i-th document
§ Si — user was satisfied by
i-th document
Ck “ 1 ô Ak “ 1 and Ek “ 1
PpAk “ 1q “ aqpukq
PpSk “ 1|Ck “ 0q “ 0
PpSk “ 1|Ck “ 1q “ sqpukq
Ek`1 “ 1 ô Ek “ 1 and Sk “ 0
5 / 24
Converting click model into metric
§ aqpukq Ñ aqpRkq, sqpukq Ñ sqpRkq
§ Compute click probability Ci and satisfaction probability Si
§ Use the following equations for utility-based and effort-based
(reciprocal rank) metrics (similar to (Carterette, 2011)):
uMetric “
Nÿ
k“1
PpCk “ 1q ¨ Rk (utility-based)
rrMetric “
Nÿ
k“1
PpSk “ 1q ¨
1
k
(effort-based)
Implementation:
https://github.com/varepsilon/clickmodels
6 / 24
Click model-based metrics and their underlying models
Derived metric
Underlying click model Utility-based Effort-based
SDBN (Chapelle and Zhang, 2009) uSDBN ERR
DBN (Chapelle and Zhang, 2009) EBU rrDBN
DCM (Guo et al., 2009) uDCM rrDCM
UBM (Dupret and Piwowarski, 2008) uUBM –
Previous work:
§ ERR, uSDBN (Chapelle et al., 2009)
§ EBU (Yilmaz et al., 2010)
7 / 24
Evaluating the metrics
§ Correlation with other metrics
§ Correlation with click metrics
§ Correlation with interleaving
Hypothesis
Model-based metrics should be better correlated with online user
metrics.
8 / 24
Aspect one: comparison to other metrics
Table: TREC 2011 runs, Kendall tau correlation. Values higher than 0.9
are marked in boldface.
Precision2 DCG ERR uSDBN EBU rrDBN uDCM rrDCM uUBM
Precision 0.649 0.841 0.597 0.730 0.568 0.397 0.562 0.442 0.537
Precision2 – 0.785 0.663 0.780 0.675 0.526 0.693 0.551 0.681
DCG – – 0.740 0.857 0.711 0.530 0.704 0.592 0.685
ERR – – – 0.807 0.919 0.754 0.902 0.826 0.888
uSDBN – – – – 0.792 0.585 0.794 0.638 0.754
EBU – – – – – 0.788 0.970 0.822 0.930
rrDBN – – – – – – 0.786 0.917 0.807
uDCM – – – – – – – 0.813 0.947
rrDCM – – – – – – – – 0.841
9 / 24
Model-based metrics
Hypothesis
Model-based metrics should be better correlated with online user
metrics.
10 / 24
Aspect two: absolute online metrics
Table: Pearson correlation between offline and absolute click metrics.
Superscripts show statistically significant difference from ERR and EBU.
-RR
Max- Min- Mean- UCTR PLC
Precision ´0.117 ´0.163 ´0.155 0.042 ´0.027
Precision2 0.026 0.093 0.075 0.092 0.094
DCG 0.178 0.243 0.237 0.163 0.245
ERR 0.378 0.471 0.469 0.199 0.399
EBU 0.374 0.467 0.464 0.198 0.397
rrDBN 0.384IJIJ 0.475IJIJ 0.473IJIJ 0.194İİ 0.399´IJ
rrDCM 0.387IJIJ 0.478IJIJ 0.476IJIJ 0.194İİ 0.400´IJ
uSDBN 0.322İİ 0.412İİ 0.407İİ 0.206IJIJ 0.370İİ
uDCM 0.374İİ 0.466İİ 0.463İİ 0.198´´ 0.396İİ
uUBM 0.377´IJ 0.469İIJ 0.467İIJ 0.198´´ 0.398´IJ
11 / 24
Aspect three: interleaving
Large Scale Validation and Analysis of Interleaved Search Evaluation A:5
Input Interleaved Rankings
Ranking Balanced Team-Draft
Rank A B A first B first AAA BAA ABA ...
1 a b a b aA bB aA
2 b e b a bB aA bB
3 c a e e cA cA eB
4 d f c c eB eB cA
5 g g d f dA dA dA
6 h h f d fB fB fB
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
Fig. 1. Examples illustrating how Balanced and Team-Draft Interleaving combine input rankings A and B
over different randomizations. Superscript for the Team-Draft interleavings indicates team membership.
Interleaving methods address these problems by merging the two rankings A and B
into a single interleaved ranking I, which is presented to the user. The retrieval system
observes clicks on the documents in I and attributes them to A, B, or both, depending
on the origin of the document. The goal is to make the interleaving process and click at-
tribution as “fair” as possible with respect to biases in user behavior (e.g. position bias
[Joachims et al. 2007]), so that clicks in the interleaved ranking I can be interpreted as
unbiased feedback for a paired comparison between A and B. The precise definition of
“fair” varies for different interleaving methods, but all have the goal of equalizing the
influence of biases on clicks in I for A and B. This equalization of behavioral biases is12 / 24
Interleaving vs. offline metrics
§ 10 Team-Draft Interleaving Experiments ∆i AB.
§ For each experiment compute TdiSignal “ WinB
WinA`WinB
´ 1
2
§ Judged query-document pairs matched against click log giving
set of queries Q (|Q| „ 102 . . . 103); some documents may be
unjudged (up to #unjudged docs per query)
§ For each metric compute:
MetricSignal “
1
|Q1|
ÿ
qPQ1
pMetricBpqq ´ MetricApqqq ,
where Q1 “ tq P Q | MetricBpqq ‰ MetricApqqu
§ Compare MetricSignal to TdiSignal using Pearson
Correlation (similar to (Radlinski and Craswell, 2010))
13 / 24
Interleaving vs. offline metrics
0 1 2 3 4 5 6 7 8 9 10
#unjudged
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
correlation
Simple Metrics
Precision
Precision2
DCG
uSDBN
ERR
EBU
rrDBN
uDCM
rrDCM
uUBM
Figure: Unjudged documents considered irrelevant
14 / 24
Making use of unjudged documents
0 1 2 3 4 5 6 7 8 9 10
#unjudged
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
correlation
Condensed Metrics
Precision
Precision2
DCG
uSDBN
ERR
EBU
rrDBN
uDCM
rrDCM
uUBM
Figure: Method by Sakai, T. Alternatives to Bpref. SIGIR’2007:
unjudged documents skipped (result page is condensed)
15 / 24
Thresholds
§ Modify offline metric usage protocol. Introduce a threshold δ:
MetricSignal “
1
|Qδ|
ÿ
qPQδ
pMetricBpqq ´ MetricApqqq ,
where Qδ “ tq P Q | |MetricBpqq ´ MetricApqq| ą δu
§ Choose a threshold to maximize correlation with interleaving
§ Use 5 experiments to tune thresholds and 5 thresholds to test.
Repeat for each possible 5/5 split (total C5
10 “ 252 splits)
16 / 24
Thresholds
0 1 2 3 4 5 6 7 8 9 10
#unjudged
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
correlation Thresholded Metrics
Precision
Precision2
DCG
uSDBN
ERR
EBU
rrDBN
uDCM
rrDCM
uUBM
17 / 24
Thresholds+condensation
0 1 2 3 4 5 6 7 8 9 10
#unjudged
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
correlation Thresholded Condensed Metrics
Precision
Precision2
DCG
uSDBN
ERR
EBU
rrDBN
uDCM
rrDCM
uUBM
18 / 24
All in one
0 1 2 3 4 5 6 7 8 9 10
#unjudged
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
correlation
Simple Metrics
Precision
Precision2
DCG
uSDBN
ERR
EBU
rrDBN
uDCM
rrDCM
uUBM
0 1 2 3 4 5 6 7 8 9 10
#unjudged
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
correlation
Condensed Metrics
Precision
Precision2
DCG
uSDBN
ERR
EBU
rrDBN
uDCM
rrDCM
uUBM
0 1 2 3 4 5 6 7 8 9 10
#unjudged
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
correlation
Thresholded Metrics
Precision
Precision2
DCG
uSDBN
ERR
EBU
rrDBN
uDCM
rrDCM
uUBM
0 1 2 3 4 5 6 7 8 9 10
#unjudged
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
correlation
Thresholded Condensed Metrics
Precision
Precision2
DCG
uSDBN
ERR
EBU
rrDBN
uDCM
rrDCM
uUBM
19 / 24
Summary
§ A recipe for turning a click model into a metric
§ Two families of metrics: utility-based and effort-based
§ Multi-aspect analysis of the metrics
20 / 24
Key results
§ Effort-based metrics are substantially different from
utility-based ones, even when based on the same user model
§ Model-based metrics show better agreement with
interleaving and better deal with unjudged documents
§ Using techniques such as condensation and threshold we
can improve agreement with interleaving
21 / 24
What’s next?
§ Judging snippets. Drop the assumption that snippet
attractiveness is a function of document relevance as was
assumed by the click model-based metrics
§ Good abandonments. Modify any evaluation metric by
adding additional gain from the snippets that contain an
answer to the user’s information need
22 / 24
23 / 24
Bibiography
B. Carterette. System effectiveness, user models, and user utility:
a conceptual framework for investigation. In SIGIR, 2011.
O. Chapelle and Y. Zhang. A dynamic bayesian network click
model for web search ranking. In WWW. ACM, 2009.
O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected
reciprocal rank for graded relevance. In CIKM. ACM, 2009.
G. Dupret and B. Piwowarski. A user browsing model to predict
search engine click data from past observations. In SIGIR. ACM,
2008.
F. Guo, C. Liu, and Y. Wang. Efficient multiple-click models in
web search. In WSDM. ACM, 2009.
F. Radlinski and N. Craswell. Comparing the sensitivity of
information retrieval metrics. In SIGIR. ACM, 2010.
E. Yilmaz, M. Shokouhi, N. Craswell, and S. Robertson. Expected
browsing utility for web search evaluation. In CIKM. ACM, 2010.
24 / 24

More Related Content

What's hot

Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysis
inventy
 
An exploratory analysis on half hourly electricity load patterns leading to h...
An exploratory analysis on half hourly electricity load patterns leading to h...An exploratory analysis on half hourly electricity load patterns leading to h...
An exploratory analysis on half hourly electricity load patterns leading to h...
acijjournal
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
ijdkp
 
Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problems
riyaniaes
 
Answers
AnswersAnswers
Cost Estimation Predictive Modeling: Regression versus Neural Network
Cost Estimation Predictive Modeling: Regression versus Neural NetworkCost Estimation Predictive Modeling: Regression versus Neural Network
Cost Estimation Predictive Modeling: Regression versus Neural Network
mustafa sarac
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
PlanetData Network of Excellence
 
Stereo Correspondence Estimation by Two Dimensional Real Time Spiral Search A...
Stereo Correspondence Estimation by Two Dimensional Real Time Spiral Search A...Stereo Correspondence Estimation by Two Dimensional Real Time Spiral Search A...
Stereo Correspondence Estimation by Two Dimensional Real Time Spiral Search A...
MDABDULMANNANMONDAL
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Seval Çapraz
 
Illustration of Medical Image Segmentation based on Clustering Algorithms
Illustration of Medical Image Segmentation based on Clustering AlgorithmsIllustration of Medical Image Segmentation based on Clustering Algorithms
Illustration of Medical Image Segmentation based on Clustering Algorithms
rahulmonikasharma
 
Survey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsSurvey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy Algorithms
IRJET Journal
 
New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...
IJDKP
 
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
ijiert bestjournal
 
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...
Jihoo Kim
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
Dr. Hamdan Al-Sabri
 
A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...
IJECEIAES
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
PRAWEEN KUMAR
 
Survey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction TechniquesSurvey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction Techniques
IRJET Journal
 
Fuzzy logic applications for data acquisition systems of practical measurement
Fuzzy logic applications for data acquisition systems  of practical measurement Fuzzy logic applications for data acquisition systems  of practical measurement
Fuzzy logic applications for data acquisition systems of practical measurement
IJECEIAES
 

What's hot (19)

Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysis
 
An exploratory analysis on half hourly electricity load patterns leading to h...
An exploratory analysis on half hourly electricity load patterns leading to h...An exploratory analysis on half hourly electricity load patterns leading to h...
An exploratory analysis on half hourly electricity load patterns leading to h...
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
 
Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problems
 
Answers
AnswersAnswers
Answers
 
Cost Estimation Predictive Modeling: Regression versus Neural Network
Cost Estimation Predictive Modeling: Regression versus Neural NetworkCost Estimation Predictive Modeling: Regression versus Neural Network
Cost Estimation Predictive Modeling: Regression versus Neural Network
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Stereo Correspondence Estimation by Two Dimensional Real Time Spiral Search A...
Stereo Correspondence Estimation by Two Dimensional Real Time Spiral Search A...Stereo Correspondence Estimation by Two Dimensional Real Time Spiral Search A...
Stereo Correspondence Estimation by Two Dimensional Real Time Spiral Search A...
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
 
Illustration of Medical Image Segmentation based on Clustering Algorithms
Illustration of Medical Image Segmentation based on Clustering AlgorithmsIllustration of Medical Image Segmentation based on Clustering Algorithms
Illustration of Medical Image Segmentation based on Clustering Algorithms
 
Survey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsSurvey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy Algorithms
 
New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...
 
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
 
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
Survey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction TechniquesSurvey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction Techniques
 
Fuzzy logic applications for data acquisition systems of practical measurement
Fuzzy logic applications for data acquisition systems  of practical measurement Fuzzy logic applications for data acquisition systems  of practical measurement
Fuzzy logic applications for data acquisition systems of practical measurement
 

Similar to Click Model-Based Information Retrieval Metrics

2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
Azad public school
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Performance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of DocumentsPerformance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of Documents
IRJET Journal
 
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerceOff-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
Ladislav Peska
 
Clickstream ppt copy
Clickstream ppt   copyClickstream ppt   copy
Clickstream ppt copy
Surbhi Sonkhaskar
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
theijes
 
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
cscpconf
 
Estimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachEstimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approach
csandit
 
Tourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation SystemTourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation System
IRJET Journal
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
Benjamin Bengfort
 
Energy management system
Energy management systemEnergy management system
Energy management system
AmishaSrivastava26
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Daniel Valcarce
 
Threshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniquesThreshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniques
journalBEEI
 
Geo Spatial Data And it’s Quality Assessment
Geo Spatial Data And it’s Quality AssessmentGeo Spatial Data And it’s Quality Assessment
Geo Spatial Data And it’s Quality Assessment
IRJET Journal
 
Improving Graph Based Model for Content Based Image Retrieval
Improving Graph Based Model for Content Based Image RetrievalImproving Graph Based Model for Content Based Image Retrieval
Improving Graph Based Model for Content Based Image Retrieval
IRJET Journal
 
Poster
PosterPoster
A Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset AnalysisA Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset Analysis
IRJET Journal
 
fmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensorsfmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensors
Fridtjof Melle
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS Dataset
Kan Yuenyong
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
IRJET Journal
 

Similar to Click Model-Based Information Retrieval Metrics (20)

2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Performance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of DocumentsPerformance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of Documents
 
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerceOff-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
 
Clickstream ppt copy
Clickstream ppt   copyClickstream ppt   copy
Clickstream ppt copy
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
 
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
 
Estimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachEstimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approach
 
Tourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation SystemTourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation System
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
 
Energy management system
Energy management systemEnergy management system
Energy management system
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
 
Threshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniquesThreshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniques
 
Geo Spatial Data And it’s Quality Assessment
Geo Spatial Data And it’s Quality AssessmentGeo Spatial Data And it’s Quality Assessment
Geo Spatial Data And it’s Quality Assessment
 
Improving Graph Based Model for Content Based Image Retrieval
Improving Graph Based Model for Content Based Image RetrievalImproving Graph Based Model for Content Based Image Retrieval
Improving Graph Based Model for Content Based Image Retrieval
 
Poster
PosterPoster
Poster
 
A Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset AnalysisA Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset Analysis
 
fmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensorsfmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensors
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS Dataset
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

Click Model-Based Information Retrieval Metrics

  • 1. Click Model-Based Information Retrieval Metrics Aleksandr Chuklin˚ 1,2, Pavel Serdyukov1, Maarten de Rijke2 1Yandex, Moscow, Russia 2ISLA, University of Amsterdam, The Netherlands SIGIR 2013 Dublin, Ireland ˚ Now at Google Switzerland 1 / 24
  • 2. § IR Metrics Overview § Click Model-Based Metrics § Analysis of the New Metrics 2 / 24
  • 3. § IR Metrics Overview § Click Model-Based Metrics § Analysis of the New Metrics 2 / 24
  • 4. § IR Metrics Overview § Click Model-Based Metrics § Analysis of the New Metrics 2 / 24
  • 5. Classification of IR evaluation techniques Offline Metrics Traditional Click Model-Based Precision uSDBN, ERR (Chapelle et al., 2009) nDCG, DCG EBU (Yilmaz et al., 2010), rrDBN MAP uDCM, rrDCM uUBM Online Experiments Absolute Metrics Interleaving MaxRR, MinRR, MeanRR Team-Draft Interleaving UCTR, QCTR Balanced Interleaving PLC 3 / 24
  • 6. Offline metrics § Fixed set of queries Q § Documents are assessed by human judges using graded relevance R P t0, 1, . . . , Rmax u SystemQuality “ 1 |Q| ÿ qPQ Utilitypqq § Where Utility usually has the following form: Utilitypqq “ Nÿ i“1 decayi ¨ Rpdoci q 4 / 24
  • 7. Click metrics: DBN Example: DBN click model (Chapelle and Zhang, 2009) tion odel (3) any ion. ting tion nvex dels ame ulta- arch each d re- the rob- ant) as- user ment anks The EiEi 1 Ei+1 Ci Ai Si au su Figure 1: The DBN used for clicks modeling. Ci is the the only observed variable. position, the following hidden binary variables are defined to model examination, perceived relevance, and actual rele- vance, respectively: • Ei: did the user examine the url? • Ai: was the user attracted by the url? • Si: was the user satisfied by the landing page? The following equations describe the model: Track: Data Mining / Session: Click Models § Ci — user clicked i-th document § Ei — user examined i-th document § Ai — user was attracted by i-th document § Si — user was satisfied by i-th document Ck “ 1 ô Ak “ 1 and Ek “ 1 PpAk “ 1q “ aqpukq PpSk “ 1|Ck “ 0q “ 0 PpSk “ 1|Ck “ 1q “ sqpukq Ek`1 “ 1 ô Ek “ 1 and Sk “ 0 5 / 24
  • 8. Converting click model into metric § aqpukq Ñ aqpRkq, sqpukq Ñ sqpRkq § Compute click probability Ci and satisfaction probability Si § Use the following equations for utility-based and effort-based (reciprocal rank) metrics (similar to (Carterette, 2011)): uMetric “ Nÿ k“1 PpCk “ 1q ¨ Rk (utility-based) rrMetric “ Nÿ k“1 PpSk “ 1q ¨ 1 k (effort-based) Implementation: https://github.com/varepsilon/clickmodels 6 / 24
  • 9. Click model-based metrics and their underlying models Derived metric Underlying click model Utility-based Effort-based SDBN (Chapelle and Zhang, 2009) uSDBN ERR DBN (Chapelle and Zhang, 2009) EBU rrDBN DCM (Guo et al., 2009) uDCM rrDCM UBM (Dupret and Piwowarski, 2008) uUBM – Previous work: § ERR, uSDBN (Chapelle et al., 2009) § EBU (Yilmaz et al., 2010) 7 / 24
  • 10. Evaluating the metrics § Correlation with other metrics § Correlation with click metrics § Correlation with interleaving Hypothesis Model-based metrics should be better correlated with online user metrics. 8 / 24
  • 11. Aspect one: comparison to other metrics Table: TREC 2011 runs, Kendall tau correlation. Values higher than 0.9 are marked in boldface. Precision2 DCG ERR uSDBN EBU rrDBN uDCM rrDCM uUBM Precision 0.649 0.841 0.597 0.730 0.568 0.397 0.562 0.442 0.537 Precision2 – 0.785 0.663 0.780 0.675 0.526 0.693 0.551 0.681 DCG – – 0.740 0.857 0.711 0.530 0.704 0.592 0.685 ERR – – – 0.807 0.919 0.754 0.902 0.826 0.888 uSDBN – – – – 0.792 0.585 0.794 0.638 0.754 EBU – – – – – 0.788 0.970 0.822 0.930 rrDBN – – – – – – 0.786 0.917 0.807 uDCM – – – – – – – 0.813 0.947 rrDCM – – – – – – – – 0.841 9 / 24
  • 12. Model-based metrics Hypothesis Model-based metrics should be better correlated with online user metrics. 10 / 24
  • 13. Aspect two: absolute online metrics Table: Pearson correlation between offline and absolute click metrics. Superscripts show statistically significant difference from ERR and EBU. -RR Max- Min- Mean- UCTR PLC Precision ´0.117 ´0.163 ´0.155 0.042 ´0.027 Precision2 0.026 0.093 0.075 0.092 0.094 DCG 0.178 0.243 0.237 0.163 0.245 ERR 0.378 0.471 0.469 0.199 0.399 EBU 0.374 0.467 0.464 0.198 0.397 rrDBN 0.384IJIJ 0.475IJIJ 0.473IJIJ 0.194İİ 0.399´IJ rrDCM 0.387IJIJ 0.478IJIJ 0.476IJIJ 0.194İİ 0.400´IJ uSDBN 0.322İİ 0.412İİ 0.407İİ 0.206IJIJ 0.370İİ uDCM 0.374İİ 0.466İİ 0.463İİ 0.198´´ 0.396İİ uUBM 0.377´IJ 0.469İIJ 0.467İIJ 0.198´´ 0.398´IJ 11 / 24
  • 14. Aspect three: interleaving Large Scale Validation and Analysis of Interleaved Search Evaluation A:5 Input Interleaved Rankings Ranking Balanced Team-Draft Rank A B A first B first AAA BAA ABA ... 1 a b a b aA bB aA 2 b e b a bB aA bB 3 c a e e cA cA eB 4 d f c c eB eB cA 5 g g d f dA dA dA 6 h h f d fB fB fB . .. . .. . .. . .. . .. . .. . .. . .. Fig. 1. Examples illustrating how Balanced and Team-Draft Interleaving combine input rankings A and B over different randomizations. Superscript for the Team-Draft interleavings indicates team membership. Interleaving methods address these problems by merging the two rankings A and B into a single interleaved ranking I, which is presented to the user. The retrieval system observes clicks on the documents in I and attributes them to A, B, or both, depending on the origin of the document. The goal is to make the interleaving process and click at- tribution as “fair” as possible with respect to biases in user behavior (e.g. position bias [Joachims et al. 2007]), so that clicks in the interleaved ranking I can be interpreted as unbiased feedback for a paired comparison between A and B. The precise definition of “fair” varies for different interleaving methods, but all have the goal of equalizing the influence of biases on clicks in I for A and B. This equalization of behavioral biases is12 / 24
  • 15. Interleaving vs. offline metrics § 10 Team-Draft Interleaving Experiments ∆i AB. § For each experiment compute TdiSignal “ WinB WinA`WinB ´ 1 2 § Judged query-document pairs matched against click log giving set of queries Q (|Q| „ 102 . . . 103); some documents may be unjudged (up to #unjudged docs per query) § For each metric compute: MetricSignal “ 1 |Q1| ÿ qPQ1 pMetricBpqq ´ MetricApqqq , where Q1 “ tq P Q | MetricBpqq ‰ MetricApqqu § Compare MetricSignal to TdiSignal using Pearson Correlation (similar to (Radlinski and Craswell, 2010)) 13 / 24
  • 16. Interleaving vs. offline metrics 0 1 2 3 4 5 6 7 8 9 10 #unjudged 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 correlation Simple Metrics Precision Precision2 DCG uSDBN ERR EBU rrDBN uDCM rrDCM uUBM Figure: Unjudged documents considered irrelevant 14 / 24
  • 17. Making use of unjudged documents 0 1 2 3 4 5 6 7 8 9 10 #unjudged 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 correlation Condensed Metrics Precision Precision2 DCG uSDBN ERR EBU rrDBN uDCM rrDCM uUBM Figure: Method by Sakai, T. Alternatives to Bpref. SIGIR’2007: unjudged documents skipped (result page is condensed) 15 / 24
  • 18. Thresholds § Modify offline metric usage protocol. Introduce a threshold δ: MetricSignal “ 1 |Qδ| ÿ qPQδ pMetricBpqq ´ MetricApqqq , where Qδ “ tq P Q | |MetricBpqq ´ MetricApqq| ą δu § Choose a threshold to maximize correlation with interleaving § Use 5 experiments to tune thresholds and 5 thresholds to test. Repeat for each possible 5/5 split (total C5 10 “ 252 splits) 16 / 24
  • 19. Thresholds 0 1 2 3 4 5 6 7 8 9 10 #unjudged 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 correlation Thresholded Metrics Precision Precision2 DCG uSDBN ERR EBU rrDBN uDCM rrDCM uUBM 17 / 24
  • 20. Thresholds+condensation 0 1 2 3 4 5 6 7 8 9 10 #unjudged 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 correlation Thresholded Condensed Metrics Precision Precision2 DCG uSDBN ERR EBU rrDBN uDCM rrDCM uUBM 18 / 24
  • 21. All in one 0 1 2 3 4 5 6 7 8 9 10 #unjudged 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 correlation Simple Metrics Precision Precision2 DCG uSDBN ERR EBU rrDBN uDCM rrDCM uUBM 0 1 2 3 4 5 6 7 8 9 10 #unjudged 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 correlation Condensed Metrics Precision Precision2 DCG uSDBN ERR EBU rrDBN uDCM rrDCM uUBM 0 1 2 3 4 5 6 7 8 9 10 #unjudged 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 correlation Thresholded Metrics Precision Precision2 DCG uSDBN ERR EBU rrDBN uDCM rrDCM uUBM 0 1 2 3 4 5 6 7 8 9 10 #unjudged 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 correlation Thresholded Condensed Metrics Precision Precision2 DCG uSDBN ERR EBU rrDBN uDCM rrDCM uUBM 19 / 24
  • 22. Summary § A recipe for turning a click model into a metric § Two families of metrics: utility-based and effort-based § Multi-aspect analysis of the metrics 20 / 24
  • 23. Key results § Effort-based metrics are substantially different from utility-based ones, even when based on the same user model § Model-based metrics show better agreement with interleaving and better deal with unjudged documents § Using techniques such as condensation and threshold we can improve agreement with interleaving 21 / 24
  • 24. What’s next? § Judging snippets. Drop the assumption that snippet attractiveness is a function of document relevance as was assumed by the click model-based metrics § Good abandonments. Modify any evaluation metric by adding additional gain from the snippets that contain an answer to the user’s information need 22 / 24
  • 26. Bibiography B. Carterette. System effectiveness, user models, and user utility: a conceptual framework for investigation. In SIGIR, 2011. O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In WWW. ACM, 2009. O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM. ACM, 2009. G. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR. ACM, 2008. F. Guo, C. Liu, and Y. Wang. Efficient multiple-click models in web search. In WSDM. ACM, 2009. F. Radlinski and N. Craswell. Comparing the sensitivity of information retrieval metrics. In SIGIR. ACM, 2010. E. Yilmaz, M. Shokouhi, N. Craswell, and S. Robertson. Expected browsing utility for web search evaluation. In CIKM. ACM, 2010. 24 / 24