Slides by Aleksandr Chuklin and Maarten de Rijke, presented at the 2016 CIKM Conference. The authors propose a methodology for better evaluating searcher satisfaction and incorporating it into how search results are evaluated and ranked.
p.s. This document was originally published at https://www.researchgate.net/publication/309416715_Slides_Incorporating_Clicks_Attention_and_Satisfaction_into_a_SERP_Evaluation_Model
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Ā
Incorporating Clicks, Attention and Satisfaction into a SERP Evaluation Model
1. Background Motivation Model & Metric Experimental Setup Results Summary
Incorporating Clicks, Attention and Satisfaction
into a SERP Evaluation Model
Aleksandr ChuklinĀ¶,Ā§ Maarten de RijkeĀ§
chuklin@google.com derijke@uva.nl
Ā¶Google Research Europe
Ā§University of Amsterdam
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 1
3. Background Motivation Model & Metric Experimental Setup Results Summary
Search Engine Result Page (SERP) Evaluation
Main problem
Combining relevance of individual SERP items (Rk) into a
whole-page metric.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 3
5. Background Motivation Model & Metric Experimental Setup Results Summary
Search Engine Result Page (SERP) Evaluation
Examples
Precision at N:
P@N =
1
N
N
k=1
Rk
document 3
document 4
document 1
document 2
document 5
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 4
6. Background Motivation Model & Metric Experimental Setup Results Summary
Search Engine Result Page (SERP) Evaluation
Examples
Precision at N:
P@N =
1
N
N
k=1
Rk
Discounted Cumulative Gain (DCG):
DCG@N =
N
k=1
1
log2 (1 + k)
Ā· Rk
document 3
document 4
document 1
document 2
document 5
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 4
7. Background Motivation Model & Metric Experimental Setup Results Summary
Search Engine Result Page (SERP) Evaluation
Examples
Precision at N:
P@N =
1
N
N
k=1
Rk
Discounted Cumulative Gain (DCG):
DCG@N =
N
k=1
1
log2 (1 + k)
Ā· Rk
Model-Based Metrics (Chuklin et al. 2013):
Utility@N =
N
k=1
P(Ck = 1) Ā· Rk
document 3
document 4
document 1
document 2
document 5
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 4
8. Background Motivation Model & Metric Experimental Setup Results Summary
Main Goal of This Paper
Better measure for SERP utility
Namely, improve this (Chuklin et al. 2013):
N
k=1
P(Ck = 1) Ā· Rk
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 5
11. Background Motivation Model & Metric Experimental Setup Results Summary
Motivation 1: Non-Trivial Attention Patterns
4
ement
9
1
3
5
6
7
8
4
2
(c) Mouse Data
data. The session sequence for this data would be
Image credits: F. Diaz, R.W. White, G. Buscher, and D. Liebling. Robust models of mouse movement on dynamic
web search results pages. In CIKM, 2013. ACM Press
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 8
12. Background Motivation Model & Metric Experimental Setup Results Summary
Motivation 2: Satisfaction Without Clicks
High direct page utility (measured by DCG or ERR) leads to higher
abandonment rate (SERPs with no clicks)
direct page utility
Image credits: from A. Chuklin and P. Serdyukov. Good abandonments in factoid queries. In WWW, 2012.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 9
13. Background Motivation Model & Metric Experimental Setup Results Summary
Problems of Existing Models and Evaluation Metrics
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 10
14. Background Motivation Model & Metric Experimental Setup Results Summary
Problems of Existing Models and Evaluation Metrics
existing models mostly do not model non-trivial user
attention patterns
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 10
15. Background Motivation Model & Metric Experimental Setup Results Summary
Problems of Existing Models and Evaluation Metrics
existing models mostly do not model non-trivial user
attention patterns
existing models do not use explicit user satisfaction data
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 10
19. Background Motivation Model & Metric Experimental Setup Results Summary
Click Model
Examination assumption: click happens only when an item was
examined and attractive:
P(Ck = 1) = P(Ek = 1) Ā· P(Ck = 1 | Ek = 1)
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 14
20. Background Motivation Model & Metric Experimental Setup Results Summary
Click Model
Examination assumption: click happens only when an item was
examined and attractive:
P(Ck = 1) = P(Ek = 1) Ā· P(Ck = 1 | Ek = 1)
N.B. Here we assume that P(Ck = 1 | Ek = 1) = Ī±(Rk) where Rk
comes from the raters and Ī± is a logistic function.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 14
22. Background Motivation Model & Metric Experimental Setup Results Summary
Attention (Examination) Model
Logistic regression model:
P(Ek = 1) = Īµ(Ļk),
where Ļk is a vector of features for SERP item k.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 16
23. Background Motivation Model & Metric Experimental Setup Results Summary
Attention (Examination) Model
Logistic regression model:
P(Ek = 1) = Īµ(Ļk),
where Ļk is a vector of features for SERP item k.
Feature group Features # of features
rank user-perceived rank of the SERP item
(can be diļ¬erent from k)
1
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 16
24. Background Motivation Model & Metric Experimental Setup Results Summary
Attention (Examination) Model
Logistic regression model:
P(Ek = 1) = Īµ(Ļk),
where Ļk is a vector of features for SERP item k.
Feature group Features # of features
rank user-perceived rank of the SERP item
(can be diļ¬erent from k)
1
CSS classes SERP item type (Web, News,
Weather, Currency, Knowledge
Panel, etc.)
10
geometry oļ¬set from the top, ļ¬rst or second col-
umn (binary), width (w), height (h),
w Ć h
5
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 16
26. Background Motivation Model & Metric Experimental Setup Results Summary
Satisfaction Model
in previous models, satisfaction comes only from clicked
results;
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 18
27. Background Motivation Model & Metric Experimental Setup Results Summary
Satisfaction Model
in previous models, satisfaction comes only from clicked
results;
in our model it also comes from the SERP items that simply
attracted attention;
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 18
28. Background Motivation Model & Metric Experimental Setup Results Summary
Satisfaction Model
in previous models, satisfaction comes only from clicked
results;
in our model it also comes from the SERP items that simply
attracted attention;
P(S = 1) = Ļ(Ļ0 + U) =
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 18
29. Background Motivation Model & Metric Experimental Setup Results Summary
Satisfaction Model
in previous models, satisfaction comes only from clicked
results;
in our model it also comes from the SERP items that simply
attracted attention;
P(S = 1) = Ļ(Ļ0 + U) =
Ļ Ļ0 +
k
P(Ek = 1)ud (Dk) +
k
P(Ck = 1)ur (Rk)
where Dk and Rk are ratings assigned by the raters for direct
snippet relevance and result relevance respectively. ud and ur are
linear functions of rating histograms.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 18
30. Background Motivation Model & Metric Experimental Setup Results Summary
The CAS Metric
Utility that determines the satisfaction probability:
U =
k
P(Ek = 1)ud (Dk) +
k
P(Ck = 1)ur (Rk)
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 19
31. Background Motivation Model & Metric Experimental Setup Results Summary
The CAS Metric
Utility that determines the satisfaction probability:
U =
k
P(Ek = 1)ud (Dk)
NEW
+
k
P(Ck = 1)ur (Rk)
Chuklin et al. 2013
has an additional term
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 19
32. Background Motivation Model & Metric Experimental Setup Results Summary
The CAS Metric
Utility that determines the satisfaction probability:
U =
k
P(Ek = 1)ud (Dk)
NEW
+
k
P(Ck = 1)ur (Rk)
Chuklin et al. 2013
has an additional term
trained on mousing and satisfaction (in addition to clicks)
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 19
34. Background Motivation Model & Metric Experimental Setup Results Summary
Dataset
199 queries with explicit unambiguous
feedback (satisļ¬ed / not satisļ¬ed);
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 21
35. Background Motivation Model & Metric Experimental Setup Results Summary
Dataset
199 queries with explicit unambiguous
feedback (satisļ¬ed / not satisļ¬ed);
1,739 rated results
direct snippet relevance (D)
result relevance (R)
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 21
36. Background Motivation Model & Metric Experimental Setup Results Summary
Baselines and CAS Model Variants
UBM model that agrees
well with online team-draft
experimental outcomes;
PBM position-based model,
a robust model with fewer
parameters than UBM;
random model that predicts
click and satisfaction with
ļ¬xed probabilities (learned
from the data).
uUBM from
Chuklin et al. 2013. Similar
to UBM, but parameters are
trained on a diļ¬erent and
much bigger dataset.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 22
37. Background Motivation Model & Metric Experimental Setup Results Summary
Baselines and CAS Model Variants
UBM model that agrees
well with online team-draft
experimental outcomes;
PBM position-based model,
a robust model with fewer
parameters than UBM;
random model that predicts
click and satisfaction with
ļ¬xed probabilities (learned
from the data).
uUBM from
Chuklin et al. 2013. Similar
to UBM, but parameters are
trained on a diļ¬erent and
much bigger dataset.
CASnod is a stripped-down
version that does not use
(D) labels;
CASnosat is a version of
the CAS model that does
not include the satisfaction
term while optimizing the
model;
CASnoreg is a version of
the CAS model that does
not use regularization while
training. All other models
were trained with
L2-regularization.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 22
39. Background Motivation Model & Metric Experimental Setup Results Summary
Is the New Metric Really New?
Correlation Between Metrics
Table: Correlation between metrics measured by average Pearsonās
correlation coeļ¬cient.
CASnosat CASnoreg CAS UBM PBM DCG uUBM
CASnod 0.593 0.564 0.633 0.470 0.487 0.546 0.441
CASnosat 0.664 0.715 0.707 0.668 0.735 0.684
CASnoreg 0.974 0.363 0.379 0.417 0.341
CAS 0.377 0.394 0.440 0.360
UBM 0.814 0.972 0.882
PBM 0.906 0.965
DCG 0.943
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 24
40. Background Motivation Model & Metric Experimental Setup Results Summary
Is the New Metric Measuring the Right Thing?
Metric Correlation with True Satisfaction
CASnod
CASnosat
CASnoreg
CAS
UBM PBM
random DCG
uUBM
0.2
0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Pearson correlation coeļ¬cient between diļ¬erent model-based
metrics and the user-reported satisfaction.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 25
41. Background Motivation Model & Metric Experimental Setup Results Summary
Bonus Point
Log-Likelihood of Click Prediction
CASnod
CASnosat
CASnoreg
CAS
UBM PBM
random
uUBM
4.5
4.0
3.5
3.0
2.5
2.0
1.5
Log-likelihood of the click data. Note that uUBM was trained on a
totally diļ¬erent dataset.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 26
43. Background Motivation Model & Metric Experimental Setup Results Summary
Summary
A model-based metric needs to model satisfaction explicitly
and use it for training.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 28
44. Background Motivation Model & Metric Experimental Setup Results Summary
Summary
A model-based metric needs to model satisfaction explicitly
and use it for training.
Direct snippet relevance (D) is essential for predicting
satisfaction.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 28
45. Background Motivation Model & Metric Experimental Setup Results Summary
Summary
A model-based metric needs to model satisfaction explicitly
and use it for training.
Direct snippet relevance (D) is essential for predicting
satisfaction.
The CAS metric is quite diļ¬erent from the previously used
metrics, making it an interesting addition to TREC.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 28
46. Background Motivation Model & Metric Experimental Setup Results Summary
Summary
A model-based metric needs to model satisfaction explicitly
and use it for training.
Direct snippet relevance (D) is essential for predicting
satisfaction.
The CAS metric is quite diļ¬erent from the previously used
metrics, making it an interesting addition to TREC.
When used as a model, CAS consistently predicts user
satisfaction with a relatively small penalty in click prediction.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 28
47. Background Motivation Model & Metric Experimental Setup Results Summary
Acknowledgments
All content represents the opinion of the authors which is not necessarily shared or endorsed by their respective
employers and/or sponsors.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 29
48.
49. Background Motivation Model & Metric Experimental Setup Results Summary
Evaluating the User Model
Log-Likelihood of Satisfaction Prediction
CASnod
CASnosat
CASnoreg
CAS
UBM PBM
random
uUBM
0.8
0.7
0.6
0.5
0.4
0.3
0.2
Log-likelihood of the satisfaction prediction. Some models have
log-likelihood below ā0.8, hence there are no boxes for them.
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 31
50. Background Motivation Model & Metric Experimental Setup Results Summary
Analyzing the Attention Features
CASrank is the
model that only uses
the rank to predict
attention;
CASnogeom only
uses the rank and
SERP item type
information and does
not use geometry;
CASnoclass does not
use the CSS class
features (SERP item
type).
Pearson correlation with satisfaction
CASrank
CASnogeom
CASnoclass
CASnod
CAS
0.2
0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Log-likelihood of clicks / satisfaction
CASrank
CASnogeom
CASnoclass
CASnod
CAS
2.5
2.4
2.3
2.2
2.1
2.0
1.9
1.8
1.7
CASrank
CASnogeom
CASnoclass
CASnod
CAS
0.65
0.60
0.55
0.50
0.45
0.40
0.35
0.30
0.25
0.20
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 32
51. Background Motivation Model & Metric Experimental Setup Results Summary
Heterogeneous SERPs
12% of the SERPs in our data are heterogeneous and our metric
does well for them.
Table: Pearson correlation between utility of heterogeneous SERP and
user-reported satisfaction.
CAS UBM PBM random DCG uUBM
0.60 0.38 -0.05 -0.39 0.24 -0.08
CASrank CASnogeom CASclass CASnod CASnosat CASnoreg
0.15 -0.04 0.27 -0.04 0.48 0.67
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 33
52. Background Motivation Model & Metric Experimental Setup Results Summary
Spammers
Some raters were ļ¬ltered out as spammers, but there was still
some natural disagreement:
Table: Filtered out workers and agreement scores for remaining workers.
% of workers % of ratings Cohenās Krippendorfās
label removed removed kappa alpha
(D) 32% 27% 0.339 0.144
(R) 41% 29% 0.348 0.117
ACāMdR Incorporating Clicks, Attention and Satisfaction. . . 34